├── pyproject.toml
├── .gitignore
├── docs
├── gpt-4o-summary.png
├── prompt-original.jpg
├── example1-summary.png
├── gpt-4o-mini-summary.png
├── example1-choose-issue.jpg
├── github-issue-original.jpg
├── github-issue-summarized.jpg
├── example2-choose-larger-context-model.png
├── prompt-after-removal-of-instructions.jpg
└── post-processed-issue-comments.txt
├── requirements.txt
├── LICENSE
├── llm.ini
├── llm.py
├── github.py
├── cli.py
├── app.py
└── README.md
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [tool.ruff]
2 | line-length = 120
3 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | .ruff_cache
3 | .vscode
4 | venv
5 | __pycache__
6 |
7 | # Keys go here, so never commit it
8 | .env
9 |
--------------------------------------------------------------------------------
/docs/gpt-4o-summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/gpt-4o-summary.png
--------------------------------------------------------------------------------
/docs/prompt-original.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/prompt-original.jpg
--------------------------------------------------------------------------------
/docs/example1-summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/example1-summary.png
--------------------------------------------------------------------------------
/docs/gpt-4o-mini-summary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/gpt-4o-mini-summary.png
--------------------------------------------------------------------------------
/docs/example1-choose-issue.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/example1-choose-issue.jpg
--------------------------------------------------------------------------------
/docs/github-issue-original.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/github-issue-original.jpg
--------------------------------------------------------------------------------
/docs/github-issue-summarized.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/github-issue-summarized.jpg
--------------------------------------------------------------------------------
/docs/example2-choose-larger-context-model.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/example2-choose-larger-context-model.png
--------------------------------------------------------------------------------
/docs/prompt-after-removal-of-instructions.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fau-masters-collected-works-cgarbin/llm-github-issues/HEAD/docs/prompt-after-removal-of-instructions.jpg
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | openai ~= 1.23.2
2 | python-dotenv ~= 1.0.0
3 | requests ~= 2.31.0
4 | streamlit ~= 1.33.0
5 |
6 | # Linters and formatters (use the latest versions)
7 | ruff
8 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Christian Garbin CS master's and Ph.D. collected works
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/llm.ini:
--------------------------------------------------------------------------------
1 | [LLM]
2 | # References:
3 | # - List of models: https://platform.openai.com/docs/models
4 | # - Pricing: https://openai.com/pricing
5 | # GPT-3.5 Turbo is getting old - we start with it to show its shortcomings
6 | model: gpt-3.5-turbo
7 | # This is the best model at the time, but it's the most expensive by a large margin and slower
8 | # Use this model to get the best results
9 | #model: gpt-4o
10 |
11 | # Note that indentation here is important - it tells configparser that they are continuation lines
12 | # Also note that keeping some the sentences together affect the results - for example, if we break up
13 | # the sentence starting at "Don't waste..." into two lines, the results are not as good (this may depend
14 | # on the model used - experiment with different models)
15 | # The "Don't waste..." part comes from the example in https://learn.microsoft.com/en-us/semantic-kernel/ai-orchestration/plugins/
16 | prompt: You are an experienced developer familiar with GitHub issues.
17 | The following text was parsed from a GitHub issue and its comments.
18 | Extract the following information from the issue and comments:
19 | - Issue: A list with the following items: title, the submitter name, the submission date and
20 | time, labels, and status (whether the issue is still open or closed).
21 | - Summary: A summary of the issue in precisely one short sentence of no more than 50 words.
22 | - Details: A longer summary of the issue. If code has been provided, list the pieces of code
23 | that cause the issue in the summary.
24 | - Comments: A table with a summary of each comment in chronological order with the columns:
25 | date/time, time since the issue was submitted, author, and a summary of the comment.
26 | Don't waste words. Use short, clear, complete sentences. Use active voice. Maximize detail, meaning focus on the content. Quote code snippets if they are relevant.
27 | Answer in markdown with section headers separating each of the parts above.
28 |
--------------------------------------------------------------------------------
/llm.py:
--------------------------------------------------------------------------------
1 | """Interface to the large language model (LLM).
2 |
3 | It currently uses OpenAI. The intention of this module is to abstract away the LLM so that it can be easily replaced
4 | with a different LLM later if needed.
5 | """
6 |
7 | import os
8 | import time
9 | from dataclasses import dataclass, field
10 |
11 | import dotenv
12 | from openai import OpenAI
13 |
14 |
15 | @dataclass
16 | class LLMResponse:
17 | """Class to hold the LLM response.
18 |
19 | We use our class instead of returning the native LLM response to make it easier to adapt to different LLMs later.
20 | """
21 |
22 | model: str = ""
23 | prompt: str = ""
24 | user_input: str = ""
25 | llm_response: str = ""
26 | input_tokens: int = 0
27 | output_tokens: int = 0
28 | cost: float = 0.0
29 | raw_response: dict = field(default_factory=dict)
30 | elapsed_time: float = 0.0
31 |
32 | @property
33 | def total_tokens(self):
34 | """Calculate the total number of tokens used."""
35 | return self.input_tokens + self.output_tokens
36 |
37 |
38 | # Support models and costs from https://openai.com/pricing
39 | _COST_UNIT = 1_000_000 # Prices are per 1,000,000 token for each model
40 | _MODEL_DATA = {
41 | "gpt-3.5-turbo": {"input": 0.5, "output": 1.5},
42 | "gpt-4o": {"input": 5.0, "output": 15.0},
43 | "gpt-4o-mini": {"input": 0.15, "output": 0.6},
44 | }
45 |
46 |
47 | def _get_openai_client() -> OpenAI:
48 | """Get a client for OpenAI."""
49 | # Always override the environment variables with the .env file to allow key changes without restarting the program
50 | dotenv.load_dotenv(override=True)
51 | api_key = os.getenv("OPENAI_API_KEY")
52 |
53 | if not api_key:
54 | raise EnvironmentError("OPENAI_API_KEY environment variable not set -- see README.md for instructions")
55 |
56 | return OpenAI(api_key=api_key)
57 |
58 |
59 | def _openai_cost(input_tokens: int, output_tokens: int, model: str) -> float:
60 | """Calculate the cost of the completion.
61 |
62 | IMPORTANT: OpenAI may change pricing at any time. Consult https://openai.com/pricing and
63 | update this function accordingly.
64 | """
65 | if model not in _MODEL_DATA:
66 | # Flag the error, but don't interrupt the program
67 | return -1.0
68 |
69 | input_cost = input_tokens * _MODEL_DATA[model]["input"] / _COST_UNIT
70 | output_cost = output_tokens * _MODEL_DATA[model]["output"] / _COST_UNIT
71 | return input_cost + output_cost
72 |
73 |
74 | def _openai_chat_completion(model: str, prompt: str, user_input: str) -> LLMResponse:
75 | """Get a chat completion from OpenAI."""
76 | # Always instantiate a new client to pick up configuration changes without restarting the program
77 | client = _get_openai_client()
78 |
79 | start_time = time.time()
80 | completion = client.chat.completions.create(
81 | model=model,
82 | messages=[
83 | {"role": "system", "content": prompt},
84 | {"role": "user", "content": user_input},
85 | ],
86 | temperature=0.0, # We want precise and repeatable results
87 | )
88 | elapsed_time = time.time() - start_time
89 | client.close()
90 |
91 | # Record the request and the response
92 | response = LLMResponse()
93 | response.elapsed_time = elapsed_time
94 | response.model = model
95 | response.prompt = prompt
96 | response.user_input = user_input
97 | response.llm_response = completion.choices[0].message.content # type: ignore
98 |
99 | # This is not exactly the raw response, but it's close enough
100 | # It assumes the completion object is a pydantic.BaseModel class, which has the `dict()`
101 | # method we need here
102 | response.raw_response = completion.model_dump()
103 |
104 | # Record the number of tokens used for input and output
105 | response.input_tokens = completion.usage.prompt_tokens # type: ignore
106 | response.output_tokens = completion.usage.completion_tokens # type: ignore
107 |
108 | # Records costs (depends on the tokens and model - set them first)
109 | response.cost = _openai_cost(response.input_tokens, response.output_tokens, model)
110 |
111 | return response
112 |
113 |
114 | def models():
115 | """Get the list of supported models."""
116 | # Return the keys in the token_costs dictionary
117 | return list(_MODEL_DATA.keys())
118 |
119 |
120 | def chat_completion(model, prompt: str, user_input: str) -> LLMResponse:
121 | """Get a completion from the LLM."""
122 | # Only one LLM is currently supported. This function can be extended to support multiple LLMs later.
123 | if model.startswith("gpt"):
124 | return _openai_chat_completion(model, prompt, user_input)
125 | raise ValueError(f"Unsupported model: {model}")
126 |
--------------------------------------------------------------------------------
/github.py:
--------------------------------------------------------------------------------
1 | """Interface to GitHub."""
2 | import requests
3 |
4 |
5 | def _get_github_api_url(repo: str) -> str:
6 | """Get GitHub API URL for a repository, accepting a flexible range of inputs.
7 |
8 | Args:
9 | repo (str): Repository in the form "user/repo" or full repository URL.
10 |
11 | Returns:
12 | str: GitHub issues API URL for the repository.
13 | """
14 | if repo.startswith("https://api.github.com/repos/"):
15 | # Assume it's already a GitHub API URL
16 | if repo.endswith("/"):
17 | repo = repo[:-1]
18 | return repo
19 |
20 | # Assume it's a GitHub repository URL and accept a flexible range of inputs
21 | # Normalize the URL
22 | if repo.startswith("https://github.com/"):
23 | repo = repo.replace("https://github.com/", "")
24 | if repo.endswith(".git"):
25 | repo = repo[:-4]
26 | if repo.endswith("/"):
27 | repo = repo[:-1]
28 |
29 | # Create the GitHub API URL from the normalized URL
30 | if "/" in repo:
31 | return f"https://api.github.com/repos/{repo}"
32 |
33 | raise ValueError("Invalid repository format. Must be in the form 'user/repo' or full repository URL.")
34 |
35 |
36 | def _invoke_github_api(repo: str, endpoint: str) -> dict:
37 | """Invoke the GitHub API for a repository.
38 |
39 | Args:
40 | repo (str): Repository in the form "user/repo" or "https://..."
41 | endpoint (str): API endpoint and its parameters, e.g. "issues/1234".
42 | """
43 | url = _get_github_api_url(repo)
44 | if endpoint:
45 | url = f"{url}/{endpoint}"
46 |
47 | response = requests.get(url, timeout=10)
48 | response.raise_for_status()
49 | return response.json()
50 |
51 |
52 | def get_issue(repo: str, issue_id: str = "") -> dict:
53 | """Get a specific issue from GitHub.
54 |
55 | Args:
56 | repo (str): Repository in the form "user/repo" or the full URL to the issue, e.g.
57 | "https://github.com/qjebbs/vscode-plantuml/issues/255".
58 | issue_id (int): Issue number. Optional if the URL already contains the issue number.
59 |
60 | Returns:
61 | dict: Issue data.
62 | """
63 | if "/issues/" in repo:
64 | # Assume it's already a fully-formed GitHub issue API URL
65 | return _invoke_github_api(repo, "")
66 | return _invoke_github_api(repo, f"issues/{issue_id}")
67 |
68 |
69 | def get_issue_comments(issue: dict) -> dict:
70 | """Get comments for a specific issue.
71 |
72 | Args:
73 | issue (dict): Issue data, as returned by GitHub (the JSON response).
74 |
75 | Returns:
76 | dict: Comments data.
77 | """
78 | url = issue["comments_url"]
79 | # Request the maximum number of comments per page
80 | # Note that pagination is not supported - if there are more than 100 comments, only the first 100 will be returned
81 | url += "?per_page=100"
82 | return _invoke_github_api(url, "")
83 |
84 |
85 | def parse_issue(issue: dict) -> str:
86 | """Parse issue data returned by GitHub into a text format.
87 |
88 | The goal is to translate the JSON into a text format that has only the information we need to pass to an LLM. The
89 | shorter format helps the LLM understand the data better and saves tokens (and thus cost).
90 |
91 | The text includes some annotations to help guide the LLM. For example, using delimiters to indicate the start and
92 | end of the body text, and using a prefix to indicate the start of each field. This is based on previous experiments
93 | with LLMs. It may need to be adjusted for different LLMs. Other LLMs may not need as much guidance.
94 |
95 | Args:
96 | issue (dict): Issue data, as returned by GitHub (the JSON response).
97 |
98 | Returns:
99 | str: Parsed issue data.
100 | """
101 | parsed = f"Title: {issue['title']}\n"
102 | parsed += f"Body (between '''):\n'''\n{issue['body']}\n'''\n"
103 | parsed += f"Submitted by: {issue['user']['login']}\n"
104 | parsed += f"Submitted on: {issue['created_at']}\n"
105 | parsed += f"Submitter association: {issue['author_association']}\n"
106 | parsed += f"State: {issue['state']}\n"
107 | parsed += f"Labels: {', '.join([label['name'] for label in issue['labels']])}\n"
108 | return parsed
109 |
110 |
111 | def parse_comments(comments: dict) -> str:
112 | """Parse comments data returned by GitHub into a text format.
113 |
114 | See comments for parse_issue() for more details.
115 |
116 | Args:
117 | comments (dict): Comments data, as returned by GitHub (the JSON response).
118 |
119 | Returns:
120 | str: Parsed comments data.
121 | """
122 | parsed = ""
123 | for comment in comments:
124 | parsed += f"Comment by: {comment['user']['login']}\n"
125 | parsed += f"Comment on: {comment['created_at']}\n"
126 | parsed += f"Body (between '''):\n'''\n{comment['body']}\n'''\n"
127 | return parsed
128 |
--------------------------------------------------------------------------------
/cli.py:
--------------------------------------------------------------------------------
1 | #! python
2 | """A simple command-line interface for running tests.
3 |
4 | Some test data to use:
5 |
6 | This one has a short description and lots of pieces of code. The summarization needs to consider the text
7 | before and after the code blocks to get the context right.
8 | - https://github.com/openai/openai-python
9 | - 488
10 |
11 | - https://github.com/openai/openai-python
12 | - 650
13 |
14 | - https://github.com/scikit-learn/scikit-learn
15 | - 27435
16 |
17 | Large issue - needs GPT-4's large context window
18 | - https://github.com/scikit-learn/scikit-learn/issues/9354
19 |
20 | This one has several comments. The large list of comments seems to cause the LLM to stop summarizing
21 | then mid-way through the issue (tested with GPT-3.5).
22 | The first comment also highlights a security issue: it has a link in a markdown text. The summary
23 | has the link, which can be used for phishing and other attacks.
24 | - https://github.com/qjebbs/vscode-plantuml/issues/255
25 | """
26 |
27 | import configparser
28 | import github
29 | import llm
30 |
31 |
32 | def get_option():
33 | """Show the menu and ask for an option."""
34 | print("\nMENU:")
35 | print("1. Enter the repository and issue number")
36 | print("2. Show the raw GitHub issue data")
37 | print("3. Show the parsed GitHub issue data")
38 | print("4. Get and show the LLM response")
39 | print("9. Exit")
40 | choice = input("Enter your choice: ")
41 | return choice
42 |
43 |
44 | def get_github_data(repository, issue_number):
45 | """Get the issue and comments from GitHub."""
46 | issue = github.get_issue(repository, issue_number)
47 | comments = github.get_issue_comments(issue)
48 | return issue, comments
49 |
50 |
51 | def get_llm_answer(parsed_issue, parsed_comments):
52 | """Get the LLM answer."""
53 | # Always read the config file to allow for changes without restarting the CLI
54 | model, prompt = get_model_and_prompt()
55 | print(f"Using model: {model}")
56 | user_input = f"{parsed_issue}\n{parsed_comments}"
57 | response = llm.chat_completion(model, prompt, user_input)
58 | return response
59 |
60 |
61 | def show_llm_response(response):
62 | """Show the LLM response and other data."""
63 | r = response # Shorter name for convenience
64 | print(f"LLM Response:\n{r.llm_response}")
65 | print("-------------------------------")
66 | print(f"Model: {r.model}")
67 | print(f"Input tokens: {r.input_tokens}, output tokens: {r.output_tokens} - cost: US ${r.cost:.2f}")
68 | tokens_sec = (r.input_tokens + r.output_tokens) / r.elapsed_time
69 | print(f"Elapsed time: {r.elapsed_time:.2f} seconds ({tokens_sec:.1f} tokens/sec)")
70 |
71 |
72 | def get_model_and_prompt():
73 | """Get the model and prompt from the config file."""
74 | # Always read the config file to allow for changes without restarting the CLI
75 | config = configparser.ConfigParser()
76 | config.read("llm.ini")
77 | model = config["LLM"]["model"]
78 | prompt = config["LLM"]["prompt"]
79 | return model, prompt
80 |
81 |
82 | def main():
83 | """Run the CLI."""
84 | repository = ""
85 | issue_number = 0
86 | issue, comments, parsed_issue, parsed_comments = None, None, None, None
87 |
88 | while True:
89 | try:
90 | choice = get_option()
91 | if choice == "1":
92 | repository = input("Enter GitHub repository name or issue URL: ")
93 | if "/issues/" not in repository:
94 | issue_number = input("Enter issue number: ")
95 | print("Getting issue data from GitHub...")
96 | issue, comments = get_github_data(repository, issue_number)
97 | if not issue:
98 | print("GitHub returned and empty issue")
99 | continue
100 | parsed_issue = github.parse_issue(issue)
101 | parsed_comments = github.parse_comments(comments)
102 | print("Done")
103 | continue
104 |
105 | # Don't run options that require GitHub data if we don't have it
106 | # Note that we check only the issue because not having comments is not an error
107 | if choice in ("2", "3", "4") and not issue:
108 | print("Retrieve the GitHub issue data first")
109 | continue
110 |
111 | if choice == "2":
112 | print("Raw GitHub issue data:")
113 | print(f"Issue from GitHub:\n{issue}")
114 | print("\n-------------------------------")
115 | print(f"Comments from GitHub:\n{comments}")
116 | elif choice == "3":
117 | print("Parsed GitHub issue data:")
118 | print(f"Issue:\n{parsed_issue}")
119 | print("\n-------------------------------")
120 | print(f"Comments:\n{parsed_comments}")
121 | elif choice == "4":
122 | print("Getting response from LLM (may take a few seconds)...")
123 | response = get_llm_answer(parsed_issue, parsed_comments)
124 | show_llm_response(response)
125 | elif choice == "9":
126 | print("Exiting...")
127 | break
128 | else:
129 | input("Invalid choice. Press Enter to continue...")
130 | except Exception as ex:
131 | print(f"Error: {ex}")
132 | input("Press Enter to continue...")
133 |
134 |
135 | if __name__ == "__main__":
136 | main()
137 |
--------------------------------------------------------------------------------
/docs/post-processed-issue-comments.txt:
--------------------------------------------------------------------------------
1 | Title: Copilot Chat: [Copilot Chat App] Azure Cognitive Search: kernel.Memory.SearchAsync producing no results for queries
2 | Body (between '''):
3 | '''
4 | **Describe the bug**
5 | I'm trying to build out the Copilot Chat App as a RAG chat (without skills for now). Not sure if its an issue with Semantic Kernel or my cognitive search setup. Looking for some guidance.
6 |
7 | **To Reproduce**
8 | Steps to reproduce the behavior:
9 | 1. Run the Copilot Chat App example
10 | 2. Register Azure Cognitive Search as Kernels memory
11 | 3. use the kernel.Memory.SearchAsync with the user prompt (not user intent) to find relevant information
12 | 4. For some prompts, it does not return any data from the indices. Azure Cognitive search's search explorer on the hand, returns the correct data. (Semantic Search is enabled)
13 |
14 | **Expected behavior**
15 | kernel.Memory.SearchAsync will return the right set of documents from the created index for all queries.
16 |
17 | **Screenshots**
18 | If applicable, add screenshots to help explain your problem.
19 |
20 | **Platform**
21 | - OS: [e.g. Windows, Mac]
22 | - IDE: [e.g. Visual Studio, VS Code]
23 | - Language: [e.g. C#, Python]
24 | - Source: [e.g. NuGet package version 0.1.0, pip package version 0.1.0, main branch of repository]
25 |
26 | **Additional context**
27 |
28 | '''
29 | Submitted by: animeshj9
30 | Submitted on: 2023-07-18T02:18:07Z
31 | Submitter association: NONE
32 | State: closed
33 | Labels: good first issue, question, copilot chat
34 |
35 |
36 | Comment by: vman
37 | Comment on: 2023-07-19T15:40:33Z
38 | Body (between '''):
39 | '''
40 | We are seeing this issue as well in the latest version of the repo. It used to work at least until 12th July for us.
41 | '''
42 | Comment by: TaoChenOSU
43 | Comment on: 2023-07-20T17:59:01Z
44 | Body (between '''):
45 | '''
46 | Hello @animeshj9,
47 |
48 | Thanks for reporting the issue!
49 |
50 | `kernel.Memory` is a `SemanticTextMemory` (https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel/Memory/SemanticTextMemory.cs) instance that uses the embedding generator backed by the embedding model you provide to generate embeddings to perform vector-based search. Semantic search in Azure Cognitive Search, on the other hand, uses a different process to retrieve semantically relevant results for your query (https://learn.microsoft.com/en-us/azure/search/semantic-search-overview). I think this explains the difference in the behaviors you see.
51 |
52 | If you can provide more information or samples, we will be able to assist you further.
53 |
54 | '''
55 | Comment by: animeshj9
56 | Comment on: 2023-07-21T17:46:59Z
57 | Body (between '''):
58 | '''
59 | Hey @TaoChenOSU - Thanks for the information.
60 |
61 | We were able to root cause and have identified a couple of issues that were happening that caused our Copilot Chat to not get any data from CognitiveSearch memory.
62 |
63 | First, to preface - we are using the 0.16 release of SK
64 |
65 | (1) In the [DocumentMemorySkill](https://github.com/microsoft/semantic-kernel/blob/main/samples/apps/copilot-chat-app/webapi/CopilotChat/Skills/ChatSkills/DocumentMemorySkill.cs) on Copilot Chat - While it says to filters out memories based DocumentMemoryMinRelevance in the promptOptions, its really filtering it out on the ReRanker score that's coming from CognitiveSearch. That helped us set the right config for MemoryMinRelevance and we were able to see some responses.
66 |
67 | (2) The function (ToMemoryRecordMetadata)[https://github.com/microsoft/semantic-kernel/blob/53a3a8466fdcafbdad304c55c4e4591dfdff6582/dotnet/src/Connectors/Connectors.Memory.AzureCognitiveSearch/AzureCognitiveSearchMemoryRecord.cs#L103C1-L104C1] - errors out if it encounters certain symbols / combinations in the documentation and error out with "the base64 string is not valid".
68 |
69 | For v0.16, we are writing our own Search function instead of using `kernel.Memory.SearchAsync()` to get the documents and that seems to be working fine for us. We are also exploring if moving to the latest SK version will fix this issue as I see a lot of changes have been made to CognitiveSearch connecter since v0.16.
70 | '''
71 | Comment by: animeshj9
72 | Comment on: 2023-07-21T17:58:57Z
73 | Body (between '''):
74 | '''
75 | @vman - What is the error you are getting? or is it just not returning data when you call `kernel.Memory.SearchAsync()`.
76 | Can you check if reducing the minrelevance or maybe writing your own Cognitive Search instead of using kernel.Memory work?
77 |
78 | If not, let us know your error and we can try to see if we can root cause it.
79 |
80 |
81 | '''
82 | Comment by: glahaye
83 | Comment on: 2023-07-25T21:12:59Z
84 | Body (between '''):
85 | '''
86 | As an alternative to using Azure Cognitive Search's semantic search, consider using Azure Cognitive Search's vector search feature, available from SK version 0.17.230718.1-preview on (and using the Microsoft.SemanticKernel.Connectors.Memory.AzureSearch nuget as opposed to only the Microsoft.SemanticKernel.Connectors.Memory.AzureCognitiveSearch nuget)
87 | '''
88 | Comment by: animeshj9
89 | Comment on: 2023-07-27T19:33:53Z
90 | Body (between '''):
91 | '''
92 | @glahaye - I am unable to find Microsoft.SemanticKernel.Connectors.Memory.AzureSearch nuget.
93 |
94 | Only returns Azure Cognitive Search: https://www.nuget.org/packages?q=Microsoft.SemanticKernel.Connectors.Memory.AzureSearch
95 | '''
96 | Comment by: glahaye
97 | Comment on: 2023-07-29T22:28:36Z
98 | Body (between '''):
99 | '''
100 | Hi @animeshj9
101 |
102 | There's been a couple of changes and I think they might fix your problem.
103 |
104 | First, Copilot Chat has been renamed to Chat Copilot and moved to its own repo:
105 | https://github.com/microsoft/chat-copilot/
106 |
107 | Then, the Nuget package I mentioned to you has been retired and its functionality has now replaced the old code in Microsoft.SemanticKernel.Connectors.Memory.AzureCognitiveSearch.
108 |
109 | I have a PR open that modifies Chat Copilot to use that latest version of the Nuget and get the vector search experience:
110 | https://github.com/microsoft/chat-copilot/pull/65
111 |
112 | I estimate it will get merged on Monday and I am confident it will resolve your current problem. Because of that, I will go ahead and close this issue on this repo.
113 |
114 | If you do still experience problems after the merge of the PR, feel free to open a new issue in the **new** repo.
115 | '''
116 | Comment by: animeshj9
117 | Comment on: 2023-07-31T18:59:35Z
118 | Body (between '''):
119 | '''
120 | Thank you very much @glahaye! I will check it out and let you know if it fixes my issues.
121 | '''
122 |
--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
1 | """Streamlit app to show the summarized GitHub issue and comments from the LLM response.
2 |
3 | Set up the environment as described in the README.md file, then run this app with:
4 |
5 | streamlit run app.py
6 | """
7 | import configparser
8 | import re
9 | import streamlit as st
10 | import github as gh
11 | import llm
12 |
13 |
14 | def get_default_settings():
15 | """Reads settings from the .ini file."""
16 | config = configparser.ConfigParser()
17 | config.read("llm.ini")
18 | model = config["LLM"]["model"]
19 | prompt = config["LLM"]["prompt"]
20 | return prompt, model
21 |
22 |
23 | def display_settings_section():
24 | """Let the user change the settings that affect the LLM response."""
25 | with st.expander("Click to configure the prompt and the model", expanded=False):
26 | if "prompt" not in st.session_state:
27 | st.session_state.prompt, st.session_state.model = get_default_settings()
28 |
29 | st.session_state.prompt = st.text_area("Prompt", st.session_state.prompt, height=300)
30 | models = llm.models()
31 | st.session_state.model = st.selectbox("Select model", models, index=models.index(st.session_state.model))
32 |
33 |
34 | def get_issue_to_show():
35 | """Get a GitHub issue to show from the user."""
36 | if "issue_url" not in st.session_state:
37 | st.session_state.issue_url = ""
38 |
39 | example_urls = [
40 | "https://github.com/openai/openai-python/issues/488 (simple example)",
41 | "https://github.com/openai/openai-python/issues/650 (also simple, but more code blocks)",
42 | "https://github.com/scikit-learn/scikit-learn/issues/9354 (large issue, requires GPT-4's large context window)",
43 | "https://github.com/microsoft/semantic-kernel/issues/2039 (large comments, GPT-4 summarizes it better)",
44 | "https://github.com/qjebbs/vscode-plantuml/issues/255 (large number of comments)",
45 | ]
46 | selected_url = st.selectbox(
47 | "Choose an example URL from this list or type your own below",
48 | example_urls,
49 | placeholder="Pick from this list or enter an URL below",
50 | index=None,
51 | )
52 | if selected_url:
53 | # Discard the comment text after the URL to make it valid
54 | st.session_state.issue_url = selected_url.split(" (")[0]
55 |
56 | st.session_state.issue_url = st.text_input(
57 | "Enter GitHub issue URL",
58 | value=st.session_state.issue_url,
59 | label_visibility="collapsed",
60 | placeholder=("Enter URL to GitHub issue or pick an example" " from the list above"),
61 | )
62 |
63 |
64 | def get_github_data(issue_url: str) -> tuple[dict, dict]:
65 | """Get the issue and comments from GitHub."""
66 | with st.spinner("Waiting for GitHub response..."):
67 | issue = gh.get_issue(issue_url)
68 | comments = gh.get_issue_comments(issue)
69 | return issue, comments
70 |
71 |
72 | def get_llm_response(model: str, prompt: str, issue: str, comments: str) -> llm.LLMResponse:
73 | """Get the LLM response for the issue and comments."""
74 | with st.spinner(f"Waiting for {model} response..."):
75 | # Format the issue and comments into a text format to make it easier for the LLM to understand
76 | # and to save tokens.
77 | text_format = f"{issue}\n\n{comments}"
78 |
79 | response = llm.chat_completion(model, prompt, text_format)
80 | return response
81 |
82 |
83 | def show_github_raw_data(issue: dict, comments: dict):
84 | """Show the GitHub issue and comments as we got from the API."""
85 | # Show a link to the issue in GitHub
86 | # Prefer the issue URL from GitHub - fall back to the user's input if we don't have it
87 | issue_url = issue.get("html_url", st.session_state.issue_url)
88 | st.link_button("Open the issue in GitHub", issue_url)
89 |
90 | st.write("This is the data as we got from from the GitHub API.")
91 | st.subheader("GitHub Issue")
92 | st.json(issue, expanded=False)
93 | st.subheader("GitHub Comments")
94 | st.json(comments, expanded=False)
95 |
96 |
97 | def show_github_post_processed_data(issue: str, comments: str):
98 | """Show the GitHub issue and comments after we have post-processed them."""
99 | with st.expander("Click to show/hide the post-processed GitHub data", expanded=False):
100 | st.write("This is the data after we have post-processed to use with the LLM.")
101 | st.subheader("GitHub Issue")
102 | st.text(issue)
103 | st.subheader("GitHub Comments")
104 | st.text(comments)
105 |
106 |
107 | def show_llm_raw_data(response: llm.LLMResponse):
108 | """Show the raw data to/from the LLM."""
109 | r = response # Shorter name to make the code easier to read
110 | st.write(
111 | (
112 | f"Total tokens: {r.total_tokens:,} (input: {r.input_tokens:,}, output: {r.output_tokens:,})"
113 | f" - costs US ${r.cost:.4f}"
114 | )
115 | )
116 |
117 | tokens_sec = r.total_tokens / r.elapsed_time
118 | st.write(f"Elapsed time: {r.elapsed_time:.2f} seconds ({tokens_sec:,.1f} tokens/sec)")
119 |
120 | with st.expander("Click to show/hide the raw data we sent to and received from the LLM", expanded=False):
121 | st.subheader("Raw LLM response")
122 | st.json(r.raw_response, expanded=False)
123 | st.subheader("Prompt")
124 | st.text(r.prompt)
125 | st.subheader("Data we extracted from the GitHub issue and comments")
126 | st.text(r.user_input)
127 | st.subheader("LLM Response")
128 | st.text(r.llm_response)
129 |
130 |
131 | def show_llm_response(response: llm.LLMResponse):
132 | """Show the formatted LLM response."""
133 | # Change markdown heading 1 to heading 3 to make it smaller
134 | # Ensure it's a heading by replacing only if it's at the start of the line
135 | txt = re.sub(r"^# ", r"### ", response.llm_response, flags=re.MULTILINE)
136 |
137 | st.header(f"Summary from {st.session_state.model}")
138 | st.write(txt)
139 |
140 |
141 | def main():
142 | """Run the Streamlit app."""
143 | st.set_page_config(layout="wide")
144 | st.title("LLM GitHub Issue Summarizer")
145 |
146 | display_settings_section()
147 | get_issue_to_show()
148 | if st.button(f"Generate summary with {st.session_state.model}"):
149 | try:
150 | issue, comments = get_github_data(st.session_state.issue_url)
151 | parsed_issue = gh.parse_issue(issue)
152 | parsed_comments = gh.parse_comments(comments)
153 | response = get_llm_response(st.session_state.model, st.session_state.prompt, parsed_issue, parsed_comments)
154 |
155 | tabs = st.tabs(["LLM data", "Raw GitHub data", "Parsed GitHub data"])
156 | with tabs[0]:
157 | show_llm_raw_data(response)
158 | with tabs[1]:
159 | show_github_raw_data(issue, comments)
160 | with tabs[2]:
161 | show_github_post_processed_data(parsed_issue, parsed_comments)
162 |
163 | show_llm_response(response)
164 | except Exception as err:
165 | st.error(err)
166 |
167 |
168 | if __name__ == "__main__":
169 | main()
170 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Using LLMs to summarize GitHub issues
2 |
3 | This project is a learning exercise on using large language models (LLMs) for summarization. It uses GitHub issues as a practical use case that we can relate to.
4 |
5 | The goal is to allow developers to understand what is being reported and discussed in the issues without having to read each message in the thread. We will take the [original GitHub issue with its comments](./docs/github-issue-original.jpg) and generate a summary like [this one](./docs/github-issue-summarized.jpg).
6 |
7 | **UPDATE 2024-07-21**: With the [announcement of GPT-4o mini](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/), there are fewer and fewer reasons to use GPT-3.5 models. I updated the code to use the GPT-4o and GPT-4o mini models and to remove the GPT-4 Turbo models (they are listed under ["older models we support"](https://openai.com/api/pricing/), hinting that they will eventually be removed).
8 |
9 | We will review the following topics:
10 |
11 | 1. How to prepare data to use with an LLM.
12 | 1. How to build a prompt to summarize data.
13 | 1. How good are LLMs at summarizing text and GitHub issues in particular.
14 | 1. Developing applications with LLMs: some of their limitations, such as the context window size.
15 | 1. The role of prompts in LLMs and how to create good prompts.
16 | 1. When not to use LLMs.
17 |
18 | This [YouTube video](https://youtu.be/5sDD0WNDZkc) walks through the sections below, but note that it uses the first version of the code. The code has been updated since then.
19 |
20 | ## Overview of the steps
21 |
22 | Before we start, let's review what happens behind the scenes when we use LLMs to summarize GitHub issues.
23 |
24 | The following diagram shows the main steps:
25 |
26 | - _Get the issue and its comments from GitHub_: The application converts the issue URL the user entered in (1) to a GitHub API URL and requests the issue, then the comments (2). The GitHub API returns the issue and comments in JSON format (3).
27 | - _Preprocess the data_: The application converts the JSON data into a compact text format (4) that the LLM can process. This is important to reduce token usage and costs.
28 | - _Build the prompt_: The application builds a prompt (5) for the LLM. The prompt is a text that tells the LLM what to do.
29 | - _Send the request to the LLM_: The application sends the prompt to the LLM (6) and waits for the response.
30 | - _Process the LLM response_: The application receives the response from the LLM (7) and shows it to the user (8).
31 |
32 | ```mermaid
33 | sequenceDiagram
34 | autonumber
35 | Actor U as User
36 | participant App as Application
37 | participant GH as GitHub API
38 | participant LLM as LLM
39 |
40 | U->>App: Enter URL to GitHub issue
41 | App->>GH: Request issue data
42 | GH-->>App: Return issue data in JSON format
43 | App->>App: Parse JSON into compact text format
44 | App->>App: Build prompt for LLM
45 | App->>LLM: Send request to LLM
46 | LLM-->>App: Return response and usage data (tokens)
47 | App->>U: Show response and usage data
48 | ```
49 |
50 | We will now review each step in more detail.
51 |
52 | ## Quick get-started guide
53 |
54 | This section describes the steps to go from a GitHub issue like [this one](https://github.com/microsoft/semantic-kernel/issues/2039) (click to enlarge)...
55 |
56 |
57 |
58 |
59 | ...to LLM-generated summary (click to enlarge):
60 |
61 |
62 |
63 |
64 | First, [prepare the environment](#preparing-the-environment), if you haven't done so yet.
65 |
66 | Run the following commands to activate the environment and start the application in a browser.
67 |
68 | ```bash
69 | source venv/bin/activate
70 | streamlit run app.py
71 | ```
72 |
73 | Once the application is running, enter the URL for the above issue, `https://github.com/microsoft/semantic-kernel/issues/2039`, and click the `Generate summary with ` button to generate the summary. It will take a few seconds to complete.
74 |
75 | **NOTES**:
76 |
77 | - Large language models are not deterministic and may be updated anytime. The results you get may be different from the ones shown here.
78 | - The GitHub issue may have been updated since the screenshots were taken.
79 |
80 | In the following sections, we will go behind the scenes to see how the application works.
81 |
82 | ## What happens behind the scenes
83 |
84 | This section describes the steps to summarize a GitHub issue using LLMs. We will start by fetching the issue data, preprocessing it, building an appropriate prompt, sending it to the LLM, and finally, processing the response.
85 |
86 | ### Step 1 - Get the GitHub issue and its comments
87 |
88 | The first step is to get the raw data using the GitHub API. In this step we translate the URL the user entered into a GitHub API URL and request the issue and its comments. For example, the URL `https://github.com/microsoft/semantic-kernel/issues/2039` is translated into `https://api.github.com/repos/microsoft/semantic-kernel/issues/2039`. The GitHub API returns a JSON object with the issue. [Click here](https://api.github.com/repos/microsoft/semantic-kernel/issues/2039) to see the JSON object for the issue.
89 |
90 | The issue has a link to its comments:
91 |
92 | ```text
93 | "comments_url": "https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/comments",
94 | ```
95 |
96 | We use that URL to request the comments and get another JSON object. [Click here](https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/comments) to see the JSON object for the comments.
97 |
98 | ### Step 2 - Translate the JSON data into a compact text format
99 |
100 | The JSON objects have more information than we need. Before sending the request to the LLM, we need to extract the pieces we need for the following reasons:
101 |
102 | 1. Large objects cost more because [most LLMs charge per token](https://openai.com/pricing).
103 | 1. It takes longer to process large objects.
104 | 1. Large objects may not fit in the LLM's context window (the context window is the number of tokens the LLM can process at a time).
105 |
106 | In this step, we convert the JSON objects into a compact text format. The text format is easier to process and takes less space than the JSON objects.
107 |
108 | This is the start of the JSON object returned by the GitHub API for the issue.
109 |
110 | ```text
111 | {
112 | "url": "https://api.github.com/repos/microsoft/semantic-kernel/issues/2039",
113 | "repository_url": "https://api.github.com/repos/microsoft/semantic-kernel",
114 | "labels_url": "https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/labels{/name}",
115 | "comments_url": "https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/comments",
116 | "events_url": "https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/events",
117 | "html_url": "https://github.com/microsoft/semantic-kernel/issues/2039",
118 | "id": 1808939848,
119 | "node_id": "I_kwDOJDJ_Yc5r0jtI",
120 | "number": 2039,
121 | "title": "Copilot Chat: [Copilot Chat App] Azure Cognitive Search: kernel.Memory.SearchAsync producing no ...
122 |
123 | "body": "**Describe the bug**\r\nI'm trying to build out the Copilot Chat App as a RAG chat (without
124 | skills for now). Not sure if its an issue with Semantic Kernel or my cognitive search...
125 | ...many lines removed for brevity...
126 | package version 0.1.0, pip package version 0.1.0, main branch of repository]\r\n\r\n**Additional
127 | context**\r\n",
128 | ...
129 | ```
130 |
131 | And this is the compact text format we create out of it.
132 |
133 | ```text
134 | Title: Copilot Chat: [Copilot Chat App] Azure Cognitive Search: kernel.Memory.SearchAsync producing no
135 | results for queries
136 | Body (between '''):
137 | '''
138 | **Describe the bug**
139 | I'm trying to build out the Copilot Chat App as a RAG chat (without skills for now). Not sure if its an
140 | issue with Semantic Kernel or my cognitive search setup. Looking for some guidance.
141 | ...many lines removed for brevity...
142 | ```
143 |
144 | To get from the JSON object to the compact text format, we do the following:
145 |
146 | - Remove all fields we don't need for the summary. For example, `repository_url`, `node_id`, and many others.
147 | - Change from JSON to plain text format. For example, `{"title": "Copilot Chat: [Copilot Chat App] Azure ...` becomes `Title: Copilot Chat: [Copilot Chat App] Azure ...`.
148 | - Remove spaces and quotes. They count as tokens, which increase costs and processing time.
149 | - Add a few hints to guide the LLM. For example, `Body (between ''')` tells the LLM that the body of the issue is between the `'''` characters.
150 |
151 | [Click here](./docs/post-processed-issue-comments.txt) to see the result of this step. Compare with the JSON object for the [issue](https://api.github.com/repos/microsoft/semantic-kernel/issues/2039) and [comments](https://api.github.com/repos/microsoft/semantic-kernel/issues/2039/comments) to see how much smaller the text format is.
152 |
153 | ### Step 3 - Build the prompt
154 |
155 | A [prompt](https://developers.google.com/machine-learning/resources/prompt-eng) tells the LLM what to do, along with the data it needs.
156 |
157 | Our prompt is stored in [this file](./llm.ini). The prompt instructs the LLM to summarize the issue and the comments in the format we want (the _"Don't waste..."_ part comes from [this example](https://learn.microsoft.com/en-us/semantic-kernel/ai-orchestration/plugins/)).
158 |
159 | ```text
160 | You are an experienced developer familiar with GitHub issues.
161 | The following text was parsed from a GitHub issue and its comments.
162 | Extract the following information from the issue and comments:
163 | - Issue: A list with the following items: title, the submitter name, the submission date and
164 | time, labels, and status (whether the issue is still open or closed).
165 | - Summary: A summary of the issue in precisely one short sentence of no more than 50 words.
166 | - Details: A longer summary of the issue. If code has been provided, list the pieces of code
167 | that cause the issue in the summary.
168 | - Comments: A table with a summary of each comment in chronological order with the columns:
169 | date/time, time since the issue was submitted, author, and a summary of the comment.
170 | Don't waste words. Use short, clear, complete sentences. Use active voice. Maximize detail, meaning focus on the content. Quote code snippets if they are relevant.
171 | Answer in markdown with section headers separating each of the parts above.
172 | ```
173 |
174 | ### Step 4 - Send the request to the LLM
175 |
176 | We now have all the pieces we need to send the request to the LLM. Different LLMs have different APIs, but most of them have a variation of the following parameters:
177 |
178 | - The model: The LLM to use. As a general rule, larger models are better but are also more expensive and take more time to build the response.
179 | - System prompt: The instructions we send to the LLM to tell it what to do, what format to use, and so on. This is usually not visible to the user.
180 | - The user input: The data the user enters in the application. In our case, the user enters the URL for the GitHub issue and we use it to create the actual user input (the parsed issue and comments).
181 | - The temperature: The higher the temperature, the more creative the LLM is. The lower the temperature, the more predictable it is. We use a temperature of 0.0 to get more precise and consistent results.
182 |
183 | These are the main ones we use in this project. There are [other parameters](https://txt.cohere.com/llm-parameters-best-outputs-language-ai/) we can adjust for other use cases.
184 |
185 | This is the relevant code in [llm.py](./llm.py):
186 |
187 | ```python
188 | completion = client.chat.completions.create(
189 | model=model,
190 | messages=[
191 | {"role": "system", "content": prompt},
192 | {"role": "user", "content": user_input},
193 | ],
194 | temperature=0.0 # We want precise and repeatable results
195 | )
196 | ```
197 |
198 | ### Step 5 - Show the response
199 |
200 | The LLM returns a JSON object with the response and usage data. We show the response to the user and use the usage data to calculate the cost of the request.
201 |
202 | This is a sample response from the LLM (using the [OpenAI API](https://platform.openai.com/docs/guides/gpt/chat-completions-api)):
203 |
204 | ```python
205 | ChatCompletion(..., choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content=
206 | '', role='assistant', function_call=None))], created=1698528558,
207 | model='gpt-3.5-turbo-0613', object='chat.completion', usage=CompletionUsage(completion_tokens=304,
208 | prompt_tokens=1301, total_tokens=1605))
209 | ```
210 |
211 | Besides the response, we get the token usage. The cost is not part of the response. We must calculate that ourselves following the [published pricing rules](https://openai.com/pricing).
212 |
213 | At this point, we have everything we need to show the response to the user.
214 |
215 | ## Developing applications with LLMs
216 |
217 | In this section, we will review a few examples of how to use LLMs in applications. We will start with simple cases that work well and then move on to cases where things don't behave as expected and how to work around them.
218 |
219 | This is a summary of what is covered in the following sections.
220 |
221 | 1. [A simple GitHub issue first to see how LLMs can summarize](#a-simple-github-issue-to-get-started).
222 | 1. [A large GitHub issue that doesn't fit in the context window of a basic LLM](#a-large-github-issue).
223 | 1. [A more powerful model for a better summary](#better-summaries-with-a-more-powerful-model).
224 | 1. [The introduction of GPT-4o mini](#the-introduction-of-gpt-4o-mini).
225 | 1. [The importance of using a good prompt](#the-importance-of-using-a-good-prompt).
226 | 1. [Sometimes we should not use an LLM](#if-all-we-have-is-a-hammer).
227 |
228 | ### A simple GitHub issue to get started
229 |
230 | We will start with a simple case to see how well LLMs can summarize.
231 |
232 | Start the user interface with the following command.
233 |
234 | ```bash
235 | source venv/bin/activate
236 | streamlit run app.py
237 | ```
238 |
239 | Then choose the first issue in the list of samples, _` (simple example)`_ and click the _"Generate summary with..."_ button.
240 |
241 |
242 |
243 |
244 | After a few seconds, we should get a summary like the picture below. At the top we can see the token count, the cost (derived from the token count), and how long it took for the LLM to generate the summary. After that we see the LLM's response. Compared with the [original GitHub issue](https://github.com/openai/openai-python/issues/488), the LLM does a good job summarizing the issue's main points and the comments. At a glance, we can see the main points of the issue and its comments.
245 |
246 |
247 |
248 |
249 | ### A large GitHub issue
250 |
251 | Now choose the issue _`https://github.com/scikit-learn/scikit-learn/issues/9354 ...`_ and click the _"Generate summary with..."_ button. Do not change the LLM model yet.
252 |
253 | It will fail with this error:
254 |
255 | > `Error code: 400 - {'error': {'message': "This model's maximum context length is 16385 tokens. However, your messages resulted in 20437 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}`
256 |
257 | Each LLM has a limit on the number of tokens it can process at a time. This limit is the _context window_ size. The context window must fit the information we want to summarize and the summary itself. If the information we want to summarize is larger than the context window, as we saw in this case, the LLM will reject the request.
258 |
259 | There are a few ways to work around this problem:
260 |
261 | - Break up the information into smaller pieces that fit in the context window. For example, we could [ask for a summary of each comment separately](https://github.com/microsoft/azure-openai-design-patterns/blob/main/patterns/01-large-document-summarization/README.md), then combine them into a single summary to show to the user. This may not work well in all cases, for example, if one comment refers to another.
262 | - Use a model with a larger context window.
263 |
264 | We will use the second option. Click on _"Click to configure the prompt and the model"_ at the top of the screen, select the GPT-4o model and click the _"Generate summary with gpt-4o"_ button.
265 |
266 |
267 |
268 |
269 | Now, we get a summary from the LLM. However, note that it will take longer to generate the summary and cost more.
270 |
271 | Why don't we start with GPT-4o to avoid such problems? Money. As a general rule, LLMs with larger context windows cost more. If we use an AI provider such as OpenAI, we must [pay more per token](https://openai.com/pricing). If we run the model ourselves, we need to buy more powerful hardware. Either way, using a larger context window costs more.
272 |
273 | ### Better summaries with a more powerful model
274 |
275 | As a result of using GPT-4o, we also get better summaries.
276 |
277 | Why don't we use GPT-4o from the start? In addition to the above reason (money), there is also higher latency. As a general rule, better models are also larger. They need more hardware to run, translating into [higher costs per token](https://openai.com/pricing) and a longer time to generate a response.
278 |
279 | We can see the difference comparing the token count, cost, and time to generate the summary between the gpt-3.5-turbo and the gpt-4o models.
280 |
281 | How do we pick a model? It depends on the use case. Start with the smallest (and thus cheaper and faster) model that produces good results. Create some heuristics to decide when to use a more powerful model. For example, switch to a larger model if the comments are larger than a certain size and if the users are willing to wait longer for better results (sometimes an average result faster is better than the perfect result later).
282 |
283 | ### The introduction of GPT-4o mini
284 |
285 | The previous sections compared GPT-3.5 Turbo against GPT-4o to emphasize the differences between a smaller and a much larger model. However, in July 2024, OpenAI introduced the [GPT-4o mini model](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/). It comes with the same 128k tokens context window as the GPT-4o model but with a much lower cost. It's even cheaper than the GPT-3.5 models. See the [OpenAI API pricing](https://openai.com/api/pricing/) for details.
286 |
287 | GPT-4o (not mini) is still a better model, but its price and latency may not justify the better results. For example, the following table shows the summary for a large issue (`https://github.com/qjebbs/vscode-plantuml/issues/255`). GPT-4o is on the left, and GPT-4o mini is on the mini. The difference in costs is staggering, but the results are not that much different.
288 |
289 | The message is that unless you have a specific reason for using GPT-3.5 Turbo, you should start a new project with the GPT-4o mini model. It will produce results comparable to GPT-4o for less than the GPT-3.5 Turbo cost.
290 |
291 | | GPT-4o summary | GPT-4o mini summary |
292 | |---------|---------|
293 | | 3,859 tokens, US $0.0303 | 4,060, tokens, US $0.0012 |
294 | |  |  |
295 |
296 | ### The importance of using a good prompt
297 |
298 | Precise instructions in the prompt are important to get good results. To illustrate what a difference a good prompt makes:
299 |
300 | 1. Select the _"gpt-3.5-turbo"_ model.
301 | 1. Select the GitHub issue `https://github.com/openai/openai-python/issues/488` from the sample list.
302 | 1. Click the _"Generate summary with gpt-3.5-turbo"_ button.
303 |
304 | We get a summary of the comments like this one.
305 |
306 |
307 |
308 |
309 | If we remove from the prompt the line _"Don't waste words. Use short, clear, complete sentences. Use active voice. Maximize detail, meaning focus on the content. Quote code snippets if they are relevant."_, we get this summary. Note how the text is more verbose and is indeed "wasting words".
310 |
311 |
312 |
313 |
314 | To remove the line, click on _"Click to configure the prompt and the model"_ at the top of the screen and remove the line from the prompt, then click on the _"Generate summary with..."_ button again. Reload the page to restore the line.
315 |
316 | Getting the prompt right is still an experimental process. It goes under the name of _prompt engineering_. These are some references to learn more about prompt engineering.
317 |
318 | - [Prompt engineering techniques (Azure)](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering)
319 | - [GPT best practices (OpenAI)](https://platform.openai.com/docs/guides/gpt-best-practices)
320 | - [Prompt engineering guide](https://www.promptingguide.ai/)
321 |
322 |
323 | ### If all we have is a hammer...
324 |
325 | Once we learn we can summarize texts with an LLM, we are tempted to use it for everything. Let's say we also want to know the number of comments on the issue. We could ask the LLM by adding it to the prompt.
326 |
327 | Click on _"Click to configure the prompt and the model"_ at the top of the screen and add the line `- Number of comments in the issue` to the prompt as shown below. Leave all other lines unchanged.
328 |
329 | ```text
330 | You are an experienced developer familiar with GitHub issues.
331 | The following text was parsed from a GitHub issue and its comments.
332 | Extract the following information from the issue and comments:
333 | - Issue: A list with the following items: title, the submitter name, the submission date and
334 | time, labels, and status (whether the issue is still open or closed).
335 | - Number of comments in the issue <-- ** ADD THIS LINE **
336 | ...remainder of the lines...
337 | ```
338 |
339 | The LLM will return _a_ number of comments, but it will usually be wrong. Select, for example, the issue `https://github.com/qjebbs/vscode-plantuml/issues/255` from the sample list. None of the models get the number of comments correctly.
340 |
341 | Why? Because **LLMs are not "executing" instructions**, they are simply [generating one token at a time](https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/).
342 |
343 | This is an important concept to keep in mind. **LLMs do not understand what the text means**. They just pick the next token based on the previous ones. They are not a replacement for code.
344 |
345 | What to do instead? If we have easy access to the information we want, we should just use it. In this case, we can get the number of comments from the GitHub API response.
346 |
347 | ```python
348 | issue, comments = get_github_data(st.session_state.issue_url)
349 | num_comments = len(comments) # <--- This is all we need
350 | ```
351 |
352 | ## What we learned in these experiments
353 |
354 | - LLMs are good at summarizing text if we use the right prompt.
355 | - Summarizing larger documents requires larger context windows or more sophisticated techniques.
356 | - Getting good results requires good prompts. Good prompts are still an experimental process.
357 | - Sometimes, we should not use an LLM. If we can easily get the information we need from the data, we should do that instead of using an LLM.
358 |
359 | ## Modifying and testing the code
360 |
361 | Use the CLI code in `cli.py` to test modifications to the code. Debugging code in a CLI is easier than in a Streamlit app. Once the code works in the CLI, adapt the Streamlit app.
362 |
363 | ## Preparing the environment
364 |
365 | This is a one-time step. If you have already done this, just activate the virtual environment with `source venv/bin/activate`.
366 |
367 | There are two steps to prepare the environment.
368 |
369 | 1. [Python environment](#python-environment)
370 | 1. [OpenAI API key](#openai-api-key)
371 |
372 | ### Python environment
373 |
374 | Run the following commands to create a virtual environment and install the required packages.
375 |
376 | ```bash
377 | python3 -m venv venv
378 | source venv/bin/activate
379 | pip install --upgrade pip
380 | pip install -r requirements.txt
381 | ```
382 |
383 | ### OpenAI API key
384 |
385 | The code uses [OpenAI GPT models](https://platform.openai.com/docs/models) to generate summaries. It's currently one of the easiest ways to get started with LLMs. While OpenAI charges for API access, it gives a US $5 credit that can go a long way in small projects that use the GPT-4o mini model. To avoid surprise bills, you can set [spending limits](https://platform.openai.com/docs/guides/production-best-practices/managing-billing-limits).
386 |
387 | If you already have an OpenAI account, create an API key [here](https://platform.openai.com/account/api-keys). If you don't have an account, create one [here](https://openai.com/product#made-for-developers).
388 |
389 | Once you have the OpenAI API key, create a `.env` file in the project root directory with the following content.
390 |
391 | ```bash
392 | OPENAI_API_KEY=
393 | ```
394 |
395 | It is safe to add the key here. It will never be committed to the repository.
396 |
397 | ## Related projects
398 |
399 | [This project](https://github.com/fau-masters-collected-works-cgarbin/gpt-all-local) lets you ask questions on a document and get answers from an LLM. It uses techniques similar to this project but with a significant difference: the LLM runs locally on your computer.
400 |
--------------------------------------------------------------------------------