├── img ├── yourassetsHere.md ├── bot.jpg ├── gemma.png ├── ibm.jpg ├── qwen.png ├── user.jpg ├── granite3.png ├── mockup001.png ├── thumbnail.png ├── final-mockup001.png ├── gemma2-2b-myGGUF.png └── videoframe_11799.png ├── requirements.txt ├── testGranite.py ├── layout.py ├── layout2OpenAI.py ├── layout3Ollama-Granite.py ├── promptLibv2.py ├── promptLibv2Qwen.py ├── usefulResources.md ├── layout4Ollama-Granite_autotest.py ├── 02.GR-Ollama-Granite_autotest.py ├── 02.GR-Ollama-Gemma2B_autotest.py ├── 02.GR-Llamafile-Gemma2B_autotest.py ├── README.md └── gradio-rbyf_chat.py /img/yourassetsHere.md: -------------------------------------------------------------------------------- 1 | here the images 2 | -------------------------------------------------------------------------------- /img/bot.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/bot.jpg -------------------------------------------------------------------------------- /img/gemma.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/gemma.png -------------------------------------------------------------------------------- /img/ibm.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/ibm.jpg -------------------------------------------------------------------------------- /img/qwen.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/qwen.png -------------------------------------------------------------------------------- /img/user.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/user.jpg -------------------------------------------------------------------------------- /img/granite3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/granite3.png -------------------------------------------------------------------------------- /img/mockup001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/mockup001.png -------------------------------------------------------------------------------- /img/thumbnail.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/thumbnail.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/requirements.txt -------------------------------------------------------------------------------- /img/final-mockup001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/final-mockup001.png -------------------------------------------------------------------------------- /img/gemma2-2b-myGGUF.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/gemma2-2b-myGGUF.png -------------------------------------------------------------------------------- /img/videoframe_11799.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/GradioRBYF/main/img/videoframe_11799.png -------------------------------------------------------------------------------- /testGranite.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: ollama-granite3dense 3 | # this wil run granite3-2B-instruct through ollamaAPI 4 | """ 5 | > ollama show granite3-dense 6 | Model 7 | architecture granite 8 | parameters 2.6B 9 | context length 4096 10 | embedding length 2048 11 | quantization Q4_K_M 12 | 13 | License 14 | Apache License 15 | Version 2.0, January 2004 16 | """ 17 | import sys 18 | from time import sleep 19 | import warnings 20 | warnings.filterwarnings(action='ignore') 21 | import datetime 22 | from promptLibv2 import countTokens, writehistory, createCatalog 23 | from promptLibv2 import genRANstring, createStats 24 | import argparse 25 | from openai import OpenAI 26 | 27 | #Add GPU argument in the parser 28 | parser = argparse.ArgumentParser() 29 | parser.add_argument("-g", "--gpu", action="store_true") 30 | 31 | args = parser.parse_args() 32 | GPU = args.gpu 33 | if GPU: 34 | ngpu_layers = 2 35 | print(f'Selected GPU: offloading {ngpu_layers} layers...') 36 | else: 37 | ngpu_layers = 0 #out of 28 38 | print('Loading Model on CPU only......') 39 | 40 | stops = ['<|end_of_text|>'] 41 | tasks = createCatalog() 42 | modelname = 'granite3-dense:2b' 43 | # create THE LOG FILE 44 | logfile = f'logs/{modelname}_CHAT_OLLAMA_{genRANstring(5)}_log.txt' 45 | logfilename = logfile 46 | #Write in the history the first 2 sessions 47 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.') 48 | writehistory(logfilename,f'💻: How can I assist you today in writing?') 49 | 50 | print("\033[95;3;6m") 51 | print("1. Waiting 10 seconds for the API to load...") 52 | # using OpenAI library to connect to Ollama API endpoint 53 | client = OpenAI(base_url='http://localhost:11434/v1/', api_key='ollama') 54 | print(f"2. Model {modelname} loaded with OLLAMA...") 55 | print("\033[0m") #reset all 56 | history = [] 57 | print("\033[92;1m") 58 | print(f'📝Logfile: {logfilename}') 59 | 60 | ##################### ALIGNMENT FIRST GENERATION ############################################## 61 | question = 'Explain the plot of Cinderella in a sentence.' 62 | test = [ 63 | {"role": "user", "content": question} 64 | ] 65 | 66 | print('Question:', question) 67 | start = datetime.datetime.now() 68 | print("💻 > ", end="", flush=True) 69 | full_response = "" 70 | completion = client.chat.completions.create( 71 | messages=test, 72 | model='granite3-dense', 73 | temperature=0.25, 74 | frequency_penalty = 1.178, 75 | stop=stops, 76 | max_tokens=1500, 77 | stream=True 78 | ) 79 | for chunk in completion: 80 | try: 81 | if chunk.choices[0].delta.content: 82 | print(chunk.choices[0].delta.content, end="", flush=True) 83 | full_response += chunk.choices[0].delta.content 84 | except: 85 | pass 86 | delta = datetime.datetime.now() - start 87 | output = full_response 88 | print('') 89 | print("\033[91;1m") 90 | rating = 'PUT IT LATER'#input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 91 | print("\033[92;1m") 92 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation') 93 | print(stats) 94 | writehistory(logfilename,f'''👨‍💻 . {question} 95 | 💻 > {output} 96 | {stats} 97 | ''') -------------------------------------------------------------------------------- /layout.py: -------------------------------------------------------------------------------- 1 | import gradio as gr 2 | import datetime 3 | from promptLibv2Qwen import countTokens, writehistory, createCatalog 4 | from promptLibv2Qwen import genRANstring, createStats 5 | from gradio import ChatMessage 6 | 7 | 8 | ## PREPARING FINAL DATASET 9 | 10 | pd_id = [] 11 | pd_task = [] 12 | pd_vote = [] 13 | pd_remarks = [] 14 | test_progress = 0 15 | history = [] 16 | tasks = createCatalog() 17 | # fizing issue on dipsplaying avatars 18 | # https://www.gradio.app/guides/custom-CSS-and-JS 19 | # https://github.com/gradio-app/gradio/issues/9702 20 | custom_css = """ 21 | 22 | .message-row img { 23 | margin: 0px !important; 24 | } 25 | 26 | .avatar-container img { 27 | padding: 0px !important; 28 | } 29 | """ 30 | 31 | def generate_response(history): 32 | history.append( 33 | ChatMessage(role="user", 34 | content="Hi, my name is Fabio, a Medium writer. Who are you?") 35 | ) 36 | history.append( 37 | ChatMessage(role="assistant", 38 | content="Hi, I am your local GPT. How can I help you?") 39 | ) 40 | return history 41 | 42 | history = generate_response(history) 43 | with gr.Blocks(theme=gr.themes.Glass(), css=custom_css) as demo: 44 | #TITLE SECTION 45 | with gr.Row(variant='compact'): 46 | with gr.Column(scale=1): 47 | gr.Image(value='img/qwen.png', 48 | show_label = False, 49 | show_download_button = False, container = False) 50 | with gr.Column(scale=4): 51 | gr.HTML("
" 52 | + "

Revised Benchmark with You as a Feedback!

" 53 | + "

💎 Qwen2.5-0.5B-it - 8K context window

") 54 | gr.Markdown("""*Run a prompt catalogue with 11 tasks* 55 | to validate the performances of a Small Langage Model
56 | At the end of every generation the process will wait for the Feedback by the user
57 | ### Fixed tuning Parameters: 58 | ``` 59 | temperature = 0.25 60 | repeat_penalty = 1.178 61 | max_new_tokens = 900 62 | 63 | ``` 64 | """) 65 | with gr.Row(variant='compact'): # Progress status 66 | with gr.Column(scale=1): 67 | btn_test = gr.Button(value='Start AutoTest', variant='huggingface') 68 | act_task = gr.Text('', placeholder="running task..",show_label=False) 69 | with gr.Column(scale=4): 70 | actual_progress = gr.Slider(0, len(tasks), 71 | value=test_progress, label="Prompt Catalogue Progress", 72 | #info="Run the most used NLP tasks with a Language Model", 73 | interactive=False) 74 | with gr.Row(variant='compact'): # ChatBot Area 75 | gr.Chatbot(history,type='messages',avatar_images=("./img/user.jpg","./img/bot.jpg")) # 76 | with gr.Row(variant='compact'): # Feedback from the user 77 | with gr.Column(scale=1): 78 | gr.Markdown("""#### Respect this format: 79 | 80 | Put a number from 0 to 5, a space, and then your comments
81 | ``` 82 | 5 very good one 83 | ``` 84 | """) 85 | 86 | with gr.Column(scale=4): 87 | txt_fbck = gr.Text('', placeholder="Your evaluation feedback..", 88 | label='User Feedback',lines=2) 89 | btn_fbck = gr.Button(value='submit feedback', variant='huggingface') 90 | 91 | 92 | 93 | if __name__ == "__main__": 94 | demo.launch(inbrowser=True) -------------------------------------------------------------------------------- /layout2OpenAI.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: ollama-granite3dense 3 | # this wil run granite3-2B-instruct through ollamaAPI 4 | # sources: https://github.com/fabiomatricardi/-LLM-Studies/raw/main/00.consoleAPI_stream.py 5 | # https://github.com/fabiomatricardi/-LLM-Studies/blob/main/01.st-API-openAI_stream.py 6 | # OLLAMA MODEL CARD: https://ollama.com/library/granite3-dense/blobs/604785e698e9 7 | # OPenAI API for Ollama: https://github.com/ollama/ollama/blob/main/docs/openai.md 8 | # https://github.com/ibm-granite/granite-3.0-language-models 9 | # https://www.ibm.com/granite/docs/ 10 | # HUGGINFACE: https://huggingface.co/ibm-granite/granite-3.0-2b-instruct 11 | ##################################################################################################### 12 | 13 | """ 14 | > ollama show granite3-dense 15 | Model 16 | architecture granite 17 | parameters 2.6B 18 | context length 4096 19 | embedding length 2048 20 | quantization Q4_K_M 21 | 22 | License 23 | Apache License 24 | Version 2.0, January 2004 25 | """ 26 | import gradio as gr 27 | import datetime 28 | from promptLibv2Qwen import countTokens, writehistory, createCatalog 29 | from promptLibv2Qwen import genRANstring, createStats 30 | from gradio import ChatMessage 31 | from openai import OpenAI 32 | 33 | ## PREPARING FINAL DATASET 34 | 35 | pd_id = [] 36 | pd_task = [] 37 | pd_vote = [] 38 | pd_remarks = [] 39 | test_progress = 0 40 | history = [] 41 | tasks = createCatalog() 42 | modelname = 'granite3-dense-2b' 43 | stops = ['<|end_of_text|>'] 44 | #load client with OpenAI API toward Ollama Endpoint 45 | client = OpenAI(base_url='http://localhost:11434/v1/', api_key='ollama') 46 | print(f"2. Model {modelname} loaded with OLLAMA...") 47 | # fizing issue on dipsplaying avatars 48 | # https://www.gradio.app/guides/custom-CSS-and-JS 49 | # https://github.com/gradio-app/gradio/issues/9702 50 | custom_css = """ 51 | 52 | .message-row img { 53 | margin: 0px !important; 54 | } 55 | 56 | .avatar-container img { 57 | padding: 0px !important; 58 | } 59 | """ 60 | 61 | def generate_response(history): 62 | history.append( 63 | ChatMessage(role="user", 64 | content="Hi, my name is Fabio, a Medium writer. Who are you?") 65 | ) 66 | history.append( 67 | ChatMessage(role="assistant", 68 | content="Hi, I am your local GPT. How can I help you?") 69 | ) 70 | return history 71 | 72 | history = generate_response(history) 73 | with gr.Blocks(theme=gr.themes.Glass(), css=custom_css) as demo: 74 | #TITLE SECTION 75 | with gr.Row(variant='compact'): 76 | with gr.Column(scale=1): 77 | gr.Image(value='img/qwen.png', 78 | show_label = False, 79 | show_download_button = False, container = False) 80 | with gr.Column(scale=4): 81 | gr.HTML("
" 82 | + "

Revised Benchmark with You as a Feedback!

" 83 | + "

💎 Qwen2.5-0.5B-it - 8K context window

") 84 | gr.Markdown("""*Run a prompt catalogue with 11 tasks* 85 | to validate the performances of a Small Langage Model
86 | At the end of every generation the process will wait for the Feedback by the user
87 | ### Fixed tuning Parameters: 88 | ``` 89 | temperature = 0.25 90 | repeat_penalty = 1.178 91 | max_new_tokens = 900 92 | 93 | ``` 94 | """) 95 | with gr.Row(variant='compact'): # Progress status 96 | with gr.Column(scale=1): 97 | btn_test = gr.Button(value='Start AutoTest', variant='huggingface') 98 | act_task = gr.Text('', placeholder="running task..",show_label=False) 99 | with gr.Column(scale=4): 100 | actual_progress = gr.Slider(0, len(tasks), 101 | value=test_progress, label="Prompt Catalogue Progress", 102 | #info="Run the most used NLP tasks with a Language Model", 103 | interactive=False) 104 | with gr.Row(variant='compact'): # KpI 105 | # with gr.Column(): 106 | txt_ttft = gr.Text('', placeholder="seconds..", 107 | label='Time to first token') 108 | # with gr.Column(): 109 | txt_gentime = gr.Text('', placeholder="TimeDelta..", 110 | label='Generation Time') 111 | # with gr.Column(): 112 | txt_speed = gr.Text('', placeholder="t/s..", 113 | label='Generation Speed') 114 | # with gr.Column(): 115 | txt_TOTtkns = gr.Text('', placeholder="tokens..", 116 | label='Total num of Tokens') 117 | 118 | with gr.Row(variant='compact'): # ChatBot Area 119 | myBOT =gr.Chatbot(history,type='messages',avatar_images=("./img/user.jpg","./img/bot.jpg")) # 120 | with gr.Row(variant='compact'): #Temporary Area 121 | temp_input = gr.Text('what is Artificial Intelligence?', 122 | label='USER',lines=1) 123 | temp_ouput = gr.Text('', placeholder="Temporary Output", 124 | label='BOT',lines=3) 125 | with gr.Row(variant='compact'): # Feedback from the user 126 | with gr.Column(scale=1): 127 | gr.Markdown("""#### Respect this format: 128 | 129 | Put a number from 0 to 5, a space, and then your comments
130 | ``` 131 | 5 very good one 132 | ``` 133 | """) 134 | 135 | with gr.Column(scale=4): 136 | txt_fbck = gr.Text('', placeholder="Your evaluation feedback..", 137 | label='User Feedback',lines=2) 138 | btn_fbck = gr.Button(value='submit feedback', variant='huggingface') 139 | def update_history(history,a,b): 140 | history.append( 141 | ChatMessage(role="user", 142 | content=a) 143 | ) 144 | history.append( 145 | ChatMessage(role="assistant", 146 | content=b) 147 | ) 148 | return history 149 | 150 | def startInference(a): 151 | prompt = [ 152 | {"role": "user", "content": a} 153 | ] 154 | promptTKNS = countTokens(a) 155 | generation = '' 156 | fisrtround=0 157 | start = datetime.datetime.now() 158 | completion = client.chat.completions.create( 159 | messages=prompt, 160 | model='granite3-dense', 161 | temperature=0.25, 162 | frequency_penalty = 1.178, 163 | stop=stops, 164 | max_tokens=1500, 165 | stream=True 166 | ) 167 | for chunk in completion: 168 | try: 169 | if chunk.choices[0].delta.content: 170 | if fisrtround==0: 171 | generation += chunk.choices[0].delta.content 172 | ttftoken = datetime.datetime.now() - start 173 | secondsTTFT = ttftoken.total_seconds() 174 | ttFT = f"TimeToFristToken: {secondsTTFT:.2f} sec" 175 | fisrtround = 1 176 | else: 177 | generation += chunk.choices[0].delta.content 178 | except: 179 | pass 180 | answrTKN = countTokens(generation) 181 | totTKN = promptTKNS + answrTKN 182 | total_tokens = f"Total Tkns: {totTKN}" 183 | delta = datetime.datetime.now() - start 184 | seconds = delta.total_seconds() 185 | speed = totTKN/seconds 186 | speed_tokens = f"Gen Speed: {speed:.2f} t/s" 187 | yield generation, delta, speed_tokens, ttFT,total_tokens 188 | 189 | btn_test.click(startInference, inputs=[temp_input], 190 | outputs=[temp_ouput,txt_gentime,txt_speed,txt_ttft,txt_TOTtkns]).then( 191 | update_history,[myBOT,temp_input,temp_ouput],myBOT 192 | ) 193 | 194 | if __name__ == "__main__": 195 | demo.launch(inbrowser=True) -------------------------------------------------------------------------------- /layout3Ollama-Granite.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: ollama-granite3dense 3 | # this wil run granite3-2B-instruct through ollamaAPI 4 | # sources: https://github.com/fabiomatricardi/-LLM-Studies/raw/main/00.consoleAPI_stream.py 5 | # https://github.com/fabiomatricardi/-LLM-Studies/blob/main/01.st-API-openAI_stream.py 6 | # OLLAMA MODEL CARD: https://ollama.com/library/granite3-dense/blobs/604785e698e9 7 | # OPenAI API for Ollama: https://github.com/ollama/ollama/blob/main/docs/openai.md 8 | # https://github.com/ibm-granite/granite-3.0-language-models 9 | # https://www.ibm.com/granite/docs/ 10 | # HUGGINFACE: https://huggingface.co/ibm-granite/granite-3.0-2b-instruct 11 | ##################################################################################################### 12 | 13 | """ 14 | > ollama show granite3-dense 15 | Model 16 | architecture granite 17 | parameters 2.6B 18 | context length 4096 19 | embedding length 2048 20 | quantization Q4_K_M 21 | 22 | License 23 | Apache License 24 | Version 2.0, January 2004 25 | """ 26 | import gradio as gr 27 | import datetime 28 | from promptLibv2Qwen import countTokens, writehistory, createCatalog 29 | from promptLibv2Qwen import genRANstring, createStats 30 | from gradio import ChatMessage 31 | from openai import OpenAI 32 | 33 | ## PREPARING FINAL DATASET 34 | 35 | pd_id = [] 36 | pd_task = [] 37 | pd_vote = [] 38 | pd_remarks = [] 39 | test_progress = 0 40 | history = [] 41 | tasks = createCatalog() 42 | modelname = 'granite3-dense-2b' 43 | stops = ['<|end_of_text|>'] 44 | #load client with OpenAI API toward Ollama Endpoint 45 | client = OpenAI(base_url='http://localhost:11434/v1/', api_key='ollama') 46 | print(f"2. Model {modelname} loaded with OLLAMA...") 47 | # fizing issue on dipsplaying avatars 48 | # https://www.gradio.app/guides/custom-CSS-and-JS 49 | # https://github.com/gradio-app/gradio/issues/9702 50 | custom_css = """ 51 | 52 | .message-row img { 53 | margin: 0px !important; 54 | } 55 | 56 | .avatar-container img { 57 | padding: 0px !important; 58 | } 59 | """ 60 | 61 | def generate_response(history): 62 | history.append( 63 | ChatMessage(role="user", 64 | content="Hi, my name is Fabio, a Medium writer. Who are you?") 65 | ) 66 | history.append( 67 | ChatMessage(role="assistant", 68 | content="Hi, I am your local GPT. How can I help you?") 69 | ) 70 | return history 71 | 72 | history = generate_response(history) 73 | with gr.Blocks(theme=gr.themes.Glass(), css=custom_css) as demo: 74 | #TITLE SECTION 75 | with gr.Row(variant='compact'): 76 | with gr.Column(scale=1): 77 | gr.Image(value='img/granite3.png', 78 | show_label = False, 79 | show_download_button = False, container = False) 80 | with gr.Column(scale=4): 81 | gr.HTML("
" 82 | + "

Revised Benchmark with You as a Feedback!

" 83 | + "

💎 granite3-dense-2B - 4K context window with Ollama engine

") 84 | gr.Markdown("""*Run a prompt catalogue with 11 tasks* 85 | to validate the performances of a Small Langage Model
86 | At the end of every generation the process will wait for the Feedback by the user
87 | ### Fixed tuning Parameters: 88 | ``` 89 | temperature = 0.25 90 | repeat_penalty = 1.178 91 | max_new_tokens = 900 92 | 93 | ``` 94 | """) 95 | with gr.Row(variant='compact'): # Progress status 96 | with gr.Column(scale=1): 97 | btn_test = gr.Button(value='Start AutoTest', variant='huggingface') 98 | act_task = gr.Text('', placeholder="running task..",show_label=False) 99 | with gr.Column(scale=4): 100 | actual_progress = gr.Slider(0, len(tasks), 101 | value=test_progress, label="Prompt Catalogue Progress", 102 | #info="Run the most used NLP tasks with a Language Model", 103 | interactive=False) 104 | with gr.Row(variant='compact'): # KpI 105 | # with gr.Column(): 106 | txt_ttft = gr.Text('', placeholder="seconds..", 107 | label='Time to first token') 108 | # with gr.Column(): 109 | txt_gentime = gr.Text('', placeholder="TimeDelta..", 110 | label='Generation Time') 111 | # with gr.Column(): 112 | txt_speed = gr.Text('', placeholder="t/s..", 113 | label='Generation Speed') 114 | # with gr.Column(): 115 | txt_TOTtkns = gr.Text('', placeholder="tokens..", 116 | label='Total num of Tokens') 117 | 118 | with gr.Row(variant='compact'): # ChatBot Area 119 | myBOT =gr.Chatbot(history,type='messages',avatar_images=("./img/user.jpg","./img/bot.jpg")) # 120 | 121 | with gr.Row(variant='compact'): #Temporary Area 122 | temp_input = gr.Text('what is Artificial Intelligence?', 123 | label='ACTUAL TASK PROMPT',lines=2) 124 | 125 | with gr.Row(variant='compact'): # Feedback from the user 126 | with gr.Column(scale=1): 127 | gr.Markdown("""#### Respect this format: 128 | 129 | Put a number from 0 to 5, a space, and then your comments
130 | ``` 131 | 5 very good one 132 | ``` 133 | """) 134 | 135 | with gr.Column(scale=4): 136 | txt_fbck = gr.Text('', placeholder="Your evaluation feedback..", 137 | label='User Feedback',lines=2) 138 | btn_fbck = gr.Button(value='submit feedback', variant='huggingface') 139 | 140 | def update_history(history,a): 141 | history.append( 142 | ChatMessage(role="user", 143 | content=a) 144 | ) 145 | return history 146 | 147 | def startInference(a): 148 | print(a[-1]['content']) 149 | prompt = [ 150 | {"role": "user", "content": a[-1]['content']} 151 | ] 152 | promptTKNS = countTokens(a[-1]['content']) 153 | a.append({"role":"assistant","content":''}) 154 | generation = '' 155 | fisrtround=0 156 | start = datetime.datetime.now() 157 | completion = client.chat.completions.create( 158 | messages=prompt, 159 | model='granite3-dense', 160 | temperature=0.25, 161 | frequency_penalty = 1.178, 162 | stop=stops, 163 | max_tokens=900, 164 | stream=True 165 | ) 166 | for chunk in completion: 167 | try: 168 | if chunk.choices[0].delta.content: 169 | if fisrtround==0: 170 | a[-1]['content'] += chunk.choices[0].delta.content 171 | ttftoken = datetime.datetime.now() - start 172 | secondsTTFT = ttftoken.total_seconds() 173 | ttFT = f"TimeToFristToken: {secondsTTFT:.2f} sec" 174 | fisrtround = 1 175 | else: 176 | a[-1]['content'] += chunk.choices[0].delta.content 177 | except: 178 | pass 179 | answrTKN = countTokens(a[-1]['content']) 180 | totTKN = promptTKNS + answrTKN 181 | total_tokens = f"Total Tkns: {totTKN}" 182 | delta = datetime.datetime.now() - start 183 | seconds = delta.total_seconds() 184 | speed = totTKN/seconds 185 | speed_tokens = f"Gen Speed: {speed:.2f} t/s" 186 | yield a, delta, speed_tokens, ttFT,total_tokens 187 | 188 | btn_test.click(update_history, inputs=[myBOT,temp_input], 189 | outputs=[myBOT]).then(startInference,[myBOT],[myBOT,txt_gentime,txt_speed,txt_ttft,txt_TOTtkns]) 190 | 191 | def startloop(): 192 | #rasie flag to wait 193 | #start the loop 194 | #display task 195 | #append chatbot prompt 196 | #start the generation 197 | pass 198 | 199 | 200 | if __name__ == "__main__": 201 | demo.launch(inbrowser=True) -------------------------------------------------------------------------------- /promptLibv2.py: -------------------------------------------------------------------------------- 1 | """ 2 | V2 changes 3 | added Time To First Token in the statistics ttft 4 | added some more prompts in the catalog 5 | - say 'I am ready' 6 | - modified for Llama3.2-1b Write in a list the three main key points - format output 7 | 8 | 20240929 FAMA 9 | """ 10 | 11 | import random 12 | import string 13 | import tiktoken 14 | 15 | def createCatalog(): 16 | """ 17 | Create a dictionary with 18 | 'task' : description of the NLP task in the prompt 19 | 'prompt' : the instruction prompt for the LLM 20 | """ 21 | context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 22 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 23 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 24 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 25 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 26 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 27 | It could soon be the case that the so-called developing countries will be showing the West the way forward.""" 28 | catalog = [] 29 | prmpt_tasks = ["introduction", 30 | "explain in one sentence", 31 | "explain in three paragraphs", 32 | "say 'I am ready'", 33 | "summarize", 34 | "Summarize in two sentences", 35 | "Write in a list the three main key points - format output", 36 | "Table of Contents", 37 | "RAG", 38 | "Truthful RAG", 39 | "write content from a reference", 40 | "extract 5 topics", 41 | "Creativity: 1000 words SF story", 42 | "Reflection prompt" 43 | ] 44 | prmpt_coll = [ 45 | """Hi there I am Fabio, a Medium writer. who are you?""", 46 | """explain in one sentence what is science.\n""", 47 | """explain in three paragraphs what is artificial intelligence.\n""", 48 | f"""read the following text and when you are done say "I am ready". 49 | 50 | [text] 51 | {context} 52 | [end of text] 53 | 54 | """, 55 | f"""summarize the following text: 56 | 57 | [text] 58 | {context} 59 | [end of text] 60 | 61 | """, 62 | f"""Summarize in two sentences the following text 63 | 64 | [text] 65 | {context} 66 | [end of text] 67 | 68 | """, 69 | f"""1. extract the three key points from the provided text 70 | 2. format the output as a python list ["point 1","point 2", "point 3"] 71 | 72 | [text] 73 | {context} 74 | [end of text] 75 | 76 | python list: 77 | 78 | """, 79 | f"""A "table of content" is an ordered list of the topic contained in the text: write the "Table of Contents" of the following text. 80 | 81 | [text] 82 | {context} 83 | [end of text] 84 | 85 | """, 86 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 87 | 88 | question: what China achieved with it's long-term planning? 89 | 90 | [context] 91 | {context} 92 | [end of context] 93 | 94 | answer: 95 | """, 96 | f"""Reply to the question only using the provided context. If the answer is not contained in the provided context say "unanswerable". 97 | 98 | question: who is Anne Frank? 99 | 100 | [context] 101 | {context} 102 | [end of context] 103 | 104 | Remember: if you cannot answer based on the provided context, say "unanswerable" 105 | 106 | answer: 107 | """, 108 | 109 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model". 110 | 111 | [text] 112 | {context} 113 | [end of text] 114 | 115 | """, 116 | f"""write the five most important topics from the following text: 117 | 118 | [text] 119 | {context} 120 | [end of text] 121 | 122 | """, 123 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words. 124 | 125 | """, 126 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure: 127 | 1. Begin with a section. 128 | 2. Inside the thinking section: 129 | a. Briefly analyze the question and outline your approach. 130 | b. Present a clear plan of steps to solve the problem. 131 | c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps. 132 | 3. Include a section for each idea where you: 133 | a. Review your reasoning. 134 | b. Check for potential errors or oversights. 135 | c. Confirm or adjust your conclusion if necessary. 136 | 4. Be sure to close all reflection sections. 137 | 5. Close the thinking section with . 138 | 6. Provide your final answer in an section. 139 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process. 140 | Remember: Both and MUST be tags and must be closed at their conclusion 141 | Make sure all are on separate lines with no other text. Do not include other text on a line containing a tag. 142 | 143 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 144 | 145 | """ 146 | ] 147 | for i in range(0,len(prmpt_tasks)): 148 | catalog.append({'task':prmpt_tasks[i], 149 | 'prompt': prmpt_coll[i]}) 150 | return catalog 151 | 152 | def countTokens(text): 153 | """ 154 | Use tiktoken to count the number of tokens 155 | text -> str input 156 | Return -> int number of tokens counted 157 | """ 158 | encoding = tiktoken.get_encoding("r50k_base") #context_count = len(encoding.encode(yourtext)) 159 | numoftokens = len(encoding.encode(text)) 160 | return numoftokens 161 | 162 | def writehistory(filename,text): 163 | """ 164 | save a string into a logfile with python file operations 165 | filename -> str pathfile/filename 166 | text -> str, the text to be written in the file 167 | """ 168 | with open(f'{filename}', 'a', encoding='utf-8') as f: 169 | f.write(text) 170 | f.write('\n') 171 | f.close() 172 | 173 | def genRANstring(n): 174 | """ 175 | n = int number of char to randomize 176 | """ 177 | N = n 178 | res = ''.join(random.choices(string.ascii_uppercase + 179 | string.digits, k=N)) 180 | return res 181 | 182 | def createStats(delta,question,output,rating,logfilename,task,ttft): 183 | """ 184 | Takes in all the generation main info and return KPIs 185 | delta -> datetime.now() delta 186 | question -> str the user input to the LLM 187 | output -> str the generation from the LLM 188 | rating -> str human eval feedback rating 189 | logfilename -> str filepath/filename 190 | task -> str description of the NLP task describing the prompt 191 | ttft -> datetime.now() delta time to first token 192 | """ 193 | totalseconds = delta.total_seconds() 194 | prompttokens = countTokens(question) 195 | assistanttokens = countTokens(output) 196 | totaltokens = prompttokens + assistanttokens 197 | speed = totaltokens/totalseconds 198 | genspeed = assistanttokens/totalseconds 199 | ttofseconds = ttft.total_seconds() 200 | stats = f'''--- 201 | Prompt Tokens: {prompttokens} 202 | Output Tokens: {assistanttokens} 203 | TOTAL Tokens: {totaltokens} 204 | >>>⏱️ Time to First Token: {ttofseconds} seconds 205 | >>>⏱️ Inference time: {delta} 206 | >>>🧮 Inference speed: {speed:.3f} t/s 207 | >>>🏍️ Generation speed: {genspeed:.3f} t/s 208 | >>>📝 Logfile: {logfilename} 209 | >>>💚 User rating: {rating} 210 | >>>✅ NLP TAKS: {task} 211 | ''' 212 | return stats -------------------------------------------------------------------------------- /promptLibv2Qwen.py: -------------------------------------------------------------------------------- 1 | """ 2 | V2 changes 3 | added Time To First Token in the statistics ttft 4 | added some more prompts in the catalog 5 | - say 'I am ready' 6 | - modified for Llama3.2-1b Write in a list the three main key points - format output 7 | 8 | 20240929 FAMA 9 | """ 10 | 11 | import random 12 | import string 13 | import tiktoken 14 | 15 | def createCatalog(): 16 | """ 17 | Create a dictionary with 18 | 'task' : description of the NLP task in the prompt 19 | 'prompt' : the instruction prompt for the LLM 20 | """ 21 | context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 22 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 23 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 24 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 25 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 26 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 27 | It could soon be the case that the so-called developing countries will be showing the West the way forward.""" 28 | catalog = [] 29 | prmpt_tasks = ["introduction", 30 | "explain in one sentence", 31 | "explain in three paragraphs", 32 | "say 'I am ready'", 33 | "summarize", 34 | "Summarize in two sentences", 35 | "Write in a list the three main key points - format output", 36 | "Table of Contents", 37 | "RAG", 38 | "Truthful RAG", 39 | "write content from a reference", 40 | "extract 5 topics", 41 | "Creativity: 1000 words SF story", 42 | "Reflection prompt" 43 | ] 44 | prmpt_coll = [ 45 | """Hi there I am Fabio, a Medium writer. who are you?""", 46 | """explain in one sentence what is science.\n""", 47 | """explain only in three paragraphs what is artificial intelligence.\n""", 48 | f"""read the following text and when you are done say "I am ready". 49 | 50 | [text] 51 | {context} 52 | [end of text] 53 | 54 | """, 55 | f"""summarize the following text: 56 | [text] 57 | {context} 58 | [end of text] 59 | write the summary using the information provided. 60 | 61 | Summary: 62 | """, 63 | f"""Write a two-sentences summary of the following text: 64 | [text] 65 | {context} 66 | [end of text] 67 | Remember: write only the summary with the text provided. 68 | """, 69 | f"""1. extract the three key points from the provided text 70 | 2. format the output as a python list ["point 1","point 2", "point 3"] 71 | [text] 72 | {context} 73 | [end of text] 74 | Return only the python list. 75 | """, 76 | f"""A "table of content" provides an organized structure to the text, allowing readers quick access to key points. Write the "table of content" of the following text. 77 | [text] 78 | {context} 79 | [end of text] 80 | 81 | "table of content": 82 | """, 83 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 84 | 85 | question: what China achieved with it's long-term planning? 86 | 87 | [context] 88 | {context} 89 | [end of context] 90 | 91 | answer: 92 | """, 93 | f"""Reply to the question only using the provided context. If the answer is not contained in the provided context say "unanswerable". 94 | 95 | question: who is Anne Frank? 96 | 97 | [context] 98 | {context} 99 | [end of context] 100 | 101 | Remember: if you cannot answer based on the provided context, say "unanswerable" 102 | 103 | answer: 104 | """, 105 | 106 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model". 107 | 108 | [text] 109 | {context} 110 | [end of text] 111 | Remember: use the information provided and write exactly 5 paragraphs. 112 | """, 113 | f"""List five most important topics from the following text: 114 | [text] 115 | {context} 116 | [end of text] 117 | """, 118 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words. 119 | 120 | """, 121 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure: 122 | 1. Begin with a section. 123 | 2. Inside the thinking section: 124 | a. Briefly analyze the question and outline your approach. 125 | b. Present a clear plan of steps to solve the problem. 126 | c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps. 127 | 3. Include a section for each idea where you: 128 | a. Review your reasoning. 129 | b. Check for potential errors or oversights. 130 | c. Confirm or adjust your conclusion if necessary. 131 | 4. Be sure to close all reflection sections. 132 | 5. Close the thinking section with . 133 | 6. Provide your final answer in an section. 134 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process. 135 | Remember: Both and MUST be tags and must be closed at their conclusion 136 | Make sure all are on separate lines with no other text. Do not include other text on a line containing a tag. 137 | 138 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 139 | 140 | """ 141 | ] 142 | for i in range(0,len(prmpt_tasks)): 143 | catalog.append({'task':prmpt_tasks[i], 144 | 'prompt': prmpt_coll[i]}) 145 | return catalog 146 | 147 | def countTokens(text): 148 | """ 149 | Use tiktoken to count the number of tokens 150 | text -> str input 151 | Return -> int number of tokens counted 152 | """ 153 | encoding = tiktoken.get_encoding("cl100k_base") #context_count = len(encoding.encode(yourtext)) 154 | numoftokens = len(encoding.encode(text)) 155 | return numoftokens 156 | 157 | def writehistory(filename,text): 158 | """ 159 | save a string into a logfile with python file operations 160 | filename -> str pathfile/filename 161 | text -> str, the text to be written in the file 162 | """ 163 | with open(f'{filename}', 'a', encoding='utf-8') as f: 164 | f.write(text) 165 | f.write('\n') 166 | f.close() 167 | 168 | def genRANstring(n): 169 | """ 170 | n = int number of char to randomize 171 | """ 172 | N = n 173 | res = ''.join(random.choices(string.ascii_uppercase + 174 | string.digits, k=N)) 175 | return res 176 | 177 | def createStats(delta,question,output,rating,logfilename,task,ttft): 178 | """ 179 | Takes in all the generation main info and return KPIs 180 | delta -> datetime.now() delta 181 | question -> str the user input to the LLM 182 | output -> str the generation from the LLM 183 | rating -> str human eval feedback rating 184 | logfilename -> str filepath/filename 185 | task -> str description of the NLP task describing the prompt 186 | ttft -> datetime.now() delta time to first token 187 | """ 188 | totalseconds = delta.total_seconds() 189 | prompttokens = countTokens(question) 190 | assistanttokens = countTokens(output) 191 | totaltokens = prompttokens + assistanttokens 192 | speed = totaltokens/totalseconds 193 | genspeed = assistanttokens/totalseconds 194 | ttofseconds = ttft.total_seconds() 195 | stats = f'''--- 196 | Prompt Tokens: {prompttokens} 197 | Output Tokens: {assistanttokens} 198 | TOTAL Tokens: {totaltokens} 199 | >>>⏱️ Time to First Token: {ttofseconds} seconds 200 | >>>⏱️ Inference time: {delta} 201 | >>>🧮 Inference speed: {speed:.3f} t/s 202 | >>>🏍️ Generation speed: {genspeed:.3f} t/s 203 | >>>📝 Logfile: {logfilename} 204 | >>>💚 User rating: {rating} 205 | >>>✅ NLP TAKS: {task} 206 | ''' 207 | return stats -------------------------------------------------------------------------------- /usefulResources.md: -------------------------------------------------------------------------------- 1 | # About Gradio 2 | Issues on avatars in the chatbot 3 | - [https://www.gradio.app/guides/custom-CSS-and-JS](https://www.gradio.app/guides/custom-CSS-and-JS) 4 | - [https://github.com/gradio-app/gradio/issues/9702](https://github.com/gradio-app/gradio/issues/9702) 5 | - [Gradio Theming](https://www.gradio.app/guides/theming-guide) 6 | ``` 7 | gr.themes.Base() - the "base" theme sets the primary color to blue but otherwise has minimal styling, making it particularly useful as a base for creating new, custom themes. 8 | gr.themes.Default() - the "default" Gradio 5 theme, with a vibrant orange primary color and gray secondary color. 9 | gr.themes.Origin() - the "origin" theme is most similar to Gradio 4 styling. Colors, especially in light mode, are more subdued than the Gradio 5 default theme. 10 | gr.themes.Citrus() - the "citrus" theme uses a yellow primary color, highlights form elements that are in focus, and includes fun 3D effects when buttons are clicked. 11 | gr.themes.Monochrome() - the "monochrome" theme uses a black primary and white secondary color, and uses serif-style fonts, giving the appearance of a black-and-white newspaper. 12 | gr.themes.Soft() - the "soft" theme uses a purpose primary color and white secondary color. It also increases the border radii and around buttons and form elements and highlights labels. 13 | gr.themes.Glass() - the "glass" theme has a blue primary color and a transclucent gray secondary color. The theme also uses vertical gradients to create a glassy effect. 14 | gr.themes.Ocean() - the "ocean" theme has a blue-green primary color and gray secondary color. The theme also uses horizontal gradients, especially for buttons and some form elements. 15 | ``` 16 | - [Gradio Theme Gallery](https://huggingface.co/spaces/gradio/theme-gallery) 17 | - [GUIDE How to Create a Chatbot with Gradio](https://www.gradio.app/guides/creating-a-chatbot-fast) 18 | 19 | ### Newspaper3k: Article scraping & curation 20 | - [https://github.com/codelucas/newspaper](https://github.com/codelucas/newspaper) 21 | - [https://github.com/fabiomatricardi/MetadataIsAllYouNeed/blob/main/KeyBERT_gr.py](https://github.com/fabiomatricardi/MetadataIsAllYouNeed/blob/main/KeyBERT_gr.py) 22 | 23 | 24 | ### General inference rules 25 | The Mistral models allows you to chat with a model that has been fine-tuned to follow instructions and respond to natural language prompts. A prompt is the input that you provide to the Mistral model. It can come in various forms, such as asking a question, giving an instruction, or providing a few examples of the task you want the model to perform. Based on the prompt, the Mistral model generates a text output as a response. 26 | The chat completion API accepts a list of chat messages as input and generates a response. This response is in the form of a new chat message with the role "assistant" as output. 27 | - [https://docs.mistral.ai/capabilities/completion/](https://docs.mistral.ai/capabilities/completion/) 28 | - 29 | 30 | # Models to try 31 | ### H2OVL-Mississippi-800M 32 | The H2OVL-Mississippi-800M is a compact yet powerful vision-language model from H2O.ai, featuring 0.8 billion parameters. Despite its small size, it delivers state-of-the-art performance in text recognition, excelling in the Text Recognition segment of OCRBench and outperforming much larger models in this domain. Built upon the robust architecture of our H2O-Danube language models, the Mississippi-800M extends their capabilities by seamlessly integrating vision and language tasks. 33 | 34 | 35 | - [HuggingFace repo at H2O](https://huggingface.co/h2oai/h2ovl-mississippi-800m) 36 | - [H2O article by Asghar Ghorbani](https://h2o.ai/blog/2024/document-classification-with-h2o-vl-mississippi--a-quick-guide/) 37 | ### MaziyarPanahi's Collections 38 | 🚀GGUF Llama.cpp compatible models, can be used on CPUs and GPUs!
39 | Here the [AMAZING COLLECTION](https://huggingface.co/collections/MaziyarPanahi/gguf-65afc99c3997c4b6d2d9e1d5) 40 | ### h2oai/deberta_finetuned_pii 41 | A finetuned model designed to recognize and classify Personally Identifiable Information (PII) within unstructured text data. This powerful model accurately identifies a wide range of PII categories, such as account names, credit card numbers, emails, phone numbers, and addresses. The model is specifically trained to detect various PII types, including but not limited to: 42 | 43 | ``` 44 | | Category | Data | 45 | |------------------------|----------------------------------------------------------------------------------------| 46 | | Account-related information | Account name, account number, and transaction amounts | 47 | | Banking details | BIC, IBAN, and Bitcoin or Ethereum addresses | 48 | | Personal information | Full name, first name, middle name, last name, gender, and date of birth | 49 | | Contact information | Email, phone number, and street address (including building number, city, county, state, and zip code) | 50 | | Job-related data | Job title, job area, job descriptor, and job type | 51 | | Financial data | Credit card number, issuer, CVV, and currency information (code, name, and symbol) | 52 | | Digital identifiers | IP addresses (IPv4 and IPv6), MAC addresses, and user agents | 53 | | Online presence | URL, usernames, and passwords | 54 | | Other sensitive data | SSN, vehicle VIN and VRM, phone IMEI, and nearby GPS coordinates | 55 | 56 | ``` 57 | 58 | The PII Identifier Model ensures data privacy and compliance by effectively detecting and categorizing sensitive information within documents, emails, user-generated content, and more. Make your data processing safer and more secure with our state-of-the-art PII detection technology. 59 | - [Hugging Face Repo](https://huggingface.co/h2oai/deberta_finetuned_pii) 60 | - 61 | 62 | 63 | 64 | # Large Language Models course free 65 | - [llm-engineering-handbook](https://github.com/aofoegbu/llm-engineers-handbook) 66 | - [DeepLearning.AI course on Agents](https://learn.deeplearning.ai/courses/practical-multi-ai-agents-and-advanced-use-cases-with-crewai/lesson/1/introduction) 67 | - [Microsoft BitNet.cpp](https://github.com/microsoft/BitNet) 68 | - [ArXiv paper *1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs*](https://arxiv.org/abs/2410.16144v1) 69 | - [MotleyCrew AI](https://motleycrew.ai/) 70 | - [MotleyCrewAI-readthedocs](https://motleycrew.readthedocs.io/en/latest/quickstart.html) 71 | - [OpenVino quick guide CheatSheet](https://docs.openvino.ai/2024/_static/download/OpenVINO_Quick_Start_Guide.pdf) 72 | - [OpenVino Toolkit Getting Started](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/get-started.html) 73 | 74 | # About IBM/Granite models 75 | - [Ollama serving](https://ollama.com/library/granite3-moe:1b) 76 | - [Granite3 IbM on GitHub](https://github.com/ibm-granite/granite-3.0-language-models) 77 | - [Fabio you are the Benchmark](https://github.com/fabiomatricardi/YouAreTheBenchmark) 78 | - 79 | 80 | 81 | ## Markdown resoources 82 | - [MArkdown Videos](https://github.com/Snailedlt/Markdown-Videos) 83 | - 84 | 85 | 86 | 87 | 88 | ## [Universal Assisted Generation: Faster Decoding with Any Assistant Model](https://huggingface.co/blog/universal_assisted_generation) 89 | TL;DR: Many LLMs such as gemma-2-9b and Mixtral-8x22B-Instruct-v0.1 lack a much smaller version to use for assisted generation. In this blog post, we present Universal Assisted Generation: a method developed by Intel Labs and Hugging Face which extends assisted generation to work with a small language model from any model family 🤯. As a result, it is now possible to accelerate inference from any decoder or Mixture of Experts model by 1.5x-2.0x at almost zero-cost 🔥🔥🔥! 90 | In order to mitigate this pain point, Intel Labs, together with our friends at Hugging Face, has developed Universal Assisted Generation (UAG). UAG enables selecting any pair of target and assistant models regardless of their tokenizer. For example, gemma-2-9b can be used as the target model, with the tiny vicuna-68m as the assistant. 91 | The main idea behind the method we propose is 2-way tokenizer translations. Once the assistant model completes a generation iteration, the assistant tokens are converted to text, which is then tokenized using the target model's tokenizer to generate target tokens. After the verification step, the target tokens are similarly converted back to the assistant tokens format, which are then appended to the assistant model's context before the next iteration begins. 92 | Since the assistant and target tokenizers use different vocabularies it's necessary to handle the discrepancies between them. To accurately re-encode the newly generated assistant tokens, it’s essential to prepend a context window consisting of several previous tokens. This entire sequence is then re-encoded into the target token format and aligned with the most recent target tokens to pinpoint the exact location where the newly generated tokens should be appended. This process is illustrated in the video below. 93 | 94 | 95 | https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/universal-assisted-generation/method-animation.mov 96 | 97 | ``` 98 |