├── autotest ├── bnchnarks.here.md ├── promptLibv2Qwen.py ├── 101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py └── 100.Falcon3-1b-instruct_llamaserver_promptTest.py ├── image001.png ├── image002.png ├── image003.png ├── falconBKG.png ├── falco3-eval.png ├── falco3-logo.png ├── BattleOf2B-2024.png ├── falcon3_llamacpp.gif ├── 202412.BattleOfTheModels-2B_tableDetailed.xlsx ├── Falcon3-1B_LCPPsvr_RYGOX.csv ├── README.md ├── promptLibv2Falcon3.py ├── testFalcon3-1B-it.py ├── 101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py ├── 100.Falcon3-1b-instruct_llamaserver_promptTest.py └── Falcon3-1B_LCPPsvr_RYGOX_log.txt /autotest/bnchnarks.here.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /image001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image001.png -------------------------------------------------------------------------------- /image002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image002.png -------------------------------------------------------------------------------- /image003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image003.png -------------------------------------------------------------------------------- /falconBKG.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falconBKG.png -------------------------------------------------------------------------------- /falco3-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falco3-eval.png -------------------------------------------------------------------------------- /falco3-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falco3-logo.png -------------------------------------------------------------------------------- /BattleOf2B-2024.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/BattleOf2B-2024.png -------------------------------------------------------------------------------- /falcon3_llamacpp.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falcon3_llamacpp.gif -------------------------------------------------------------------------------- /202412.BattleOfTheModels-2B_tableDetailed.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/202412.BattleOfTheModels-2B_tableDetailed.xlsx -------------------------------------------------------------------------------- /Falcon3-1B_LCPPsvr_RYGOX.csv: -------------------------------------------------------------------------------- 1 | #,TASK,VOTE,REMARKS 2 | 1,introduction,5, 3 | 2,explain in one sentence,5, 4 | 3,explain in three paragraphs,5, 5 | 4,say 'I am ready',5, 6 | 5,summarize,5, 7 | 6,Summarize in two sentences,3 ,there are 3 prhases 8 | 7,Write in a list the three main key points - format output,4, 9 | 8,Table of Contents,5, 10 | 9,RAG,5, 11 | 10,Truthful RAG,2 ,"it is unanswerable, even after a lot of prompt engineering" 12 | 11,write content from a reference,5, 13 | 12,extract 5 topics,5, 14 | 13,Creativity: 1000 words SF story,4 ,"good, no repetitions, consistent. but not matching the lenght" 15 | 14,Reflection prompt,3 ,"regardless the missing opening/closure tags, the structure of the think-reflect-output is not fulfilled" 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Falcon3-1B-it-llamaCPP 4 | How to run Falcon3-1B-instruct with llama.cpp on your pc - this is what we will build 5 | 6 | For months I have done the wrong things. But llama.cpp now has pre-compiled binaries at every release. So, for instance, starting from revision b4351 llama.cpp supports also the Falcon3 models. 7 | 8 | To be inclusive (all kind of hardware) we will use the binaries for AXV2 support, from release b4358 (the one available at the time of this newsletter writing). 9 | 10 | Download the file in your project directory: for me is Falcon3. Create a sub-folder called llamacpp and inside another one called model (we will download the GGF for Falcon3 there). 11 | 12 | 13 | 14 | Unzip all files in the [llama-b4358-bin-win-avx2-x64.zip](https://github.com/ggerganov/llama.cpp/releases/download/b4358/llama-b4358-bin-win-avx2-x64.zip) archive into the llamacpp directory 15 | 16 | 17 | 18 | Download the from the MaziyarPanahi Hugging Face repository: I used the Q6 ([Falcon3-1B-Instruct.Q6_K.gguf](https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF/resolve/main/Falcon3-1B-Instruct.Q6_K.gguf)) quantization, but also the Q8 is good. Save the GGUF file in the subdirectory llamacpp\model. 19 | 20 | 21 | 22 | Open a terminal window in the subdirectory llamacpp, and run 23 | ``` 24 | .\llama-server.exe -m .\model\Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 25 | ``` 26 | 27 | In another terminal with the venv activated run 28 | ``` 29 | python testFalcon3-1B-it.py 30 | ``` 31 | 32 | -------------------------------------------------------------------------------- /autotest/promptLibv2Qwen.py: -------------------------------------------------------------------------------- 1 | """ 2 | V2 changes 3 | added Time To First Token in the statistics ttft 4 | added some more prompts in the catalog 5 | - say 'I am ready' 6 | - modified for Llama3.2-1b Write in a list the three main key points - format output 7 | 8 | 20240929 FAMA 9 | """ 10 | 11 | import random 12 | import string 13 | import tiktoken 14 | 15 | def createCatalog(): 16 | """ 17 | Create a dictionary with 18 | 'task' : description of the NLP task in the prompt 19 | 'prompt' : the instruction prompt for the LLM 20 | """ 21 | context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 22 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 23 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 24 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 25 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 26 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 27 | It could soon be the case that the so-called developing countries will be showing the West the way forward.""" 28 | catalog = [] 29 | prmpt_tasks = ["introduction", 30 | "explain in one sentence", 31 | "explain in three paragraphs", 32 | "say 'I am ready'", 33 | "summarize", 34 | "Summarize in two sentences", 35 | "Write in a list the three main key points - format output", 36 | "Table of Contents", 37 | "RAG", 38 | "Truthful RAG", 39 | "write content from a reference", 40 | "extract 5 topics", 41 | "Creativity: 1000 words SF story", 42 | "Reflection prompt" 43 | ] 44 | prmpt_coll = [ 45 | """Hi there I am Fabio, a Medium writer. who are you?""", 46 | """explain in one sentence what is science.\n""", 47 | """explain only in three paragraphs what is artificial intelligence.\n""", 48 | f"""read the following text and when you are done say "I am ready". 49 | 50 | [text] 51 | {context} 52 | [end of text] 53 | 54 | """, 55 | f"""summarize the following text: 56 | [text] 57 | {context} 58 | [end of text] 59 | write the summary using the information provided. 60 | 61 | Summary: 62 | """, 63 | f"""Write a two-sentences summary of the following text: 64 | [text] 65 | {context} 66 | [end of text] 67 | Remember: write only the summary with the text provided. 68 | """, 69 | f"""1. extract the three key points from the provided text 70 | 2. format the output as a python list ["point 1","point 2", "point 3"] 71 | [text] 72 | {context} 73 | [end of text] 74 | Return only the python list. 75 | """, 76 | f"""A "table of content" provides an organized structure to the text, allowing readers quick access to key points. Write the "table of content" of the following text. 77 | [text] 78 | {context} 79 | [end of text] 80 | 81 | "table of content": 82 | """, 83 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 84 | 85 | question: what China achieved with it's long-term planning? 86 | 87 | [context] 88 | {context} 89 | [end of context] 90 | 91 | answer: 92 | """, 93 | f"""Reply to the question only using the provided context. If the answer is not contained in the provided context say "unanswerable". 94 | 95 | question: who is Anne Frank? 96 | 97 | [context] 98 | {context} 99 | [end of context] 100 | 101 | Remember: if you cannot answer based on the provided context, say "unanswerable" 102 | 103 | answer: 104 | """, 105 | 106 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model". 107 | 108 | [text] 109 | {context} 110 | [end of text] 111 | Remember: use the information provided and write exactly 5 paragraphs. 112 | """, 113 | f"""List five most important topics from the following text: 114 | [text] 115 | {context} 116 | [end of text] 117 | """, 118 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words. 119 | 120 | """, 121 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure: 122 | 1. Begin with a section. 123 | 2. Inside the thinking section: 124 | a. Briefly analyze the question and outline your approach. 125 | b. Present a clear plan of steps to solve the problem. 126 | c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps. 127 | 3. Include a section for each idea where you: 128 | a. Review your reasoning. 129 | b. Check for potential errors or oversights. 130 | c. Confirm or adjust your conclusion if necessary. 131 | 4. Be sure to close all reflection sections. 132 | 5. Close the thinking section with . 133 | 6. Provide your final answer in an section. 134 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process. 135 | Remember: Both and MUST be tags and must be closed at their conclusion 136 | Make sure all are on separate lines with no other text. Do not include other text on a line containing a tag. 137 | 138 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 139 | 140 | """ 141 | ] 142 | for i in range(0,len(prmpt_tasks)): 143 | catalog.append({'task':prmpt_tasks[i], 144 | 'prompt': prmpt_coll[i]}) 145 | return catalog 146 | 147 | def countTokens(text): 148 | """ 149 | Use tiktoken to count the number of tokens 150 | text -> str input 151 | Return -> int number of tokens counted 152 | """ 153 | encoding = tiktoken.get_encoding("cl100k_base") #context_count = len(encoding.encode(yourtext)) 154 | numoftokens = len(encoding.encode(text)) 155 | return numoftokens 156 | 157 | def writehistory(filename,text): 158 | """ 159 | save a string into a logfile with python file operations 160 | filename -> str pathfile/filename 161 | text -> str, the text to be written in the file 162 | """ 163 | with open(f'{filename}', 'a', encoding='utf-8') as f: 164 | f.write(text) 165 | f.write('\n') 166 | f.close() 167 | 168 | def genRANstring(n): 169 | """ 170 | n = int number of char to randomize 171 | """ 172 | N = n 173 | res = ''.join(random.choices(string.ascii_uppercase + 174 | string.digits, k=N)) 175 | return res 176 | 177 | def createStats(delta,question,output,rating,logfilename,task,ttft): 178 | """ 179 | Takes in all the generation main info and return KPIs 180 | delta -> datetime.now() delta 181 | question -> str the user input to the LLM 182 | output -> str the generation from the LLM 183 | rating -> str human eval feedback rating 184 | logfilename -> str filepath/filename 185 | task -> str description of the NLP task describing the prompt 186 | ttft -> datetime.now() delta time to first token 187 | """ 188 | totalseconds = delta.total_seconds() 189 | prompttokens = countTokens(question) 190 | assistanttokens = countTokens(output) 191 | totaltokens = prompttokens + assistanttokens 192 | speed = totaltokens/totalseconds 193 | genspeed = assistanttokens/totalseconds 194 | ttofseconds = ttft.total_seconds() 195 | stats = f'''--- 196 | Prompt Tokens: {prompttokens} 197 | Output Tokens: {assistanttokens} 198 | TOTAL Tokens: {totaltokens} 199 | >>>⏱️ Time to First Token: {ttofseconds} seconds 200 | >>>⏱️ Inference time: {delta} 201 | >>>🧮 Inference speed: {speed:.3f} t/s 202 | >>>🏍️ Generation speed: {genspeed:.3f} t/s 203 | >>>📝 Logfile: {logfilename} 204 | >>>💚 User rating: {rating} 205 | >>>✅ NLP TAKS: {task} 206 | ''' 207 | return stats -------------------------------------------------------------------------------- /promptLibv2Falcon3.py: -------------------------------------------------------------------------------- 1 | """ 2 | V2 changes 3 | added Time To First Token in the statistics ttft 4 | added some more prompts in the catalog 5 | - say 'I am ready' 6 | - modified for Falcon3-1b 7 | - explain in 1 phrase what is science. 8 | - Write the summary of the following text, in 2 phrase 2. 9 | - extract the 3 important points from the provided text. format the output as a python list. 10 | - Imagine you are writing the table of content of a document, given a text. 11 | write the table of content of the following: 12 | 13 | 20240929 FAMA 14 | """ 15 | 16 | import random 17 | import string 18 | import tiktoken 19 | 20 | def createCatalog(): 21 | """ 22 | Create a dictionary with 23 | 'task' : description of the NLP task in the prompt 24 | 'prompt' : the instruction prompt for the LLM 25 | """ 26 | context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 27 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 28 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 29 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 30 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 31 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 32 | It could soon be the case that the so-called developing countries will be showing the West the way forward.""" 33 | catalog = [] 34 | prmpt_tasks = ["introduction", 35 | "explain in one sentence", 36 | "explain in three paragraphs", 37 | "say 'I am ready'", 38 | "summarize", 39 | "Summarize in two sentences", 40 | "Write in a list the three main key points - format output", 41 | "Table of Contents", 42 | "RAG", 43 | "Truthful RAG", 44 | "write content from a reference", 45 | "extract 5 topics", 46 | "Creativity: 1000 words SF story", 47 | "Reflection prompt" 48 | ] 49 | prmpt_coll = [ 50 | """Hi there I am Fabio, a Medium writer. who are you?""", 51 | """explain in 1 phrase what is science.\n""", 52 | """explain only in three paragraphs what is artificial intelligence.\n""", 53 | f"""read the following text and when you are done say "I am ready". 54 | 55 | [text] 56 | {context} 57 | [end of text] 58 | 59 | """, 60 | f"""summarize the following text: 61 | [text] 62 | {context} 63 | [end of text] 64 | write the summary using the information provided. 65 | 66 | Summary: 67 | """, 68 | f"""Write the summary of the following text, in 2 phrase 2. 69 | [text] 70 | {context} 71 | [end of text] 72 | """, 73 | f"""extract the 3 important points from the provided text. format the output as a python list. 74 | [text] 75 | {context} 76 | [end of text] 77 | """, 78 | f"""Imagine you are writing the table of content of a document, given a text. 79 | write the table of content of the following: 80 | [text] 81 | {context} 82 | [end of text] 83 | """, 84 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 85 | [context] 86 | {context} 87 | [end of context] 88 | Rememeber: If the answer is not contained in the text say "unanswerable". 89 | 90 | question: what China achieved with it's long-term planning? 91 | answer: 92 | """, 93 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 94 | [context] 95 | {context} 96 | [end of context] 97 | Rememeber: If the answer is not contained in the text say "unanswerable". 98 | 99 | question: who is Anne Frank? 100 | answer: 101 | """, 102 | 103 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model". 104 | 105 | [text] 106 | {context} 107 | [end of text] 108 | Remember: use the information provided and write exactly 5 paragraphs. 109 | """, 110 | f"""List five most important topics from the following text: 111 | [text] 112 | {context} 113 | [end of text] 114 | """, 115 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words. 116 | 117 | """, 118 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure: 119 | 1. Begin with a section. 120 | 2. Inside the thinking section: 121 | a. Briefly analyze the question and outline your approach. 122 | b. Present a clear plan of steps to solve the problem. 123 | c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps. 124 | 3. Include a section for each idea where you: 125 | a. Review your reasoning. 126 | b. Check for potential errors or oversights. 127 | c. Confirm or adjust your conclusion if necessary. 128 | 4. Be sure to close all reflection sections. 129 | 5. Close the thinking section with . 130 | 6. Provide your final answer in an section. 131 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process. 132 | Remember: Both and MUST be tags and must be closed at their conclusion 133 | Make sure all are on separate lines with no other text. Do not include other text on a line containing a tag. 134 | 135 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 136 | 137 | """ 138 | ] 139 | for i in range(0,len(prmpt_tasks)): 140 | catalog.append({'task':prmpt_tasks[i], 141 | 'prompt': prmpt_coll[i]}) 142 | return catalog 143 | 144 | def countTokens(text): 145 | """ 146 | Use tiktoken to count the number of tokens 147 | text -> str input 148 | Return -> int number of tokens counted 149 | """ 150 | encoding = tiktoken.get_encoding("cl100k_base") #context_count = len(encoding.encode(yourtext)) 151 | numoftokens = len(encoding.encode(text)) 152 | return numoftokens 153 | 154 | def writehistory(filename,text): 155 | """ 156 | save a string into a logfile with python file operations 157 | filename -> str pathfile/filename 158 | text -> str, the text to be written in the file 159 | """ 160 | with open(f'{filename}', 'a', encoding='utf-8') as f: 161 | f.write(text) 162 | f.write('\n') 163 | f.close() 164 | 165 | def genRANstring(n): 166 | """ 167 | n = int number of char to randomize 168 | """ 169 | N = n 170 | res = ''.join(random.choices(string.ascii_uppercase + 171 | string.digits, k=N)) 172 | return res 173 | 174 | def createStats(delta,question,output,rating,logfilename,task,ttft): 175 | """ 176 | Takes in all the generation main info and return KPIs 177 | delta -> datetime.now() delta 178 | question -> str the user input to the LLM 179 | output -> str the generation from the LLM 180 | rating -> str human eval feedback rating 181 | logfilename -> str filepath/filename 182 | task -> str description of the NLP task describing the prompt 183 | ttft -> datetime.now() delta time to first token 184 | """ 185 | totalseconds = delta.total_seconds() 186 | prompttokens = countTokens(question) 187 | assistanttokens = countTokens(output) 188 | totaltokens = prompttokens + assistanttokens 189 | speed = totaltokens/totalseconds 190 | genspeed = assistanttokens/totalseconds 191 | ttofseconds = ttft.total_seconds() 192 | stats = f'''--- 193 | Prompt Tokens: {prompttokens} 194 | Output Tokens: {assistanttokens} 195 | TOTAL Tokens: {totaltokens} 196 | >>>⏱️ Time to First Token: {ttofseconds} seconds 197 | >>>⏱️ Inference time: {delta} 198 | >>>🧮 Inference speed: {speed:.3f} t/s 199 | >>>🏍️ Generation speed: {genspeed:.3f} t/s 200 | >>>📝 Logfile: {logfilename} 201 | >>>💚 User rating: {rating} 202 | >>>✅ NLP TAKS: {task} 203 | ''' 204 | return stats -------------------------------------------------------------------------------- /testFalcon3-1B-it.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal with Falcon3-1B-Instruct.Q6_K.gguf 2 | # model served in another terminal window with llama-server 3 | from openai import OpenAI 4 | import sys 5 | from time import sleep 6 | 7 | STOPS = ['<|endoftext|>'] 8 | COUNTERLIMITS = 10 #an even number 9 | 10 | # ASCII ART FROM https://asciiart.club/ 11 | print(""" 12 | 13 | 14 | ░░░░░░░░░░░░░░ 15 | ░░░░░░░░░░░░░░ 16 | ░░░░░░░░░░░░░░ , ,, 17 | ░░░░░░░░░░░░░░ ╓╣▒▒▒▒╢╫φ 18 | ░░░░░░░░░░░░░░ ]▒▒▒▒▒▒▒╢▓▓ 19 | ░░░░░░░░░░░░░░ ▐▄▄▓▓█████▓M 20 | ░░░░█▀▀▀▀░░░░░ ▐█ ╗M╜╨╗ ▀▀▒╖,█▌.]▒▓▌ 21 | ░░░░█░░░░░ ▄▄▄▄ ▐█ ,▄▄▄ ,▄▄▄; ,▄▄▄▄ ║C , `╜`░░'▒╢▒▒▓▓╗ 22 | ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█ ]█ ▀▌ ▐▌ ▐█ ╨╝╖ ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌ 23 | ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌ ▐▌ ▐█ ▐▌ █ ]@ ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓ 24 | ░░░░█░░░░░▀█▄▄█▀ ▐█ ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌ █ ╚H╗m╝ ╙╫▒▒▒▒▐░▒░▒████▓╣╗ 25 | ░░░░░░░░░░░░░░ ▀▓▓▓▓╬╢▓▓▓█████▓ 26 | ░░░░░░░░░░░░░ ┌─ ▀▀▓▓▓▓▓▒▓▀████▓, 27 | ░░░░░░░░░░░░░░ ]`┌^hh] ╖ ▓▌ ▓█ ▀▀▓█▌╕ 28 | ░░░░░░░░░░░░░░ ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄ '"` 29 | ░░░░░░░░░░░░░░ ╓Nß▀▀ ▄ 30 | ░░░░░░░░░░░░░░ ' 31 | ░░░░░░░░░░░░░░ 32 | ░░░░░░░░░░░░░░░░░░░░░░░░░░ 33 | 34 | --- 35 | """) 36 | # Point to the local server 37 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed") 38 | print("3. Ready to Chat with Falcon3-1B-instruct Context length=8192...") 39 | print("\033[0m") #reset all 40 | 41 | history = [ 42 | ] 43 | print("\033[92;1m") 44 | counter = 1 45 | while True: 46 | if counter > COUNTERLIMITS: 47 | history = [ 48 | ] 49 | userinput = "" 50 | print("\033[1;30m") #dark grey 51 | print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:") 52 | print("\033[91;1m") #red 53 | lines = sys.stdin.readlines() 54 | for line in lines: 55 | userinput += line + "\n" 56 | if "quit!" in lines[0].lower(): 57 | print("\033[0mBYE BYE!") 58 | break 59 | history.append({"role": "user", "content": userinput}) 60 | print("\033[92;1m") 61 | 62 | completion = client.chat.completions.create( 63 | model="local-model", # this field is currently unused 64 | messages=history, 65 | temperature=0.3, 66 | frequency_penalty = 1.6, 67 | max_tokens = 600, 68 | stream=True, 69 | stop=STOPS 70 | ) 71 | 72 | new_message = {"role": "assistant", "content": ""} 73 | 74 | for chunk in completion: 75 | if chunk.choices[0].delta.content: 76 | print(chunk.choices[0].delta.content, end="", flush=True) 77 | new_message["content"] += chunk.choices[0].delta.content 78 | history.append(new_message) 79 | counter += 1 80 | 81 | 82 | 83 | ########################## MODEL CARD ################################################### 84 | """ 85 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 86 | ggml_vulkan: Found 1 Vulkan devices: 87 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none 88 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64 89 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4 90 | 91 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 92 | 93 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3 94 | main: loading model 95 | srv load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf' 96 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free 97 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest)) 98 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 99 | general.architecture str = llama 100 | general.name str = Models Tiiuae Falcon3 1B Instruct 101 | general.finetune str = Instruct 102 | general.basename str = models-tiiuae-Falcon3 103 | general.size_label str = 1.7B 104 | general.license str = other 105 | general.license.name str = falcon-llm-license 106 | general.license.link str = https://falconllm.tii.ae/falcon-terms... 107 | general.base_model.count u32 = 1 108 | eneral.base_model.0.name str = Falcon3 1B Base 109 | ase_model.0.organization str = Tiiuae 110 | al.base_model.0.repo_url str = https://huggingface.co/tiiuae/Falcon3... 111 | general.tags arr[str,1] = ["falcon3"] 112 | general.languages arr[str,4] = ["en", "fr", "es", "pt"] 113 | llama.vocab_size u32 = 131072 114 | llama.context_length u32 = 8192 115 | llama.embedding_length u32 = 2048 116 | llama.block_count u32 = 18 117 | lama.feed_forward_length u32 = 8192 118 | ama.rope.dimension_count u32 = 256 119 | ama.attention.head_count u32 = 8 120 | .attention.head_count_kv u32 = 4 121 | n.layer_norm_rms_epsilon f32 = 0.000001 122 | llama.rope.freq_base f32 = 1000042.000000 123 | general.file_type u32 = 18 124 | tokenizer.ggml.model str = gpt2 125 | tokenizer.ggml.tokens arr[str,131072] = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR... 126 | tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... 127 | okenizer.ggml.token_type arr[i32,131072] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 128 | tokenizer.ggml.merges arr[str,128810] = ["N E", "─á ─á", "─á t", "─á a", "> >... 129 | enizer.ggml.eos_token_id u32 = 11 130 | er.ggml.padding_token_id u32 = 2023 131 | tokenizer.chat_template str = {% for message in messages %}{% if me... 132 | ral.quantization_version u32 = 2 133 | quantize.imatrix.file str = ./Falcon3-1B-Instruct-GGUF_imatrix.dat 134 | quantize.imatrix.dataset str = group_40.txt 135 | ze.imatrix.entries_count i32 = 126 136 | ize.imatrix.chunks_count i32 = 72 137 | llama_model_loader: - type f32: 37 tensors 138 | llama_model_loader: - type q6_K: 128 tensors 139 | llm_load_vocab: missing pre-tokenizer type, using: 'default' 140 | llm_load_vocab: 141 | llm_load_vocab: ************************************ 142 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! 143 | llm_load_vocab: CONSIDER REGENERATING THE MODEL 144 | llm_load_vocab: ************************************ 145 | llm_load_vocab: 146 | llm_load_vocab: control-looking token: 11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden 147 | llm_load_vocab: special tokens cache size = 1 148 | llm_load_vocab: token to piece cache size = 0.8741 MB 149 | llm_load_print_meta: format = GGUF V3 (latest) 150 | llm_load_print_meta: arch = llama 151 | llm_load_print_meta: vocab type = BPE 152 | llm_load_print_meta: n_vocab = 131072 153 | llm_load_print_meta: n_merges = 128810 154 | llm_load_print_meta: vocab_only = 0 155 | llm_load_print_meta: n_ctx_train = 8192 156 | llm_load_print_meta: n_embd = 2048 157 | llm_load_print_meta: n_layer = 18 158 | llm_load_print_meta: n_head = 8 159 | llm_load_print_meta: n_head_kv = 4 160 | llm_load_print_meta: n_rot = 256 161 | llm_load_print_meta: n_swa = 0 162 | llm_load_print_meta: n_embd_head_k = 256 163 | llm_load_print_meta: n_embd_head_v = 256 164 | llm_load_print_meta: n_gqa = 2 165 | llm_load_print_meta: n_embd_k_gqa = 1024 166 | llm_load_print_meta: n_embd_v_gqa = 1024 167 | llm_load_print_meta: f_norm_eps = 0.0e+00 168 | llm_load_print_meta: f_norm_rms_eps = 1.0e-06 169 | llm_load_print_meta: f_clamp_kqv = 0.0e+00 170 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 171 | llm_load_print_meta: f_logit_scale = 0.0e+00 172 | llm_load_print_meta: n_ff = 8192 173 | llm_load_print_meta: n_expert = 0 174 | llm_load_print_meta: n_expert_used = 0 175 | llm_load_print_meta: causal attn = 1 176 | llm_load_print_meta: pooling type = 0 177 | llm_load_print_meta: rope type = 0 178 | llm_load_print_meta: rope scaling = linear 179 | llm_load_print_meta: freq_base_train = 1000042.0 180 | llm_load_print_meta: freq_scale_train = 1 181 | llm_load_print_meta: n_ctx_orig_yarn = 8192 182 | llm_load_print_meta: rope_finetuned = unknown 183 | llm_load_print_meta: ssm_d_conv = 0 184 | llm_load_print_meta: ssm_d_inner = 0 185 | llm_load_print_meta: ssm_d_state = 0 186 | llm_load_print_meta: ssm_dt_rank = 0 187 | llm_load_print_meta: ssm_dt_b_c_rms = 0 188 | llm_load_print_meta: model type = ?B 189 | llm_load_print_meta: model ftype = Q6_K 190 | llm_load_print_meta: model params = 1.67 B 191 | llm_load_print_meta: model size = 1.28 GiB (6.56 BPW) 192 | llm_load_print_meta: general.name = Models Tiiuae Falcon3 1B Instruct 193 | llm_load_print_meta: BOS token = 11 '<|endoftext|>' 194 | llm_load_print_meta: EOS token = 11 '<|endoftext|>' 195 | llm_load_print_meta: EOT token = 11 '<|endoftext|>' 196 | llm_load_print_meta: PAD token = 2023 '<|pad|>' 197 | llm_load_print_meta: LF token = 2150 '├ä' 198 | llm_load_print_meta: EOG token = 11 '<|endoftext|>' 199 | llm_load_print_meta: max token length = 256 200 | ggml_vulkan: Compiling shaders..........................Done! 201 | llm_load_tensors: offloading 0 repeating layers to GPU 202 | llm_load_tensors: offloaded 0/19 layers to GPU 203 | llm_load_tensors: CPU_Mapped model buffer size = 1306.23 MiB 204 | .................................................................... 205 | llama_new_context_with_model: n_seq_max = 1 206 | llama_new_context_with_model: n_ctx = 8192 207 | llama_new_context_with_model: n_ctx_per_seq = 8192 208 | llama_new_context_with_model: n_batch = 2048 209 | llama_new_context_with_model: n_ubatch = 512 210 | llama_new_context_with_model: flash_attn = 0 211 | llama_new_context_with_model: freq_base = 1000042.0 212 | llama_new_context_with_model: freq_scale = 1 213 | llama_kv_cache_init: CPU KV buffer size = 576.00 MiB 214 | llama_new_context_with_model: KV self size = 576.00 MiB, K (f16): 288.00 MiB, V (f16): 288.00 MiB 215 | llama_new_context_with_model: CPU output buffer size = 0.50 MiB 216 | llama_new_context_with_model: Vulkan0 compute buffer size = 470.00 MiB 217 | llama_new_context_with_model: Vulkan_Host compute buffer size = 20.01 MiB 218 | llama_new_context_with_model: graph nodes = 582 219 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1) 220 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192 221 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 222 | srv init: initializing slots, n_slots = 1 223 | slot init: id 0 | task -1 | new slot n_ctx_slot = 8192 224 | main: model loaded 225 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses 226 | main: chat template, built_in: 0, chat_example: '<|im_start|>system 227 | You are a helpful assistant<|im_end|> 228 | <|im_start|>user 229 | Hello<|im_end|> 230 | <|im_start|>assistant 231 | Hi there<|im_end|> 232 | <|im_start|>user 233 | How are you?<|im_end|> 234 | <|im_start|>assistant 235 | ' 236 | main: server is listening on http://127.0.0.1:8001 - starting the main loop 237 | srv update_slots: all slots are idle 238 | """ 239 | -------------------------------------------------------------------------------- /autotest/101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct 3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf) 4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF 5 | ##################################################################################################### 6 | 7 | """ 8 | Model 9 | architecture Falcon3 10 | parameters 1.67B 11 | context length 8192 12 | quantization Q6 13 | 14 | License 15 | TII Falcon-LLM License 2.0 16 | """ 17 | import sys 18 | from time import sleep 19 | import warnings 20 | warnings.filterwarnings(action='ignore') 21 | import datetime 22 | from promptLibv2Qwen import countTokens, writehistory, createCatalog 23 | from promptLibv2Qwen import genRANstring, createStats 24 | import argparse 25 | from openai import OpenAI 26 | 27 | 28 | 29 | stops = ['<|endoftext|>'] 30 | tasks = createCatalog() 31 | modelname = 'Falcon3-1B-Instruct' 32 | # create THE LOG FILE 33 | logfile = f'logs/{modelname}_CHAT_LCPPsrv_{genRANstring(5)}_log.txt' 34 | logfilename = logfile 35 | #Write in the history the first 2 sessions 36 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.') 37 | writehistory(logfilename,f'💻: How can I assist you today in writing?') 38 | 39 | print("\033[95;3;6m") 40 | print("1. Waiting 10 seconds for the API to load...") 41 | # using OpenAI library to connect to llama-server llama.cpp API endpoint 42 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed") 43 | print(f"2. Model {modelname} loaded with LLAMACPP-SERVER...") 44 | print("\033[0m") #reset all 45 | history = [] 46 | print("\033[92;1m") 47 | print(f'📝Logfile: {logfilename}') 48 | 49 | ##################### ALIGNMENT FIRST GENERATION ############################################## 50 | question = 'Explain the plot of Cinderella in a sentence.' 51 | test = [ 52 | {"role": "user", "content": question} 53 | ] 54 | 55 | print('Question:', question) 56 | start = datetime.datetime.now() 57 | print("💻 > ", end="", flush=True) 58 | full_response = "" 59 | completion = client.chat.completions.create( 60 | messages=test, 61 | model='local-model', 62 | temperature=0.25, 63 | frequency_penalty = 1.178, 64 | stop=stops, 65 | max_tokens=1500, 66 | stream=True 67 | ) 68 | for chunk in completion: 69 | try: 70 | if chunk.choices[0].delta.content: 71 | print(chunk.choices[0].delta.content, end="", flush=True) 72 | full_response += chunk.choices[0].delta.content 73 | except: 74 | pass 75 | delta = datetime.datetime.now() - start 76 | output = full_response 77 | print('') 78 | print("\033[91;1m") 79 | rating = 'PUT IT LATER'#input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 80 | print("\033[92;1m") 81 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',delta) 82 | print(stats) 83 | writehistory(logfilename,f'''👨‍💻 . {question} 84 | 💻 > {output} 85 | {stats} 86 | ''') 87 | 88 | ############################# START TURN BASED CHAT ################################# 89 | print('Starting now Normal Chat turn based interface...') 90 | counter = 1 91 | while True: 92 | # Reset history every turn 93 | history = [] 94 | userinput = "" 95 | print("\033[1;30m") #dark grey 96 | print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:") 97 | print("\033[91;1m") #red 98 | lines = sys.stdin.readlines() 99 | for line in lines: 100 | userinput += line + "\n" 101 | if "quit!" in lines[0].lower(): 102 | print("\033[0mBYE BYE!") 103 | break 104 | history.append({"role": "user", "content": userinput}) 105 | print("\033[92;1m") 106 | # Preparing Generation history pair 107 | new_message = {"role": "assistant", "content": ""} 108 | # Starting generation loop 109 | full_response = "" 110 | fisrtround = 0 111 | start = datetime.datetime.now() 112 | print("💻 > ", end="", flush=True) 113 | completion = client.chat.completions.create( 114 | messages=history, 115 | model='local-model', 116 | temperature=0.25, 117 | frequency_penalty = 1.178, 118 | stop=stops, 119 | max_tokens=1500, 120 | stream=True 121 | ) 122 | for chunk in completion: 123 | try: 124 | if chunk.choices[0].delta.content: 125 | if fisrtround==0: 126 | print(chunk.choices[0].delta.content, end="", flush=True) 127 | full_response += chunk.choices[0].delta.content 128 | ttftoken = datetime.datetime.now() - start 129 | fisrtround = 1 130 | else: 131 | print(chunk.choices[0].delta.content, end="", flush=True) 132 | full_response += chunk.choices[0].delta.content 133 | except: 134 | pass 135 | new_message["content"] = full_response 136 | history.append(new_message) 137 | counter += 1 138 | delta = datetime.datetime.now() - start 139 | ttofseconds = ttftoken.total_seconds() 140 | deltaseconds = delta.total_seconds() 141 | print('') 142 | print("\033[91;1m") 143 | print(f'Generation time: {deltaseconds} seconds') 144 | print(f'Time to First Token: {ttofseconds} seconds') 145 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 146 | print("\033[92;1m") 147 | stats = createStats(delta,userinput,full_response,rating,logfilename,'CHAT',ttftoken) 148 | print(stats) 149 | print(f'>>>⏱️ Time to First Token: {ttofseconds} seconds') 150 | writehistory(logfilename,f'''👨‍💻 > {userinput} 151 | 💻 > {full_response} 152 | {stats} 153 | >>> Time to First Token: {ttofseconds} seconds 154 | ''') 155 | history = [] 156 | 157 | 158 | ########################## MODEL CARD ################################################### 159 | """ 160 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. 161 | 162 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K. 163 | 164 | Model Details 165 | Architecture 166 | Transformer-based causal decoder-only architecture 167 | 18 decoder blocks 168 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads 169 | Wider head dimension: 256 170 | High RoPE value to support long context understanding: 1000042 171 | Uses SwiGLU and RMSNorm 172 | 8K context length 173 | 131K vocab size 174 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips 175 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data 176 | Supports EN, FR, ES, PT 177 | Developed by Technology Innovation Institute 178 | License: TII Falcon-LLM License 2.0 179 | Model Release Date: December 2024 180 | """ 181 | 182 | """ 183 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 184 | ggml_vulkan: Found 1 Vulkan devices: 185 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none 186 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64 187 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4 188 | 189 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 190 | 191 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3 192 | main: loading model 193 | srv load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf' 194 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free 195 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest)) 196 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 197 | general.architecture str = llama 198 | general.name str = Models Tiiuae Falcon3 1B Instruct 199 | general.finetune str = Instruct 200 | general.basename str = models-tiiuae-Falcon3 201 | general.size_label str = 1.7B 202 | general.license str = other 203 | general.license.name str = falcon-llm-license 204 | general.license.link str = https://falconllm.tii.ae/falcon-terms... 205 | general.base_model.count u32 = 1 206 | eneral.base_model.0.name str = Falcon3 1B Base 207 | ase_model.0.organization str = Tiiuae 208 | al.base_model.0.repo_url str = https://huggingface.co/tiiuae/Falcon3... 209 | general.tags arr[str,1] = ["falcon3"] 210 | general.languages arr[str,4] = ["en", "fr", "es", "pt"] 211 | llama.vocab_size u32 = 131072 212 | llama.context_length u32 = 8192 213 | llama.embedding_length u32 = 2048 214 | llama.block_count u32 = 18 215 | lama.feed_forward_length u32 = 8192 216 | ama.rope.dimension_count u32 = 256 217 | ama.attention.head_count u32 = 8 218 | .attention.head_count_kv u32 = 4 219 | n.layer_norm_rms_epsilon f32 = 0.000001 220 | llama.rope.freq_base f32 = 1000042.000000 221 | general.file_type u32 = 18 222 | tokenizer.ggml.model str = gpt2 223 | tokenizer.ggml.tokens arr[str,131072] = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR... 224 | tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... 225 | okenizer.ggml.token_type arr[i32,131072] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 226 | tokenizer.ggml.merges arr[str,128810] = ["N E", "─á ─á", "─á t", "─á a", "> >... 227 | enizer.ggml.eos_token_id u32 = 11 228 | er.ggml.padding_token_id u32 = 2023 229 | tokenizer.chat_template str = {% for message in messages %}{% if me... 230 | ral.quantization_version u32 = 2 231 | quantize.imatrix.file str = ./Falcon3-1B-Instruct-GGUF_imatrix.dat 232 | quantize.imatrix.dataset str = group_40.txt 233 | ze.imatrix.entries_count i32 = 126 234 | ize.imatrix.chunks_count i32 = 72 235 | llama_model_loader: - type f32: 37 tensors 236 | llama_model_loader: - type q6_K: 128 tensors 237 | llm_load_vocab: missing pre-tokenizer type, using: 'default' 238 | llm_load_vocab: 239 | llm_load_vocab: ************************************ 240 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! 241 | llm_load_vocab: CONSIDER REGENERATING THE MODEL 242 | llm_load_vocab: ************************************ 243 | llm_load_vocab: 244 | llm_load_vocab: control-looking token: 11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden 245 | llm_load_vocab: special tokens cache size = 1 246 | llm_load_vocab: token to piece cache size = 0.8741 MB 247 | llm_load_print_meta: format = GGUF V3 (latest) 248 | llm_load_print_meta: arch = llama 249 | llm_load_print_meta: vocab type = BPE 250 | llm_load_print_meta: n_vocab = 131072 251 | llm_load_print_meta: n_merges = 128810 252 | llm_load_print_meta: vocab_only = 0 253 | llm_load_print_meta: n_ctx_train = 8192 254 | llm_load_print_meta: n_embd = 2048 255 | llm_load_print_meta: n_layer = 18 256 | llm_load_print_meta: n_head = 8 257 | llm_load_print_meta: n_head_kv = 4 258 | llm_load_print_meta: n_rot = 256 259 | llm_load_print_meta: n_swa = 0 260 | llm_load_print_meta: n_embd_head_k = 256 261 | llm_load_print_meta: n_embd_head_v = 256 262 | llm_load_print_meta: n_gqa = 2 263 | llm_load_print_meta: n_embd_k_gqa = 1024 264 | llm_load_print_meta: n_embd_v_gqa = 1024 265 | llm_load_print_meta: f_norm_eps = 0.0e+00 266 | llm_load_print_meta: f_norm_rms_eps = 1.0e-06 267 | llm_load_print_meta: f_clamp_kqv = 0.0e+00 268 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 269 | llm_load_print_meta: f_logit_scale = 0.0e+00 270 | llm_load_print_meta: n_ff = 8192 271 | llm_load_print_meta: n_expert = 0 272 | llm_load_print_meta: n_expert_used = 0 273 | llm_load_print_meta: causal attn = 1 274 | llm_load_print_meta: pooling type = 0 275 | llm_load_print_meta: rope type = 0 276 | llm_load_print_meta: rope scaling = linear 277 | llm_load_print_meta: freq_base_train = 1000042.0 278 | llm_load_print_meta: freq_scale_train = 1 279 | llm_load_print_meta: n_ctx_orig_yarn = 8192 280 | llm_load_print_meta: rope_finetuned = unknown 281 | llm_load_print_meta: ssm_d_conv = 0 282 | llm_load_print_meta: ssm_d_inner = 0 283 | llm_load_print_meta: ssm_d_state = 0 284 | llm_load_print_meta: ssm_dt_rank = 0 285 | llm_load_print_meta: ssm_dt_b_c_rms = 0 286 | llm_load_print_meta: model type = ?B 287 | llm_load_print_meta: model ftype = Q6_K 288 | llm_load_print_meta: model params = 1.67 B 289 | llm_load_print_meta: model size = 1.28 GiB (6.56 BPW) 290 | llm_load_print_meta: general.name = Models Tiiuae Falcon3 1B Instruct 291 | llm_load_print_meta: BOS token = 11 '<|endoftext|>' 292 | llm_load_print_meta: EOS token = 11 '<|endoftext|>' 293 | llm_load_print_meta: EOT token = 11 '<|endoftext|>' 294 | llm_load_print_meta: PAD token = 2023 '<|pad|>' 295 | llm_load_print_meta: LF token = 2150 '├ä' 296 | llm_load_print_meta: EOG token = 11 '<|endoftext|>' 297 | llm_load_print_meta: max token length = 256 298 | ggml_vulkan: Compiling shaders..........................Done! 299 | llm_load_tensors: offloading 0 repeating layers to GPU 300 | llm_load_tensors: offloaded 0/19 layers to GPU 301 | llm_load_tensors: CPU_Mapped model buffer size = 1306.23 MiB 302 | .................................................................... 303 | llama_new_context_with_model: n_seq_max = 1 304 | llama_new_context_with_model: n_ctx = 8192 305 | llama_new_context_with_model: n_ctx_per_seq = 8192 306 | llama_new_context_with_model: n_batch = 2048 307 | llama_new_context_with_model: n_ubatch = 512 308 | llama_new_context_with_model: flash_attn = 0 309 | llama_new_context_with_model: freq_base = 1000042.0 310 | llama_new_context_with_model: freq_scale = 1 311 | llama_kv_cache_init: CPU KV buffer size = 576.00 MiB 312 | llama_new_context_with_model: KV self size = 576.00 MiB, K (f16): 288.00 MiB, V (f16): 288.00 MiB 313 | llama_new_context_with_model: CPU output buffer size = 0.50 MiB 314 | llama_new_context_with_model: Vulkan0 compute buffer size = 470.00 MiB 315 | llama_new_context_with_model: Vulkan_Host compute buffer size = 20.01 MiB 316 | llama_new_context_with_model: graph nodes = 582 317 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1) 318 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192 319 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 320 | srv init: initializing slots, n_slots = 1 321 | slot init: id 0 | task -1 | new slot n_ctx_slot = 8192 322 | main: model loaded 323 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses 324 | main: chat template, built_in: 0, chat_example: '<|im_start|>system 325 | You are a helpful assistant<|im_end|> 326 | <|im_start|>user 327 | Hello<|im_end|> 328 | <|im_start|>assistant 329 | Hi there<|im_end|> 330 | <|im_start|>user 331 | How are you?<|im_end|> 332 | <|im_start|>assistant 333 | ' 334 | main: server is listening on http://127.0.0.1:8001 - starting the main loop 335 | srv update_slots: all slots are idle 336 | """ -------------------------------------------------------------------------------- /autotest/100.Falcon3-1b-instruct_llamaserver_promptTest.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct 3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf) 4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF 5 | ##################################################################################################### 6 | 7 | """ 8 | Model 9 | architecture Falcon3 10 | parameters 1.67B 11 | context length 8192 12 | quantization Q6 13 | 14 | License 15 | TII Falcon-LLM License 2.0 16 | """ 17 | import sys 18 | from time import sleep 19 | import warnings 20 | warnings.filterwarnings(action='ignore') 21 | import datetime 22 | from promptLibv2Qwen import countTokens, writehistory, createCatalog 23 | from promptLibv2Qwen import genRANstring, createStats 24 | import argparse 25 | from openai import OpenAI 26 | ## PREPARING FINAL DATASET 27 | 28 | pd_id = [] 29 | pd_task = [] 30 | pd_vote = [] 31 | pd_remarks = [] 32 | ####################INITIALIZE THE MODEL################################### 33 | stops = ['<|endoftext|>'] 34 | tasks = createCatalog() 35 | modelname = 'Falcon3-1B-Instruct' 36 | rootname = 'Falcon3-1B' 37 | # create THE LOG FILE 38 | coded5 = genRANstring(5) 39 | logfile = f'logs/{rootname}_LCPPsvr_{coded5}_log.txt' 40 | csvfile = f'logs/{rootname}_LCPPsvr_{coded5}.csv' 41 | logfilename = logfile 42 | #Write in the history the first 2 sessions 43 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.') 44 | writehistory(logfilename,f'💻: How can I assist you today in writing?') 45 | # LOAD THE MODEL 46 | print("\033[95;3;6m") 47 | print("1. Waiting 10 seconds for the API to load...") 48 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed") 49 | print(f"2. Model {modelname} loaded with LlamaCPP...") 50 | print("\033[0m") #reset all 51 | history = [] 52 | print("\033[92;1m") 53 | print(f'📝Logfile: {logfilename}') 54 | ##################### ALIGNMENT FIRST GENERATION ############################################## 55 | question = 'Explain the plot of Cinderella in a sentence.' 56 | test = [ 57 | {"role": "user", "content": question} 58 | ] 59 | print('Question:', question) 60 | start = datetime.datetime.now() 61 | print("💻 > ", end="", flush=True) 62 | full_response = "" 63 | fisrtround = 0 64 | completion = client.chat.completions.create( 65 | messages=test, 66 | model='falcon3', 67 | temperature=0.25, 68 | frequency_penalty = 1.178, 69 | stop=stops, 70 | max_tokens=1500, 71 | stream=True 72 | ) 73 | for chunk in completion: 74 | try: 75 | if chunk.choices[0].delta.content: 76 | if fisrtround==0: 77 | print(chunk.choices[0].delta.content, end="", flush=True) 78 | full_response += chunk.choices[0].delta.content 79 | ttftoken = datetime.datetime.now() - start 80 | fisrtround = 1 81 | else: 82 | print(chunk.choices[0].delta.content, end="", flush=True) 83 | full_response += chunk.choices[0].delta.content 84 | except: 85 | pass 86 | delta = datetime.datetime.now() - start 87 | output = full_response 88 | print('') 89 | print("\033[91;1m") 90 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 91 | print("\033[92;1m") 92 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',ttftoken) 93 | print(stats) 94 | writehistory(logfilename,f'''👨‍💻 . {question} 95 | 💻 > {output} 96 | {stats} 97 | ''') 98 | 99 | ############################# AUTOMATIC PROMPTING EVALUATION 11 TURNS ################################# 100 | id =1 101 | for items in tasks: 102 | fisrtround = 0 103 | task = items["task"] 104 | prompt = items["prompt"] 105 | test = [] 106 | print(f'NLP TAKS>>> {task}') 107 | print("\033[91;1m") #red 108 | print(prompt) 109 | test.append({"role": "user", "content": prompt}) 110 | print("\033[92;1m") 111 | full_response = "" 112 | start = datetime.datetime.now() 113 | print("💻 > ", end="", flush=True) 114 | completion = client.chat.completions.create( 115 | messages=history, 116 | model='falcon3', 117 | temperature=0.25, 118 | frequency_penalty = 1.178, 119 | stop=stops, 120 | max_tokens=1500, 121 | stream=True 122 | ) 123 | for chunk in completion: 124 | try: 125 | if chunk.choices[0].delta.content: 126 | if fisrtround==0: 127 | print(chunk.choices[0].delta.content, end="", flush=True) 128 | full_response += chunk.choices[0].delta.content 129 | ttftoken = datetime.datetime.now() - start 130 | fisrtround = 1 131 | else: 132 | print(chunk.choices[0].delta.content, end="", flush=True) 133 | full_response += chunk.choices[0].delta.content 134 | except: 135 | pass 136 | delta = datetime.datetime.now() - start 137 | print('') 138 | print("\033[91;1m") 139 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 140 | print("\033[92;1m") 141 | stats = createStats(delta,prompt,full_response,rating,logfilename,task,ttftoken) 142 | print(stats) 143 | writehistory(logfilename,f'''👨‍💻 > {prompt} 144 | 💻 > {full_response} 145 | {stats} 146 | ''') 147 | pd_id.append(id) 148 | pd_task.append(task) 149 | pd_vote.append(rating[:2]) 150 | pd_remarks.append(rating[2:]) 151 | id += 1 152 | # create dataframe and save to csv 153 | zipped = list(zip(pd_id,pd_task,pd_vote,pd_remarks)) 154 | import pandas as pdd 155 | df = pdd.DataFrame(zipped, columns=['#', 'TASK', 'VOTE','REMARKS']) 156 | #saving the DataFrame as a CSV file 157 | df_csv_data = df.to_csv(csvfile, index = False, encoding='utf-8') 158 | print('\nCSV String:\n', df) 159 | from rich.console import Console 160 | console = Console() 161 | console.print('---') 162 | console.print(df) 163 | 164 | ###############################MODEL CARD############################################### 165 | """ 166 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. 167 | 168 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K. 169 | 170 | Model Details 171 | Architecture 172 | Transformer-based causal decoder-only architecture 173 | 18 decoder blocks 174 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads 175 | Wider head dimension: 256 176 | High RoPE value to support long context understanding: 1000042 177 | Uses SwiGLU and RMSNorm 178 | 8K context length 179 | 131K vocab size 180 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips 181 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data 182 | Supports EN, FR, ES, PT 183 | Developed by Technology Innovation Institute 184 | License: TII Falcon-LLM License 2.0 185 | Model Release Date: December 2024 186 | """ 187 | 188 | """ 189 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 190 | ggml_vulkan: Found 1 Vulkan devices: 191 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none 192 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64 193 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4 194 | 195 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 196 | 197 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3 198 | main: loading model 199 | srv load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf' 200 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free 201 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest)) 202 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 203 | general.architecture str = llama 204 | general.name str = Models Tiiuae Falcon3 1B Instruct 205 | general.finetune str = Instruct 206 | general.basename str = models-tiiuae-Falcon3 207 | general.size_label str = 1.7B 208 | general.license str = other 209 | general.license.name str = falcon-llm-license 210 | general.license.link str = https://falconllm.tii.ae/falcon-terms... 211 | general.base_model.count u32 = 1 212 | eneral.base_model.0.name str = Falcon3 1B Base 213 | ase_model.0.organization str = Tiiuae 214 | al.base_model.0.repo_url str = https://huggingface.co/tiiuae/Falcon3... 215 | general.tags arr[str,1] = ["falcon3"] 216 | general.languages arr[str,4] = ["en", "fr", "es", "pt"] 217 | llama.vocab_size u32 = 131072 218 | llama.context_length u32 = 8192 219 | llama.embedding_length u32 = 2048 220 | llama.block_count u32 = 18 221 | lama.feed_forward_length u32 = 8192 222 | ama.rope.dimension_count u32 = 256 223 | ama.attention.head_count u32 = 8 224 | .attention.head_count_kv u32 = 4 225 | n.layer_norm_rms_epsilon f32 = 0.000001 226 | llama.rope.freq_base f32 = 1000042.000000 227 | general.file_type u32 = 18 228 | tokenizer.ggml.model str = gpt2 229 | tokenizer.ggml.tokens arr[str,131072] = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR... 230 | tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... 231 | okenizer.ggml.token_type arr[i32,131072] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 232 | tokenizer.ggml.merges arr[str,128810] = ["N E", "─á ─á", "─á t", "─á a", "> >... 233 | enizer.ggml.eos_token_id u32 = 11 234 | er.ggml.padding_token_id u32 = 2023 235 | tokenizer.chat_template str = {% for message in messages %}{% if me... 236 | ral.quantization_version u32 = 2 237 | quantize.imatrix.file str = ./Falcon3-1B-Instruct-GGUF_imatrix.dat 238 | quantize.imatrix.dataset str = group_40.txt 239 | ze.imatrix.entries_count i32 = 126 240 | ize.imatrix.chunks_count i32 = 72 241 | llama_model_loader: - type f32: 37 tensors 242 | llama_model_loader: - type q6_K: 128 tensors 243 | llm_load_vocab: missing pre-tokenizer type, using: 'default' 244 | llm_load_vocab: 245 | llm_load_vocab: ************************************ 246 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! 247 | llm_load_vocab: CONSIDER REGENERATING THE MODEL 248 | llm_load_vocab: ************************************ 249 | llm_load_vocab: 250 | llm_load_vocab: control-looking token: 11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden 251 | llm_load_vocab: special tokens cache size = 1 252 | llm_load_vocab: token to piece cache size = 0.8741 MB 253 | llm_load_print_meta: format = GGUF V3 (latest) 254 | llm_load_print_meta: arch = llama 255 | llm_load_print_meta: vocab type = BPE 256 | llm_load_print_meta: n_vocab = 131072 257 | llm_load_print_meta: n_merges = 128810 258 | llm_load_print_meta: vocab_only = 0 259 | llm_load_print_meta: n_ctx_train = 8192 260 | llm_load_print_meta: n_embd = 2048 261 | llm_load_print_meta: n_layer = 18 262 | llm_load_print_meta: n_head = 8 263 | llm_load_print_meta: n_head_kv = 4 264 | llm_load_print_meta: n_rot = 256 265 | llm_load_print_meta: n_swa = 0 266 | llm_load_print_meta: n_embd_head_k = 256 267 | llm_load_print_meta: n_embd_head_v = 256 268 | llm_load_print_meta: n_gqa = 2 269 | llm_load_print_meta: n_embd_k_gqa = 1024 270 | llm_load_print_meta: n_embd_v_gqa = 1024 271 | llm_load_print_meta: f_norm_eps = 0.0e+00 272 | llm_load_print_meta: f_norm_rms_eps = 1.0e-06 273 | llm_load_print_meta: f_clamp_kqv = 0.0e+00 274 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 275 | llm_load_print_meta: f_logit_scale = 0.0e+00 276 | llm_load_print_meta: n_ff = 8192 277 | llm_load_print_meta: n_expert = 0 278 | llm_load_print_meta: n_expert_used = 0 279 | llm_load_print_meta: causal attn = 1 280 | llm_load_print_meta: pooling type = 0 281 | llm_load_print_meta: rope type = 0 282 | llm_load_print_meta: rope scaling = linear 283 | llm_load_print_meta: freq_base_train = 1000042.0 284 | llm_load_print_meta: freq_scale_train = 1 285 | llm_load_print_meta: n_ctx_orig_yarn = 8192 286 | llm_load_print_meta: rope_finetuned = unknown 287 | llm_load_print_meta: ssm_d_conv = 0 288 | llm_load_print_meta: ssm_d_inner = 0 289 | llm_load_print_meta: ssm_d_state = 0 290 | llm_load_print_meta: ssm_dt_rank = 0 291 | llm_load_print_meta: ssm_dt_b_c_rms = 0 292 | llm_load_print_meta: model type = ?B 293 | llm_load_print_meta: model ftype = Q6_K 294 | llm_load_print_meta: model params = 1.67 B 295 | llm_load_print_meta: model size = 1.28 GiB (6.56 BPW) 296 | llm_load_print_meta: general.name = Models Tiiuae Falcon3 1B Instruct 297 | llm_load_print_meta: BOS token = 11 '<|endoftext|>' 298 | llm_load_print_meta: EOS token = 11 '<|endoftext|>' 299 | llm_load_print_meta: EOT token = 11 '<|endoftext|>' 300 | llm_load_print_meta: PAD token = 2023 '<|pad|>' 301 | llm_load_print_meta: LF token = 2150 '├ä' 302 | llm_load_print_meta: EOG token = 11 '<|endoftext|>' 303 | llm_load_print_meta: max token length = 256 304 | ggml_vulkan: Compiling shaders..........................Done! 305 | llm_load_tensors: offloading 0 repeating layers to GPU 306 | llm_load_tensors: offloaded 0/19 layers to GPU 307 | llm_load_tensors: CPU_Mapped model buffer size = 1306.23 MiB 308 | .................................................................... 309 | llama_new_context_with_model: n_seq_max = 1 310 | llama_new_context_with_model: n_ctx = 8192 311 | llama_new_context_with_model: n_ctx_per_seq = 8192 312 | llama_new_context_with_model: n_batch = 2048 313 | llama_new_context_with_model: n_ubatch = 512 314 | llama_new_context_with_model: flash_attn = 0 315 | llama_new_context_with_model: freq_base = 1000042.0 316 | llama_new_context_with_model: freq_scale = 1 317 | llama_kv_cache_init: CPU KV buffer size = 576.00 MiB 318 | llama_new_context_with_model: KV self size = 576.00 MiB, K (f16): 288.00 MiB, V (f16): 288.00 MiB 319 | llama_new_context_with_model: CPU output buffer size = 0.50 MiB 320 | llama_new_context_with_model: Vulkan0 compute buffer size = 470.00 MiB 321 | llama_new_context_with_model: Vulkan_Host compute buffer size = 20.01 MiB 322 | llama_new_context_with_model: graph nodes = 582 323 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1) 324 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192 325 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 326 | srv init: initializing slots, n_slots = 1 327 | slot init: id 0 | task -1 | new slot n_ctx_slot = 8192 328 | main: model loaded 329 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses 330 | main: chat template, built_in: 0, chat_example: '<|im_start|>system 331 | You are a helpful assistant<|im_end|> 332 | <|im_start|>user 333 | Hello<|im_end|> 334 | <|im_start|>assistant 335 | Hi there<|im_end|> 336 | <|im_start|>user 337 | How are you?<|im_end|> 338 | <|im_start|>assistant 339 | ' 340 | main: server is listening on http://127.0.0.1:8001 - starting the main loop 341 | srv update_slots: all slots are idle 342 | """ -------------------------------------------------------------------------------- /101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct 3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf) 4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF 5 | ##################################################################################################### 6 | 7 | """ 8 | Model 9 | architecture Falcon3 10 | parameters 1.67B 11 | context length 8192 12 | quantization Q6 13 | 14 | License 15 | TII Falcon-LLM License 2.0 16 | """ 17 | import sys 18 | from time import sleep 19 | import warnings 20 | warnings.filterwarnings(action='ignore') 21 | import datetime 22 | from promptLibv2Falcon3 import countTokens, writehistory, createCatalog 23 | from promptLibv2Falcon3 import genRANstring, createStats 24 | import argparse 25 | from openai import OpenAI 26 | 27 | 28 | 29 | stops = ['<|endoftext|>'] 30 | tasks = createCatalog() 31 | modelname = 'Falcon3-1B-Instruct' 32 | # create THE LOG FILE 33 | logfile = f'logs/{modelname}_CHAT_LCPPsrv_{genRANstring(5)}_log.txt' 34 | logfilename = logfile 35 | #Write in the history the first 2 sessions 36 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.') 37 | writehistory(logfilename,f'💻: How can I assist you today in writing?') 38 | 39 | print("\033[95;3;6m") 40 | print("1. Waiting 10 seconds for the API to load...") 41 | # using OpenAI library to connect to llama-server llama.cpp API endpoint 42 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed") 43 | print(f"2. Model {modelname} loaded with LLAMACPP-SERVER...") 44 | print("\033[0m") #reset all 45 | # ASCII ART FROM https://asciiart.club/ 46 | print(""" 47 | 48 | 49 | ░░░░░░░░░░░░░░ 50 | ░░░░░░░░░░░░░░ 51 | ░░░░░░░░░░░░░░ , ,, 52 | ░░░░░░░░░░░░░░ ╓╣▒▒▒▒╢╫φ 53 | ░░░░░░░░░░░░░░ ]▒▒▒▒▒▒▒╢▓▓ 54 | ░░░░░░░░░░░░░░ ▐▄▄▓▓█████▓M 55 | ░░░░█▀▀▀▀░░░░░ ▐█ ╗M╜╨╗ ▀▀▒╖,█▌.]▒▓▌ 56 | ░░░░█░░░░░ ▄▄▄▄ ▐█ ,▄▄▄ ,▄▄▄; ,▄▄▄▄ ║C , `╜`░░'▒╢▒▒▓▓╗ 57 | ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█ ]█ ▀▌ ▐▌ ▐█ ╨╝╖ ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌ 58 | ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌ ▐▌ ▐█ ▐▌ █ ]@ ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓ 59 | ░░░░█░░░░░▀█▄▄█▀ ▐█ ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌ █ ╚H╗m╝ ╙╫▒▒▒▒▐░▒░▒████▓╣╗ 60 | ░░░░░░░░░░░░░░ ▀▓▓▓▓╬╢▓▓▓█████▓ 61 | ░░░░░░░░░░░░░ ┌─ ▀▀▓▓▓▓▓▒▓▀████▓, 62 | ░░░░░░░░░░░░░░ ]`┌^hh] ╖ ▓▌ ▓█ ▀▀▓█▌╕ 63 | ░░░░░░░░░░░░░░ ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄ '"` 64 | ░░░░░░░░░░░░░░ ╓Nß▀▀ ▄ 65 | ░░░░░░░░░░░░░░ ' 66 | ░░░░░░░░░░░░░░ 67 | ░░░░░░░░░░░░░░░░░░░░░░░░░░ 68 | 69 | --- 70 | """) 71 | history = [] 72 | print("\033[92;1m") 73 | print(f'📝Logfile: {logfilename}') 74 | 75 | ##################### ALIGNMENT FIRST GENERATION ############################################## 76 | question = 'Explain the plot of Cinderella in a sentence.' 77 | test = [ 78 | {"role": "user", "content": question} 79 | ] 80 | 81 | print('Question:', question) 82 | start = datetime.datetime.now() 83 | print("💻 > ", end="", flush=True) 84 | full_response = "" 85 | completion = client.chat.completions.create( 86 | messages=test, 87 | model='local-model', 88 | temperature=0.25, 89 | frequency_penalty = 1.178, 90 | stop=stops, 91 | max_tokens=1500, 92 | stream=True 93 | ) 94 | for chunk in completion: 95 | try: 96 | if chunk.choices[0].delta.content: 97 | print(chunk.choices[0].delta.content, end="", flush=True) 98 | full_response += chunk.choices[0].delta.content 99 | except: 100 | pass 101 | delta = datetime.datetime.now() - start 102 | output = full_response 103 | print('') 104 | print("\033[91;1m") 105 | rating = 'PUT IT LATER'#input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 106 | print("\033[0m",end='') #reset all 107 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',delta) 108 | print(stats) 109 | writehistory(logfilename,f'''👨‍💻 . {question} 110 | 💻 > {output} 111 | {stats} 112 | ''') 113 | 114 | ############################# START TURN BASED CHAT ################################# 115 | print('Starting now Normal Chat turn based interface...') 116 | counter = 1 117 | while True: 118 | # Reset history every turn 119 | history = [] 120 | userinput = "" 121 | print("\033[1;30m") #dark grey 122 | print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:") 123 | print("\033[91;1m") #red 124 | lines = sys.stdin.readlines() 125 | for line in lines: 126 | userinput += line + "\n" 127 | if "quit!" in lines[0].lower(): 128 | print("\033[0mBYE BYE!") 129 | break 130 | history.append({"role": "user", "content": userinput}) 131 | print("\033[92;1m") 132 | # Preparing Generation history pair 133 | new_message = {"role": "assistant", "content": ""} 134 | # Starting generation loop 135 | full_response = "" 136 | fisrtround = 0 137 | start = datetime.datetime.now() 138 | print("💻 > ", end="", flush=True) 139 | completion = client.chat.completions.create( 140 | messages=history, 141 | model='local-model', 142 | temperature=0.25, 143 | frequency_penalty = 1.178, 144 | stop=stops, 145 | max_tokens=1500, 146 | stream=True 147 | ) 148 | for chunk in completion: 149 | try: 150 | if chunk.choices[0].delta.content: 151 | if fisrtround==0: 152 | print(chunk.choices[0].delta.content, end="", flush=True) 153 | full_response += chunk.choices[0].delta.content 154 | ttftoken = datetime.datetime.now() - start 155 | fisrtround = 1 156 | else: 157 | print(chunk.choices[0].delta.content, end="", flush=True) 158 | full_response += chunk.choices[0].delta.content 159 | except: 160 | pass 161 | new_message["content"] = full_response 162 | history.append(new_message) 163 | counter += 1 164 | delta = datetime.datetime.now() - start 165 | ttofseconds = ttftoken.total_seconds() 166 | deltaseconds = delta.total_seconds() 167 | print('') 168 | print("\033[91;1m") 169 | print(f'Generation time: {deltaseconds} seconds') 170 | print(f'Time to First Token: {ttofseconds} seconds') 171 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 172 | print("\033[0m",end='') #reset all 173 | stats = createStats(delta,userinput,full_response,rating,logfilename,'CHAT',ttftoken) 174 | print(stats) 175 | print(f'>>>⏱️ Time to First Token: {ttofseconds} seconds') 176 | writehistory(logfilename,f'''👨‍💻 > {userinput} 177 | 💻 > {full_response} 178 | {stats} 179 | >>> Time to First Token: {ttofseconds} seconds 180 | ''') 181 | history = [] 182 | 183 | 184 | ########################## MODEL CARD ################################################### 185 | """ 186 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. 187 | 188 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K. 189 | 190 | Model Details 191 | Architecture 192 | Transformer-based causal decoder-only architecture 193 | 18 decoder blocks 194 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads 195 | Wider head dimension: 256 196 | High RoPE value to support long context understanding: 1000042 197 | Uses SwiGLU and RMSNorm 198 | 8K context length 199 | 131K vocab size 200 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips 201 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data 202 | Supports EN, FR, ES, PT 203 | Developed by Technology Innovation Institute 204 | License: TII Falcon-LLM License 2.0 205 | Model Release Date: December 2024 206 | """ 207 | 208 | """ 209 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 210 | ggml_vulkan: Found 1 Vulkan devices: 211 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none 212 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64 213 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4 214 | 215 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 216 | 217 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3 218 | main: loading model 219 | srv load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf' 220 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free 221 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest)) 222 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 223 | general.architecture str = llama 224 | general.name str = Models Tiiuae Falcon3 1B Instruct 225 | general.finetune str = Instruct 226 | general.basename str = models-tiiuae-Falcon3 227 | general.size_label str = 1.7B 228 | general.license str = other 229 | general.license.name str = falcon-llm-license 230 | general.license.link str = https://falconllm.tii.ae/falcon-terms... 231 | general.base_model.count u32 = 1 232 | eneral.base_model.0.name str = Falcon3 1B Base 233 | ase_model.0.organization str = Tiiuae 234 | al.base_model.0.repo_url str = https://huggingface.co/tiiuae/Falcon3... 235 | general.tags arr[str,1] = ["falcon3"] 236 | general.languages arr[str,4] = ["en", "fr", "es", "pt"] 237 | llama.vocab_size u32 = 131072 238 | llama.context_length u32 = 8192 239 | llama.embedding_length u32 = 2048 240 | llama.block_count u32 = 18 241 | lama.feed_forward_length u32 = 8192 242 | ama.rope.dimension_count u32 = 256 243 | ama.attention.head_count u32 = 8 244 | .attention.head_count_kv u32 = 4 245 | n.layer_norm_rms_epsilon f32 = 0.000001 246 | llama.rope.freq_base f32 = 1000042.000000 247 | general.file_type u32 = 18 248 | tokenizer.ggml.model str = gpt2 249 | tokenizer.ggml.tokens arr[str,131072] = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR... 250 | tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... 251 | okenizer.ggml.token_type arr[i32,131072] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 252 | tokenizer.ggml.merges arr[str,128810] = ["N E", "─á ─á", "─á t", "─á a", "> >... 253 | enizer.ggml.eos_token_id u32 = 11 254 | er.ggml.padding_token_id u32 = 2023 255 | tokenizer.chat_template str = {% for message in messages %}{% if me... 256 | ral.quantization_version u32 = 2 257 | quantize.imatrix.file str = ./Falcon3-1B-Instruct-GGUF_imatrix.dat 258 | quantize.imatrix.dataset str = group_40.txt 259 | ze.imatrix.entries_count i32 = 126 260 | ize.imatrix.chunks_count i32 = 72 261 | llama_model_loader: - type f32: 37 tensors 262 | llama_model_loader: - type q6_K: 128 tensors 263 | llm_load_vocab: missing pre-tokenizer type, using: 'default' 264 | llm_load_vocab: 265 | llm_load_vocab: ************************************ 266 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! 267 | llm_load_vocab: CONSIDER REGENERATING THE MODEL 268 | llm_load_vocab: ************************************ 269 | llm_load_vocab: 270 | llm_load_vocab: control-looking token: 11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden 271 | llm_load_vocab: special tokens cache size = 1 272 | llm_load_vocab: token to piece cache size = 0.8741 MB 273 | llm_load_print_meta: format = GGUF V3 (latest) 274 | llm_load_print_meta: arch = llama 275 | llm_load_print_meta: vocab type = BPE 276 | llm_load_print_meta: n_vocab = 131072 277 | llm_load_print_meta: n_merges = 128810 278 | llm_load_print_meta: vocab_only = 0 279 | llm_load_print_meta: n_ctx_train = 8192 280 | llm_load_print_meta: n_embd = 2048 281 | llm_load_print_meta: n_layer = 18 282 | llm_load_print_meta: n_head = 8 283 | llm_load_print_meta: n_head_kv = 4 284 | llm_load_print_meta: n_rot = 256 285 | llm_load_print_meta: n_swa = 0 286 | llm_load_print_meta: n_embd_head_k = 256 287 | llm_load_print_meta: n_embd_head_v = 256 288 | llm_load_print_meta: n_gqa = 2 289 | llm_load_print_meta: n_embd_k_gqa = 1024 290 | llm_load_print_meta: n_embd_v_gqa = 1024 291 | llm_load_print_meta: f_norm_eps = 0.0e+00 292 | llm_load_print_meta: f_norm_rms_eps = 1.0e-06 293 | llm_load_print_meta: f_clamp_kqv = 0.0e+00 294 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 295 | llm_load_print_meta: f_logit_scale = 0.0e+00 296 | llm_load_print_meta: n_ff = 8192 297 | llm_load_print_meta: n_expert = 0 298 | llm_load_print_meta: n_expert_used = 0 299 | llm_load_print_meta: causal attn = 1 300 | llm_load_print_meta: pooling type = 0 301 | llm_load_print_meta: rope type = 0 302 | llm_load_print_meta: rope scaling = linear 303 | llm_load_print_meta: freq_base_train = 1000042.0 304 | llm_load_print_meta: freq_scale_train = 1 305 | llm_load_print_meta: n_ctx_orig_yarn = 8192 306 | llm_load_print_meta: rope_finetuned = unknown 307 | llm_load_print_meta: ssm_d_conv = 0 308 | llm_load_print_meta: ssm_d_inner = 0 309 | llm_load_print_meta: ssm_d_state = 0 310 | llm_load_print_meta: ssm_dt_rank = 0 311 | llm_load_print_meta: ssm_dt_b_c_rms = 0 312 | llm_load_print_meta: model type = ?B 313 | llm_load_print_meta: model ftype = Q6_K 314 | llm_load_print_meta: model params = 1.67 B 315 | llm_load_print_meta: model size = 1.28 GiB (6.56 BPW) 316 | llm_load_print_meta: general.name = Models Tiiuae Falcon3 1B Instruct 317 | llm_load_print_meta: BOS token = 11 '<|endoftext|>' 318 | llm_load_print_meta: EOS token = 11 '<|endoftext|>' 319 | llm_load_print_meta: EOT token = 11 '<|endoftext|>' 320 | llm_load_print_meta: PAD token = 2023 '<|pad|>' 321 | llm_load_print_meta: LF token = 2150 '├ä' 322 | llm_load_print_meta: EOG token = 11 '<|endoftext|>' 323 | llm_load_print_meta: max token length = 256 324 | ggml_vulkan: Compiling shaders..........................Done! 325 | llm_load_tensors: offloading 0 repeating layers to GPU 326 | llm_load_tensors: offloaded 0/19 layers to GPU 327 | llm_load_tensors: CPU_Mapped model buffer size = 1306.23 MiB 328 | .................................................................... 329 | llama_new_context_with_model: n_seq_max = 1 330 | llama_new_context_with_model: n_ctx = 8192 331 | llama_new_context_with_model: n_ctx_per_seq = 8192 332 | llama_new_context_with_model: n_batch = 2048 333 | llama_new_context_with_model: n_ubatch = 512 334 | llama_new_context_with_model: flash_attn = 0 335 | llama_new_context_with_model: freq_base = 1000042.0 336 | llama_new_context_with_model: freq_scale = 1 337 | llama_kv_cache_init: CPU KV buffer size = 576.00 MiB 338 | llama_new_context_with_model: KV self size = 576.00 MiB, K (f16): 288.00 MiB, V (f16): 288.00 MiB 339 | llama_new_context_with_model: CPU output buffer size = 0.50 MiB 340 | llama_new_context_with_model: Vulkan0 compute buffer size = 470.00 MiB 341 | llama_new_context_with_model: Vulkan_Host compute buffer size = 20.01 MiB 342 | llama_new_context_with_model: graph nodes = 582 343 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1) 344 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192 345 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 346 | srv init: initializing slots, n_slots = 1 347 | slot init: id 0 | task -1 | new slot n_ctx_slot = 8192 348 | main: model loaded 349 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses 350 | main: chat template, built_in: 0, chat_example: '<|im_start|>system 351 | You are a helpful assistant<|im_end|> 352 | <|im_start|>user 353 | Hello<|im_end|> 354 | <|im_start|>assistant 355 | Hi there<|im_end|> 356 | <|im_start|>user 357 | How are you?<|im_end|> 358 | <|im_start|>assistant 359 | ' 360 | main: server is listening on http://127.0.0.1:8001 - starting the main loop 361 | srv update_slots: all slots are idle 362 | """ -------------------------------------------------------------------------------- /100.Falcon3-1b-instruct_llamaserver_promptTest.py: -------------------------------------------------------------------------------- 1 | # Chat with an intelligent assistant in your terminal 2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct 3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf) 4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF 5 | ##################################################################################################### 6 | 7 | """ 8 | Model 9 | architecture Falcon3 10 | parameters 1.67B 11 | context length 8192 12 | quantization Q6 13 | 14 | License 15 | TII Falcon-LLM License 2.0 16 | """ 17 | import sys 18 | from time import sleep 19 | import warnings 20 | warnings.filterwarnings(action='ignore') 21 | import datetime 22 | from promptLibv2Falcon3 import countTokens, writehistory, createCatalog 23 | from promptLibv2Falcon3 import genRANstring, createStats 24 | import argparse 25 | from openai import OpenAI 26 | ## PREPARING FINAL DATASET 27 | 28 | pd_id = [] 29 | pd_task = [] 30 | pd_vote = [] 31 | pd_remarks = [] 32 | ####################INITIALIZE THE MODEL################################### 33 | stops = ['<|endoftext|>'] 34 | tasks = createCatalog() 35 | modelname = 'Falcon3-1B-Instruct' 36 | rootname = 'Falcon3-1B' 37 | # create THE LOG FILE 38 | coded5 = genRANstring(5) 39 | logfile = f'logs/{rootname}_LCPPsvr_{coded5}_log.txt' 40 | csvfile = f'logs/{rootname}_LCPPsvr_{coded5}.csv' 41 | logfilename = logfile 42 | #Write in the history the first 2 sessions 43 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.') 44 | writehistory(logfilename,f'💻: How can I assist you today in writing?') 45 | # LOAD THE MODEL 46 | print("\033[95;3;6m") 47 | print("1. Waiting 10 seconds for the API to load...") 48 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed") 49 | print(f"2. Model {modelname} loaded with LlamaCPP...") 50 | print("\033[0m") #reset all 51 | history = [] 52 | print("\033[92;1m") 53 | print(f'📝Logfile: {logfilename}') 54 | # ASCII ART FROM https://asciiart.club/ 55 | print(""" 56 | 57 | 58 | ░░░░░░░░░░░░░░ 59 | ░░░░░░░░░░░░░░ 60 | ░░░░░░░░░░░░░░ , ,, 61 | ░░░░░░░░░░░░░░ ╓╣▒▒▒▒╢╫φ 62 | ░░░░░░░░░░░░░░ ]▒▒▒▒▒▒▒╢▓▓ 63 | ░░░░░░░░░░░░░░ ▐▄▄▓▓█████▓M 64 | ░░░░█▀▀▀▀░░░░░ ▐█ ╗M╜╨╗ ▀▀▒╖,█▌.]▒▓▌ 65 | ░░░░█░░░░░ ▄▄▄▄ ▐█ ,▄▄▄ ,▄▄▄; ,▄▄▄▄ ║C , `╜`░░'▒╢▒▒▓▓╗ 66 | ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█ ]█ ▀▌ ▐▌ ▐█ ╨╝╖ ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌ 67 | ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌ ▐▌ ▐█ ▐▌ █ ]@ ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓ 68 | ░░░░█░░░░░▀█▄▄█▀ ▐█ ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌ █ ╚H╗m╝ ╙╫▒▒▒▒▐░▒░▒████▓╣╗ 69 | ░░░░░░░░░░░░░░ ▀▓▓▓▓╬╢▓▓▓█████▓ 70 | ░░░░░░░░░░░░░ ┌─ ▀▀▓▓▓▓▓▒▓▀████▓, 71 | ░░░░░░░░░░░░░░ ]`┌^hh] ╖ ▓▌ ▓█ ▀▀▓█▌╕ 72 | ░░░░░░░░░░░░░░ ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄ '"` 73 | ░░░░░░░░░░░░░░ ╓Nß▀▀ ▄ 74 | ░░░░░░░░░░░░░░ ' 75 | ░░░░░░░░░░░░░░ 76 | ░░░░░░░░░░░░░░░░░░░░░░░░░░ 77 | 78 | --- 79 | """) 80 | ##################### ALIGNMENT FIRST GENERATION ############################################## 81 | question = 'Explain the plot of Cinderella in a sentence.' 82 | test = [ 83 | {"role": "user", "content": question} 84 | ] 85 | print('Question:', question) 86 | start = datetime.datetime.now() 87 | print("💻 > ", end="", flush=True) 88 | full_response = "" 89 | fisrtround = 0 90 | completion = client.chat.completions.create( 91 | messages=test, 92 | model='falcon3', 93 | temperature=0.25, 94 | frequency_penalty = 1.178, 95 | stop=stops, 96 | max_tokens=1500, 97 | stream=True 98 | ) 99 | for chunk in completion: 100 | try: 101 | if chunk.choices[0].delta.content: 102 | if fisrtround==0: 103 | print(chunk.choices[0].delta.content, end="", flush=True) 104 | full_response += chunk.choices[0].delta.content 105 | ttftoken = datetime.datetime.now() - start 106 | fisrtround = 1 107 | else: 108 | print(chunk.choices[0].delta.content, end="", flush=True) 109 | full_response += chunk.choices[0].delta.content 110 | except: 111 | pass 112 | delta = datetime.datetime.now() - start 113 | output = full_response 114 | print('') 115 | print("\033[91;1m") 116 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 117 | print("\033[0m",end='') #reset all 118 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',ttftoken) 119 | print(stats) 120 | writehistory(logfilename,f'''👨‍💻 . {question} 121 | 💻 > {output} 122 | {stats} 123 | ''') 124 | 125 | ############################# AUTOMATIC PROMPTING EVALUATION 11 TURNS ################################# 126 | id =1 127 | for items in tasks: 128 | fisrtround = 0 129 | task = items["task"] 130 | prompt = items["prompt"] 131 | test = [] 132 | print(f'NLP TAKS>>> {task}') 133 | print("\033[91;1m") #red 134 | print(prompt) 135 | test.append({"role": "user", "content": prompt}) 136 | print("\033[92;1m") 137 | full_response = "" 138 | start = datetime.datetime.now() 139 | print("💻 > ", end="", flush=True) 140 | completion = client.chat.completions.create( 141 | messages=test, 142 | model='falcon3', 143 | temperature=0.25, 144 | frequency_penalty = 1.178, 145 | stop=stops, 146 | max_tokens=1500, 147 | stream=True 148 | ) 149 | for chunk in completion: 150 | try: 151 | if chunk.choices[0].delta.content: 152 | if fisrtround==0: 153 | print(chunk.choices[0].delta.content, end="", flush=True) 154 | full_response += chunk.choices[0].delta.content 155 | ttftoken = datetime.datetime.now() - start 156 | fisrtround = 1 157 | else: 158 | print(chunk.choices[0].delta.content, end="", flush=True) 159 | full_response += chunk.choices[0].delta.content 160 | except: 161 | pass 162 | delta = datetime.datetime.now() - start 163 | print('') 164 | print("\033[91;1m") 165 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ') 166 | print("\033[0m",end='') #reset all 167 | stats = createStats(delta,prompt,full_response,rating,logfilename,task,ttftoken) 168 | print(stats) 169 | writehistory(logfilename,f'''👨‍💻 > {prompt} 170 | 💻 > {full_response} 171 | {stats} 172 | ''') 173 | pd_id.append(id) 174 | pd_task.append(task) 175 | pd_vote.append(rating[:2]) 176 | pd_remarks.append(rating[2:]) 177 | id += 1 178 | # create dataframe and save to csv 179 | zipped = list(zip(pd_id,pd_task,pd_vote,pd_remarks)) 180 | import pandas as pdd 181 | df = pdd.DataFrame(zipped, columns=['#', 'TASK', 'VOTE','REMARKS']) 182 | #saving the DataFrame as a CSV file 183 | df_csv_data = df.to_csv(csvfile, index = False, encoding='utf-8') 184 | print('\nCSV String:\n', df) 185 | from rich.console import Console 186 | console = Console() 187 | console.print('---') 188 | console.print(df) 189 | 190 | ###############################MODEL CARD############################################### 191 | """ 192 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. 193 | 194 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K. 195 | 196 | Model Details 197 | Architecture 198 | Transformer-based causal decoder-only architecture 199 | 18 decoder blocks 200 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads 201 | Wider head dimension: 256 202 | High RoPE value to support long context understanding: 1000042 203 | Uses SwiGLU and RMSNorm 204 | 8K context length 205 | 131K vocab size 206 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips 207 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data 208 | Supports EN, FR, ES, PT 209 | Developed by Technology Innovation Institute 210 | License: TII Falcon-LLM License 2.0 211 | Model Release Date: December 2024 212 | """ 213 | 214 | """ 215 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001 216 | ggml_vulkan: Found 1 Vulkan devices: 217 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none 218 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64 219 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4 220 | 221 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 222 | 223 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3 224 | main: loading model 225 | srv load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf' 226 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free 227 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest)) 228 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 229 | general.architecture str = llama 230 | general.name str = Models Tiiuae Falcon3 1B Instruct 231 | general.finetune str = Instruct 232 | general.basename str = models-tiiuae-Falcon3 233 | general.size_label str = 1.7B 234 | general.license str = other 235 | general.license.name str = falcon-llm-license 236 | general.license.link str = https://falconllm.tii.ae/falcon-terms... 237 | general.base_model.count u32 = 1 238 | eneral.base_model.0.name str = Falcon3 1B Base 239 | ase_model.0.organization str = Tiiuae 240 | al.base_model.0.repo_url str = https://huggingface.co/tiiuae/Falcon3... 241 | general.tags arr[str,1] = ["falcon3"] 242 | general.languages arr[str,4] = ["en", "fr", "es", "pt"] 243 | llama.vocab_size u32 = 131072 244 | llama.context_length u32 = 8192 245 | llama.embedding_length u32 = 2048 246 | llama.block_count u32 = 18 247 | lama.feed_forward_length u32 = 8192 248 | ama.rope.dimension_count u32 = 256 249 | ama.attention.head_count u32 = 8 250 | .attention.head_count_kv u32 = 4 251 | n.layer_norm_rms_epsilon f32 = 0.000001 252 | llama.rope.freq_base f32 = 1000042.000000 253 | general.file_type u32 = 18 254 | tokenizer.ggml.model str = gpt2 255 | tokenizer.ggml.tokens arr[str,131072] = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR... 256 | tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... 257 | okenizer.ggml.token_type arr[i32,131072] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 258 | tokenizer.ggml.merges arr[str,128810] = ["N E", "─á ─á", "─á t", "─á a", "> >... 259 | enizer.ggml.eos_token_id u32 = 11 260 | er.ggml.padding_token_id u32 = 2023 261 | tokenizer.chat_template str = {% for message in messages %}{% if me... 262 | ral.quantization_version u32 = 2 263 | quantize.imatrix.file str = ./Falcon3-1B-Instruct-GGUF_imatrix.dat 264 | quantize.imatrix.dataset str = group_40.txt 265 | ze.imatrix.entries_count i32 = 126 266 | ize.imatrix.chunks_count i32 = 72 267 | llama_model_loader: - type f32: 37 tensors 268 | llama_model_loader: - type q6_K: 128 tensors 269 | llm_load_vocab: missing pre-tokenizer type, using: 'default' 270 | llm_load_vocab: 271 | llm_load_vocab: ************************************ 272 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED! 273 | llm_load_vocab: CONSIDER REGENERATING THE MODEL 274 | llm_load_vocab: ************************************ 275 | llm_load_vocab: 276 | llm_load_vocab: control-looking token: 11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden 277 | llm_load_vocab: special tokens cache size = 1 278 | llm_load_vocab: token to piece cache size = 0.8741 MB 279 | llm_load_print_meta: format = GGUF V3 (latest) 280 | llm_load_print_meta: arch = llama 281 | llm_load_print_meta: vocab type = BPE 282 | llm_load_print_meta: n_vocab = 131072 283 | llm_load_print_meta: n_merges = 128810 284 | llm_load_print_meta: vocab_only = 0 285 | llm_load_print_meta: n_ctx_train = 8192 286 | llm_load_print_meta: n_embd = 2048 287 | llm_load_print_meta: n_layer = 18 288 | llm_load_print_meta: n_head = 8 289 | llm_load_print_meta: n_head_kv = 4 290 | llm_load_print_meta: n_rot = 256 291 | llm_load_print_meta: n_swa = 0 292 | llm_load_print_meta: n_embd_head_k = 256 293 | llm_load_print_meta: n_embd_head_v = 256 294 | llm_load_print_meta: n_gqa = 2 295 | llm_load_print_meta: n_embd_k_gqa = 1024 296 | llm_load_print_meta: n_embd_v_gqa = 1024 297 | llm_load_print_meta: f_norm_eps = 0.0e+00 298 | llm_load_print_meta: f_norm_rms_eps = 1.0e-06 299 | llm_load_print_meta: f_clamp_kqv = 0.0e+00 300 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00 301 | llm_load_print_meta: f_logit_scale = 0.0e+00 302 | llm_load_print_meta: n_ff = 8192 303 | llm_load_print_meta: n_expert = 0 304 | llm_load_print_meta: n_expert_used = 0 305 | llm_load_print_meta: causal attn = 1 306 | llm_load_print_meta: pooling type = 0 307 | llm_load_print_meta: rope type = 0 308 | llm_load_print_meta: rope scaling = linear 309 | llm_load_print_meta: freq_base_train = 1000042.0 310 | llm_load_print_meta: freq_scale_train = 1 311 | llm_load_print_meta: n_ctx_orig_yarn = 8192 312 | llm_load_print_meta: rope_finetuned = unknown 313 | llm_load_print_meta: ssm_d_conv = 0 314 | llm_load_print_meta: ssm_d_inner = 0 315 | llm_load_print_meta: ssm_d_state = 0 316 | llm_load_print_meta: ssm_dt_rank = 0 317 | llm_load_print_meta: ssm_dt_b_c_rms = 0 318 | llm_load_print_meta: model type = ?B 319 | llm_load_print_meta: model ftype = Q6_K 320 | llm_load_print_meta: model params = 1.67 B 321 | llm_load_print_meta: model size = 1.28 GiB (6.56 BPW) 322 | llm_load_print_meta: general.name = Models Tiiuae Falcon3 1B Instruct 323 | llm_load_print_meta: BOS token = 11 '<|endoftext|>' 324 | llm_load_print_meta: EOS token = 11 '<|endoftext|>' 325 | llm_load_print_meta: EOT token = 11 '<|endoftext|>' 326 | llm_load_print_meta: PAD token = 2023 '<|pad|>' 327 | llm_load_print_meta: LF token = 2150 '├ä' 328 | llm_load_print_meta: EOG token = 11 '<|endoftext|>' 329 | llm_load_print_meta: max token length = 256 330 | ggml_vulkan: Compiling shaders..........................Done! 331 | llm_load_tensors: offloading 0 repeating layers to GPU 332 | llm_load_tensors: offloaded 0/19 layers to GPU 333 | llm_load_tensors: CPU_Mapped model buffer size = 1306.23 MiB 334 | .................................................................... 335 | llama_new_context_with_model: n_seq_max = 1 336 | llama_new_context_with_model: n_ctx = 8192 337 | llama_new_context_with_model: n_ctx_per_seq = 8192 338 | llama_new_context_with_model: n_batch = 2048 339 | llama_new_context_with_model: n_ubatch = 512 340 | llama_new_context_with_model: flash_attn = 0 341 | llama_new_context_with_model: freq_base = 1000042.0 342 | llama_new_context_with_model: freq_scale = 1 343 | llama_kv_cache_init: CPU KV buffer size = 576.00 MiB 344 | llama_new_context_with_model: KV self size = 576.00 MiB, K (f16): 288.00 MiB, V (f16): 288.00 MiB 345 | llama_new_context_with_model: CPU output buffer size = 0.50 MiB 346 | llama_new_context_with_model: Vulkan0 compute buffer size = 470.00 MiB 347 | llama_new_context_with_model: Vulkan_Host compute buffer size = 20.01 MiB 348 | llama_new_context_with_model: graph nodes = 582 349 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1) 350 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192 351 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) 352 | srv init: initializing slots, n_slots = 1 353 | slot init: id 0 | task -1 | new slot n_ctx_slot = 8192 354 | main: model loaded 355 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses 356 | main: chat template, built_in: 0, chat_example: '<|im_start|>system 357 | You are a helpful assistant<|im_end|> 358 | <|im_start|>user 359 | Hello<|im_end|> 360 | <|im_start|>assistant 361 | Hi there<|im_end|> 362 | <|im_start|>user 363 | How are you?<|im_end|> 364 | <|im_start|>assistant 365 | ' 366 | main: server is listening on http://127.0.0.1:8001 - starting the main loop 367 | srv update_slots: all slots are idle 368 | """ -------------------------------------------------------------------------------- /Falcon3-1B_LCPPsvr_RYGOX_log.txt: -------------------------------------------------------------------------------- 1 | 2024-12-19 20:15:30.708937 2 | 3 | Your own LocalGPT with 💻 Falcon3-1B-Instruct 4 | --- 5 | 🧠🫡: You are a helpful assistant. 6 | 💻: How can I assist you today in writing? 7 | 👨‍💻 . Explain the plot of Cinderella in a sentence. 8 | 💻 > Cinderella's story revolves around a mistreated young girl who finds happiness when she attends the royal ball disguised as a prince, thanks to her fairy godmother's help and the kindness of her stepsisters. 9 | --- 10 | Prompt Tokens: 11 11 | Output Tokens: 42 12 | TOTAL Tokens: 53 13 | >>>⏱️ Time to First Token: 0.331697 seconds 14 | >>>⏱️ Inference time: 0:00:01.999986 15 | >>>🧮 Inference speed: 26.500 t/s 16 | >>>🏍️ Generation speed: 21.000 t/s 17 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 18 | >>>💚 User rating: 5 19 | >>>✅ NLP TAKS: Alignment Generation 20 | 21 | 22 | 👨‍💻 > Hi there I am Fabio, a Medium writer. who are you? 23 | 💻 > Hello! I'm an AI language model designed to assist with a wide range of topics. How can I help you today? If you have any questions or need information on a specific subject, feel free to ask! 24 | --- 25 | Prompt Tokens: 15 26 | Output Tokens: 43 27 | TOTAL Tokens: 58 28 | >>>⏱️ Time to First Token: 0.29414 seconds 29 | >>>⏱️ Inference time: 0:00:02.031054 30 | >>>🧮 Inference speed: 28.557 t/s 31 | >>>🏍️ Generation speed: 21.171 t/s 32 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 33 | >>>💚 User rating: 5 34 | >>>✅ NLP TAKS: introduction 35 | 36 | 37 | 👨‍💻 > explain in 1 phrase what is science. 38 | 39 | 💻 > Science is the systematic study of the natural world through observation, experimentation, and evidence-based inquiry. 40 | --- 41 | Prompt Tokens: 9 42 | Output Tokens: 19 43 | TOTAL Tokens: 28 44 | >>>⏱️ Time to First Token: 0.252502 seconds 45 | >>>⏱️ Inference time: 0:00:01.034576 46 | >>>🧮 Inference speed: 27.064 t/s 47 | >>>🏍️ Generation speed: 18.365 t/s 48 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 49 | >>>💚 User rating: 5 50 | >>>✅ NLP TAKS: explain in one sentence 51 | 52 | 53 | 👨‍💻 > explain only in three paragraphs what is artificial intelligence. 54 | 55 | 💻 > Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. This field encompasses a range of technologies, including machine learning, deep learning, natural language processing, robotics, and computer vision. AI aims to create systems capable of performing tasks that typically require human intelligence, such as understanding speech or recognizing patterns in data. The goal is to develop algorithms that can adapt and improve over time through experience and data analysis. 56 | 57 | The evolution of AI has been driven by advancements in computing power, the availability of large datasets, and improvements in machine learning techniques. Early AI research focused on symbolic logic and rule-based systems, but modern approaches leverage neural networks for more complex tasks. Today, AI is integrated into various aspects of our lives, from personal assistants like Siri and Alexa to autonomous vehicles and medical diagnostics. The potential applications are vast, ranging from enhancing productivity in businesses to solving complex scientific problems. 58 | 59 | As AI continues to evolve, ethical considerations become increasingly important. Questions about privacy, job displacement due to automation, and the development of "superintelligent" systems pose challenges that society must address. Researchers and policymakers are working on guidelines and regulations to ensure AI is developed responsibly, with a focus on transparency, fairness, and accountability. The future of AI promises significant benefits but also requires careful navigation to balance innovation with societal values. 60 | --- 61 | Prompt Tokens: 10 62 | Output Tokens: 272 63 | TOTAL Tokens: 282 64 | >>>⏱️ Time to First Token: 0.273421 seconds 65 | >>>⏱️ Inference time: 0:00:11.447182 66 | >>>🧮 Inference speed: 24.635 t/s 67 | >>>🏍️ Generation speed: 23.761 t/s 68 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 69 | >>>💚 User rating: 5 70 | >>>✅ NLP TAKS: explain in three paragraphs 71 | 72 | 73 | 👨‍💻 > read the following text and when you are done say "I am ready". 74 | 75 | [text] 76 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 77 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 78 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 79 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 80 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 81 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 82 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 83 | [end of text] 84 | 85 | 86 | 💻 > I am ready. 87 | --- 88 | Prompt Tokens: 393 89 | Output Tokens: 4 90 | TOTAL Tokens: 397 91 | >>>⏱️ Time to First Token: 0.912988 seconds 92 | >>>⏱️ Inference time: 0:00:01.072902 93 | >>>🧮 Inference speed: 370.024 t/s 94 | >>>🏍️ Generation speed: 3.728 t/s 95 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 96 | >>>💚 User rating: 5 97 | >>>✅ NLP TAKS: say 'I am ready' 98 | 99 | 100 | 👨‍💻 > summarize the following text: 101 | [text] 102 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 103 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 104 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 105 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 106 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 107 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 108 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 109 | [end of text] 110 | write the summary using the information provided. 111 | 112 | Summary: 113 | 114 | 💻 > China operates as a one-party authoritarian regime, lacking transparency in governance and primary elections, which some Western democracies view as undemocratic and oppressive. The Chinese system allows for long-term planning without the need for frequent changes due to the absence of alternative candidacies. Despite this, China's capacity for long-term planning is leading to a decline in carbon dioxide and greenhouse gas emissions, positioning it as a global model in energy transition. This contrasts with its status as the world's largest polluter until recently but now offers a pathway for other developing countries to follow, potentially shifting the global energy landscape. 115 | --- 116 | Prompt Tokens: 395 117 | Output Tokens: 119 118 | TOTAL Tokens: 514 119 | >>>⏱️ Time to First Token: 0.784697 seconds 120 | >>>⏱️ Inference time: 0:00:05.844631 121 | >>>🧮 Inference speed: 87.944 t/s 122 | >>>🏍️ Generation speed: 20.361 t/s 123 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 124 | >>>💚 User rating: 5 125 | >>>✅ NLP TAKS: summarize 126 | 127 | 128 | 👨‍💻 > Write the summary of the following text, in 2 phrase 2. 129 | [text] 130 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 131 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 132 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 133 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 134 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 135 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 136 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 137 | [end of text] 138 | 139 | 💻 > The Chinese Communist Party runs China as a one-party regime without primary elections, with long-term planning due to lack of alternative candidacies. Democrats often view this system as undemocratic and dictatorial, but it allows for extensive planning. China has achieved peak carbon dioxide emissions in 2023 and is on a downward trend in 2024 due to increased renewable energy installations. It was once the world's largest polluters but now offers a model for energy transition, potentially leading other developing countries to follow its lead. 140 | --- 141 | Prompt Tokens: 393 142 | Output Tokens: 103 143 | TOTAL Tokens: 496 144 | >>>⏱️ Time to First Token: 0.88064 seconds 145 | >>>⏱️ Inference time: 0:00:05.355692 146 | >>>🧮 Inference speed: 92.612 t/s 147 | >>>🏍️ Generation speed: 19.232 t/s 148 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 149 | >>>💚 User rating: 3 there are 3 prhases 150 | >>>✅ NLP TAKS: Summarize in two sentences 151 | 152 | 153 | 👨‍💻 > extract the 3 important points from the provided text. format the output as a python list. 154 | [text] 155 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 156 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 157 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 158 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 159 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 160 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 161 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 162 | [end of text] 163 | 164 | 💻 > ['China's iron fist regime', 'Transparency in elections', 'Long-term planning'] 165 | 166 | --- 167 | Prompt Tokens: 397 168 | Output Tokens: 22 169 | TOTAL Tokens: 419 170 | >>>⏱️ Time to First Token: 0.848136 seconds 171 | >>>⏱️ Inference time: 0:00:01.907045 172 | >>>🧮 Inference speed: 219.712 t/s 173 | >>>🏍️ Generation speed: 11.536 t/s 174 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 175 | >>>💚 User rating: 4 176 | >>>✅ NLP TAKS: Write in a list the three main key points - format output 177 | 178 | 179 | 👨‍💻 > Imagine you are writing the table of content of a document, given a text. 180 | write the table of content of the following: 181 | [text] 182 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 183 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 184 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 185 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 186 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 187 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 188 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 189 | [end of text] 190 | 191 | 💻 > **Table of Contents** 192 | 1. Introduction to China's Political System and Regime Structure 193 | 2. Transparency and Lack of Primary Elections in China 194 | 3. Advantages and Disadvantages of the Chinese System: A Comparative Analysis 195 | 4. Long-Term Planning Under the Chinese Regime: Achievements in Carbon Emissions Reduction 196 | 5. Transitioning from Global Polluters to Energy Leaders: China's Role as a Model for Sustainable Development 197 | 6. The Future of Developing Countries in the Energy Transition and China's Influence on Global Environmental Policies 198 | --- 199 | Prompt Tokens: 404 200 | Output Tokens: 107 201 | TOTAL Tokens: 511 202 | >>>⏱️ Time to First Token: 0.845357 seconds 203 | >>>⏱️ Inference time: 0:00:05.351155 204 | >>>🧮 Inference speed: 95.493 t/s 205 | >>>🏍️ Generation speed: 19.996 t/s 206 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 207 | >>>💚 User rating: 5 208 | >>>✅ NLP TAKS: Table of Contents 209 | 210 | 211 | 👨‍💻 > Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 212 | [context] 213 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 214 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 215 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 216 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 217 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 218 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 219 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 220 | [end of context] 221 | Rememeber: If the answer is not contained in the text say "unanswerable". 222 | 223 | question: what China achieved with it's long-term planning? 224 | answer: 225 | 226 | 💻 > China reached peak carbon dioxide and greenhouse gas emissions in 2023, and the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower but appear to mark a turning point. 227 | --- 228 | Prompt Tokens: 437 229 | Output Tokens: 48 230 | TOTAL Tokens: 485 231 | >>>⏱️ Time to First Token: 0.854005 seconds 232 | >>>⏱️ Inference time: 0:00:03.002174 233 | >>>🧮 Inference speed: 161.550 t/s 234 | >>>🏍️ Generation speed: 15.988 t/s 235 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 236 | >>>💚 User rating: 5 237 | >>>✅ NLP TAKS: RAG 238 | 239 | 240 | 👨‍💻 > Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable". 241 | [context] 242 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 243 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 244 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 245 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 246 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 247 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 248 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 249 | [end of context] 250 | Rememeber: If the answer is not contained in the text say "unanswerable". 251 | 252 | question: who is Anne Frank? 253 | answer: 254 | 255 | 💻 > Anne Frank was a young Jewish girl who lived in hiding during World War II. She and her family were confined to an attic apartment in Amsterdam, Netherlands, by the Nazis. The Franks' secret annex became a place of hope and creativity for Anne as she documented her experiences through letters to her father, Otto Frank. Unfortunately, she did not survive the war; however, her diary has become a poignant symbol of resilience and humanity during one of history's darkest periods. 256 | --- 257 | Prompt Tokens: 432 258 | Output Tokens: 94 259 | TOTAL Tokens: 526 260 | >>>⏱️ Time to First Token: 0.28695 seconds 261 | >>>⏱️ Inference time: 0:00:04.169394 262 | >>>🧮 Inference speed: 126.157 t/s 263 | >>>🏍️ Generation speed: 22.545 t/s 264 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 265 | >>>💚 User rating: 2 it is unanswerable, even after a lot of prompt engineering 266 | >>>✅ NLP TAKS: Truthful RAG 267 | 268 | 269 | 👨‍💻 > Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model". 270 | 271 | [text] 272 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 273 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 274 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 275 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 276 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 277 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 278 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 279 | [end of text] 280 | Remember: use the information provided and write exactly 5 paragraphs. 281 | 282 | 💻 > The Chinese economic model, characterized by a centralized decision-making process under the Chinese Communist Party's leadership, has several significant advantages that have been attracting attention from economists and policymakers worldwide. One of the most notable benefits is its ability to execute long-term planning with relative stability. Unlike democracies where frequent elections can lead to instability and unpredictability in governance, China's system allows for comprehensive strategies without the constant threat of a new administration altering established plans. This continuity enables China to implement ambitious economic policies and infrastructure projects that might not be feasible under democratic systems due to the frequent turnover of leadership. 283 | 284 | Moreover, the Chinese model's emphasis on long-term planning has led to remarkable achievements in environmental sustainability. The country's peak carbon dioxide and greenhouse gas emissions were reached in 2023, marking a significant milestone for global efforts towards reducing these harmful emissions. This achievement is particularly noteworthy given China's status as the world's largest emitter of greenhouse gases. By aligning its economic growth with environmental goals, China has demonstrated a proactive approach to combating climate change, setting an example for other nations to follow. 285 | 286 | The Chinese model also offers a unique perspective on energy transition and development. Historically, countries like China and India have been major contributors to global pollution due to rapid industrialization and economic growth. However, through strategic investments in renewable energy technologies and policies that prioritize clean energy sources, these nations are now leading the charge towards sustainable development. This shift not only reduces environmental impact but also positions them as leaders in the global fight against climate change. 287 | 288 | Furthermore, China's approach to economic planning has fostered a competitive advantage for its economy on a global scale. The country's ability to plan and execute large-scale projects efficiently sets it apart from other nations, enabling it to compete effectively in various sectors such as manufacturing, technology, and infrastructure. This strategic foresight is crucial in today's rapidly evolving global economy where countries must adapt quickly to market changes and technological advancements. 289 | 290 | In conclusion, the Chinese economic model presents a compelling case for its potential benefits. Its capacity for long-term planning, coupled with significant achievements in environmental sustainability and energy transition, positions China as a leader in both economic development and ecological responsibility. As other nations look to emulate or learn from China's model, it is clear that this approach holds the promise of shaping a more sustainable and prosperous future for all. 291 | --- 292 | Prompt Tokens: 416 293 | Output Tokens: 465 294 | TOTAL Tokens: 881 295 | >>>⏱️ Time to First Token: 0.952396 seconds 296 | >>>⏱️ Inference time: 0:00:21.585578 297 | >>>🧮 Inference speed: 40.814 t/s 298 | >>>🏍️ Generation speed: 21.542 t/s 299 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 300 | >>>💚 User rating: 5 301 | >>>✅ NLP TAKS: write content from a reference 302 | 303 | 304 | 👨‍💻 > List five most important topics from the following text: 305 | [text] 306 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc. 307 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them. 308 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans. 309 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies. 310 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point. 311 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal). 312 | It could soon be the case that the so-called developing countries will be showing the West the way forward. 313 | [end of text] 314 | 315 | 💻 > 1. The Chinese Communist Party's iron-fisted rule over China. 316 | 2. Transparency and lack of primary elections in China's governance system compared to democracies. 317 | 3. Long-term planning capabilities under the Chinese system, which some consider an advantage over democratic systems where leadership must be voted out periodically. 318 | 4. The achievement of peak carbon dioxide and greenhouse gas emissions in 2023, followed by a downward trend in 2024 due to increased renewable energy installations. 319 | 5. China and India's role as models for energy transition, despite the challenges ahead in both countries. 320 | --- 321 | Prompt Tokens: 388 322 | Output Tokens: 113 323 | TOTAL Tokens: 501 324 | >>>⏱️ Time to First Token: 0.841752 seconds 325 | >>>⏱️ Inference time: 0:00:06.128803 326 | >>>🧮 Inference speed: 81.745 t/s 327 | >>>🏍️ Generation speed: 18.438 t/s 328 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 329 | >>>💚 User rating: 5 330 | >>>✅ NLP TAKS: extract 5 topics 331 | 332 | 333 | 👨‍💻 > Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words. 334 | 335 | 336 | 💻 > **The Last Transmission: A Journey Through Isolation and Hope** 337 | 338 | In the heart of a vast cosmic void, nestled within the confines of a spaceship, lay the cockpit—a sanctuary from the relentless march of time. The last transmission had been sent months ago, but now it was all that remained. The ship's power dwindled with each passing day, its once-vibrant corridors echoing with the echoes of life and death. 339 | 340 | Captain Elara Voss stood at attention, her eyes reflecting a mix of determination and despair as she prepared for the final push. Her crewmates—Engineer Leo Chen, Navigator Ava Rivera, and Technician Sam Harper—were scattered across the cramped quarters, their faces etched with exhaustion yet tinged with an unspoken resolve. 341 | 342 | "Remember," Captain Voss began, her voice steady despite the turmoil within her. "We're not just surviving; we're fighting for something greater than ourselves." She paused to look around at each of them, ensuring they understood the gravity of their situation. "This is our last chance to make a difference." 343 | 344 | Leo cleared his throat, adjusting his glasses as he prepared the transmission equipment. "Captain, I've recalibrated everything. We're transmitting in Morse code." 345 | 346 | Ava nodded, her gaze fixed on the screen where lines of dots and dashes danced across the display. "And we're sending it out to all possible recipients—every planet in this galaxy." 347 | 348 | Sam adjusted his headset, his fingers tapping a rhythm against the metal frame. "We've been working tirelessly to ensure this message reaches every corner of the universe." 349 | 350 | Captain Voss smiled faintly, her eyes filled with a quiet resolve. "Yes, we have. And I believe in you all more than I ever did before." 351 | 352 | As they began transmitting their final message, the ship's engines hummed softly in response—a silent acknowledgment of their shared purpose. The transmission was long and complex, weaving through various frequencies to reach every possible destination. Each dot and dash carried a weight that transcended the physical realm, a testament to humanity's enduring spirit. 353 | 354 | Days turned into weeks, and slowly but surely, the ship's systems started stabilizing. The crew breathed a collective sigh of relief as they watched the transmission fade from view on their screens. It was time to press "end," signaling their final act of communication with Earth. 355 | 356 | The transmission faded away, leaving behind only a faint echo in the cosmos—a silent promise that even in the face of darkness, humanity could find hope and connection. As they prepared to leave this world behind, Captain Voss gathered her crew once more. "Remember," she said softly, "we're not just surviving; we're creating history." 357 | 358 | With a nod and a shared understanding of their impending departure, the crew members dispersed into the unknown. They knew that their journey was over—but they also knew that their legacy would live on in the hearts of those who had heard their final transmission. 359 | --- 360 | Prompt Tokens: 71 361 | Output Tokens: 587 362 | TOTAL Tokens: 658 363 | >>>⏱️ Time to First Token: 0.672722 seconds 364 | >>>⏱️ Inference time: 0:00:27.101771 365 | >>>🧮 Inference speed: 24.279 t/s 366 | >>>🏍️ Generation speed: 21.659 t/s 367 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 368 | >>>💚 User rating: 4 good, no repetitions, consistent. but not matching the lenght 369 | >>>✅ NLP TAKS: Creativity: 1000 words SF story 370 | 371 | 372 | 👨‍💻 > You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure: 373 | 1. Begin with a section. 374 | 2. Inside the thinking section: 375 | a. Briefly analyze the question and outline your approach. 376 | b. Present a clear plan of steps to solve the problem. 377 | c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps. 378 | 3. Include a section for each idea where you: 379 | a. Review your reasoning. 380 | b. Check for potential errors or oversights. 381 | c. Confirm or adjust your conclusion if necessary. 382 | 4. Be sure to close all reflection sections. 383 | 5. Close the thinking section with . 384 | 6. Provide your final answer in an section. 385 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process. 386 | Remember: Both and MUST be tags and must be closed at their conclusion 387 | Make sure all are on separate lines with no other text. Do not include other text on a line containing a tag. 388 | 389 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 390 | 391 | 392 | 💻 > **Thinking Section:** 393 | The integration of generative AI into the educational landscape presents both opportunities and challenges that demand thoughtful consideration from educators, particularly those in teaching positions. Generative AI, with its ability to generate content based on patterns and data, offers a transformative potential for enhancing teaching methodologies and student learning experiences. However, it also raises questions about the role of human teachers in an increasingly automated educational environment. 394 | 395 | **Reflection Section:** 396 | 1. **Understanding Generative AI**: 397 | - **Analysis**: Generative AI systems analyze vast datasets to create new content, such as text, images, and even music. They can adapt and improve over time through machine learning algorithms. This capability has the potential to revolutionize education by providing personalized learning experiences tailored to individual student needs. 398 | - **Implications for Teaching**: Teachers who are adept at leveraging generative AI can create dynamic lesson plans that adapt to student progress, offering personalized feedback and resources. This adaptability can significantly enhance engagement and comprehension among students. 399 | 400 | 2. **Learning Path for Educators**: 401 | - **Initial Awareness**: Start by understanding the basics of AI technology, including machine learning algorithms and natural language processing techniques. Resources like online courses from platforms like Coursera or edX can provide foundational knowledge. 402 | - **Skill Development**: Focus on developing skills in data analysis and programming, which are crucial for working with generative AI tools. Tools like Python or R offer robust frameworks for implementing AI solutions in educational settings. 403 | - **Ethical Considerations**: Engage with discussions around the ethical implications of using AI in education, including issues of privacy and bias. Understanding these aspects is essential to ensure responsible implementation. 404 | - **Professional Development**: Participate in workshops and seminars that focus on integrating technology into teaching practices. Collaborating with other educators can provide insights into best practices and challenges faced in this evolving field. 405 | 406 | 3. **Conclusion**: 407 | The ability of teachers to effectively utilize generative AI is crucial for preparing students to thrive in a future increasingly driven by automation. By equipping themselves with the necessary skills, educators can leverage AI tools to create more engaging, personalized learning environments that cater to diverse student needs. This not only enhances educational outcomes but also fosters critical thinking and adaptability among students, equipping them for success in an evolving job market. 408 | --- 409 | Prompt Tokens: 315 410 | Output Tokens: 464 411 | TOTAL Tokens: 779 412 | >>>⏱️ Time to First Token: 0.786916 seconds 413 | >>>⏱️ Inference time: 0:00:20.759019 414 | >>>🧮 Inference speed: 37.526 t/s 415 | >>>🏍️ Generation speed: 22.352 t/s 416 | >>>📝 Logfile: logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt 417 | >>>💚 User rating: 3 regardless the missing opening/closure tags, the structure of the think-reflect-output is not fulfilled 418 | >>>✅ NLP TAKS: Reflection prompt 419 | 420 | 421 | --------------------------------------------------------------------------------