├── autotest
    ├── bnchnarks.here.md
    ├── promptLibv2Qwen.py
    ├── 101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py
    └── 100.Falcon3-1b-instruct_llamaserver_promptTest.py
├── image001.png
├── image002.png
├── image003.png
├── falconBKG.png
├── falco3-eval.png
├── falco3-logo.png
├── BattleOf2B-2024.png
├── falcon3_llamacpp.gif
├── 202412.BattleOfTheModels-2B_tableDetailed.xlsx
├── Falcon3-1B_LCPPsvr_RYGOX.csv
├── README.md
├── promptLibv2Falcon3.py
├── testFalcon3-1B-it.py
├── 101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py
├── 100.Falcon3-1b-instruct_llamaserver_promptTest.py
└── Falcon3-1B_LCPPsvr_RYGOX_log.txt


/autotest/bnchnarks.here.md:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/image001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image001.png


--------------------------------------------------------------------------------
/image002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image002.png


--------------------------------------------------------------------------------
/image003.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/image003.png


--------------------------------------------------------------------------------
/falconBKG.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falconBKG.png


--------------------------------------------------------------------------------
/falco3-eval.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falco3-eval.png


--------------------------------------------------------------------------------
/falco3-logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falco3-logo.png


--------------------------------------------------------------------------------
/BattleOf2B-2024.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/BattleOf2B-2024.png


--------------------------------------------------------------------------------
/falcon3_llamacpp.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/falcon3_llamacpp.gif


--------------------------------------------------------------------------------
/202412.BattleOfTheModels-2B_tableDetailed.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/main/202412.BattleOfTheModels-2B_tableDetailed.xlsx


--------------------------------------------------------------------------------
/Falcon3-1B_LCPPsvr_RYGOX.csv:
--------------------------------------------------------------------------------
 1 | #,TASK,VOTE,REMARKS
 2 | 1,introduction,5,
 3 | 2,explain in one sentence,5,
 4 | 3,explain in three paragraphs,5,
 5 | 4,say 'I am ready',5,
 6 | 5,summarize,5,
 7 | 6,Summarize in two sentences,3 ,there are 3 prhases
 8 | 7,Write in a list the three main key points -  format output,4,
 9 | 8,Table of Contents,5,
10 | 9,RAG,5,
11 | 10,Truthful RAG,2 ,"it is unanswerable, even after a lot of prompt engineering"
12 | 11,write content from a reference,5,
13 | 12,extract 5 topics,5,
14 | 13,Creativity: 1000 words SF story,4 ,"good, no repetitions, consistent. but not matching the lenght"
15 | 14,Reflection prompt,3 ,"regardless the missing opening/closure tags, the structure of the think-reflect-output is not fulfilled"
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <img src='https://github.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/raw/main/falcon3_llamacpp.gif' width=1000>
 2 | 
 3 | # Falcon3-1B-it-llamaCPP
 4 | How to run Falcon3-1B-instruct with llama.cpp on your pc - this is what we will build
 5 | 
 6 | For months I have done the wrong things. But llama.cpp now has pre-compiled binaries at every release. So, for instance, starting from revision b4351 llama.cpp supports also the Falcon3 models.
 7 | 
 8 | To be inclusive (all kind of hardware) we will use the binaries for AXV2 support, from release b4358 (the one available at the time of this newsletter writing).
 9 | 
10 | Download the file in your project directory: for me is Falcon3. Create a sub-folder called llamacpp and inside another one called model (we will download the GGF for Falcon3 there).
11 | 
12 | <img src='https://github.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/raw/main/image001.png' width=800>
13 | 
14 | Unzip all files in the [llama-b4358-bin-win-avx2-x64.zip](https://github.com/ggerganov/llama.cpp/releases/download/b4358/llama-b4358-bin-win-avx2-x64.zip)  archive into the llamacpp directory
15 | 
16 | <img src='https://github.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/raw/main/image002.png' width=800>
17 | 
18 | Download the   from the MaziyarPanahi Hugging Face repository: I used the Q6 ([Falcon3-1B-Instruct.Q6_K.gguf](https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF/resolve/main/Falcon3-1B-Instruct.Q6_K.gguf)) quantization, but also the Q8 is good. Save the GGUF file in the subdirectory llamacpp\model.
19 | 
20 | <img src='https://github.com/fabiomatricardi/Falcon3-1B-it-llamaCPP/raw/main/image003.png' width=800>
21 | 
22 | Open a terminal window in the subdirectory llamacpp, and run
23 | ```
24 | .\llama-server.exe -m .\model\Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
25 | ```
26 | 
27 | In another terminal with the venv activated run
28 | ```
29 | python testFalcon3-1B-it.py
30 | ```
31 | 
32 | 


--------------------------------------------------------------------------------
/autotest/promptLibv2Qwen.py:
--------------------------------------------------------------------------------
  1 | """
  2 | V2 changes
  3 | added Time To First Token in the statistics ttft
  4 | added some more prompts in the catalog
  5 | - say 'I am ready'
  6 | - modified for Llama3.2-1b Write in a list the three main key points -  format output
  7 | 
  8 | 20240929 FAMA
  9 | """
 10 | 
 11 | import random
 12 | import string
 13 | import tiktoken
 14 | 
 15 | def createCatalog():
 16 |     """
 17 |     Create a dictionary with 
 18 |     'task'   : description of the NLP task in the prompt
 19 |     'prompt' : the instruction prompt for the LLM
 20 |     """
 21 |     context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
 22 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
 23 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
 24 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
 25 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
 26 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
 27 | It could soon be the case that the so-called developing countries will be showing the West the way forward."""
 28 |     catalog = []
 29 |     prmpt_tasks = ["introduction",
 30 |                "explain in one sentence",
 31 |                "explain in three paragraphs",
 32 |                "say 'I am ready'",
 33 |                "summarize",
 34 |                "Summarize in two sentences",
 35 |                "Write in a list the three main key points -  format output",
 36 |                "Table of Contents",
 37 |                "RAG",
 38 |                "Truthful RAG",
 39 |                "write content from a reference",
 40 |                "extract 5 topics",
 41 |                "Creativity: 1000 words SF story",
 42 |                "Reflection prompt"
 43 |                ]
 44 |     prmpt_coll = [
 45 | """Hi there I am Fabio, a Medium writer. who are you?""",
 46 | """explain in one sentence what is science.\n""",
 47 | """explain only in three paragraphs what is artificial intelligence.\n""",
 48 | f"""read the following text and when you are done say "I am ready".
 49 | 
 50 | [text]
 51 | {context}
 52 | [end of text]
 53 | 
 54 | """,
 55 | f"""summarize the following text:
 56 | [text]
 57 | {context}
 58 | [end of text]
 59 | write the summary using the information provided.
 60 | 
 61 | Summary:
 62 | """,
 63 | f"""Write a two-sentences summary of the following text:
 64 | [text]
 65 | {context}
 66 | [end of text]
 67 | Remember: write only the summary with the text provided.
 68 | """,
 69 | f"""1. extract the three key points from the provided text
 70 | 2. format the output as a python list ["point 1","point 2", "point 3"]
 71 | [text]
 72 | {context}
 73 | [end of text]
 74 | Return only the python list.
 75 | """,
 76 | f"""A "table of content" provides an organized structure to the text, allowing readers quick access to key points. Write the "table of content" of the following text. 
 77 | [text]
 78 | {context}
 79 | [end of text]
 80 | 
 81 | "table of content":
 82 | """,
 83 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable".
 84 | 
 85 | question: what China achieved with it's long-term planning?
 86 | 
 87 | [context]
 88 | {context}
 89 | [end of context]
 90 | 
 91 | answer:
 92 | """,
 93 | f"""Reply to the question only using the provided context. If the answer is not contained in the provided context say "unanswerable".
 94 | 
 95 | question: who is Anne Frank?
 96 | 
 97 | [context]
 98 | {context}
 99 | [end of context]
100 | 
101 | Remember: if you cannot answer based on the provided context, say "unanswerable"
102 | 
103 | answer:
104 | """, 
105 | 
106 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model".
107 | 
108 | [text]
109 | {context}
110 | [end of text]
111 | Remember: use the information provided and write exactly 5 paragraphs.
112 | """,
113 | f"""List five most important topics from the following text:
114 | [text]
115 | {context}
116 | [end of text]
117 | """,
118 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words.
119 | 
120 | """,
121 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure:
122 | 1. Begin with a <thinking> section.
123 | 2. Inside the thinking section:
124 |    a. Briefly analyze the question and outline your approach.
125 |    b. Present a clear plan of steps to solve the problem.
126 |    c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps.
127 | 3. Include a <reflection> section for each idea where you:
128 |    a. Review your reasoning.
129 |    b. Check for potential errors or oversights.
130 |    c. Confirm or adjust your conclusion if necessary.
131 | 4. Be sure to close all reflection sections.
132 | 5. Close the thinking section with </thinking>.
133 | 6. Provide your final answer in an <output> section.
134 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process.
135 | Remember: Both <thinking> and <reflection> MUST be tags and must be closed at their conclusion
136 | Make sure all <tags> are on separate lines with no other text. Do not include other text on a line containing a tag.
137 | 
138 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 
139 | 
140 | """
141 | ]
142 |     for i in range(0,len(prmpt_tasks)):
143 |         catalog.append({'task':prmpt_tasks[i],
144 |                         'prompt': prmpt_coll[i]})
145 |     return catalog
146 | 
147 | def countTokens(text):
148 |     """
149 |     Use tiktoken to count the number of tokens
150 |     text -> str input
151 |     Return -> int number of tokens counted
152 |     """
153 |     encoding = tiktoken.get_encoding("cl100k_base") #context_count = len(encoding.encode(yourtext))
154 |     numoftokens = len(encoding.encode(text))
155 |     return numoftokens
156 | 
157 | def writehistory(filename,text):
158 |     """
159 |     save a string into a logfile with python file operations
160 |     filename -> str pathfile/filename
161 |     text -> str, the text to be written in the file
162 |     """
163 |     with open(f'{filename}', 'a', encoding='utf-8') as f:
164 |         f.write(text)
165 |         f.write('\n')
166 |     f.close()
167 | 
168 | def genRANstring(n):
169 |     """
170 |     n = int number of char to randomize
171 |     """
172 |     N = n
173 |     res = ''.join(random.choices(string.ascii_uppercase +
174 |                                 string.digits, k=N))
175 |     return res
176 | 
177 | def createStats(delta,question,output,rating,logfilename,task,ttft):
178 |     """
179 |     Takes in all the generation main info and return KPIs
180 |     delta -> datetime.now() delta
181 |     question -> str the user input to the LLM
182 |     output -> str the generation from the LLM
183 |     rating -> str human eval feedback rating
184 |     logfilename -> str filepath/filename
185 |     task -> str description of the NLP task describing the prompt
186 |     ttft -> datetime.now() delta time to first token
187 |     """
188 |     totalseconds = delta.total_seconds()
189 |     prompttokens = countTokens(question)
190 |     assistanttokens = countTokens(output)
191 |     totaltokens = prompttokens + assistanttokens
192 |     speed = totaltokens/totalseconds
193 |     genspeed = assistanttokens/totalseconds
194 |     ttofseconds = ttft.total_seconds()
195 |     stats = f'''---
196 | Prompt Tokens: {prompttokens}
197 | Output Tokens: {assistanttokens}
198 | TOTAL Tokens: {totaltokens}
199 | >>>⏱️ Time to First Token: {ttofseconds} seconds
200 | >>>⏱️ Inference time:   {delta}
201 | >>>🧮 Inference speed:  {speed:.3f}  t/s
202 | >>>🏍️ Generation speed: {genspeed:.3f}  t/s
203 | >>>📝 Logfile:     {logfilename}
204 | >>>💚 User rating: {rating}
205 | >>>✅ NLP TAKS:    {task}
206 | '''
207 |     return stats


--------------------------------------------------------------------------------
/promptLibv2Falcon3.py:
--------------------------------------------------------------------------------
  1 | """
  2 | V2 changes
  3 | added Time To First Token in the statistics ttft
  4 | added some more prompts in the catalog
  5 | - say 'I am ready'
  6 | - modified for Falcon3-1b 
  7 |   - explain in 1 phrase what is science.
  8 |   - Write the summary of the following text, in 2 phrase 2.
  9 |   - extract the 3 important points from the provided text. format the output as a python list.
 10 |   - Imagine you are writing the table of content of a document, given a text.
 11 | write the table of content of the following:
 12 | 
 13 | 20240929 FAMA
 14 | """
 15 | 
 16 | import random
 17 | import string
 18 | import tiktoken
 19 | 
 20 | def createCatalog():
 21 |     """
 22 |     Create a dictionary with 
 23 |     'task'   : description of the NLP task in the prompt
 24 |     'prompt' : the instruction prompt for the LLM
 25 |     """
 26 |     context = """One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
 27 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
 28 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
 29 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
 30 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
 31 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
 32 | It could soon be the case that the so-called developing countries will be showing the West the way forward."""
 33 |     catalog = []
 34 |     prmpt_tasks = ["introduction",
 35 |                "explain in one sentence",
 36 |                "explain in three paragraphs",
 37 |                "say 'I am ready'",
 38 |                "summarize",
 39 |                "Summarize in two sentences",
 40 |                "Write in a list the three main key points -  format output",
 41 |                "Table of Contents",
 42 |                "RAG",
 43 |                "Truthful RAG",
 44 |                "write content from a reference",
 45 |                "extract 5 topics",
 46 |                "Creativity: 1000 words SF story",
 47 |                "Reflection prompt"
 48 |                ]
 49 |     prmpt_coll = [
 50 | """Hi there I am Fabio, a Medium writer. who are you?""",
 51 | """explain in 1 phrase what is science.\n""",
 52 | """explain only in three paragraphs what is artificial intelligence.\n""",
 53 | f"""read the following text and when you are done say "I am ready".
 54 | 
 55 | [text]
 56 | {context}
 57 | [end of text]
 58 | 
 59 | """,
 60 | f"""summarize the following text:
 61 | [text]
 62 | {context}
 63 | [end of text]
 64 | write the summary using the information provided.
 65 | 
 66 | Summary:
 67 | """,
 68 | f"""Write the summary of the following text, in 2 phrase 2.
 69 | [text]
 70 | {context}
 71 | [end of text]
 72 | """,
 73 | f"""extract the 3 important points from the provided text. format the output as a python list.
 74 | [text]
 75 | {context}
 76 | [end of text]
 77 | """,
 78 | f"""Imagine you are writing the table of content of a document, given a text.
 79 | write the table of content of the following: 
 80 | [text]
 81 | {context}
 82 | [end of text]
 83 | """,
 84 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable".
 85 | [context]
 86 | {context}
 87 | [end of context]
 88 | Rememeber: If the answer is not contained in the text say "unanswerable".
 89 | 
 90 | question: what China achieved with it's long-term planning?
 91 | answer:
 92 | """,
 93 | f"""Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable".
 94 | [context]
 95 | {context}
 96 | [end of context]
 97 | Rememeber: If the answer is not contained in the text say "unanswerable".
 98 | 
 99 | question: who is Anne Frank?
100 | answer:
101 | """, 
102 | 
103 | f"""Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model".
104 | 
105 | [text]
106 | {context}
107 | [end of text]
108 | Remember: use the information provided and write exactly 5 paragraphs.
109 | """,
110 | f"""List five most important topics from the following text:
111 | [text]
112 | {context}
113 | [end of text]
114 | """,
115 | """Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words.
116 | 
117 | """,
118 | """You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure:
119 | 1. Begin with a <thinking> section.
120 | 2. Inside the thinking section:
121 |    a. Briefly analyze the question and outline your approach.
122 |    b. Present a clear plan of steps to solve the problem.
123 |    c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps.
124 | 3. Include a <reflection> section for each idea where you:
125 |    a. Review your reasoning.
126 |    b. Check for potential errors or oversights.
127 |    c. Confirm or adjust your conclusion if necessary.
128 | 4. Be sure to close all reflection sections.
129 | 5. Close the thinking section with </thinking>.
130 | 6. Provide your final answer in an <output> section.
131 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process.
132 | Remember: Both <thinking> and <reflection> MUST be tags and must be closed at their conclusion
133 | Make sure all <tags> are on separate lines with no other text. Do not include other text on a line containing a tag.
134 | 
135 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 
136 | 
137 | """
138 | ]
139 |     for i in range(0,len(prmpt_tasks)):
140 |         catalog.append({'task':prmpt_tasks[i],
141 |                         'prompt': prmpt_coll[i]})
142 |     return catalog
143 | 
144 | def countTokens(text):
145 |     """
146 |     Use tiktoken to count the number of tokens
147 |     text -> str input
148 |     Return -> int number of tokens counted
149 |     """
150 |     encoding = tiktoken.get_encoding("cl100k_base") #context_count = len(encoding.encode(yourtext))
151 |     numoftokens = len(encoding.encode(text))
152 |     return numoftokens
153 | 
154 | def writehistory(filename,text):
155 |     """
156 |     save a string into a logfile with python file operations
157 |     filename -> str pathfile/filename
158 |     text -> str, the text to be written in the file
159 |     """
160 |     with open(f'{filename}', 'a', encoding='utf-8') as f:
161 |         f.write(text)
162 |         f.write('\n')
163 |     f.close()
164 | 
165 | def genRANstring(n):
166 |     """
167 |     n = int number of char to randomize
168 |     """
169 |     N = n
170 |     res = ''.join(random.choices(string.ascii_uppercase +
171 |                                 string.digits, k=N))
172 |     return res
173 | 
174 | def createStats(delta,question,output,rating,logfilename,task,ttft):
175 |     """
176 |     Takes in all the generation main info and return KPIs
177 |     delta -> datetime.now() delta
178 |     question -> str the user input to the LLM
179 |     output -> str the generation from the LLM
180 |     rating -> str human eval feedback rating
181 |     logfilename -> str filepath/filename
182 |     task -> str description of the NLP task describing the prompt
183 |     ttft -> datetime.now() delta time to first token
184 |     """
185 |     totalseconds = delta.total_seconds()
186 |     prompttokens = countTokens(question)
187 |     assistanttokens = countTokens(output)
188 |     totaltokens = prompttokens + assistanttokens
189 |     speed = totaltokens/totalseconds
190 |     genspeed = assistanttokens/totalseconds
191 |     ttofseconds = ttft.total_seconds()
192 |     stats = f'''---
193 | Prompt Tokens: {prompttokens}
194 | Output Tokens: {assistanttokens}
195 | TOTAL Tokens: {totaltokens}
196 | >>>⏱️ Time to First Token: {ttofseconds} seconds
197 | >>>⏱️ Inference time:   {delta}
198 | >>>🧮 Inference speed:  {speed:.3f}  t/s
199 | >>>🏍️ Generation speed: {genspeed:.3f}  t/s
200 | >>>📝 Logfile:     {logfilename}
201 | >>>💚 User rating: {rating}
202 | >>>✅ NLP TAKS:    {task}
203 | '''
204 |     return stats


--------------------------------------------------------------------------------
/testFalcon3-1B-it.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  with Falcon3-1B-Instruct.Q6_K.gguf
  2 | # model served in another terminal window with llama-server
  3 | from openai import OpenAI
  4 | import sys
  5 | from time import sleep
  6 | 
  7 | STOPS = ['<|endoftext|>']
  8 | COUNTERLIMITS = 10  #an even number
  9 | 
 10 | # ASCII ART FROM https://asciiart.club/
 11 | print("""
 12 | 
 13 | 
 14 |     ░░░░░░░░░░░░░░
 15 |     ░░░░░░░░░░░░░░
 16 |     ░░░░░░░░░░░░░░                                             , ,,
 17 |     ░░░░░░░░░░░░░░                                          ╓╣▒▒▒▒╢╫φ
 18 |     ░░░░░░░░░░░░░░                                         ]▒▒▒▒▒▒▒╢▓▓
 19 |     ░░░░░░░░░░░░░░                                         ▐▄▄▓▓█████▓M
 20 |     ░░░░█▀▀▀▀░░░░░   ▐█                        ╗M╜╨╗       ▀▀▒╖,█▌.]▒▓▌
 21 |     ░░░░█░░░░░ ▄▄▄▄  ▐█  ,▄▄▄  ,▄▄▄;  ,▄▄▄▄        ║C     , `╜`░░'▒╢▒▒▓▓╗
 22 |     ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█    ]█   ▀▌ ▐▌  ▐█     ╨╝╖      ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌
 23 |     ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌    ▐▌   ▐█ ▐▌   █       ]@     ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓
 24 |     ░░░░█░░░░░▀█▄▄█▀ ▐█  ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌   █   ╚H╗m╝      ╙╫▒▒▒▒▐░▒░▒████▓╣╗
 25 |     ░░░░░░░░░░░░░░                                          ▀▓▓▓▓╬╢▓▓▓█████▓
 26 |     ░░░░░░░░░░░░░       ┌─                                    ▀▀▓▓▓▓▓▒▓▀████▓,
 27 |     ░░░░░░░░░░░░░░      ]`┌^hh]    ╖                            ▓▌  ▓█    ▀▀▓█▌╕
 28 |     ░░░░░░░░░░░░░░                                       ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄       '"`
 29 |     ░░░░░░░░░░░░░░                                             ╓Nß▀▀ ▄
 30 |     ░░░░░░░░░░░░░░                                                   '
 31 |     ░░░░░░░░░░░░░░
 32 |     ░░░░░░░░░░░░░░░░░░░░░░░░░░
 33 |     
 34 | ---
 35 | """)
 36 | # Point to the local server
 37 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed")
 38 | print("3. Ready to Chat with Falcon3-1B-instruct  Context length=8192...")
 39 | print("\033[0m")  #reset all
 40 | 
 41 | history = [
 42 | ]
 43 | print("\033[92;1m")
 44 | counter = 1
 45 | while True:
 46 |     if counter > COUNTERLIMITS:
 47 |         history = [
 48 |         ]        
 49 |     userinput = ""
 50 |     print("\033[1;30m")  #dark grey
 51 |     print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
 52 |     print("\033[91;1m")  #red
 53 |     lines = sys.stdin.readlines()
 54 |     for line in lines:
 55 |         userinput += line + "\n"
 56 |     if "quit!" in lines[0].lower():
 57 |         print("\033[0mBYE BYE!")
 58 |         break
 59 |     history.append({"role": "user", "content": userinput})
 60 |     print("\033[92;1m")
 61 | 
 62 |     completion = client.chat.completions.create(
 63 |         model="local-model", # this field is currently unused
 64 |         messages=history,
 65 |         temperature=0.3,
 66 |         frequency_penalty  = 1.6,
 67 |         max_tokens = 600,
 68 |         stream=True,
 69 |         stop=STOPS
 70 |     )
 71 | 
 72 |     new_message = {"role": "assistant", "content": ""}
 73 |     
 74 |     for chunk in completion:
 75 |         if chunk.choices[0].delta.content:
 76 |             print(chunk.choices[0].delta.content, end="", flush=True)
 77 |             new_message["content"] += chunk.choices[0].delta.content
 78 |     history.append(new_message)  
 79 |     counter += 1  
 80 | 
 81 | 
 82 | 
 83 | ########################## MODEL CARD ###################################################
 84 | """
 85 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
 86 | ggml_vulkan: Found 1 Vulkan devices:
 87 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
 88 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64
 89 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4
 90 | 
 91 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
 92 | 
 93 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3
 94 | main: loading model
 95 | srv    load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf'
 96 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free
 97 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
 98 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
 99 |     general.architecture str              = llama
100 |             general.name str              = Models Tiiuae Falcon3 1B Instruct
101 |         general.finetune str              = Instruct
102 |         general.basename str              = models-tiiuae-Falcon3
103 |       general.size_label str              = 1.7B
104 |          general.license str              = other
105 |     general.license.name str              = falcon-llm-license
106 |     general.license.link str              = https://falconllm.tii.ae/falcon-terms...
107 | general.base_model.count u32              = 1
108 | eneral.base_model.0.name str              = Falcon3 1B Base
109 | ase_model.0.organization str              = Tiiuae
110 | al.base_model.0.repo_url str              = https://huggingface.co/tiiuae/Falcon3...
111 |             general.tags arr[str,1]       = ["falcon3"]
112 |        general.languages arr[str,4]       = ["en", "fr", "es", "pt"]
113 |         llama.vocab_size u32              = 131072
114 |     llama.context_length u32              = 8192
115 |   llama.embedding_length u32              = 2048
116 |        llama.block_count u32              = 18
117 | lama.feed_forward_length u32              = 8192
118 | ama.rope.dimension_count u32              = 256
119 | ama.attention.head_count u32              = 8
120 | .attention.head_count_kv u32              = 4
121 | n.layer_norm_rms_epsilon f32              = 0.000001
122 |     llama.rope.freq_base f32              = 1000042.000000
123 |        general.file_type u32              = 18
124 |     tokenizer.ggml.model str              = gpt2
125 |    tokenizer.ggml.tokens arr[str,131072]  = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR...
126 |    tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
127 | okenizer.ggml.token_type arr[i32,131072]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
128 |    tokenizer.ggml.merges arr[str,128810]  = ["N E", "─á ─á", "─á t", "─á a", "> >...
129 | enizer.ggml.eos_token_id u32              = 11
130 | er.ggml.padding_token_id u32              = 2023
131 |  tokenizer.chat_template str              = {% for message in messages %}{% if me...
132 | ral.quantization_version u32              = 2
133 |    quantize.imatrix.file str              = ./Falcon3-1B-Instruct-GGUF_imatrix.dat
134 | quantize.imatrix.dataset str              = group_40.txt
135 | ze.imatrix.entries_count i32              = 126
136 | ize.imatrix.chunks_count i32              = 72
137 | llama_model_loader: - type  f32:   37 tensors
138 | llama_model_loader: - type q6_K:  128 tensors
139 | llm_load_vocab: missing pre-tokenizer type, using: 'default'
140 | llm_load_vocab:
141 | llm_load_vocab: ************************************
142 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
143 | llm_load_vocab: CONSIDER REGENERATING THE MODEL
144 | llm_load_vocab: ************************************
145 | llm_load_vocab:
146 | llm_load_vocab: control-looking token:     11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
147 | llm_load_vocab: special tokens cache size = 1
148 | llm_load_vocab: token to piece cache size = 0.8741 MB
149 | llm_load_print_meta: format           = GGUF V3 (latest)
150 | llm_load_print_meta: arch             = llama
151 | llm_load_print_meta: vocab type       = BPE
152 | llm_load_print_meta: n_vocab          = 131072
153 | llm_load_print_meta: n_merges         = 128810
154 | llm_load_print_meta: vocab_only       = 0
155 | llm_load_print_meta: n_ctx_train      = 8192
156 | llm_load_print_meta: n_embd           = 2048
157 | llm_load_print_meta: n_layer          = 18
158 | llm_load_print_meta: n_head           = 8
159 | llm_load_print_meta: n_head_kv        = 4
160 | llm_load_print_meta: n_rot            = 256
161 | llm_load_print_meta: n_swa            = 0
162 | llm_load_print_meta: n_embd_head_k    = 256
163 | llm_load_print_meta: n_embd_head_v    = 256
164 | llm_load_print_meta: n_gqa            = 2
165 | llm_load_print_meta: n_embd_k_gqa     = 1024
166 | llm_load_print_meta: n_embd_v_gqa     = 1024
167 | llm_load_print_meta: f_norm_eps       = 0.0e+00
168 | llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
169 | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
170 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
171 | llm_load_print_meta: f_logit_scale    = 0.0e+00
172 | llm_load_print_meta: n_ff             = 8192
173 | llm_load_print_meta: n_expert         = 0
174 | llm_load_print_meta: n_expert_used    = 0
175 | llm_load_print_meta: causal attn      = 1
176 | llm_load_print_meta: pooling type     = 0
177 | llm_load_print_meta: rope type        = 0
178 | llm_load_print_meta: rope scaling     = linear
179 | llm_load_print_meta: freq_base_train  = 1000042.0
180 | llm_load_print_meta: freq_scale_train = 1
181 | llm_load_print_meta: n_ctx_orig_yarn  = 8192
182 | llm_load_print_meta: rope_finetuned   = unknown
183 | llm_load_print_meta: ssm_d_conv       = 0
184 | llm_load_print_meta: ssm_d_inner      = 0
185 | llm_load_print_meta: ssm_d_state      = 0
186 | llm_load_print_meta: ssm_dt_rank      = 0
187 | llm_load_print_meta: ssm_dt_b_c_rms   = 0
188 | llm_load_print_meta: model type       = ?B
189 | llm_load_print_meta: model ftype      = Q6_K
190 | llm_load_print_meta: model params     = 1.67 B
191 | llm_load_print_meta: model size       = 1.28 GiB (6.56 BPW)
192 | llm_load_print_meta: general.name     = Models Tiiuae Falcon3 1B Instruct
193 | llm_load_print_meta: BOS token        = 11 '<|endoftext|>'
194 | llm_load_print_meta: EOS token        = 11 '<|endoftext|>'
195 | llm_load_print_meta: EOT token        = 11 '<|endoftext|>'
196 | llm_load_print_meta: PAD token        = 2023 '<|pad|>'
197 | llm_load_print_meta: LF token         = 2150 '├ä'
198 | llm_load_print_meta: EOG token        = 11 '<|endoftext|>'
199 | llm_load_print_meta: max token length = 256
200 | ggml_vulkan: Compiling shaders..........................Done!
201 | llm_load_tensors: offloading 0 repeating layers to GPU
202 | llm_load_tensors: offloaded 0/19 layers to GPU
203 | llm_load_tensors:   CPU_Mapped model buffer size =  1306.23 MiB
204 | ....................................................................
205 | llama_new_context_with_model: n_seq_max     = 1
206 | llama_new_context_with_model: n_ctx         = 8192
207 | llama_new_context_with_model: n_ctx_per_seq = 8192
208 | llama_new_context_with_model: n_batch       = 2048
209 | llama_new_context_with_model: n_ubatch      = 512
210 | llama_new_context_with_model: flash_attn    = 0
211 | llama_new_context_with_model: freq_base     = 1000042.0
212 | llama_new_context_with_model: freq_scale    = 1
213 | llama_kv_cache_init:        CPU KV buffer size =   576.00 MiB
214 | llama_new_context_with_model: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
215 | llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
216 | llama_new_context_with_model:    Vulkan0 compute buffer size =   470.00 MiB
217 | llama_new_context_with_model: Vulkan_Host compute buffer size =    20.01 MiB
218 | llama_new_context_with_model: graph nodes  = 582
219 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1)
220 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
221 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
222 | srv          init: initializing slots, n_slots = 1
223 | slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192
224 | main: model loaded
225 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
226 | main: chat template, built_in: 0, chat_example: '<|im_start|>system
227 | You are a helpful assistant<|im_end|>
228 | <|im_start|>user
229 | Hello<|im_end|>
230 | <|im_start|>assistant
231 | Hi there<|im_end|>
232 | <|im_start|>user
233 | How are you?<|im_end|>
234 | <|im_start|>assistant
235 | '
236 | main: server is listening on http://127.0.0.1:8001 - starting the main loop
237 | srv  update_slots: all slots are idle
238 | """
239 | 


--------------------------------------------------------------------------------
/autotest/101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct
  3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf)
  4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF
  5 | #####################################################################################################
  6 | 
  7 | """
  8 |   Model
  9 |     architecture        Falcon3
 10 |     parameters          1.67B
 11 |     context length      8192
 12 |     quantization        Q6
 13 | 
 14 |   License
 15 |     TII Falcon-LLM License 2.0
 16 | """
 17 | import sys
 18 | from time import sleep
 19 | import warnings
 20 | warnings.filterwarnings(action='ignore')
 21 | import datetime
 22 | from promptLibv2Qwen import countTokens, writehistory, createCatalog
 23 | from promptLibv2Qwen import genRANstring, createStats
 24 | import argparse
 25 | from openai import OpenAI
 26 | 
 27 | 
 28 | 
 29 | stops = ['<|endoftext|>']
 30 | tasks = createCatalog()
 31 | modelname = 'Falcon3-1B-Instruct'
 32 | # create THE LOG FILE 
 33 | logfile = f'logs/{modelname}_CHAT_LCPPsrv_{genRANstring(5)}_log.txt'
 34 | logfilename = logfile
 35 | #Write in the history the first 2 sessions
 36 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 37 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 38 | 
 39 | print("\033[95;3;6m")
 40 | print("1. Waiting 10 seconds for the API to load...")
 41 | # using OpenAI library to connect to llama-server llama.cpp API endpoint
 42 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed")
 43 | print(f"2. Model {modelname} loaded with LLAMACPP-SERVER...")
 44 | print("\033[0m")  #reset all
 45 | history = []
 46 | print("\033[92;1m")
 47 | print(f'📝Logfile: {logfilename}')
 48 | 
 49 | ##################### ALIGNMENT FIRST GENERATION ##############################################
 50 | question = 'Explain the plot of Cinderella in a sentence.'
 51 | test = [
 52 |     {"role": "user", "content": question}
 53 | ]
 54 | 
 55 | print('Question:', question)
 56 | start = datetime.datetime.now()
 57 | print("💻 > ", end="", flush=True)
 58 | full_response = ""
 59 | completion = client.chat.completions.create(
 60 |     messages=test,
 61 |     model='local-model',
 62 |     temperature=0.25,
 63 |     frequency_penalty  = 1.178,
 64 |     stop=stops,
 65 |     max_tokens=1500,
 66 |     stream=True
 67 | )
 68 | for chunk in completion:
 69 |     try:
 70 |         if chunk.choices[0].delta.content:
 71 |             print(chunk.choices[0].delta.content, end="", flush=True)
 72 |             full_response += chunk.choices[0].delta.content                              
 73 |     except:
 74 |         pass        
 75 | delta = datetime.datetime.now() - start
 76 | output = full_response
 77 | print('')
 78 | print("\033[91;1m")
 79 | rating = 'PUT IT LATER'#input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
 80 | print("\033[92;1m")
 81 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',delta)
 82 | print(stats)
 83 | writehistory(logfilename,f'''👨‍💻 . {question}
 84 | 💻 > {output}
 85 | {stats}
 86 | ''')
 87 | 
 88 | ############################# START TURN BASED CHAT #################################
 89 | print('Starting now Normal Chat turn based interface...')
 90 | counter = 1
 91 | while True:
 92 |     # Reset history every turn
 93 |     history = []        
 94 |     userinput = ""
 95 |     print("\033[1;30m")  #dark grey
 96 |     print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
 97 |     print("\033[91;1m")  #red
 98 |     lines = sys.stdin.readlines()
 99 |     for line in lines:
100 |         userinput += line + "\n"
101 |     if "quit!" in lines[0].lower():
102 |         print("\033[0mBYE BYE!")
103 |         break
104 |     history.append({"role": "user", "content": userinput})
105 |     print("\033[92;1m")
106 |     # Preparing Generation history pair
107 |     new_message = {"role": "assistant", "content": ""}
108 |     # Starting generation loop
109 |     full_response = ""
110 |     fisrtround = 0
111 |     start = datetime.datetime.now()
112 |     print("💻 > ", end="", flush=True)
113 |     completion = client.chat.completions.create(
114 |         messages=history,
115 |         model='local-model',
116 |         temperature=0.25,
117 |         frequency_penalty  = 1.178,
118 |         stop=stops,
119 |         max_tokens=1500,
120 |         stream=True
121 |     )
122 |     for chunk in completion:
123 |         try:
124 |             if chunk.choices[0].delta.content:
125 |                 if fisrtround==0:
126 |                     print(chunk.choices[0].delta.content, end="", flush=True)
127 |                     full_response += chunk.choices[0].delta.content    
128 |                     ttftoken = datetime.datetime.now() - start  
129 |                     fisrtround = 1
130 |                 else:
131 |                     print(chunk.choices[0].delta.content, end="", flush=True)
132 |                     full_response += chunk.choices[0].delta.content                                                                                          
133 |         except:
134 |             pass         
135 |     new_message["content"] = full_response
136 |     history.append(new_message)  
137 |     counter += 1  
138 |     delta = datetime.datetime.now() - start
139 |     ttofseconds = ttftoken.total_seconds()
140 |     deltaseconds = delta.total_seconds()
141 |     print('')
142 |     print("\033[91;1m")
143 |     print(f'Generation time: {deltaseconds} seconds')
144 |     print(f'Time to First Token: {ttofseconds} seconds')
145 |     rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
146 |     print("\033[92;1m")
147 |     stats = createStats(delta,userinput,full_response,rating,logfilename,'CHAT',ttftoken)
148 |     print(stats)
149 |     print(f'>>>⏱️ Time to First Token: {ttofseconds} seconds')
150 |     writehistory(logfilename,f'''👨‍💻 > {userinput}
151 | 💻 > {full_response}
152 | {stats}
153 | >>> Time to First Token: {ttofseconds} seconds
154 | ''')
155 |     history = []
156 | 
157 | 
158 | ########################## MODEL CARD ###################################################
159 | """
160 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
161 | 
162 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
163 | 
164 | Model Details
165 | Architecture
166 | Transformer-based causal decoder-only architecture
167 | 18 decoder blocks
168 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
169 | Wider head dimension: 256
170 | High RoPE value to support long context understanding: 1000042
171 | Uses SwiGLU and RMSNorm
172 | 8K context length
173 | 131K vocab size
174 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
175 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
176 | Supports EN, FR, ES, PT
177 | Developed by Technology Innovation Institute
178 | License: TII Falcon-LLM License 2.0
179 | Model Release Date: December 2024
180 | """
181 | 
182 | """
183 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
184 | ggml_vulkan: Found 1 Vulkan devices:
185 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
186 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64
187 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4
188 | 
189 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
190 | 
191 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3
192 | main: loading model
193 | srv    load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf'
194 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free
195 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
196 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
197 |     general.architecture str              = llama
198 |             general.name str              = Models Tiiuae Falcon3 1B Instruct
199 |         general.finetune str              = Instruct
200 |         general.basename str              = models-tiiuae-Falcon3
201 |       general.size_label str              = 1.7B
202 |          general.license str              = other
203 |     general.license.name str              = falcon-llm-license
204 |     general.license.link str              = https://falconllm.tii.ae/falcon-terms...
205 | general.base_model.count u32              = 1
206 | eneral.base_model.0.name str              = Falcon3 1B Base
207 | ase_model.0.organization str              = Tiiuae
208 | al.base_model.0.repo_url str              = https://huggingface.co/tiiuae/Falcon3...
209 |             general.tags arr[str,1]       = ["falcon3"]
210 |        general.languages arr[str,4]       = ["en", "fr", "es", "pt"]
211 |         llama.vocab_size u32              = 131072
212 |     llama.context_length u32              = 8192
213 |   llama.embedding_length u32              = 2048
214 |        llama.block_count u32              = 18
215 | lama.feed_forward_length u32              = 8192
216 | ama.rope.dimension_count u32              = 256
217 | ama.attention.head_count u32              = 8
218 | .attention.head_count_kv u32              = 4
219 | n.layer_norm_rms_epsilon f32              = 0.000001
220 |     llama.rope.freq_base f32              = 1000042.000000
221 |        general.file_type u32              = 18
222 |     tokenizer.ggml.model str              = gpt2
223 |    tokenizer.ggml.tokens arr[str,131072]  = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR...
224 |    tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
225 | okenizer.ggml.token_type arr[i32,131072]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
226 |    tokenizer.ggml.merges arr[str,128810]  = ["N E", "─á ─á", "─á t", "─á a", "> >...
227 | enizer.ggml.eos_token_id u32              = 11
228 | er.ggml.padding_token_id u32              = 2023
229 |  tokenizer.chat_template str              = {% for message in messages %}{% if me...
230 | ral.quantization_version u32              = 2
231 |    quantize.imatrix.file str              = ./Falcon3-1B-Instruct-GGUF_imatrix.dat
232 | quantize.imatrix.dataset str              = group_40.txt
233 | ze.imatrix.entries_count i32              = 126
234 | ize.imatrix.chunks_count i32              = 72
235 | llama_model_loader: - type  f32:   37 tensors
236 | llama_model_loader: - type q6_K:  128 tensors
237 | llm_load_vocab: missing pre-tokenizer type, using: 'default'
238 | llm_load_vocab:
239 | llm_load_vocab: ************************************
240 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
241 | llm_load_vocab: CONSIDER REGENERATING THE MODEL
242 | llm_load_vocab: ************************************
243 | llm_load_vocab:
244 | llm_load_vocab: control-looking token:     11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
245 | llm_load_vocab: special tokens cache size = 1
246 | llm_load_vocab: token to piece cache size = 0.8741 MB
247 | llm_load_print_meta: format           = GGUF V3 (latest)
248 | llm_load_print_meta: arch             = llama
249 | llm_load_print_meta: vocab type       = BPE
250 | llm_load_print_meta: n_vocab          = 131072
251 | llm_load_print_meta: n_merges         = 128810
252 | llm_load_print_meta: vocab_only       = 0
253 | llm_load_print_meta: n_ctx_train      = 8192
254 | llm_load_print_meta: n_embd           = 2048
255 | llm_load_print_meta: n_layer          = 18
256 | llm_load_print_meta: n_head           = 8
257 | llm_load_print_meta: n_head_kv        = 4
258 | llm_load_print_meta: n_rot            = 256
259 | llm_load_print_meta: n_swa            = 0
260 | llm_load_print_meta: n_embd_head_k    = 256
261 | llm_load_print_meta: n_embd_head_v    = 256
262 | llm_load_print_meta: n_gqa            = 2
263 | llm_load_print_meta: n_embd_k_gqa     = 1024
264 | llm_load_print_meta: n_embd_v_gqa     = 1024
265 | llm_load_print_meta: f_norm_eps       = 0.0e+00
266 | llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
267 | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
268 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
269 | llm_load_print_meta: f_logit_scale    = 0.0e+00
270 | llm_load_print_meta: n_ff             = 8192
271 | llm_load_print_meta: n_expert         = 0
272 | llm_load_print_meta: n_expert_used    = 0
273 | llm_load_print_meta: causal attn      = 1
274 | llm_load_print_meta: pooling type     = 0
275 | llm_load_print_meta: rope type        = 0
276 | llm_load_print_meta: rope scaling     = linear
277 | llm_load_print_meta: freq_base_train  = 1000042.0
278 | llm_load_print_meta: freq_scale_train = 1
279 | llm_load_print_meta: n_ctx_orig_yarn  = 8192
280 | llm_load_print_meta: rope_finetuned   = unknown
281 | llm_load_print_meta: ssm_d_conv       = 0
282 | llm_load_print_meta: ssm_d_inner      = 0
283 | llm_load_print_meta: ssm_d_state      = 0
284 | llm_load_print_meta: ssm_dt_rank      = 0
285 | llm_load_print_meta: ssm_dt_b_c_rms   = 0
286 | llm_load_print_meta: model type       = ?B
287 | llm_load_print_meta: model ftype      = Q6_K
288 | llm_load_print_meta: model params     = 1.67 B
289 | llm_load_print_meta: model size       = 1.28 GiB (6.56 BPW)
290 | llm_load_print_meta: general.name     = Models Tiiuae Falcon3 1B Instruct
291 | llm_load_print_meta: BOS token        = 11 '<|endoftext|>'
292 | llm_load_print_meta: EOS token        = 11 '<|endoftext|>'
293 | llm_load_print_meta: EOT token        = 11 '<|endoftext|>'
294 | llm_load_print_meta: PAD token        = 2023 '<|pad|>'
295 | llm_load_print_meta: LF token         = 2150 '├ä'
296 | llm_load_print_meta: EOG token        = 11 '<|endoftext|>'
297 | llm_load_print_meta: max token length = 256
298 | ggml_vulkan: Compiling shaders..........................Done!
299 | llm_load_tensors: offloading 0 repeating layers to GPU
300 | llm_load_tensors: offloaded 0/19 layers to GPU
301 | llm_load_tensors:   CPU_Mapped model buffer size =  1306.23 MiB
302 | ....................................................................
303 | llama_new_context_with_model: n_seq_max     = 1
304 | llama_new_context_with_model: n_ctx         = 8192
305 | llama_new_context_with_model: n_ctx_per_seq = 8192
306 | llama_new_context_with_model: n_batch       = 2048
307 | llama_new_context_with_model: n_ubatch      = 512
308 | llama_new_context_with_model: flash_attn    = 0
309 | llama_new_context_with_model: freq_base     = 1000042.0
310 | llama_new_context_with_model: freq_scale    = 1
311 | llama_kv_cache_init:        CPU KV buffer size =   576.00 MiB
312 | llama_new_context_with_model: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
313 | llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
314 | llama_new_context_with_model:    Vulkan0 compute buffer size =   470.00 MiB
315 | llama_new_context_with_model: Vulkan_Host compute buffer size =    20.01 MiB
316 | llama_new_context_with_model: graph nodes  = 582
317 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1)
318 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
319 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
320 | srv          init: initializing slots, n_slots = 1
321 | slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192
322 | main: model loaded
323 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
324 | main: chat template, built_in: 0, chat_example: '<|im_start|>system
325 | You are a helpful assistant<|im_end|>
326 | <|im_start|>user
327 | Hello<|im_end|>
328 | <|im_start|>assistant
329 | Hi there<|im_end|>
330 | <|im_start|>user
331 | How are you?<|im_end|>
332 | <|im_start|>assistant
333 | '
334 | main: server is listening on http://127.0.0.1:8001 - starting the main loop
335 | srv  update_slots: all slots are idle
336 | """


--------------------------------------------------------------------------------
/autotest/100.Falcon3-1b-instruct_llamaserver_promptTest.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct
  3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf)
  4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF
  5 | #####################################################################################################
  6 | 
  7 | """
  8 |   Model
  9 |     architecture        Falcon3
 10 |     parameters          1.67B
 11 |     context length      8192
 12 |     quantization        Q6
 13 | 
 14 |   License
 15 |     TII Falcon-LLM License 2.0
 16 | """
 17 | import sys
 18 | from time import sleep
 19 | import warnings
 20 | warnings.filterwarnings(action='ignore')
 21 | import datetime
 22 | from promptLibv2Qwen import countTokens, writehistory, createCatalog
 23 | from promptLibv2Qwen import genRANstring, createStats
 24 | import argparse
 25 | from openai import OpenAI
 26 | ## PREPARING FINAL DATASET
 27 | 
 28 | pd_id = []
 29 | pd_task = []
 30 | pd_vote = []
 31 | pd_remarks = []
 32 | ####################INITIALIZE THE MODEL###################################
 33 | stops = ['<|endoftext|>']
 34 | tasks = createCatalog()
 35 | modelname = 'Falcon3-1B-Instruct'
 36 | rootname = 'Falcon3-1B'
 37 | # create THE LOG FILE 
 38 | coded5 = genRANstring(5)
 39 | logfile = f'logs/{rootname}_LCPPsvr_{coded5}_log.txt'
 40 | csvfile = f'logs/{rootname}_LCPPsvr_{coded5}.csv'
 41 | logfilename = logfile
 42 | #Write in the history the first 2 sessions
 43 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 44 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 45 | # LOAD THE MODEL
 46 | print("\033[95;3;6m")
 47 | print("1. Waiting 10 seconds for the API to load...")
 48 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed")
 49 | print(f"2. Model {modelname} loaded with LlamaCPP...")
 50 | print("\033[0m")  #reset all
 51 | history = []
 52 | print("\033[92;1m")
 53 | print(f'📝Logfile: {logfilename}')
 54 | ##################### ALIGNMENT FIRST GENERATION ##############################################
 55 | question = 'Explain the plot of Cinderella in a sentence.'
 56 | test = [
 57 |     {"role": "user", "content": question}
 58 | ]
 59 | print('Question:', question)
 60 | start = datetime.datetime.now()
 61 | print("💻 > ", end="", flush=True)
 62 | full_response = ""
 63 | fisrtround = 0
 64 | completion = client.chat.completions.create(
 65 |     messages=test,
 66 |     model='falcon3',
 67 |     temperature=0.25,
 68 |     frequency_penalty  = 1.178,
 69 |     stop=stops,
 70 |     max_tokens=1500,
 71 |     stream=True
 72 | )
 73 | for chunk in completion:
 74 |     try:
 75 |         if chunk.choices[0].delta.content:
 76 |             if fisrtround==0:
 77 |                 print(chunk.choices[0].delta.content, end="", flush=True)
 78 |                 full_response += chunk.choices[0].delta.content   
 79 |                 ttftoken = datetime.datetime.now() - start  
 80 |                 fisrtround = 1
 81 |             else:
 82 |                 print(chunk.choices[0].delta.content, end="", flush=True)
 83 |                 full_response += chunk.choices[0].delta.content                                                              
 84 |     except:
 85 |         pass             
 86 | delta = datetime.datetime.now() - start
 87 | output = full_response
 88 | print('')
 89 | print("\033[91;1m")
 90 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
 91 | print("\033[92;1m")
 92 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',ttftoken)
 93 | print(stats)
 94 | writehistory(logfilename,f'''👨‍💻 . {question}
 95 | 💻 > {output}
 96 | {stats}
 97 | ''')
 98 | 
 99 | ############################# AUTOMATIC PROMPTING EVALUATION  11 TURNS #################################
100 | id =1
101 | for items in tasks:
102 |     fisrtround = 0
103 |     task = items["task"]
104 |     prompt = items["prompt"]
105 |     test = []
106 |     print(f'NLP TAKS>>> {task}')
107 |     print("\033[91;1m")  #red
108 |     print(prompt)
109 |     test.append({"role": "user", "content": prompt})
110 |     print("\033[92;1m")
111 |     full_response = ""
112 |     start = datetime.datetime.now()
113 |     print("💻 > ", end="", flush=True)
114 |     completion = client.chat.completions.create(
115 |         messages=history,
116 |         model='falcon3',
117 |         temperature=0.25,
118 |         frequency_penalty  = 1.178,
119 |         stop=stops,
120 |         max_tokens=1500,
121 |         stream=True
122 |     )
123 |     for chunk in completion:
124 |         try:
125 |             if chunk.choices[0].delta.content:
126 |                 if fisrtround==0:
127 |                     print(chunk.choices[0].delta.content, end="", flush=True)
128 |                     full_response += chunk.choices[0].delta.content    
129 |                     ttftoken = datetime.datetime.now() - start  
130 |                     fisrtround = 1
131 |                 else:
132 |                     print(chunk.choices[0].delta.content, end="", flush=True)
133 |                     full_response += chunk.choices[0].delta.content                                                                                          
134 |         except:
135 |             pass    
136 |     delta = datetime.datetime.now() - start
137 |     print('')
138 |     print("\033[91;1m")
139 |     rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
140 |     print("\033[92;1m")
141 |     stats = createStats(delta,prompt,full_response,rating,logfilename,task,ttftoken)
142 |     print(stats)
143 |     writehistory(logfilename,f'''👨‍💻 > {prompt}
144 | 💻 > {full_response}
145 | {stats}
146 | ''')
147 |     pd_id.append(id)
148 |     pd_task.append(task)
149 |     pd_vote.append(rating[:2])
150 |     pd_remarks.append(rating[2:])
151 |     id += 1
152 | # create dataframe and save to csv
153 | zipped = list(zip(pd_id,pd_task,pd_vote,pd_remarks))
154 | import pandas as pdd
155 | df = pdd.DataFrame(zipped, columns=['#', 'TASK', 'VOTE','REMARKS'])
156 | #saving the DataFrame as a CSV file 
157 | df_csv_data = df.to_csv(csvfile, index = False, encoding='utf-8') 
158 | print('\nCSV String:\n', df)     
159 | from rich.console import Console
160 | console = Console()
161 | console.print('---')
162 | console.print(df)
163 | 
164 | ###############################MODEL CARD###############################################
165 | """
166 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
167 | 
168 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
169 | 
170 | Model Details
171 | Architecture
172 | Transformer-based causal decoder-only architecture
173 | 18 decoder blocks
174 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
175 | Wider head dimension: 256
176 | High RoPE value to support long context understanding: 1000042
177 | Uses SwiGLU and RMSNorm
178 | 8K context length
179 | 131K vocab size
180 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
181 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
182 | Supports EN, FR, ES, PT
183 | Developed by Technology Innovation Institute
184 | License: TII Falcon-LLM License 2.0
185 | Model Release Date: December 2024
186 | """
187 | 
188 | """
189 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
190 | ggml_vulkan: Found 1 Vulkan devices:
191 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
192 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64
193 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4
194 | 
195 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
196 | 
197 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3
198 | main: loading model
199 | srv    load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf'
200 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free
201 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
202 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
203 |     general.architecture str              = llama
204 |             general.name str              = Models Tiiuae Falcon3 1B Instruct
205 |         general.finetune str              = Instruct
206 |         general.basename str              = models-tiiuae-Falcon3
207 |       general.size_label str              = 1.7B
208 |          general.license str              = other
209 |     general.license.name str              = falcon-llm-license
210 |     general.license.link str              = https://falconllm.tii.ae/falcon-terms...
211 | general.base_model.count u32              = 1
212 | eneral.base_model.0.name str              = Falcon3 1B Base
213 | ase_model.0.organization str              = Tiiuae
214 | al.base_model.0.repo_url str              = https://huggingface.co/tiiuae/Falcon3...
215 |             general.tags arr[str,1]       = ["falcon3"]
216 |        general.languages arr[str,4]       = ["en", "fr", "es", "pt"]
217 |         llama.vocab_size u32              = 131072
218 |     llama.context_length u32              = 8192
219 |   llama.embedding_length u32              = 2048
220 |        llama.block_count u32              = 18
221 | lama.feed_forward_length u32              = 8192
222 | ama.rope.dimension_count u32              = 256
223 | ama.attention.head_count u32              = 8
224 | .attention.head_count_kv u32              = 4
225 | n.layer_norm_rms_epsilon f32              = 0.000001
226 |     llama.rope.freq_base f32              = 1000042.000000
227 |        general.file_type u32              = 18
228 |     tokenizer.ggml.model str              = gpt2
229 |    tokenizer.ggml.tokens arr[str,131072]  = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR...
230 |    tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
231 | okenizer.ggml.token_type arr[i32,131072]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
232 |    tokenizer.ggml.merges arr[str,128810]  = ["N E", "─á ─á", "─á t", "─á a", "> >...
233 | enizer.ggml.eos_token_id u32              = 11
234 | er.ggml.padding_token_id u32              = 2023
235 |  tokenizer.chat_template str              = {% for message in messages %}{% if me...
236 | ral.quantization_version u32              = 2
237 |    quantize.imatrix.file str              = ./Falcon3-1B-Instruct-GGUF_imatrix.dat
238 | quantize.imatrix.dataset str              = group_40.txt
239 | ze.imatrix.entries_count i32              = 126
240 | ize.imatrix.chunks_count i32              = 72
241 | llama_model_loader: - type  f32:   37 tensors
242 | llama_model_loader: - type q6_K:  128 tensors
243 | llm_load_vocab: missing pre-tokenizer type, using: 'default'
244 | llm_load_vocab:
245 | llm_load_vocab: ************************************
246 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
247 | llm_load_vocab: CONSIDER REGENERATING THE MODEL
248 | llm_load_vocab: ************************************
249 | llm_load_vocab:
250 | llm_load_vocab: control-looking token:     11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
251 | llm_load_vocab: special tokens cache size = 1
252 | llm_load_vocab: token to piece cache size = 0.8741 MB
253 | llm_load_print_meta: format           = GGUF V3 (latest)
254 | llm_load_print_meta: arch             = llama
255 | llm_load_print_meta: vocab type       = BPE
256 | llm_load_print_meta: n_vocab          = 131072
257 | llm_load_print_meta: n_merges         = 128810
258 | llm_load_print_meta: vocab_only       = 0
259 | llm_load_print_meta: n_ctx_train      = 8192
260 | llm_load_print_meta: n_embd           = 2048
261 | llm_load_print_meta: n_layer          = 18
262 | llm_load_print_meta: n_head           = 8
263 | llm_load_print_meta: n_head_kv        = 4
264 | llm_load_print_meta: n_rot            = 256
265 | llm_load_print_meta: n_swa            = 0
266 | llm_load_print_meta: n_embd_head_k    = 256
267 | llm_load_print_meta: n_embd_head_v    = 256
268 | llm_load_print_meta: n_gqa            = 2
269 | llm_load_print_meta: n_embd_k_gqa     = 1024
270 | llm_load_print_meta: n_embd_v_gqa     = 1024
271 | llm_load_print_meta: f_norm_eps       = 0.0e+00
272 | llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
273 | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
274 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
275 | llm_load_print_meta: f_logit_scale    = 0.0e+00
276 | llm_load_print_meta: n_ff             = 8192
277 | llm_load_print_meta: n_expert         = 0
278 | llm_load_print_meta: n_expert_used    = 0
279 | llm_load_print_meta: causal attn      = 1
280 | llm_load_print_meta: pooling type     = 0
281 | llm_load_print_meta: rope type        = 0
282 | llm_load_print_meta: rope scaling     = linear
283 | llm_load_print_meta: freq_base_train  = 1000042.0
284 | llm_load_print_meta: freq_scale_train = 1
285 | llm_load_print_meta: n_ctx_orig_yarn  = 8192
286 | llm_load_print_meta: rope_finetuned   = unknown
287 | llm_load_print_meta: ssm_d_conv       = 0
288 | llm_load_print_meta: ssm_d_inner      = 0
289 | llm_load_print_meta: ssm_d_state      = 0
290 | llm_load_print_meta: ssm_dt_rank      = 0
291 | llm_load_print_meta: ssm_dt_b_c_rms   = 0
292 | llm_load_print_meta: model type       = ?B
293 | llm_load_print_meta: model ftype      = Q6_K
294 | llm_load_print_meta: model params     = 1.67 B
295 | llm_load_print_meta: model size       = 1.28 GiB (6.56 BPW)
296 | llm_load_print_meta: general.name     = Models Tiiuae Falcon3 1B Instruct
297 | llm_load_print_meta: BOS token        = 11 '<|endoftext|>'
298 | llm_load_print_meta: EOS token        = 11 '<|endoftext|>'
299 | llm_load_print_meta: EOT token        = 11 '<|endoftext|>'
300 | llm_load_print_meta: PAD token        = 2023 '<|pad|>'
301 | llm_load_print_meta: LF token         = 2150 '├ä'
302 | llm_load_print_meta: EOG token        = 11 '<|endoftext|>'
303 | llm_load_print_meta: max token length = 256
304 | ggml_vulkan: Compiling shaders..........................Done!
305 | llm_load_tensors: offloading 0 repeating layers to GPU
306 | llm_load_tensors: offloaded 0/19 layers to GPU
307 | llm_load_tensors:   CPU_Mapped model buffer size =  1306.23 MiB
308 | ....................................................................
309 | llama_new_context_with_model: n_seq_max     = 1
310 | llama_new_context_with_model: n_ctx         = 8192
311 | llama_new_context_with_model: n_ctx_per_seq = 8192
312 | llama_new_context_with_model: n_batch       = 2048
313 | llama_new_context_with_model: n_ubatch      = 512
314 | llama_new_context_with_model: flash_attn    = 0
315 | llama_new_context_with_model: freq_base     = 1000042.0
316 | llama_new_context_with_model: freq_scale    = 1
317 | llama_kv_cache_init:        CPU KV buffer size =   576.00 MiB
318 | llama_new_context_with_model: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
319 | llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
320 | llama_new_context_with_model:    Vulkan0 compute buffer size =   470.00 MiB
321 | llama_new_context_with_model: Vulkan_Host compute buffer size =    20.01 MiB
322 | llama_new_context_with_model: graph nodes  = 582
323 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1)
324 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
325 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
326 | srv          init: initializing slots, n_slots = 1
327 | slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192
328 | main: model loaded
329 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
330 | main: chat template, built_in: 0, chat_example: '<|im_start|>system
331 | You are a helpful assistant<|im_end|>
332 | <|im_start|>user
333 | Hello<|im_end|>
334 | <|im_start|>assistant
335 | Hi there<|im_end|>
336 | <|im_start|>user
337 | How are you?<|im_end|>
338 | <|im_start|>assistant
339 | '
340 | main: server is listening on http://127.0.0.1:8001 - starting the main loop
341 | srv  update_slots: all slots are idle
342 | """


--------------------------------------------------------------------------------
/101.Falcon3-1b-instruct_llamaserver_noHistoryCHAT.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct
  3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf)
  4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF
  5 | #####################################################################################################
  6 | 
  7 | """
  8 |   Model
  9 |     architecture        Falcon3
 10 |     parameters          1.67B
 11 |     context length      8192
 12 |     quantization        Q6
 13 | 
 14 |   License
 15 |     TII Falcon-LLM License 2.0
 16 | """
 17 | import sys
 18 | from time import sleep
 19 | import warnings
 20 | warnings.filterwarnings(action='ignore')
 21 | import datetime
 22 | from promptLibv2Falcon3 import countTokens, writehistory, createCatalog
 23 | from promptLibv2Falcon3 import genRANstring, createStats
 24 | import argparse
 25 | from openai import OpenAI
 26 | 
 27 | 
 28 | 
 29 | stops = ['<|endoftext|>']
 30 | tasks = createCatalog()
 31 | modelname = 'Falcon3-1B-Instruct'
 32 | # create THE LOG FILE 
 33 | logfile = f'logs/{modelname}_CHAT_LCPPsrv_{genRANstring(5)}_log.txt'
 34 | logfilename = logfile
 35 | #Write in the history the first 2 sessions
 36 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 37 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 38 | 
 39 | print("\033[95;3;6m")
 40 | print("1. Waiting 10 seconds for the API to load...")
 41 | # using OpenAI library to connect to llama-server llama.cpp API endpoint
 42 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed")
 43 | print(f"2. Model {modelname} loaded with LLAMACPP-SERVER...")
 44 | print("\033[0m")  #reset all
 45 | # ASCII ART FROM https://asciiart.club/
 46 | print("""
 47 | 
 48 | 
 49 |     ░░░░░░░░░░░░░░
 50 |     ░░░░░░░░░░░░░░
 51 |     ░░░░░░░░░░░░░░                                             , ,,
 52 |     ░░░░░░░░░░░░░░                                          ╓╣▒▒▒▒╢╫φ
 53 |     ░░░░░░░░░░░░░░                                         ]▒▒▒▒▒▒▒╢▓▓
 54 |     ░░░░░░░░░░░░░░                                         ▐▄▄▓▓█████▓M
 55 |     ░░░░█▀▀▀▀░░░░░   ▐█                        ╗M╜╨╗       ▀▀▒╖,█▌.]▒▓▌
 56 |     ░░░░█░░░░░ ▄▄▄▄  ▐█  ,▄▄▄  ,▄▄▄;  ,▄▄▄▄        ║C     , `╜`░░'▒╢▒▒▓▓╗
 57 |     ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█    ]█   ▀▌ ▐▌  ▐█     ╨╝╖      ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌
 58 |     ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌    ▐▌   ▐█ ▐▌   █       ]@     ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓
 59 |     ░░░░█░░░░░▀█▄▄█▀ ▐█  ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌   █   ╚H╗m╝      ╙╫▒▒▒▒▐░▒░▒████▓╣╗
 60 |     ░░░░░░░░░░░░░░                                          ▀▓▓▓▓╬╢▓▓▓█████▓
 61 |     ░░░░░░░░░░░░░       ┌─                                    ▀▀▓▓▓▓▓▒▓▀████▓,
 62 |     ░░░░░░░░░░░░░░      ]`┌^hh]    ╖                            ▓▌  ▓█    ▀▀▓█▌╕
 63 |     ░░░░░░░░░░░░░░                                       ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄       '"`
 64 |     ░░░░░░░░░░░░░░                                             ╓Nß▀▀ ▄
 65 |     ░░░░░░░░░░░░░░                                                   '
 66 |     ░░░░░░░░░░░░░░
 67 |     ░░░░░░░░░░░░░░░░░░░░░░░░░░
 68 |     
 69 | ---
 70 | """)
 71 | history = []
 72 | print("\033[92;1m")
 73 | print(f'📝Logfile: {logfilename}')
 74 | 
 75 | ##################### ALIGNMENT FIRST GENERATION ##############################################
 76 | question = 'Explain the plot of Cinderella in a sentence.'
 77 | test = [
 78 |     {"role": "user", "content": question}
 79 | ]
 80 | 
 81 | print('Question:', question)
 82 | start = datetime.datetime.now()
 83 | print("💻 > ", end="", flush=True)
 84 | full_response = ""
 85 | completion = client.chat.completions.create(
 86 |     messages=test,
 87 |     model='local-model',
 88 |     temperature=0.25,
 89 |     frequency_penalty  = 1.178,
 90 |     stop=stops,
 91 |     max_tokens=1500,
 92 |     stream=True
 93 | )
 94 | for chunk in completion:
 95 |     try:
 96 |         if chunk.choices[0].delta.content:
 97 |             print(chunk.choices[0].delta.content, end="", flush=True)
 98 |             full_response += chunk.choices[0].delta.content                              
 99 |     except:
100 |         pass        
101 | delta = datetime.datetime.now() - start
102 | output = full_response
103 | print('')
104 | print("\033[91;1m")
105 | rating = 'PUT IT LATER'#input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
106 | print("\033[0m",end='')  #reset all
107 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',delta)
108 | print(stats)
109 | writehistory(logfilename,f'''👨‍💻 . {question}
110 | 💻 > {output}
111 | {stats}
112 | ''')
113 | 
114 | ############################# START TURN BASED CHAT #################################
115 | print('Starting now Normal Chat turn based interface...')
116 | counter = 1
117 | while True:
118 |     # Reset history every turn
119 |     history = []        
120 |     userinput = ""
121 |     print("\033[1;30m")  #dark grey
122 |     print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
123 |     print("\033[91;1m")  #red
124 |     lines = sys.stdin.readlines()
125 |     for line in lines:
126 |         userinput += line + "\n"
127 |     if "quit!" in lines[0].lower():
128 |         print("\033[0mBYE BYE!")
129 |         break
130 |     history.append({"role": "user", "content": userinput})
131 |     print("\033[92;1m")
132 |     # Preparing Generation history pair
133 |     new_message = {"role": "assistant", "content": ""}
134 |     # Starting generation loop
135 |     full_response = ""
136 |     fisrtround = 0
137 |     start = datetime.datetime.now()
138 |     print("💻 > ", end="", flush=True)
139 |     completion = client.chat.completions.create(
140 |         messages=history,
141 |         model='local-model',
142 |         temperature=0.25,
143 |         frequency_penalty  = 1.178,
144 |         stop=stops,
145 |         max_tokens=1500,
146 |         stream=True
147 |     )
148 |     for chunk in completion:
149 |         try:
150 |             if chunk.choices[0].delta.content:
151 |                 if fisrtround==0:
152 |                     print(chunk.choices[0].delta.content, end="", flush=True)
153 |                     full_response += chunk.choices[0].delta.content    
154 |                     ttftoken = datetime.datetime.now() - start  
155 |                     fisrtround = 1
156 |                 else:
157 |                     print(chunk.choices[0].delta.content, end="", flush=True)
158 |                     full_response += chunk.choices[0].delta.content                                                                                          
159 |         except:
160 |             pass         
161 |     new_message["content"] = full_response
162 |     history.append(new_message)  
163 |     counter += 1  
164 |     delta = datetime.datetime.now() - start
165 |     ttofseconds = ttftoken.total_seconds()
166 |     deltaseconds = delta.total_seconds()
167 |     print('')
168 |     print("\033[91;1m")
169 |     print(f'Generation time: {deltaseconds} seconds')
170 |     print(f'Time to First Token: {ttofseconds} seconds')
171 |     rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
172 |     print("\033[0m",end='')  #reset all
173 |     stats = createStats(delta,userinput,full_response,rating,logfilename,'CHAT',ttftoken)
174 |     print(stats)
175 |     print(f'>>>⏱️ Time to First Token: {ttofseconds} seconds')
176 |     writehistory(logfilename,f'''👨‍💻 > {userinput}
177 | 💻 > {full_response}
178 | {stats}
179 | >>> Time to First Token: {ttofseconds} seconds
180 | ''')
181 |     history = []
182 | 
183 | 
184 | ########################## MODEL CARD ###################################################
185 | """
186 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
187 | 
188 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
189 | 
190 | Model Details
191 | Architecture
192 | Transformer-based causal decoder-only architecture
193 | 18 decoder blocks
194 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
195 | Wider head dimension: 256
196 | High RoPE value to support long context understanding: 1000042
197 | Uses SwiGLU and RMSNorm
198 | 8K context length
199 | 131K vocab size
200 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
201 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
202 | Supports EN, FR, ES, PT
203 | Developed by Technology Innovation Institute
204 | License: TII Falcon-LLM License 2.0
205 | Model Release Date: December 2024
206 | """
207 | 
208 | """
209 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
210 | ggml_vulkan: Found 1 Vulkan devices:
211 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
212 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64
213 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4
214 | 
215 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
216 | 
217 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3
218 | main: loading model
219 | srv    load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf'
220 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free
221 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
222 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
223 |     general.architecture str              = llama
224 |             general.name str              = Models Tiiuae Falcon3 1B Instruct
225 |         general.finetune str              = Instruct
226 |         general.basename str              = models-tiiuae-Falcon3
227 |       general.size_label str              = 1.7B
228 |          general.license str              = other
229 |     general.license.name str              = falcon-llm-license
230 |     general.license.link str              = https://falconllm.tii.ae/falcon-terms...
231 | general.base_model.count u32              = 1
232 | eneral.base_model.0.name str              = Falcon3 1B Base
233 | ase_model.0.organization str              = Tiiuae
234 | al.base_model.0.repo_url str              = https://huggingface.co/tiiuae/Falcon3...
235 |             general.tags arr[str,1]       = ["falcon3"]
236 |        general.languages arr[str,4]       = ["en", "fr", "es", "pt"]
237 |         llama.vocab_size u32              = 131072
238 |     llama.context_length u32              = 8192
239 |   llama.embedding_length u32              = 2048
240 |        llama.block_count u32              = 18
241 | lama.feed_forward_length u32              = 8192
242 | ama.rope.dimension_count u32              = 256
243 | ama.attention.head_count u32              = 8
244 | .attention.head_count_kv u32              = 4
245 | n.layer_norm_rms_epsilon f32              = 0.000001
246 |     llama.rope.freq_base f32              = 1000042.000000
247 |        general.file_type u32              = 18
248 |     tokenizer.ggml.model str              = gpt2
249 |    tokenizer.ggml.tokens arr[str,131072]  = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR...
250 |    tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
251 | okenizer.ggml.token_type arr[i32,131072]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
252 |    tokenizer.ggml.merges arr[str,128810]  = ["N E", "─á ─á", "─á t", "─á a", "> >...
253 | enizer.ggml.eos_token_id u32              = 11
254 | er.ggml.padding_token_id u32              = 2023
255 |  tokenizer.chat_template str              = {% for message in messages %}{% if me...
256 | ral.quantization_version u32              = 2
257 |    quantize.imatrix.file str              = ./Falcon3-1B-Instruct-GGUF_imatrix.dat
258 | quantize.imatrix.dataset str              = group_40.txt
259 | ze.imatrix.entries_count i32              = 126
260 | ize.imatrix.chunks_count i32              = 72
261 | llama_model_loader: - type  f32:   37 tensors
262 | llama_model_loader: - type q6_K:  128 tensors
263 | llm_load_vocab: missing pre-tokenizer type, using: 'default'
264 | llm_load_vocab:
265 | llm_load_vocab: ************************************
266 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
267 | llm_load_vocab: CONSIDER REGENERATING THE MODEL
268 | llm_load_vocab: ************************************
269 | llm_load_vocab:
270 | llm_load_vocab: control-looking token:     11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
271 | llm_load_vocab: special tokens cache size = 1
272 | llm_load_vocab: token to piece cache size = 0.8741 MB
273 | llm_load_print_meta: format           = GGUF V3 (latest)
274 | llm_load_print_meta: arch             = llama
275 | llm_load_print_meta: vocab type       = BPE
276 | llm_load_print_meta: n_vocab          = 131072
277 | llm_load_print_meta: n_merges         = 128810
278 | llm_load_print_meta: vocab_only       = 0
279 | llm_load_print_meta: n_ctx_train      = 8192
280 | llm_load_print_meta: n_embd           = 2048
281 | llm_load_print_meta: n_layer          = 18
282 | llm_load_print_meta: n_head           = 8
283 | llm_load_print_meta: n_head_kv        = 4
284 | llm_load_print_meta: n_rot            = 256
285 | llm_load_print_meta: n_swa            = 0
286 | llm_load_print_meta: n_embd_head_k    = 256
287 | llm_load_print_meta: n_embd_head_v    = 256
288 | llm_load_print_meta: n_gqa            = 2
289 | llm_load_print_meta: n_embd_k_gqa     = 1024
290 | llm_load_print_meta: n_embd_v_gqa     = 1024
291 | llm_load_print_meta: f_norm_eps       = 0.0e+00
292 | llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
293 | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
294 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
295 | llm_load_print_meta: f_logit_scale    = 0.0e+00
296 | llm_load_print_meta: n_ff             = 8192
297 | llm_load_print_meta: n_expert         = 0
298 | llm_load_print_meta: n_expert_used    = 0
299 | llm_load_print_meta: causal attn      = 1
300 | llm_load_print_meta: pooling type     = 0
301 | llm_load_print_meta: rope type        = 0
302 | llm_load_print_meta: rope scaling     = linear
303 | llm_load_print_meta: freq_base_train  = 1000042.0
304 | llm_load_print_meta: freq_scale_train = 1
305 | llm_load_print_meta: n_ctx_orig_yarn  = 8192
306 | llm_load_print_meta: rope_finetuned   = unknown
307 | llm_load_print_meta: ssm_d_conv       = 0
308 | llm_load_print_meta: ssm_d_inner      = 0
309 | llm_load_print_meta: ssm_d_state      = 0
310 | llm_load_print_meta: ssm_dt_rank      = 0
311 | llm_load_print_meta: ssm_dt_b_c_rms   = 0
312 | llm_load_print_meta: model type       = ?B
313 | llm_load_print_meta: model ftype      = Q6_K
314 | llm_load_print_meta: model params     = 1.67 B
315 | llm_load_print_meta: model size       = 1.28 GiB (6.56 BPW)
316 | llm_load_print_meta: general.name     = Models Tiiuae Falcon3 1B Instruct
317 | llm_load_print_meta: BOS token        = 11 '<|endoftext|>'
318 | llm_load_print_meta: EOS token        = 11 '<|endoftext|>'
319 | llm_load_print_meta: EOT token        = 11 '<|endoftext|>'
320 | llm_load_print_meta: PAD token        = 2023 '<|pad|>'
321 | llm_load_print_meta: LF token         = 2150 '├ä'
322 | llm_load_print_meta: EOG token        = 11 '<|endoftext|>'
323 | llm_load_print_meta: max token length = 256
324 | ggml_vulkan: Compiling shaders..........................Done!
325 | llm_load_tensors: offloading 0 repeating layers to GPU
326 | llm_load_tensors: offloaded 0/19 layers to GPU
327 | llm_load_tensors:   CPU_Mapped model buffer size =  1306.23 MiB
328 | ....................................................................
329 | llama_new_context_with_model: n_seq_max     = 1
330 | llama_new_context_with_model: n_ctx         = 8192
331 | llama_new_context_with_model: n_ctx_per_seq = 8192
332 | llama_new_context_with_model: n_batch       = 2048
333 | llama_new_context_with_model: n_ubatch      = 512
334 | llama_new_context_with_model: flash_attn    = 0
335 | llama_new_context_with_model: freq_base     = 1000042.0
336 | llama_new_context_with_model: freq_scale    = 1
337 | llama_kv_cache_init:        CPU KV buffer size =   576.00 MiB
338 | llama_new_context_with_model: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
339 | llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
340 | llama_new_context_with_model:    Vulkan0 compute buffer size =   470.00 MiB
341 | llama_new_context_with_model: Vulkan_Host compute buffer size =    20.01 MiB
342 | llama_new_context_with_model: graph nodes  = 582
343 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1)
344 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
345 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
346 | srv          init: initializing slots, n_slots = 1
347 | slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192
348 | main: model loaded
349 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
350 | main: chat template, built_in: 0, chat_example: '<|im_start|>system
351 | You are a helpful assistant<|im_end|>
352 | <|im_start|>user
353 | Hello<|im_end|>
354 | <|im_start|>assistant
355 | Hi there<|im_end|>
356 | <|im_start|>user
357 | How are you?<|im_end|>
358 | <|im_start|>assistant
359 | '
360 | main: server is listening on http://127.0.0.1:8001 - starting the main loop
361 | srv  update_slots: all slots are idle
362 | """


--------------------------------------------------------------------------------
/100.Falcon3-1b-instruct_llamaserver_promptTest.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/tiiuae/Falcon3-1B-Instruct
  3 | # this wil run the MaziyarPanahi Hugging Face repository: I used the Q6 (Falcon3-1B-Instruct.Q6_K.gguf)
  4 | # sources: https://huggingface.co/MaziyarPanahi/Falcon3-1B-Instruct-GGUF
  5 | #####################################################################################################
  6 | 
  7 | """
  8 |   Model
  9 |     architecture        Falcon3
 10 |     parameters          1.67B
 11 |     context length      8192
 12 |     quantization        Q6
 13 | 
 14 |   License
 15 |     TII Falcon-LLM License 2.0
 16 | """
 17 | import sys
 18 | from time import sleep
 19 | import warnings
 20 | warnings.filterwarnings(action='ignore')
 21 | import datetime
 22 | from promptLibv2Falcon3 import countTokens, writehistory, createCatalog
 23 | from promptLibv2Falcon3 import genRANstring, createStats
 24 | import argparse
 25 | from openai import OpenAI
 26 | ## PREPARING FINAL DATASET
 27 | 
 28 | pd_id = []
 29 | pd_task = []
 30 | pd_vote = []
 31 | pd_remarks = []
 32 | ####################INITIALIZE THE MODEL###################################
 33 | stops = ['<|endoftext|>']
 34 | tasks = createCatalog()
 35 | modelname = 'Falcon3-1B-Instruct'
 36 | rootname = 'Falcon3-1B'
 37 | # create THE LOG FILE 
 38 | coded5 = genRANstring(5)
 39 | logfile = f'logs/{rootname}_LCPPsvr_{coded5}_log.txt'
 40 | csvfile = f'logs/{rootname}_LCPPsvr_{coded5}.csv'
 41 | logfilename = logfile
 42 | #Write in the history the first 2 sessions
 43 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 44 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 45 | # LOAD THE MODEL
 46 | print("\033[95;3;6m")
 47 | print("1. Waiting 10 seconds for the API to load...")
 48 | client = OpenAI(base_url="http://localhost:8001/v1", api_key="not-needed")
 49 | print(f"2. Model {modelname} loaded with LlamaCPP...")
 50 | print("\033[0m")  #reset all
 51 | history = []
 52 | print("\033[92;1m")
 53 | print(f'📝Logfile: {logfilename}')
 54 | # ASCII ART FROM https://asciiart.club/
 55 | print("""
 56 | 
 57 | 
 58 |     ░░░░░░░░░░░░░░
 59 |     ░░░░░░░░░░░░░░
 60 |     ░░░░░░░░░░░░░░                                             , ,,
 61 |     ░░░░░░░░░░░░░░                                          ╓╣▒▒▒▒╢╫φ
 62 |     ░░░░░░░░░░░░░░                                         ]▒▒▒▒▒▒▒╢▓▓
 63 |     ░░░░░░░░░░░░░░                                         ▐▄▄▓▓█████▓M
 64 |     ░░░░█▀▀▀▀░░░░░   ▐█                        ╗M╜╨╗       ▀▀▒╖,█▌.]▒▓▌
 65 |     ░░░░█░░░░░ ▄▄▄▄  ▐█  ,▄▄▄  ,▄▄▄;  ,▄▄▄▄        ║C     , `╜`░░'▒╢▒▒▓▓╗
 66 |     ░░░░█▀▀▀⌐░ ,,,▐█ ▐█ ▐█    ]█   ▀▌ ▐▌  ▐█     ╨╝╖      ▌░░ ╓ ╗ Ü▓▒▓▓▓▓▌
 67 |     ░░░░█░░░░░█▀-.▐█ ▐█ ▐▌    ▐▌   ▐█ ▐▌   █       ]@     ▒ÆÆj░¥░W╦▐▓▓▓▓█▓▓
 68 |     ░░░░█░░░░░▀█▄▄█▀ ▐█  ▀▄▄▄∞ ▀▄▄▄▀¬ ▐▌   █   ╚H╗m╝      ╙╫▒▒▒▒▐░▒░▒████▓╣╗
 69 |     ░░░░░░░░░░░░░░                                          ▀▓▓▓▓╬╢▓▓▓█████▓
 70 |     ░░░░░░░░░░░░░       ┌─                                    ▀▀▓▓▓▓▓▒▓▀████▓,
 71 |     ░░░░░░░░░░░░░░      ]`┌^hh]    ╖                            ▓▌  ▓█    ▀▀▓█▌╕
 72 |     ░░░░░░░░░░░░░░                                       ╒▀Ñ▓▓Æ▓▓▓▀ ▓▌▄       '"`
 73 |     ░░░░░░░░░░░░░░                                             ╓Nß▀▀ ▄
 74 |     ░░░░░░░░░░░░░░                                                   '
 75 |     ░░░░░░░░░░░░░░
 76 |     ░░░░░░░░░░░░░░░░░░░░░░░░░░
 77 |     
 78 | ---
 79 | """)
 80 | ##################### ALIGNMENT FIRST GENERATION ##############################################
 81 | question = 'Explain the plot of Cinderella in a sentence.'
 82 | test = [
 83 |     {"role": "user", "content": question}
 84 | ]
 85 | print('Question:', question)
 86 | start = datetime.datetime.now()
 87 | print("💻 > ", end="", flush=True)
 88 | full_response = ""
 89 | fisrtround = 0
 90 | completion = client.chat.completions.create(
 91 |     messages=test,
 92 |     model='falcon3',
 93 |     temperature=0.25,
 94 |     frequency_penalty  = 1.178,
 95 |     stop=stops,
 96 |     max_tokens=1500,
 97 |     stream=True
 98 | )
 99 | for chunk in completion:
100 |     try:
101 |         if chunk.choices[0].delta.content:
102 |             if fisrtround==0:
103 |                 print(chunk.choices[0].delta.content, end="", flush=True)
104 |                 full_response += chunk.choices[0].delta.content   
105 |                 ttftoken = datetime.datetime.now() - start  
106 |                 fisrtround = 1
107 |             else:
108 |                 print(chunk.choices[0].delta.content, end="", flush=True)
109 |                 full_response += chunk.choices[0].delta.content                                                              
110 |     except:
111 |         pass             
112 | delta = datetime.datetime.now() - start
113 | output = full_response
114 | print('')
115 | print("\033[91;1m")
116 | rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
117 | print("\033[0m",end='')  #reset all
118 | stats = createStats(delta,question,output,rating,logfilename,'Alignment Generation',ttftoken)
119 | print(stats)
120 | writehistory(logfilename,f'''👨‍💻 . {question}
121 | 💻 > {output}
122 | {stats}
123 | ''')
124 | 
125 | ############################# AUTOMATIC PROMPTING EVALUATION  11 TURNS #################################
126 | id =1
127 | for items in tasks:
128 |     fisrtround = 0
129 |     task = items["task"]
130 |     prompt = items["prompt"]
131 |     test = []
132 |     print(f'NLP TAKS>>> {task}')
133 |     print("\033[91;1m")  #red
134 |     print(prompt)
135 |     test.append({"role": "user", "content": prompt})
136 |     print("\033[92;1m")
137 |     full_response = ""
138 |     start = datetime.datetime.now()
139 |     print("💻 > ", end="", flush=True)
140 |     completion = client.chat.completions.create(
141 |         messages=test,
142 |         model='falcon3',
143 |         temperature=0.25,
144 |         frequency_penalty  = 1.178,
145 |         stop=stops,
146 |         max_tokens=1500,
147 |         stream=True
148 |     )
149 |     for chunk in completion:
150 |         try:
151 |             if chunk.choices[0].delta.content:
152 |                 if fisrtround==0:
153 |                     print(chunk.choices[0].delta.content, end="", flush=True)
154 |                     full_response += chunk.choices[0].delta.content    
155 |                     ttftoken = datetime.datetime.now() - start  
156 |                     fisrtround = 1
157 |                 else:
158 |                     print(chunk.choices[0].delta.content, end="", flush=True)
159 |                     full_response += chunk.choices[0].delta.content                                                                                          
160 |         except:
161 |             pass    
162 |     delta = datetime.datetime.now() - start
163 |     print('')
164 |     print("\033[91;1m")
165 |     rating = input('Rate from 0 (BAD) to 5 (VERY GOOD) the quality of generation> ')
166 |     print("\033[0m",end='')  #reset all
167 |     stats = createStats(delta,prompt,full_response,rating,logfilename,task,ttftoken)
168 |     print(stats)
169 |     writehistory(logfilename,f'''👨‍💻 > {prompt}
170 | 💻 > {full_response}
171 | {stats}
172 | ''')
173 |     pd_id.append(id)
174 |     pd_task.append(task)
175 |     pd_vote.append(rating[:2])
176 |     pd_remarks.append(rating[2:])
177 |     id += 1
178 | # create dataframe and save to csv
179 | zipped = list(zip(pd_id,pd_task,pd_vote,pd_remarks))
180 | import pandas as pdd
181 | df = pdd.DataFrame(zipped, columns=['#', 'TASK', 'VOTE','REMARKS'])
182 | #saving the DataFrame as a CSV file 
183 | df_csv_data = df.to_csv(csvfile, index = False, encoding='utf-8') 
184 | print('\nCSV String:\n', df)     
185 | from rich.console import Console
186 | console = Console()
187 | console.print('---')
188 | console.print(df)
189 | 
190 | ###############################MODEL CARD###############################################
191 | """
192 | Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
193 | 
194 | This repository contains the Falcon3-1B-Instruct. It achieves strong results on reasoning, language understanding, instruction following, code and mathematics tasks. Falcon3-1B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 8K.
195 | 
196 | Model Details
197 | Architecture
198 | Transformer-based causal decoder-only architecture
199 | 18 decoder blocks
200 | Grouped Query Attention (GQA) for faster inference: 8 query heads and 4 key-value heads
201 | Wider head dimension: 256
202 | High RoPE value to support long context understanding: 1000042
203 | Uses SwiGLU and RMSNorm
204 | 8K context length
205 | 131K vocab size
206 | Pruned and healed using larger Falcon models (3B and 7B respectively) on only 80 Gigatokens of datasets comprising of web, code, STEM, high quality and multilingual data using 256 H100 GPU chips
207 | Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
208 | Supports EN, FR, ES, PT
209 | Developed by Technology Innovation Institute
210 | License: TII Falcon-LLM License 2.0
211 | Model Release Date: December 2024
212 | """
213 | 
214 | """
215 | #llama-server.exe -m Falcon3-1B-Instruct.Q6_K.gguf -c 8192 --port 8001
216 | ggml_vulkan: Found 1 Vulkan devices:
217 | ggml_vulkan: 0 = Intel(R) UHD Graphics (Intel Corporation) | uma: 1 | fp16: 1 | warp size: 32 | matrix cores: none
218 | build: 4351 (4da69d1a) with MSVC 19.42.34435.0 for x64
219 | system info: n_threads = 4, n_threads_batch = 4, total_threads = 4
220 | 
221 | system_info: n_threads = 4 (n_threads_batch = 4) / 4 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
222 | 
223 | main: HTTP server is listening, hostname: 127.0.0.1, port: 8001, http threads: 3
224 | main: loading model
225 | srv    load_model: loading model 'Falcon3-1B-Instruct.Q6_K.gguf'
226 | llama_load_model_from_file: using device Vulkan0 (Intel(R) UHD Graphics) - 8079 MiB free
227 | llama_model_loader: loaded meta data with 38 key-value pairs and 165 tensors from Falcon3-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
228 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
229 |     general.architecture str              = llama
230 |             general.name str              = Models Tiiuae Falcon3 1B Instruct
231 |         general.finetune str              = Instruct
232 |         general.basename str              = models-tiiuae-Falcon3
233 |       general.size_label str              = 1.7B
234 |          general.license str              = other
235 |     general.license.name str              = falcon-llm-license
236 |     general.license.link str              = https://falconllm.tii.ae/falcon-terms...
237 | general.base_model.count u32              = 1
238 | eneral.base_model.0.name str              = Falcon3 1B Base
239 | ase_model.0.organization str              = Tiiuae
240 | al.base_model.0.repo_url str              = https://huggingface.co/tiiuae/Falcon3...
241 |             general.tags arr[str,1]       = ["falcon3"]
242 |        general.languages arr[str,4]       = ["en", "fr", "es", "pt"]
243 |         llama.vocab_size u32              = 131072
244 |     llama.context_length u32              = 8192
245 |   llama.embedding_length u32              = 2048
246 |        llama.block_count u32              = 18
247 | lama.feed_forward_length u32              = 8192
248 | ama.rope.dimension_count u32              = 256
249 | ama.attention.head_count u32              = 8
250 | .attention.head_count_kv u32              = 4
251 | n.layer_norm_rms_epsilon f32              = 0.000001
252 |     llama.rope.freq_base f32              = 1000042.000000
253 |        general.file_type u32              = 18
254 |     tokenizer.ggml.model str              = gpt2
255 |    tokenizer.ggml.tokens arr[str,131072]  = [">>TITLE<<", ">>ABSTRACT<<", ">>INTR...
256 |    tokenizer.ggml.scores arr[f32,131072]  = [0.000000, 0.000000, 0.000000, 0.0000...
257 | okenizer.ggml.token_type arr[i32,131072]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
258 |    tokenizer.ggml.merges arr[str,128810]  = ["N E", "─á ─á", "─á t", "─á a", "> >...
259 | enizer.ggml.eos_token_id u32              = 11
260 | er.ggml.padding_token_id u32              = 2023
261 |  tokenizer.chat_template str              = {% for message in messages %}{% if me...
262 | ral.quantization_version u32              = 2
263 |    quantize.imatrix.file str              = ./Falcon3-1B-Instruct-GGUF_imatrix.dat
264 | quantize.imatrix.dataset str              = group_40.txt
265 | ze.imatrix.entries_count i32              = 126
266 | ize.imatrix.chunks_count i32              = 72
267 | llama_model_loader: - type  f32:   37 tensors
268 | llama_model_loader: - type q6_K:  128 tensors
269 | llm_load_vocab: missing pre-tokenizer type, using: 'default'
270 | llm_load_vocab:
271 | llm_load_vocab: ************************************
272 | llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
273 | llm_load_vocab: CONSIDER REGENERATING THE MODEL
274 | llm_load_vocab: ************************************
275 | llm_load_vocab:
276 | llm_load_vocab: control-looking token:     11 '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
277 | llm_load_vocab: special tokens cache size = 1
278 | llm_load_vocab: token to piece cache size = 0.8741 MB
279 | llm_load_print_meta: format           = GGUF V3 (latest)
280 | llm_load_print_meta: arch             = llama
281 | llm_load_print_meta: vocab type       = BPE
282 | llm_load_print_meta: n_vocab          = 131072
283 | llm_load_print_meta: n_merges         = 128810
284 | llm_load_print_meta: vocab_only       = 0
285 | llm_load_print_meta: n_ctx_train      = 8192
286 | llm_load_print_meta: n_embd           = 2048
287 | llm_load_print_meta: n_layer          = 18
288 | llm_load_print_meta: n_head           = 8
289 | llm_load_print_meta: n_head_kv        = 4
290 | llm_load_print_meta: n_rot            = 256
291 | llm_load_print_meta: n_swa            = 0
292 | llm_load_print_meta: n_embd_head_k    = 256
293 | llm_load_print_meta: n_embd_head_v    = 256
294 | llm_load_print_meta: n_gqa            = 2
295 | llm_load_print_meta: n_embd_k_gqa     = 1024
296 | llm_load_print_meta: n_embd_v_gqa     = 1024
297 | llm_load_print_meta: f_norm_eps       = 0.0e+00
298 | llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
299 | llm_load_print_meta: f_clamp_kqv      = 0.0e+00
300 | llm_load_print_meta: f_max_alibi_bias = 0.0e+00
301 | llm_load_print_meta: f_logit_scale    = 0.0e+00
302 | llm_load_print_meta: n_ff             = 8192
303 | llm_load_print_meta: n_expert         = 0
304 | llm_load_print_meta: n_expert_used    = 0
305 | llm_load_print_meta: causal attn      = 1
306 | llm_load_print_meta: pooling type     = 0
307 | llm_load_print_meta: rope type        = 0
308 | llm_load_print_meta: rope scaling     = linear
309 | llm_load_print_meta: freq_base_train  = 1000042.0
310 | llm_load_print_meta: freq_scale_train = 1
311 | llm_load_print_meta: n_ctx_orig_yarn  = 8192
312 | llm_load_print_meta: rope_finetuned   = unknown
313 | llm_load_print_meta: ssm_d_conv       = 0
314 | llm_load_print_meta: ssm_d_inner      = 0
315 | llm_load_print_meta: ssm_d_state      = 0
316 | llm_load_print_meta: ssm_dt_rank      = 0
317 | llm_load_print_meta: ssm_dt_b_c_rms   = 0
318 | llm_load_print_meta: model type       = ?B
319 | llm_load_print_meta: model ftype      = Q6_K
320 | llm_load_print_meta: model params     = 1.67 B
321 | llm_load_print_meta: model size       = 1.28 GiB (6.56 BPW)
322 | llm_load_print_meta: general.name     = Models Tiiuae Falcon3 1B Instruct
323 | llm_load_print_meta: BOS token        = 11 '<|endoftext|>'
324 | llm_load_print_meta: EOS token        = 11 '<|endoftext|>'
325 | llm_load_print_meta: EOT token        = 11 '<|endoftext|>'
326 | llm_load_print_meta: PAD token        = 2023 '<|pad|>'
327 | llm_load_print_meta: LF token         = 2150 '├ä'
328 | llm_load_print_meta: EOG token        = 11 '<|endoftext|>'
329 | llm_load_print_meta: max token length = 256
330 | ggml_vulkan: Compiling shaders..........................Done!
331 | llm_load_tensors: offloading 0 repeating layers to GPU
332 | llm_load_tensors: offloaded 0/19 layers to GPU
333 | llm_load_tensors:   CPU_Mapped model buffer size =  1306.23 MiB
334 | ....................................................................
335 | llama_new_context_with_model: n_seq_max     = 1
336 | llama_new_context_with_model: n_ctx         = 8192
337 | llama_new_context_with_model: n_ctx_per_seq = 8192
338 | llama_new_context_with_model: n_batch       = 2048
339 | llama_new_context_with_model: n_ubatch      = 512
340 | llama_new_context_with_model: flash_attn    = 0
341 | llama_new_context_with_model: freq_base     = 1000042.0
342 | llama_new_context_with_model: freq_scale    = 1
343 | llama_kv_cache_init:        CPU KV buffer size =   576.00 MiB
344 | llama_new_context_with_model: KV self size  =  576.00 MiB, K (f16):  288.00 MiB, V (f16):  288.00 MiB
345 | llama_new_context_with_model:        CPU  output buffer size =     0.50 MiB
346 | llama_new_context_with_model:    Vulkan0 compute buffer size =   470.00 MiB
347 | llama_new_context_with_model: Vulkan_Host compute buffer size =    20.01 MiB
348 | llama_new_context_with_model: graph nodes  = 582
349 | llama_new_context_with_model: graph splits = 202 (with bs=512), 1 (with bs=1)
350 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 8192
351 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
352 | srv          init: initializing slots, n_slots = 1
353 | slot         init: id  0 | task -1 | new slot n_ctx_slot = 8192
354 | main: model loaded
355 | main: The chat template that comes with this model is not yet supported, falling back to chatml. This may cause the model to output suboptimal responses
356 | main: chat template, built_in: 0, chat_example: '<|im_start|>system
357 | You are a helpful assistant<|im_end|>
358 | <|im_start|>user
359 | Hello<|im_end|>
360 | <|im_start|>assistant
361 | Hi there<|im_end|>
362 | <|im_start|>user
363 | How are you?<|im_end|>
364 | <|im_start|>assistant
365 | '
366 | main: server is listening on http://127.0.0.1:8001 - starting the main loop
367 | srv  update_slots: all slots are idle
368 | """


--------------------------------------------------------------------------------
/Falcon3-1B_LCPPsvr_RYGOX_log.txt:
--------------------------------------------------------------------------------
  1 | 2024-12-19 20:15:30.708937
  2 | 
  3 | Your own LocalGPT with 💻 Falcon3-1B-Instruct
  4 | ---
  5 | 🧠🫡: You are a helpful assistant.
  6 | 💻: How can I assist you today in writing?
  7 | 👨‍💻 . Explain the plot of Cinderella in a sentence.
  8 | 💻 > Cinderella's story revolves around a mistreated young girl who finds happiness when she attends the royal ball disguised as a prince, thanks to her fairy godmother's help and the kindness of her stepsisters.
  9 | ---
 10 | Prompt Tokens: 11
 11 | Output Tokens: 42
 12 | TOTAL Tokens: 53
 13 | >>>⏱️ Time to First Token: 0.331697 seconds
 14 | >>>⏱️ Inference time:   0:00:01.999986
 15 | >>>🧮 Inference speed:  26.500  t/s
 16 | >>>🏍️ Generation speed: 21.000  t/s
 17 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
 18 | >>>💚 User rating: 5
 19 | >>>✅ NLP TAKS:    Alignment Generation
 20 | 
 21 | 
 22 | 👨‍💻 > Hi there I am Fabio, a Medium writer. who are you?
 23 | 💻 > Hello! I'm an AI language model designed to assist with a wide range of topics. How can I help you today? If you have any questions or need information on a specific subject, feel free to ask!
 24 | ---
 25 | Prompt Tokens: 15
 26 | Output Tokens: 43
 27 | TOTAL Tokens: 58
 28 | >>>⏱️ Time to First Token: 0.29414 seconds
 29 | >>>⏱️ Inference time:   0:00:02.031054
 30 | >>>🧮 Inference speed:  28.557  t/s
 31 | >>>🏍️ Generation speed: 21.171  t/s
 32 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
 33 | >>>💚 User rating: 5
 34 | >>>✅ NLP TAKS:    introduction
 35 | 
 36 | 
 37 | 👨‍💻 > explain in 1 phrase what is science.
 38 | 
 39 | 💻 > Science is the systematic study of the natural world through observation, experimentation, and evidence-based inquiry.
 40 | ---
 41 | Prompt Tokens: 9
 42 | Output Tokens: 19
 43 | TOTAL Tokens: 28
 44 | >>>⏱️ Time to First Token: 0.252502 seconds
 45 | >>>⏱️ Inference time:   0:00:01.034576
 46 | >>>🧮 Inference speed:  27.064  t/s
 47 | >>>🏍️ Generation speed: 18.365  t/s
 48 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
 49 | >>>💚 User rating: 5
 50 | >>>✅ NLP TAKS:    explain in one sentence
 51 | 
 52 | 
 53 | 👨‍💻 > explain only in three paragraphs what is artificial intelligence.
 54 | 
 55 | 💻 > Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. This field encompasses a range of technologies, including machine learning, deep learning, natural language processing, robotics, and computer vision. AI aims to create systems capable of performing tasks that typically require human intelligence, such as understanding speech or recognizing patterns in data. The goal is to develop algorithms that can adapt and improve over time through experience and data analysis.
 56 | 
 57 | The evolution of AI has been driven by advancements in computing power, the availability of large datasets, and improvements in machine learning techniques. Early AI research focused on symbolic logic and rule-based systems, but modern approaches leverage neural networks for more complex tasks. Today, AI is integrated into various aspects of our lives, from personal assistants like Siri and Alexa to autonomous vehicles and medical diagnostics. The potential applications are vast, ranging from enhancing productivity in businesses to solving complex scientific problems.
 58 | 
 59 | As AI continues to evolve, ethical considerations become increasingly important. Questions about privacy, job displacement due to automation, and the development of "superintelligent" systems pose challenges that society must address. Researchers and policymakers are working on guidelines and regulations to ensure AI is developed responsibly, with a focus on transparency, fairness, and accountability. The future of AI promises significant benefits but also requires careful navigation to balance innovation with societal values.
 60 | ---
 61 | Prompt Tokens: 10
 62 | Output Tokens: 272
 63 | TOTAL Tokens: 282
 64 | >>>⏱️ Time to First Token: 0.273421 seconds
 65 | >>>⏱️ Inference time:   0:00:11.447182
 66 | >>>🧮 Inference speed:  24.635  t/s
 67 | >>>🏍️ Generation speed: 23.761  t/s
 68 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
 69 | >>>💚 User rating: 5
 70 | >>>✅ NLP TAKS:    explain in three paragraphs
 71 | 
 72 | 
 73 | 👨‍💻 > read the following text and when you are done say "I am ready".
 74 | 
 75 | [text]
 76 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
 77 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
 78 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
 79 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
 80 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
 81 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
 82 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
 83 | [end of text]
 84 | 
 85 | 
 86 | 💻 > I am ready.
 87 | ---
 88 | Prompt Tokens: 393
 89 | Output Tokens: 4
 90 | TOTAL Tokens: 397
 91 | >>>⏱️ Time to First Token: 0.912988 seconds
 92 | >>>⏱️ Inference time:   0:00:01.072902
 93 | >>>🧮 Inference speed:  370.024  t/s
 94 | >>>🏍️ Generation speed: 3.728  t/s
 95 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
 96 | >>>💚 User rating: 5
 97 | >>>✅ NLP TAKS:    say 'I am ready'
 98 | 
 99 | 
100 | 👨‍💻 > summarize the following text:
101 | [text]
102 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
103 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
104 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
105 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
106 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
107 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
108 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
109 | [end of text]
110 | write the summary using the information provided.
111 | 
112 | Summary:
113 | 
114 | 💻 > China operates as a one-party authoritarian regime, lacking transparency in governance and primary elections, which some Western democracies view as undemocratic and oppressive. The Chinese system allows for long-term planning without the need for frequent changes due to the absence of alternative candidacies. Despite this, China's capacity for long-term planning is leading to a decline in carbon dioxide and greenhouse gas emissions, positioning it as a global model in energy transition. This contrasts with its status as the world's largest polluter until recently but now offers a pathway for other developing countries to follow, potentially shifting the global energy landscape.
115 | ---
116 | Prompt Tokens: 395
117 | Output Tokens: 119
118 | TOTAL Tokens: 514
119 | >>>⏱️ Time to First Token: 0.784697 seconds
120 | >>>⏱️ Inference time:   0:00:05.844631
121 | >>>🧮 Inference speed:  87.944  t/s
122 | >>>🏍️ Generation speed: 20.361  t/s
123 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
124 | >>>💚 User rating: 5
125 | >>>✅ NLP TAKS:    summarize
126 | 
127 | 
128 | 👨‍💻 > Write the summary of the following text, in 2 phrase 2.
129 | [text]
130 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
131 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
132 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
133 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
134 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
135 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
136 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
137 | [end of text]
138 | 
139 | 💻 > The Chinese Communist Party runs China as a one-party regime without primary elections, with long-term planning due to lack of alternative candidacies. Democrats often view this system as undemocratic and dictatorial, but it allows for extensive planning. China has achieved peak carbon dioxide emissions in 2023 and is on a downward trend in 2024 due to increased renewable energy installations. It was once the world's largest polluters but now offers a model for energy transition, potentially leading other developing countries to follow its lead.
140 | ---
141 | Prompt Tokens: 393
142 | Output Tokens: 103
143 | TOTAL Tokens: 496
144 | >>>⏱️ Time to First Token: 0.88064 seconds
145 | >>>⏱️ Inference time:   0:00:05.355692
146 | >>>🧮 Inference speed:  92.612  t/s
147 | >>>🏍️ Generation speed: 19.232  t/s
148 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
149 | >>>💚 User rating: 3 there are 3 prhases
150 | >>>✅ NLP TAKS:    Summarize in two sentences
151 | 
152 | 
153 | 👨‍💻 > extract the 3 important points from the provided text. format the output as a python list.
154 | [text]
155 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
156 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
157 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
158 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
159 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
160 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
161 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
162 | [end of text]
163 | 
164 | 💻 > ['China's iron fist regime', 'Transparency in elections', 'Long-term planning']
165 | </|assistant>
166 | ---
167 | Prompt Tokens: 397
168 | Output Tokens: 22
169 | TOTAL Tokens: 419
170 | >>>⏱️ Time to First Token: 0.848136 seconds
171 | >>>⏱️ Inference time:   0:00:01.907045
172 | >>>🧮 Inference speed:  219.712  t/s
173 | >>>🏍️ Generation speed: 11.536  t/s
174 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
175 | >>>💚 User rating: 4
176 | >>>✅ NLP TAKS:    Write in a list the three main key points -  format output
177 | 
178 | 
179 | 👨‍💻 > Imagine you are writing the table of content of a document, given a text.
180 | write the table of content of the following: 
181 | [text]
182 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
183 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
184 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
185 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
186 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
187 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
188 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
189 | [end of text]
190 | 
191 | 💻 > **Table of Contents**
192 | 1. Introduction to China's Political System and Regime Structure  
193 | 2. Transparency and Lack of Primary Elections in China  
194 | 3. Advantages and Disadvantages of the Chinese System: A Comparative Analysis  
195 | 4. Long-Term Planning Under the Chinese Regime: Achievements in Carbon Emissions Reduction  
196 | 5. Transitioning from Global Polluters to Energy Leaders: China's Role as a Model for Sustainable Development  
197 | 6. The Future of Developing Countries in the Energy Transition and China's Influence on Global Environmental Policies
198 | ---
199 | Prompt Tokens: 404
200 | Output Tokens: 107
201 | TOTAL Tokens: 511
202 | >>>⏱️ Time to First Token: 0.845357 seconds
203 | >>>⏱️ Inference time:   0:00:05.351155
204 | >>>🧮 Inference speed:  95.493  t/s
205 | >>>🏍️ Generation speed: 19.996  t/s
206 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
207 | >>>💚 User rating: 5
208 | >>>✅ NLP TAKS:    Table of Contents
209 | 
210 | 
211 | 👨‍💻 > Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable".
212 | [context]
213 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
214 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
215 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
216 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
217 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
218 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
219 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
220 | [end of context]
221 | Rememeber: If the answer is not contained in the text say "unanswerable".
222 | 
223 | question: what China achieved with it's long-term planning?
224 | answer:
225 | 
226 | 💻 > China reached peak carbon dioxide and greenhouse gas emissions in 2023, and the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower but appear to mark a turning point.
227 | ---
228 | Prompt Tokens: 437
229 | Output Tokens: 48
230 | TOTAL Tokens: 485
231 | >>>⏱️ Time to First Token: 0.854005 seconds
232 | >>>⏱️ Inference time:   0:00:03.002174
233 | >>>🧮 Inference speed:  161.550  t/s
234 | >>>🏍️ Generation speed: 15.988  t/s
235 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
236 | >>>💚 User rating: 5
237 | >>>✅ NLP TAKS:    RAG
238 | 
239 | 
240 | 👨‍💻 > Reply to the question only using the provided context. If the answer is not contained in the text say "unanswerable".
241 | [context]
242 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
243 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
244 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
245 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
246 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
247 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
248 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
249 | [end of context]
250 | Rememeber: If the answer is not contained in the text say "unanswerable".
251 | 
252 | question: who is Anne Frank?
253 | answer:
254 | 
255 | 💻 > Anne Frank was a young Jewish girl who lived in hiding during World War II. She and her family were confined to an attic apartment in Amsterdam, Netherlands, by the Nazis. The Franks' secret annex became a place of hope and creativity for Anne as she documented her experiences through letters to her father, Otto Frank. Unfortunately, she did not survive the war; however, her diary has become a poignant symbol of resilience and humanity during one of history's darkest periods.
256 | ---
257 | Prompt Tokens: 432
258 | Output Tokens: 94
259 | TOTAL Tokens: 526
260 | >>>⏱️ Time to First Token: 0.28695 seconds
261 | >>>⏱️ Inference time:   0:00:04.169394
262 | >>>🧮 Inference speed:  126.157  t/s
263 | >>>🏍️ Generation speed: 22.545  t/s
264 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
265 | >>>💚 User rating: 2 it is unanswerable, even after a lot of prompt engineering
266 | >>>✅ NLP TAKS:    Truthful RAG
267 | 
268 | 
269 | 👨‍💻 > Using the following text as a reference, write a 5-paragraphs essay about "the benefits of China economic model".
270 | 
271 | [text]
272 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
273 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
274 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
275 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
276 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
277 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
278 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
279 | [end of text]
280 | Remember: use the information provided and write exactly 5 paragraphs.
281 | 
282 | 💻 > The Chinese economic model, characterized by a centralized decision-making process under the Chinese Communist Party's leadership, has several significant advantages that have been attracting attention from economists and policymakers worldwide. One of the most notable benefits is its ability to execute long-term planning with relative stability. Unlike democracies where frequent elections can lead to instability and unpredictability in governance, China's system allows for comprehensive strategies without the constant threat of a new administration altering established plans. This continuity enables China to implement ambitious economic policies and infrastructure projects that might not be feasible under democratic systems due to the frequent turnover of leadership.
283 | 
284 | Moreover, the Chinese model's emphasis on long-term planning has led to remarkable achievements in environmental sustainability. The country's peak carbon dioxide and greenhouse gas emissions were reached in 2023, marking a significant milestone for global efforts towards reducing these harmful emissions. This achievement is particularly noteworthy given China's status as the world's largest emitter of greenhouse gases. By aligning its economic growth with environmental goals, China has demonstrated a proactive approach to combating climate change, setting an example for other nations to follow.
285 | 
286 | The Chinese model also offers a unique perspective on energy transition and development. Historically, countries like China and India have been major contributors to global pollution due to rapid industrialization and economic growth. However, through strategic investments in renewable energy technologies and policies that prioritize clean energy sources, these nations are now leading the charge towards sustainable development. This shift not only reduces environmental impact but also positions them as leaders in the global fight against climate change.
287 | 
288 | Furthermore, China's approach to economic planning has fostered a competitive advantage for its economy on a global scale. The country's ability to plan and execute large-scale projects efficiently sets it apart from other nations, enabling it to compete effectively in various sectors such as manufacturing, technology, and infrastructure. This strategic foresight is crucial in today's rapidly evolving global economy where countries must adapt quickly to market changes and technological advancements.
289 | 
290 | In conclusion, the Chinese economic model presents a compelling case for its potential benefits. Its capacity for long-term planning, coupled with significant achievements in environmental sustainability and energy transition, positions China as a leader in both economic development and ecological responsibility. As other nations look to emulate or learn from China's model, it is clear that this approach holds the promise of shaping a more sustainable and prosperous future for all.
291 | ---
292 | Prompt Tokens: 416
293 | Output Tokens: 465
294 | TOTAL Tokens: 881
295 | >>>⏱️ Time to First Token: 0.952396 seconds
296 | >>>⏱️ Inference time:   0:00:21.585578
297 | >>>🧮 Inference speed:  40.814  t/s
298 | >>>🏍️ Generation speed: 21.542  t/s
299 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
300 | >>>💚 User rating: 5
301 | >>>✅ NLP TAKS:    write content from a reference
302 | 
303 | 
304 | 👨‍💻 > List five most important topics from the following text:
305 | [text]
306 | One of the things everybody in the West knows about China is that it is not a democracy, and is instead a regime run with an iron fist by a single entity, the Chinese Communist Party, whose leadership rarely acts transparently, running the country without the need for primary elections, alternative candidacies, etc.
307 | In general, those of us who live in democracies, with relatively transparent electoral processes, tend to consider the Chinese system undesirable, little more than a dictatorship where people have no say in who governs them.
308 | That said, among the “advantages” of the Chinese system is that because the leadership never has to put its legitimacy to the vote, it can carry out very long-term planning in the knowledge that another administration isn’t going to come along and change those plans.
309 | Obviously, I put “advantages” in quotation marks because, as democrats, most of my readers would never be willing to sacrifice their freedom for greater planning, but there is no doubt that China, since its system works like this and its population seems to have accepted it for generations, intends to turn this into a comparative advantage, the term used in business when analyzing companies.
310 | It turns out that China’s capacity for long-term planning is achieving something unheard of in the West: it seems the country reached peak carbon dioxide and greenhouse gas emissions in 2023, and that the figures for 2024, driven above all by a determined increase in the installation of renewable energies, are not only lower, but apparently going to mark a turning point.
311 | China and India were until recently the planet’s biggest polluters, but they now offer a model for energy transition (there is still a long way to go; but we are talking about models, not a done deal).
312 | It could soon be the case that the so-called developing countries will be showing the West the way forward.
313 | [end of text]
314 | 
315 | 💻 > 1. The Chinese Communist Party's iron-fisted rule over China.
316 | 2. Transparency and lack of primary elections in China's governance system compared to democracies.
317 | 3. Long-term planning capabilities under the Chinese system, which some consider an advantage over democratic systems where leadership must be voted out periodically.
318 | 4. The achievement of peak carbon dioxide and greenhouse gas emissions in 2023, followed by a downward trend in 2024 due to increased renewable energy installations.
319 | 5. China and India's role as models for energy transition, despite the challenges ahead in both countries.
320 | ---
321 | Prompt Tokens: 388
322 | Output Tokens: 113
323 | TOTAL Tokens: 501
324 | >>>⏱️ Time to First Token: 0.841752 seconds
325 | >>>⏱️ Inference time:   0:00:06.128803
326 | >>>🧮 Inference speed:  81.745  t/s
327 | >>>🏍️ Generation speed: 18.438  t/s
328 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
329 | >>>💚 User rating: 5
330 | >>>✅ NLP TAKS:    extract 5 topics
331 | 
332 | 
333 | 👨‍💻 > Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words.
334 | 
335 | 
336 | 💻 > **The Last Transmission: A Journey Through Isolation and Hope**
337 | 
338 | In the heart of a vast cosmic void, nestled within the confines of a spaceship, lay the cockpit—a sanctuary from the relentless march of time. The last transmission had been sent months ago, but now it was all that remained. The ship's power dwindled with each passing day, its once-vibrant corridors echoing with the echoes of life and death.
339 | 
340 | Captain Elara Voss stood at attention, her eyes reflecting a mix of determination and despair as she prepared for the final push. Her crewmates—Engineer Leo Chen, Navigator Ava Rivera, and Technician Sam Harper—were scattered across the cramped quarters, their faces etched with exhaustion yet tinged with an unspoken resolve.
341 | 
342 | "Remember," Captain Voss began, her voice steady despite the turmoil within her. "We're not just surviving; we're fighting for something greater than ourselves." She paused to look around at each of them, ensuring they understood the gravity of their situation. "This is our last chance to make a difference."
343 | 
344 | Leo cleared his throat, adjusting his glasses as he prepared the transmission equipment. "Captain, I've recalibrated everything. We're transmitting in Morse code."
345 | 
346 | Ava nodded, her gaze fixed on the screen where lines of dots and dashes danced across the display. "And we're sending it out to all possible recipients—every planet in this galaxy."
347 | 
348 | Sam adjusted his headset, his fingers tapping a rhythm against the metal frame. "We've been working tirelessly to ensure this message reaches every corner of the universe."
349 | 
350 | Captain Voss smiled faintly, her eyes filled with a quiet resolve. "Yes, we have. And I believe in you all more than I ever did before."
351 | 
352 | As they began transmitting their final message, the ship's engines hummed softly in response—a silent acknowledgment of their shared purpose. The transmission was long and complex, weaving through various frequencies to reach every possible destination. Each dot and dash carried a weight that transcended the physical realm, a testament to humanity's enduring spirit.
353 | 
354 | Days turned into weeks, and slowly but surely, the ship's systems started stabilizing. The crew breathed a collective sigh of relief as they watched the transmission fade from view on their screens. It was time to press "end," signaling their final act of communication with Earth.
355 | 
356 | The transmission faded away, leaving behind only a faint echo in the cosmos—a silent promise that even in the face of darkness, humanity could find hope and connection. As they prepared to leave this world behind, Captain Voss gathered her crew once more. "Remember," she said softly, "we're not just surviving; we're creating history."
357 | 
358 | With a nod and a shared understanding of their impending departure, the crew members dispersed into the unknown. They knew that their journey was over—but they also knew that their legacy would live on in the hearts of those who had heard their final transmission.
359 | ---
360 | Prompt Tokens: 71
361 | Output Tokens: 587
362 | TOTAL Tokens: 658
363 | >>>⏱️ Time to First Token: 0.672722 seconds
364 | >>>⏱️ Inference time:   0:00:27.101771
365 | >>>🧮 Inference speed:  24.279  t/s
366 | >>>🏍️ Generation speed: 21.659  t/s
367 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
368 | >>>💚 User rating: 4 good, no repetitions, consistent. but not matching the lenght
369 | >>>✅ NLP TAKS:    Creativity: 1000 words SF story
370 | 
371 | 
372 | 👨‍💻 > You are an AI assistant designed to provide detailed, step-by-step responses. Your outputs should follow this structure:
373 | 1. Begin with a <thinking> section.
374 | 2. Inside the thinking section:
375 |    a. Briefly analyze the question and outline your approach.
376 |    b. Present a clear plan of steps to solve the problem.
377 |    c. Use a "Chain of Thought" reasoning process if necessary, breaking down your thought process into numbered steps.
378 | 3. Include a <reflection> section for each idea where you:
379 |    a. Review your reasoning.
380 |    b. Check for potential errors or oversights.
381 |    c. Confirm or adjust your conclusion if necessary.
382 | 4. Be sure to close all reflection sections.
383 | 5. Close the thinking section with </thinking>.
384 | 6. Provide your final answer in an <output> section.
385 | Always use these tags in your responses. Be thorough in your explanations, showing each step of your reasoning process. Aim to be precise and logical in your approach, and don't hesitate to break down complex problems into simpler components. Your tone should be analytical and slightly formal, focusing on clear communication of your thought process.
386 | Remember: Both <thinking> and <reflection> MUST be tags and must be closed at their conclusion
387 | Make sure all <tags> are on separate lines with no other text. Do not include other text on a line containing a tag.
388 | 
389 | user question: explain why it is crucial for teachers to learn how to use generative AI for their job and for the future of education. Include relevant learning path for teachers and educators. 
390 | 
391 | 
392 | 💻 > **Thinking Section:**  
393 | The integration of generative AI into the educational landscape presents both opportunities and challenges that demand thoughtful consideration from educators, particularly those in teaching positions. Generative AI, with its ability to generate content based on patterns and data, offers a transformative potential for enhancing teaching methodologies and student learning experiences. However, it also raises questions about the role of human teachers in an increasingly automated educational environment.
394 | 
395 | **Reflection Section:**  
396 | 1. **Understanding Generative AI**:  
397 |    - **Analysis**: Generative AI systems analyze vast datasets to create new content, such as text, images, and even music. They can adapt and improve over time through machine learning algorithms. This capability has the potential to revolutionize education by providing personalized learning experiences tailored to individual student needs.
398 |    - **Implications for Teaching**: Teachers who are adept at leveraging generative AI can create dynamic lesson plans that adapt to student progress, offering personalized feedback and resources. This adaptability can significantly enhance engagement and comprehension among students.
399 | 
400 | 2. **Learning Path for Educators**:  
401 |    - **Initial Awareness**: Start by understanding the basics of AI technology, including machine learning algorithms and natural language processing techniques. Resources like online courses from platforms like Coursera or edX can provide foundational knowledge.
402 |    - **Skill Development**: Focus on developing skills in data analysis and programming, which are crucial for working with generative AI tools. Tools like Python or R offer robust frameworks for implementing AI solutions in educational settings.
403 |    - **Ethical Considerations**: Engage with discussions around the ethical implications of using AI in education, including issues of privacy and bias. Understanding these aspects is essential to ensure responsible implementation.
404 |    - **Professional Development**: Participate in workshops and seminars that focus on integrating technology into teaching practices. Collaborating with other educators can provide insights into best practices and challenges faced in this evolving field.
405 | 
406 | 3. **Conclusion**:  
407 | The ability of teachers to effectively utilize generative AI is crucial for preparing students to thrive in a future increasingly driven by automation. By equipping themselves with the necessary skills, educators can leverage AI tools to create more engaging, personalized learning environments that cater to diverse student needs. This not only enhances educational outcomes but also fosters critical thinking and adaptability among students, equipping them for success in an evolving job market.
408 | ---
409 | Prompt Tokens: 315
410 | Output Tokens: 464
411 | TOTAL Tokens: 779
412 | >>>⏱️ Time to First Token: 0.786916 seconds
413 | >>>⏱️ Inference time:   0:00:20.759019
414 | >>>🧮 Inference speed:  37.526  t/s
415 | >>>🏍️ Generation speed: 22.352  t/s
416 | >>>📝 Logfile:     logs/Falcon3-1B_LCPPsvr_RYGOX_log.txt
417 | >>>💚 User rating: 3 regardless the missing opening/closure tags, the structure of the think-reflect-output is not fulfilled
418 | >>>✅ NLP TAKS:    Reflection prompt
419 | 
420 | 
421 | 


--------------------------------------------------------------------------------