├── images
├── assetsHere.md
├── tasks001.png
├── tasks002.png
├── tasks003.png
├── output_image.png
├── twittpost001.png
├── createHFtoken.png
├── 2024-12-03 13 41 29.png
├── 2024-12-03 13 42 08.png
├── 2024-12-03 13 45 47.png
└── tweetPOST_workflow.png
└── README.md
/images/assetsHere.md:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/images/tasks001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks001.png
--------------------------------------------------------------------------------
/images/tasks002.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks002.png
--------------------------------------------------------------------------------
/images/tasks003.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks003.png
--------------------------------------------------------------------------------
/images/output_image.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/output_image.png
--------------------------------------------------------------------------------
/images/twittpost001.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/twittpost001.png
--------------------------------------------------------------------------------
/images/createHFtoken.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/createHFtoken.png
--------------------------------------------------------------------------------
/images/2024-12-03 13 41 29.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 41 29.png
--------------------------------------------------------------------------------
/images/2024-12-03 13 42 08.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 42 08.png
--------------------------------------------------------------------------------
/images/2024-12-03 13 45 47.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 45 47.png
--------------------------------------------------------------------------------
/images/tweetPOST_workflow.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tweetPOST_workflow.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # HF-API-tutorial
4 | How to use Hugging Face Inference API calls from your python
5 |
6 |
7 | follow the full tutorial from my medium Article.
8 |
9 | [ADD THE LINK HERE](#)
10 |
11 | ### Requirements
12 | ```
13 | pip install --no-cache-dir transformers huggingface_hub[inference] pillow tiktoken gradio_client streamlit
14 | ```
15 |
16 | ### Usage
17 | Replace **`hf_xxxx`** with your created Hugging Face Access TOken
18 |
19 | meta-llama/Meta-Llama-3-8B-Instruct
20 | ```python
21 | from huggingface_hub import InferenceClient
22 | hftoken = 'hf_xxxx'
23 | client = InferenceClient(token=hftoken)
24 |
25 | messages = [{"role": "user", "content": "Explain what is Science in 3 paragraphs."}]
26 | client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct",token=hftoken)
27 | output = client.chat_completion(messages, max_tokens=500)
28 | print(output.choices[0].message.content)
29 | ```
30 |
31 | ---
32 |
33 | Using https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 with `steraming effect`
34 | ```python
35 | from huggingface_hub import InferenceClient
36 | hftoken = 'hf_xxxx'
37 | client = InferenceClient(token=hftoken)
38 | messages = [{"role": "user", "content": "Explain what is Science in 3 paragraphs."}]
39 | stream = client.chat.completions.create(
40 | model="mistralai/Mixtral-8x7B-Instruct-v0.1",
41 | messages=messages,
42 | max_tokens=500,
43 | stream=True)
44 | for chunk in stream:
45 | print(chunk.choices[0].delta.content, end="")
46 | print('')
47 | ```
48 |
49 | ---
50 |
51 | Using https://huggingface.co/Qwen/QwQ-32B-Preview with `steraming effect`
52 | ```python
53 | from huggingface_hub import InferenceClient
54 | hftoken = 'hf_xxxx'
55 | client = InferenceClient(token=hftoken)
56 | prompt = input(user: )
57 | messages = [{"role": "user", "content": prompt}]
58 | stream = client.chat.completions.create(
59 | model="Qwen/QwQ-32B-Preview",
60 | messages=messages,
61 | max_tokens=500,
62 | stream=True)
63 | for chunk in stream:
64 | print(chunk.choices[0].delta.content, end="")
65 | print('')
66 | ```
67 |
68 | ---
69 |
70 | Using https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1 with `steraming effect`
71 | ```python
72 | from huggingface_hub import InferenceClient
73 | hftoken = 'hf_xxxx'
74 | client = InferenceClient(token=hftoken)
75 | prompt = input(user: )
76 | messages = [{"role": "user", "content": prompt}]
77 | stream = client.chat.completions.create(
78 | model="mistralai/Mixtral-8x22B-Instruct-v0.1",
79 | messages=messages,
80 | max_tokens=500,
81 | stream=True)
82 | for chunk in stream:
83 | print(chunk.choices[0].delta.content, end="")
84 | print('')
85 | ```
86 |
87 | ---
88 |
89 | # Using Gradio_client
90 | Normal Inference API does not have additional parameters, that can become useful when dealing with image Generation
91 |
92 | So using gradio_clinet can come in hand
93 |
94 | The text used here:
95 | ```
96 | headers = """Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI.
97 | Exploring the limitations of current Models and anticipating architectural shifts"""
98 | bruteText = """Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI.
99 | Exploring the limitations of current Models and anticipating architectural shifts
100 | FABIO
101 | DEC 02, 2024
102 | Share
103 |
104 | Before you learn how to write well with AI, you need to know how to write well.
105 |
106 | Before you learn how to add your own voice to an AI’s output, you need to know your own voice.
107 |
108 | Before you learn how to check whether AI summarizes a text correctly, you need to know how to read.
109 |
110 | Before you learn how to check AI for hallucinations, you need to know how to research.
111 |
112 | Artificial Intelligence is evolving so fast that generative models, particularly Large Language Models (LLMs), have captured the imagination of tech enthusiasts and industry leaders alike... and most probably even yours!
113 |
114 | However, beneath the surface of this excitement lies a complex reality that rings many bells! We need a closer examination of many warning signs.
115 |
116 | This newsletter, part 1 of a series, gives an introduction into the current state of GenAI, highlighting its limitations and giving you some glimpse into the future.
117 |
118 | what to read — by lexica.art
119 |
120 | In times of turmoil and controversy, listen to the quiet ones; the rest are picking sides to build their identity — Alberto Romero
121 |
122 | So, in case you are already bored, I will say the most important things first: start following the quiet voices, people who thinks differently, that are able to speak up and disprove the hypes. Here a short list of the ones I trust:
123 |
124 | The Algorithmic Bridge | Gary Marcus | The Kaitchup – AI on a Budget | Planet Earth & Beyond
125 |
126 | The Hype vs. Reality
127 |
128 | The prevailing narrative around GenAI is one of boundless potential, with promises of revolutionizing industries from content creation to customer service. Yet, a closer look reveals that while these models are impressive, they are not without flaws.
129 |
130 | A recent survey by CNBC indicated that while 79% of respondents have tried Microsoft's Copilot, only 25% found it worthwhile. This discrepancy suggests that the expectations set by the hype may not align with the current capabilities of these tools.
131 |
132 | It is a consolidated fact now, even though many have tried to throw smoke screens. Trying to keep the hype and secure founds, blindly bowing to the scaling law god as a sort of religion.
133 |
134 | LLMs are not near AGI, not even close. We are living the beginning, an experimental phase of what it can become. But considering the huge amount of money invested in it, the so called gurus of AI don’t want to admit it.
135 |
136 | But you don’t need to be a Machine Learning guru to see this coming!
137 |
138 | What you need… is to look where modern prophets are pointing, where outliers meet.
139 |
140 | Coming out of the metaphor. For months (now years) Big Tech has blindly moved capitals and resources promising the World the panacea of all evils. Generative AI has been pointed out as the new savior: and every bug and issue was washed away with a faithful (and blind) statement:
141 |
142 | we simply need more GPUs and more training data
143 |
144 | The reality is quite different. The Transformers architecture is an amazing achievement, it works wonders… but it has fundamental limits. It is a stepping stone, but it cannot be the final destination.
145 |
146 | Fundamental Limitations
147 |
148 | Current LLMs, predominantly based on the Transformer architecture, excel in many areas but struggle with several key issues:
149 |
150 | Sequential Data Handling: These models have difficulty managing time-dependent sequences, which is crucial for tasks involving causality and temporal relationships.
151 |
152 | Observability: There's a lack of transparency in how these models arrive at their outputs, making it hard to understand and trust their decision-making processes.
153 |
154 | Computational Complexity: The quadratic increase in computational requirements with longer sequences poses significant scalability challenges.
155 |
156 | Scaling Law Illusion: The belief that larger models and more data inevitably lead to better performance is being questioned, as returns diminish with increased scale.
157 |
158 | The Need for Architectural Innovation
159 |
160 | Given these limitations, there is a growing consensus that a radical shift in architectural design is necessary.
161 |
162 | Pioneering efforts are already underway, with startups like Liquid.ai exploring non-Transformer architectures and academic institutions re-investigating older models like Recurrent Neural Networks (RNNs).
163 |
164 | One notable example is the RWKV model, which aims to combine the strengths of Transformers and RNNs, offering efficient parallel training and linear scaling during inference.
165 |
166 | There are more other attempts all over the AI community: another innovation came from NVIDIA. They presented exactly 1 week ago (with paper published on November the 20th) a new small hybrid model: Hymba is a 1.5B hybrid Mamba x Attention Model that outperforms other small LLMs like Llama-3.2 or SmolLM v2 being trained on only 1.5T Tokens. An efficient and innovative blend that uses a new hybrid architecture with Mamba and Attention heads running in parallel with additional meta tokens, to improve the efficacy of the model… and it can be used commercially (This model is released under the NVIDIA Open Model License Agreement).
167 |
168 | Such innovations signal a move towards more efficient and capable AI architectures.
169 |
170 | Conclusion
171 |
172 | While current GenAI tools are remarkable, they represent just the beginning of a longer journey towards more sophisticated and reliable AI systems.
173 |
174 | But we need to be intellectually honest, and acknowledge their limitations. Once this first step is done, it becomes so easy to invest our time and resources in architectural innovation.
175 |
176 | I want to be part of the next generation of AI that overcomes today's challenges and unlocks new possibilities: won’t you?
177 |
178 | Stay tuned for our next newsletter, where we'll dive deeper into the latest research and developments in AI architecture.
179 |
180 | GIFT of the day: and article showing that there is plenty of amazing people, sharing their amazing results to solve daily real-world problems. And this is not something new!
181 | """
182 |
183 | bruteText = bruteText.replace('\n\n','\n')
184 | ```
185 |
186 | bruteText
187 | ```
188 | """
189 | Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI.
190 | Exploring the limitations of current Models and anticipating architectural shifts
191 | FABIO
192 | DEC 02, 2024
193 | Share
194 | Before you learn how to write well with AI, you need to know how to write well.
195 | Before you learn how to add your own voice to an AI’s output, you need to know your own voice.
196 | Before you learn how to check whether AI summarizes a text correctly, you need to know how to read.
197 | Before you learn how to check AI for hallucinations, you need to know how to research.
198 | Artificial Intelligence is evolving so fast that generative models, particularly Large Language Models (LLMs), have captured the imagination of tech enthusiasts and industry leaders alike... and most probably even yours!
199 | However, beneath the surface of this excitement lies a complex reality that rings many bells! We need a closer examination of many warning signs.
200 | This newsletter, part 1 of a series, gives an introduction into the current state of GenAI, highlighting its limitations and giving you some glimpse into the future.
201 | what to read — by lexica.art
202 | In times of turmoil and controversy, listen to the quiet ones; the rest are picking sides to build their identity — Alberto Romero
203 | So, in case you are already bored, I will say the most important things first: start following the quiet voices, people who thinks differently, that are able to speak up and disprove the hypes. Here a short list of the ones I trust:
204 | The Algorithmic Bridge | Gary Marcus | The Kaitchup – AI on a Budget | Planet Earth & Beyond
205 | The Hype vs. Reality
206 | The prevailing narrative around GenAI is one of boundless potential, with promises of revolutionizing industries from content creation to customer service. Yet, a closer look reveals that while these models are impressive, they are not without flaws.
207 | A recent survey by CNBC indicated that while 79% of respondents have tried Microsoft's Copilot, only 25% found it worthwhile. This discrepancy suggests that the expectations set by the hype may not align with the current capabilities of these tools.
208 | It is a consolidated fact now, even though many have tried to throw smoke screens. Trying to keep the hype and secure founds, blindly bowing to the scaling law god as a sort of religion.
209 | LLMs are not near AGI, not even close. We are living the beginning, an experimental phase of what it can become. But considering the huge amount of money invested in it, the so called gurus of AI don’t want to admit it.
210 | But you don’t need to be a Machine Learning guru to see this coming!
211 | What you need… is to look where modern prophets are pointing, where outliers meet.
212 | Coming out of the metaphor. For months (now years) Big Tech has blindly moved capitals and resources promising the World the panacea of all evils. Generative AI has been pointed out as the new savior: and every bug and issue was washed away with a faithful (and blind) statement:
213 | we simply need more GPUs and more training data
214 | The reality is quite different. The Transformers architecture is an amazing achievement, it works wonders… but it has fundamental limits. It is a stepping stone, but it cannot be the final destination.
215 | Fundamental Limitations
216 | Current LLMs, predominantly based on the Transformer architecture, excel in many areas but struggle with several key issues:
217 | Sequential Data Handling: These models have difficulty managing time-dependent sequences, which is crucial for tasks involving causality and temporal relationships.
218 | Observability: There's a lack of transparency in how these models arrive at their outputs, making it hard to understand and trust their decision-making processes.
219 | Computational Complexity: The quadratic increase in computational requirements with longer sequences poses significant scalability challenges.
220 | Scaling Law Illusion: The belief that larger models and more data inevitably lead to better performance is being questioned, as returns diminish with increased scale.
221 | The Need for Architectural Innovation
222 | Given these limitations, there is a growing consensus that a radical shift in architectural design is necessary.
223 | Pioneering efforts are already underway, with startups like Liquid.ai exploring non-Transformer architectures and academic institutions re-investigating older models like Recurrent Neural Networks (RNNs).
224 | One notable example is the RWKV model, which aims to combine the strengths of Transformers and RNNs, offering efficient parallel training and linear scaling during inference.
225 | There are more other attempts all over the AI community: another innovation came from NVIDIA. They presented exactly 1 week ago (with paper published on November the 20th) a new small hybrid model: Hymba is a 1.5B hybrid Mamba x Attention Model that outperforms other small LLMs like Llama-3.2 or SmolLM v2 being trained on only 1.5T Tokens. An efficient and innovative blend that uses a new hybrid architecture with Mamba and Attention heads running in parallel with additional meta tokens, to improve the efficacy of the model… and it can be used commercially (This model is released under the NVIDIA Open Model License Agreement).
226 | Such innovations signal a move towards more efficient and capable AI architectures.
227 | Conclusion
228 | While current GenAI tools are remarkable, they represent just the beginning of a longer journey towards more sophisticated and reliable AI systems.
229 | But we need to be intellectually honest, and acknowledge their limitations. Once this first step is done, it becomes so easy to invest our time and resources in architectural innovation.
230 | I want to be part of the next generation of AI that overcomes today's challenges and unlocks new possibilities: won’t you?
231 | Stay tuned for our next newsletter, where we'll dive deeper into the latest research and developments in AI architecture.
232 | GIFT of the day: and article showing that there is plenty of amazing people, sharing their amazing results to solve daily real-world problems. And this is not something new!
233 | """
234 | ```
235 |
236 | ### Create the prompt for stable Diffusion
237 | ```python
238 | SD_prompt = f'Create a prompt for Stable Diffusion based on the information below. Return only the prompt.\---\n{headers}\n\nPROMPT:'
239 | client = InferenceClient(token=token)
240 | messages = [{"role": "user", "content": SD_prompt}]
241 | completion = client.chat.completions.create(
242 | model="Qwen/Qwen2.5-72B-Instruct",
243 | messages=messages,
244 | max_tokens=500
245 | )
246 | ImageGEN_prompt = completion.choices[0].message.content
247 | ```
248 | or
249 | ```python
250 | from PIL import Image
251 | client = InferenceClient("strangerzonehf/Flux-Midjourney-Mix2-LoRA", token=token)
252 | # output is a PIL.Image object
253 | image = client.text_to_image(ImageGEN_prompt)
254 | image.save("twittpost002.png")
255 | image.show()
256 | ```
257 |
258 |
259 |
260 | As you can see we have 0 control on the generation parameters.
261 | ```python
262 | client = Client("stabilityai/stable-diffusion-3.5-large")
263 | result = client.predict(
264 | prompt=ImageGEN_prompt,
265 | negative_prompt='blur',
266 | seed=0,
267 | randomize_seed=True,
268 | width=1360,
269 | height=768,
270 | guidance_scale=4.5,
271 | num_inference_steps=30,
272 | api_name="/infer"
273 | temp = result[0]
274 | image = Image.open(temp)
275 | image.save("output_image.png")
276 | image.show()
277 | ```
278 |
279 |
280 | ### Generate 3 Tweets from the newsletter
281 | ```
282 | Tweet_prompt = f"""Read the following newsletter. rewrite it into 3 twitter posts in English, in progression.
283 | ---
284 | {bruteText}"""
285 | ```
286 | Now we call the same `Qwen/Qwen2.5-72B-Instruct` endpoint and split the tweets into 3
287 | ```python
288 | client = InferenceClient(token=token)
289 | messages = [{"role": "user", "content": Tweet_prompt}]
290 | completion = client.chat.completions.create(
291 | model="Qwen/Qwen2.5-72B-Instruct",
292 | messages=messages,
293 | max_tokens=500
294 | )
295 | from rich.console import Console
296 | console = Console(width=80)
297 | tweet1 = completion.choices[0].message.content.split('1:')[1].split('\n\n')[0]
298 | tweet2 = completion.choices[0].message.content.split('2:')[1].split('\n\n')[0]
299 | tweet3 = completion.choices[0].message.content.split('3:')[1]
300 | console.print(tweet1)
301 | console.rule()
302 | console.print(tweet2)
303 | console.rule()
304 | console.print(tweet3)
305 | console.rule()
306 | ```
307 | The result is like this
308 | ```
309 | ORIGINaL RESULT...
310 |
311 | ### Tweet 1:
312 | Before diving into AI writing, know your own writing voice. Before trusting AI summaries, learn to read critically. Before dismissing AI hallucinations, master research. AI is evolving, but it starts with you. #GenAI #LLMs #TechTrends
313 |
314 | ### Tweet 2:
315 | The hype around Generative AI is real, but it's crucial to see beyond the surface. LLMs, while impressive, have significant limitations in handling sequential data, transparency, and computational complexity. The scaling law isn’t a magic solution. #AIReality #TechDebate
316 |
317 | ### Tweet 3:
318 | Innovation in AI architecture is on the horizon. Startups and researchers are exploring non-Transformer models like RWKV and Hymba, which offer efficient parallel training and linear scaling. The future of AI is about more than just size—it's about smarter design. Stay tuned for more! #AIFuture #TechInnovation
319 | ```
320 |
321 | final
322 | ```
323 | Before diving into AI writing, know your own writing voice. Before trusting AI
324 | summaries, learn to read critically. Before dismissing AI hallucinations, master
325 | research. AI is evolving, but it starts with you. #GenAI #LLMs #TechTrends
326 | ────────────────────────────────────────────────────────────────────────────────
327 | The hype around Generative AI is real, but it's crucial to see beyond the
328 | surface. LLMs, while impressive, have significant limitations in handling
329 | sequential data, transparency, and computational complexity. The scaling law
330 | isn’t a magic solution. #AIReality #TechDebate
331 | ────────────────────────────────────────────────────────────────────────────────
332 | Innovation in AI architecture is on the horizon. Startups and researchers are
333 | exploring non-Transformer models like RWKV and Hymba, which offer efficient
334 | parallel training and linear scaling. The future of AI is about more than just
335 | size—it's about smarter design. Stay tuned for more! #AIFuture #TechInnovation
336 |
337 | ```
338 |
339 |
340 |
341 |
342 |
343 |
344 |
345 |
346 |
347 |
348 |
349 |
350 |
351 |
352 |
353 |
354 |
355 |
--------------------------------------------------------------------------------