├── images ├── assetsHere.md ├── tasks001.png ├── tasks002.png ├── tasks003.png ├── output_image.png ├── twittpost001.png ├── createHFtoken.png ├── 2024-12-03 13 41 29.png ├── 2024-12-03 13 42 08.png ├── 2024-12-03 13 45 47.png └── tweetPOST_workflow.png └── README.md /images/assetsHere.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /images/tasks001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks001.png -------------------------------------------------------------------------------- /images/tasks002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks002.png -------------------------------------------------------------------------------- /images/tasks003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tasks003.png -------------------------------------------------------------------------------- /images/output_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/output_image.png -------------------------------------------------------------------------------- /images/twittpost001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/twittpost001.png -------------------------------------------------------------------------------- /images/createHFtoken.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/createHFtoken.png -------------------------------------------------------------------------------- /images/2024-12-03 13 41 29.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 41 29.png -------------------------------------------------------------------------------- /images/2024-12-03 13 42 08.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 42 08.png -------------------------------------------------------------------------------- /images/2024-12-03 13 45 47.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/2024-12-03 13 45 47.png -------------------------------------------------------------------------------- /images/tweetPOST_workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fabiomatricardi/HF-API-tutorial/main/images/tweetPOST_workflow.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # HF-API-tutorial 4 | How to use Hugging Face Inference API calls from your python 5 | 6 | 7 | follow the full tutorial from my medium Article. 8 | 9 | [ADD THE LINK HERE](#) 10 | 11 | ### Requirements 12 | ``` 13 | pip install --no-cache-dir transformers huggingface_hub[inference] pillow tiktoken gradio_client streamlit 14 | ``` 15 | 16 | ### Usage 17 | Replace **`hf_xxxx`** with your created Hugging Face Access TOken 18 | 19 | meta-llama/Meta-Llama-3-8B-Instruct 20 | ```python 21 | from huggingface_hub import InferenceClient 22 | hftoken = 'hf_xxxx' 23 | client = InferenceClient(token=hftoken) 24 | 25 | messages = [{"role": "user", "content": "Explain what is Science in 3 paragraphs."}] 26 | client = InferenceClient("meta-llama/Meta-Llama-3-8B-Instruct",token=hftoken) 27 | output = client.chat_completion(messages, max_tokens=500) 28 | print(output.choices[0].message.content) 29 | ``` 30 | 31 | --- 32 | 33 | Using https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1 with `steraming effect` 34 | ```python 35 | from huggingface_hub import InferenceClient 36 | hftoken = 'hf_xxxx' 37 | client = InferenceClient(token=hftoken) 38 | messages = [{"role": "user", "content": "Explain what is Science in 3 paragraphs."}] 39 | stream = client.chat.completions.create( 40 | model="mistralai/Mixtral-8x7B-Instruct-v0.1", 41 | messages=messages, 42 | max_tokens=500, 43 | stream=True) 44 | for chunk in stream: 45 | print(chunk.choices[0].delta.content, end="") 46 | print('') 47 | ``` 48 | 49 | --- 50 | 51 | Using https://huggingface.co/Qwen/QwQ-32B-Preview with `steraming effect` 52 | ```python 53 | from huggingface_hub import InferenceClient 54 | hftoken = 'hf_xxxx' 55 | client = InferenceClient(token=hftoken) 56 | prompt = input(user: ) 57 | messages = [{"role": "user", "content": prompt}] 58 | stream = client.chat.completions.create( 59 | model="Qwen/QwQ-32B-Preview", 60 | messages=messages, 61 | max_tokens=500, 62 | stream=True) 63 | for chunk in stream: 64 | print(chunk.choices[0].delta.content, end="") 65 | print('') 66 | ``` 67 | 68 | --- 69 | 70 | Using https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1 with `steraming effect` 71 | ```python 72 | from huggingface_hub import InferenceClient 73 | hftoken = 'hf_xxxx' 74 | client = InferenceClient(token=hftoken) 75 | prompt = input(user: ) 76 | messages = [{"role": "user", "content": prompt}] 77 | stream = client.chat.completions.create( 78 | model="mistralai/Mixtral-8x22B-Instruct-v0.1", 79 | messages=messages, 80 | max_tokens=500, 81 | stream=True) 82 | for chunk in stream: 83 | print(chunk.choices[0].delta.content, end="") 84 | print('') 85 | ``` 86 | 87 | --- 88 | 89 | # Using Gradio_client 90 | Normal Inference API does not have additional parameters, that can become useful when dealing with image Generation 91 | 92 | So using gradio_clinet can come in hand 93 | 94 | The text used here: 95 | ``` 96 | headers = """Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI. 97 | Exploring the limitations of current Models and anticipating architectural shifts""" 98 | bruteText = """Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI. 99 | Exploring the limitations of current Models and anticipating architectural shifts 100 | FABIO 101 | DEC 02, 2024 102 | Share 103 | 104 | Before you learn how to write well with AI, you need to know how to write well. 105 | 106 | Before you learn how to add your own voice to an AI’s output, you need to know your own voice. 107 | 108 | Before you learn how to check whether AI summarizes a text correctly, you need to know how to read. 109 | 110 | Before you learn how to check AI for hallucinations, you need to know how to research. 111 | 112 | Artificial Intelligence is evolving so fast that generative models, particularly Large Language Models (LLMs), have captured the imagination of tech enthusiasts and industry leaders alike... and most probably even yours! 113 | 114 | However, beneath the surface of this excitement lies a complex reality that rings many bells! We need a closer examination of many warning signs. 115 | 116 | This newsletter, part 1 of a series, gives an introduction into the current state of GenAI, highlighting its limitations and giving you some glimpse into the future. 117 | 118 | what to read — by lexica.art 119 | 120 | In times of turmoil and controversy, listen to the quiet ones; the rest are picking sides to build their identity — Alberto Romero 121 | 122 | So, in case you are already bored, I will say the most important things first: start following the quiet voices, people who thinks differently, that are able to speak up and disprove the hypes. Here a short list of the ones I trust: 123 | 124 | The Algorithmic Bridge | Gary Marcus | The Kaitchup – AI on a Budget | Planet Earth & Beyond 125 | 126 | The Hype vs. Reality 127 | 128 | The prevailing narrative around GenAI is one of boundless potential, with promises of revolutionizing industries from content creation to customer service. Yet, a closer look reveals that while these models are impressive, they are not without flaws. 129 | 130 | A recent survey by CNBC indicated that while 79% of respondents have tried Microsoft's Copilot, only 25% found it worthwhile. This discrepancy suggests that the expectations set by the hype may not align with the current capabilities of these tools. 131 | 132 | It is a consolidated fact now, even though many have tried to throw smoke screens. Trying to keep the hype and secure founds, blindly bowing to the scaling law god as a sort of religion. 133 | 134 | LLMs are not near AGI, not even close. We are living the beginning, an experimental phase of what it can become. But considering the huge amount of money invested in it, the so called gurus of AI don’t want to admit it. 135 | 136 | But you don’t need to be a Machine Learning guru to see this coming! 137 | 138 | What you need… is to look where modern prophets are pointing, where outliers meet. 139 | 140 | Coming out of the metaphor. For months (now years) Big Tech has blindly moved capitals and resources promising the World the panacea of all evils. Generative AI has been pointed out as the new savior: and every bug and issue was washed away with a faithful (and blind) statement: 141 | 142 | we simply need more GPUs and more training data 143 | 144 | The reality is quite different. The Transformers architecture is an amazing achievement, it works wonders… but it has fundamental limits. It is a stepping stone, but it cannot be the final destination. 145 | 146 | Fundamental Limitations 147 | 148 | Current LLMs, predominantly based on the Transformer architecture, excel in many areas but struggle with several key issues: 149 | 150 | Sequential Data Handling: These models have difficulty managing time-dependent sequences, which is crucial for tasks involving causality and temporal relationships. 151 | 152 | Observability: There's a lack of transparency in how these models arrive at their outputs, making it hard to understand and trust their decision-making processes. 153 | 154 | Computational Complexity: The quadratic increase in computational requirements with longer sequences poses significant scalability challenges. 155 | 156 | Scaling Law Illusion: The belief that larger models and more data inevitably lead to better performance is being questioned, as returns diminish with increased scale. 157 | 158 | The Need for Architectural Innovation 159 | 160 | Given these limitations, there is a growing consensus that a radical shift in architectural design is necessary. 161 | 162 | Pioneering efforts are already underway, with startups like Liquid.ai exploring non-Transformer architectures and academic institutions re-investigating older models like Recurrent Neural Networks (RNNs). 163 | 164 | One notable example is the RWKV model, which aims to combine the strengths of Transformers and RNNs, offering efficient parallel training and linear scaling during inference. 165 | 166 | There are more other attempts all over the AI community: another innovation came from NVIDIA. They presented exactly 1 week ago (with paper published on November the 20th) a new small hybrid model: Hymba is a 1.5B hybrid Mamba x Attention Model that outperforms other small LLMs like Llama-3.2 or SmolLM v2 being trained on only 1.5T Tokens. An efficient and innovative blend that uses a new hybrid architecture with Mamba and Attention heads running in parallel with additional meta tokens, to improve the efficacy of the model… and it can be used commercially (This model is released under the NVIDIA Open Model License Agreement). 167 | 168 | Such innovations signal a move towards more efficient and capable AI architectures. 169 | 170 | Conclusion 171 | 172 | While current GenAI tools are remarkable, they represent just the beginning of a longer journey towards more sophisticated and reliable AI systems. 173 | 174 | But we need to be intellectually honest, and acknowledge their limitations. Once this first step is done, it becomes so easy to invest our time and resources in architectural innovation. 175 | 176 | I want to be part of the next generation of AI that overcomes today's challenges and unlocks new possibilities: won’t you? 177 | 178 | Stay tuned for our next newsletter, where we'll dive deeper into the latest research and developments in AI architecture. 179 | 180 | GIFT of the day: and article showing that there is plenty of amazing people, sharing their amazing results to solve daily real-world problems. And this is not something new! 181 | """ 182 | 183 | bruteText = bruteText.replace('\n\n','\n') 184 | ``` 185 | 186 | bruteText 187 | ``` 188 | """ 189 | Do you think LLM can THINK? Beyond the Hype lies the future of Generative AI. 190 | Exploring the limitations of current Models and anticipating architectural shifts 191 | FABIO 192 | DEC 02, 2024 193 | Share 194 | Before you learn how to write well with AI, you need to know how to write well. 195 | Before you learn how to add your own voice to an AI’s output, you need to know your own voice. 196 | Before you learn how to check whether AI summarizes a text correctly, you need to know how to read. 197 | Before you learn how to check AI for hallucinations, you need to know how to research. 198 | Artificial Intelligence is evolving so fast that generative models, particularly Large Language Models (LLMs), have captured the imagination of tech enthusiasts and industry leaders alike... and most probably even yours! 199 | However, beneath the surface of this excitement lies a complex reality that rings many bells! We need a closer examination of many warning signs. 200 | This newsletter, part 1 of a series, gives an introduction into the current state of GenAI, highlighting its limitations and giving you some glimpse into the future. 201 | what to read — by lexica.art 202 | In times of turmoil and controversy, listen to the quiet ones; the rest are picking sides to build their identity — Alberto Romero 203 | So, in case you are already bored, I will say the most important things first: start following the quiet voices, people who thinks differently, that are able to speak up and disprove the hypes. Here a short list of the ones I trust: 204 | The Algorithmic Bridge | Gary Marcus | The Kaitchup – AI on a Budget | Planet Earth & Beyond 205 | The Hype vs. Reality 206 | The prevailing narrative around GenAI is one of boundless potential, with promises of revolutionizing industries from content creation to customer service. Yet, a closer look reveals that while these models are impressive, they are not without flaws. 207 | A recent survey by CNBC indicated that while 79% of respondents have tried Microsoft's Copilot, only 25% found it worthwhile. This discrepancy suggests that the expectations set by the hype may not align with the current capabilities of these tools. 208 | It is a consolidated fact now, even though many have tried to throw smoke screens. Trying to keep the hype and secure founds, blindly bowing to the scaling law god as a sort of religion. 209 | LLMs are not near AGI, not even close. We are living the beginning, an experimental phase of what it can become. But considering the huge amount of money invested in it, the so called gurus of AI don’t want to admit it. 210 | But you don’t need to be a Machine Learning guru to see this coming! 211 | What you need… is to look where modern prophets are pointing, where outliers meet. 212 | Coming out of the metaphor. For months (now years) Big Tech has blindly moved capitals and resources promising the World the panacea of all evils. Generative AI has been pointed out as the new savior: and every bug and issue was washed away with a faithful (and blind) statement: 213 | we simply need more GPUs and more training data 214 | The reality is quite different. The Transformers architecture is an amazing achievement, it works wonders… but it has fundamental limits. It is a stepping stone, but it cannot be the final destination. 215 | Fundamental Limitations 216 | Current LLMs, predominantly based on the Transformer architecture, excel in many areas but struggle with several key issues: 217 | Sequential Data Handling: These models have difficulty managing time-dependent sequences, which is crucial for tasks involving causality and temporal relationships. 218 | Observability: There's a lack of transparency in how these models arrive at their outputs, making it hard to understand and trust their decision-making processes. 219 | Computational Complexity: The quadratic increase in computational requirements with longer sequences poses significant scalability challenges. 220 | Scaling Law Illusion: The belief that larger models and more data inevitably lead to better performance is being questioned, as returns diminish with increased scale. 221 | The Need for Architectural Innovation 222 | Given these limitations, there is a growing consensus that a radical shift in architectural design is necessary. 223 | Pioneering efforts are already underway, with startups like Liquid.ai exploring non-Transformer architectures and academic institutions re-investigating older models like Recurrent Neural Networks (RNNs). 224 | One notable example is the RWKV model, which aims to combine the strengths of Transformers and RNNs, offering efficient parallel training and linear scaling during inference. 225 | There are more other attempts all over the AI community: another innovation came from NVIDIA. They presented exactly 1 week ago (with paper published on November the 20th) a new small hybrid model: Hymba is a 1.5B hybrid Mamba x Attention Model that outperforms other small LLMs like Llama-3.2 or SmolLM v2 being trained on only 1.5T Tokens. An efficient and innovative blend that uses a new hybrid architecture with Mamba and Attention heads running in parallel with additional meta tokens, to improve the efficacy of the model… and it can be used commercially (This model is released under the NVIDIA Open Model License Agreement). 226 | Such innovations signal a move towards more efficient and capable AI architectures. 227 | Conclusion 228 | While current GenAI tools are remarkable, they represent just the beginning of a longer journey towards more sophisticated and reliable AI systems. 229 | But we need to be intellectually honest, and acknowledge their limitations. Once this first step is done, it becomes so easy to invest our time and resources in architectural innovation. 230 | I want to be part of the next generation of AI that overcomes today's challenges and unlocks new possibilities: won’t you? 231 | Stay tuned for our next newsletter, where we'll dive deeper into the latest research and developments in AI architecture. 232 | GIFT of the day: and article showing that there is plenty of amazing people, sharing their amazing results to solve daily real-world problems. And this is not something new! 233 | """ 234 | ``` 235 | 236 | ### Create the prompt for stable Diffusion 237 | ```python 238 | SD_prompt = f'Create a prompt for Stable Diffusion based on the information below. Return only the prompt.\---\n{headers}\n\nPROMPT:' 239 | client = InferenceClient(token=token) 240 | messages = [{"role": "user", "content": SD_prompt}] 241 | completion = client.chat.completions.create( 242 | model="Qwen/Qwen2.5-72B-Instruct", 243 | messages=messages, 244 | max_tokens=500 245 | ) 246 | ImageGEN_prompt = completion.choices[0].message.content 247 | ``` 248 | or 249 | ```python 250 | from PIL import Image 251 | client = InferenceClient("strangerzonehf/Flux-Midjourney-Mix2-LoRA", token=token) 252 | # output is a PIL.Image object 253 | image = client.text_to_image(ImageGEN_prompt) 254 | image.save("twittpost002.png") 255 | image.show() 256 | ``` 257 | 258 | 259 | 260 | As you can see we have 0 control on the generation parameters. 261 | ```python 262 | client = Client("stabilityai/stable-diffusion-3.5-large") 263 | result = client.predict( 264 | prompt=ImageGEN_prompt, 265 | negative_prompt='blur', 266 | seed=0, 267 | randomize_seed=True, 268 | width=1360, 269 | height=768, 270 | guidance_scale=4.5, 271 | num_inference_steps=30, 272 | api_name="/infer" 273 | temp = result[0] 274 | image = Image.open(temp) 275 | image.save("output_image.png") 276 | image.show() 277 | ``` 278 | 279 | 280 | ### Generate 3 Tweets from the newsletter 281 | ``` 282 | Tweet_prompt = f"""Read the following newsletter. rewrite it into 3 twitter posts in English, in progression. 283 | --- 284 | {bruteText}""" 285 | ``` 286 | Now we call the same `Qwen/Qwen2.5-72B-Instruct` endpoint and split the tweets into 3 287 | ```python 288 | client = InferenceClient(token=token) 289 | messages = [{"role": "user", "content": Tweet_prompt}] 290 | completion = client.chat.completions.create( 291 | model="Qwen/Qwen2.5-72B-Instruct", 292 | messages=messages, 293 | max_tokens=500 294 | ) 295 | from rich.console import Console 296 | console = Console(width=80) 297 | tweet1 = completion.choices[0].message.content.split('1:')[1].split('\n\n')[0] 298 | tweet2 = completion.choices[0].message.content.split('2:')[1].split('\n\n')[0] 299 | tweet3 = completion.choices[0].message.content.split('3:')[1] 300 | console.print(tweet1) 301 | console.rule() 302 | console.print(tweet2) 303 | console.rule() 304 | console.print(tweet3) 305 | console.rule() 306 | ``` 307 | The result is like this 308 | ``` 309 | ORIGINaL RESULT... 310 | 311 | ### Tweet 1: 312 | Before diving into AI writing, know your own writing voice. Before trusting AI summaries, learn to read critically. Before dismissing AI hallucinations, master research. AI is evolving, but it starts with you. #GenAI #LLMs #TechTrends 313 | 314 | ### Tweet 2: 315 | The hype around Generative AI is real, but it's crucial to see beyond the surface. LLMs, while impressive, have significant limitations in handling sequential data, transparency, and computational complexity. The scaling law isn’t a magic solution. #AIReality #TechDebate 316 | 317 | ### Tweet 3: 318 | Innovation in AI architecture is on the horizon. Startups and researchers are exploring non-Transformer models like RWKV and Hymba, which offer efficient parallel training and linear scaling. The future of AI is about more than just size—it's about smarter design. Stay tuned for more! #AIFuture #TechInnovation 319 | ``` 320 | 321 | final 322 | ``` 323 | Before diving into AI writing, know your own writing voice. Before trusting AI 324 | summaries, learn to read critically. Before dismissing AI hallucinations, master 325 | research. AI is evolving, but it starts with you. #GenAI #LLMs #TechTrends 326 | ──────────────────────────────────────────────────────────────────────────────── 327 | The hype around Generative AI is real, but it's crucial to see beyond the 328 | surface. LLMs, while impressive, have significant limitations in handling 329 | sequential data, transparency, and computational complexity. The scaling law 330 | isn’t a magic solution. #AIReality #TechDebate 331 | ──────────────────────────────────────────────────────────────────────────────── 332 | Innovation in AI architecture is on the horizon. Startups and researchers are 333 | exploring non-Transformer models like RWKV and Hymba, which offer efficient 334 | parallel training and linear scaling. The future of AI is about more than just 335 | size—it's about smarter design. Stay tuned for more! #AIFuture #TechInnovation 336 | 337 | ``` 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | --------------------------------------------------------------------------------