├── 3D ├── object_detection └── Stable Diffusion V3 ├── GenerativeAI ├── Video │ └── sora ├── NL │ ├── LoRAShear │ ├── Connect_LLM_to_the_Internet_with_AutoGen.py │ └── Marlin_on_gptq.py ├── Image │ ├── Emu │ ├── diffusion │ └── Lumiere: A Space-Time Diffusion Model for Video Generation └── Basic.txt ├── Robotics ├── VLAM │ └── Helix ├── lerobot │ └── link ├── Genesis │ ├── link │ └── 1. What is Genesis ├── with_LLM │ └── SensorLM.py ├── Embodiment in virtual humans and robots └── What are Frames? ├── MultiModal ├── Octopus ├── Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens ├── MultiAgent │ ├── Agent_simulations │ │ └── READ │ ├── CrewAI │ │ ├── Open Source Models.py │ │ └── Agent with Tools.py │ └── gpt_reaserch ├── STT │ └── Live Speech-to-Text with Distil-Whisper and PyTorch.py ├── LLaVa-1.5 ├── LLaMA_3.2_Vision.py ├── Florence_2.py └── DeepSpeed-VisualChat ├── Tuning ├── Model_merge ├── Fine_tuning │ ├── Guide │ ├── LLM2LLM.txt │ ├── TorchTune │ ├── Reflection Tuning │ │ └── into │ ├── FLUTE │ ├── LLaMA_Factory.py │ ├── PEFT │ │ ├── VeRA │ │ ├── Parameter-Efficient Orthogonal Finetuning │ │ ├── OLoRA │ │ └── VB-LoRA │ ├── Odds Ratio Preference Optimization │ ├── Gemma │ │ └── Gemma_3_12B_for_Reasoning.py │ ├── Finetuning_LLama3_using_Axolotl.py │ ├── RAFT.py │ ├── ReFT │ │ └── into │ ├── Image │ │ └── DINOv2.py │ └── Fine-Tuning for Reasoning and Context ├── Hyper_paramter_tuning │ └── Insights about hyperparameters └── Distill │ └── Model_Distillation ├── custom_gpu_kernel ├── C5. Kernel Optimization & Backward Pass Implementation ├── Introduction to CUDA Programming for Python Developers └── reference_ ├── RAG ├── chunking │ ├── Document Understanding Transformer │ ├── Ollama_OCR.py │ ├── LlamaParse.py │ ├── AI_and_LLM_for_Document_Extraction.py │ ├── Build_a_Local_Ollama_OCR_Application_Powered_By_Llama_3.2_Vision.py │ └── LumberChunker ├── KAG │ └── KAG Graph + Multimodal RAG + LLM Agents.py ├── Network_Analysis_through_LLMs_for_Knowledge_Extraction │ ├── whisper.py │ ├── logger.py │ ├── schema.py │ ├── readme.txt │ ├── utils.py │ ├── llm.llm.py │ └── llm.prompts.py ├── etc │ └── Building an AI-Powered Web Search Assistant Using GPT-4 and Streamlit.py ├── Efficient_RAG_for_Mixed_Context_Texts_with_Indexify’s_Framework.py ├── Retrieval-Augmented Dual Instruction Tuning .txt ├── ReAugKD ├── Self_RAF.txt ├── eval │ ├── DeepEval.py │ └── UpTrain.py └── MemLong: Memory-Augmented Retrieval for Long Text LLM Generation ├── etc ├── groq ├── GNN │ └── Understanding Spectral Graph Theory and then the current SOTA of GNNs ├── MCP │ └── MCPO.py ├── Mixed Precision Training ├── NanoFlow ├── graph mining └── Min-P Sampling ├── text ├── Simple_basic ├── LLM │ ├── DeepSeek-R1 │ │ └── Deepseek Open Source Week │ ├── All_concept │ ├── From Concept to Code: Unveiling the ChatGPT Algorithm │ ├── Extracting_Text_from_Multi_Column_Pages.py │ ├── DeepSeek_R_ in_24GB_GPU.py │ ├── Main Stages of Auto-regressive Decoding for LLM Inference │ ├── quantization_of_llms_with_llama_cpp.py │ └── From Bytes to Ideas: LLMs Without Tokenization ├── llm-transparency-tool ├── AirLLM.py ├── Embedding │ ├── LLM2Vec_llama_3.py │ └── INSTRUCTOR.py ├── Agent │ ├── Adding_Memory_To_Agents.py │ ├── Search_Agent_with_Pydantic_AI.py │ ├── memory │ │ ├── Memory_in_Agent_langchain.py │ │ └── Build_AI_Agents_with_Active_Memory_Management_Using_LangMem.py │ └── OpenAI Agent SDK │ │ └── OpenAI_Agents_SDK [Explained]_with_Code_Implementation ├── Math_assistant_with_Orca-2-7B.py ├── BitNet b1.58 ├── Nemotron-4 15B ├── LlamaParse_Financial_Document_Analysis.py ├── Semantic Signal Separation.py └── Inside COSP and USP ├── readme.txt ├── Introducing Python 4.0 ├── Knowledge Graph Reasoning ├── Latent Knowledge Graphs ├── Enhancing_AI_with_Graph_RAG_Models.py ├── Graph_Maker.py └── optimizing-connections-graphs ├── OpenVINO.txt ├── new_arch ├── about_RWKV │ └── RWKV-6 ├── LNN │ └── LFM Liquid-3B ├── mamba │ ├── read.txt │ ├── mamba2 │ │ └── Into │ └── Jamba │ │ └── into ├── SUPRA: Turn a Transformer Model into an RNN Model ├── LCMs │ └── Reference ├── Gated Residual Networks: A Modern Secret Weapon for Tabular Deep Learning ├── nGPT ├── Diff_transformer │ └── Differential Transformer └── Kolmogorov-Arnold Networks (KANs) │ └── simple.py ├── Reasoning ├── Thinking │ └── Fast + Slow Thinking └── Llamaberry.py ├── Mixture of Experts └── QMoE ├── Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs ├── VISION └── Image │ ├── Ideogram_2.0.py │ ├── Flux_1.py │ ├── Mixture of Nested Experts │ └── VASA-1 ├── Voice ├── Veo ├── Entity Detection_on_Audio_Data.py └── Llava_and_Whisper.py ├── Continual_learning_A ├── Replay-based and Architecture-based approach.txt └── A_define.txt ├── Llama-Bitnet.py ├── gpt_with_confidence.py ├── attention └── Infini_attention │ ├── More_deep_init_attention │ └── infi-attention ├── automated-prompt-engineering ├── Jax ├── SELF-DISCOVER ├── AGI ├── AGI-24 │ └── Is Complexity an Illusion? └── DeepMind Unveils Groundbreaking Path to AGI Success ├── Training ├── Minitron approach └── 18RL │ └── etc │ └── FlowRL: Matching Reward Distributions for LLM Reasoning ├── RLAIF: Reinforcement Learning from AI Feedback └── Proximal Policy Optimization /3D/object_detection: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /GenerativeAI/Video/sora: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Robotics/VLAM/Helix: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /MultiModal/Octopus: -------------------------------------------------------------------------------- 1 | From https://arxiv.org/abs/2310.08588 2 | -------------------------------------------------------------------------------- /Tuning/Model_merge: -------------------------------------------------------------------------------- 1 | ### https://github.com/arcee-ai/mergekit 2 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Guide: -------------------------------------------------------------------------------- 1 | https://arxiv.org/html/2408.13296v2#Ch4 2 | -------------------------------------------------------------------------------- /GenerativeAI/NL/LoRAShear: -------------------------------------------------------------------------------- 1 | From https://arxiv.org/pdf/2310.18356.pdf 2 | -------------------------------------------------------------------------------- /Robotics/lerobot/link: -------------------------------------------------------------------------------- 1 | ### From https://github.com/huggingface/lerobot 2 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/LLM2LLM.txt: -------------------------------------------------------------------------------- 1 | https://github.com/SqueezeAILab/LLM2LLM 2 | -------------------------------------------------------------------------------- /custom_gpu_kernel/C5. Kernel Optimization & Backward Pass Implementation: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /RAG/chunking/Document Understanding Transformer: -------------------------------------------------------------------------------- 1 | ## https://github.com/CLOVAAI/DONUT 2 | -------------------------------------------------------------------------------- /etc/groq: -------------------------------------------------------------------------------- 1 | From https://towardsdatascience.com/groq-intuitively-and-exhaustively-explained-01e3fcd727ab 2 | -------------------------------------------------------------------------------- /text/Simple_basic: -------------------------------------------------------------------------------- 1 | ## https://medium.com/@vipra_singh/llm-architectures-explained-nlp-fundamentals-part-1-de5bf75e553a 2 | -------------------------------------------------------------------------------- /readme.txt: -------------------------------------------------------------------------------- 1 | Repository to store thesis and my study history to use reference for my future work 2 | 3 | Have to organize... 4 | -------------------------------------------------------------------------------- /Introducing Python 4.0: -------------------------------------------------------------------------------- 1 | ## From https://levelup.gitconnected.com/airflow-vs-mage-vs-kestra-e4bf6e35cfa2 2 | 3 | Just check given link 4 | -------------------------------------------------------------------------------- /Knowledge Graph Reasoning/Latent Knowledge Graphs: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@8thcross/latent-knowledge-graphs-66dbb479ed52 2 | 3 | 4 | -------------------------------------------------------------------------------- /OpenVINO.txt: -------------------------------------------------------------------------------- 1 | from https://medium.com/openvino-toolkit/introducing-openvino-2023-1-power-of-generative-ai-at-the-edge-84d8ce08d095 2 | 3 | -------------------------------------------------------------------------------- /custom_gpu_kernel/Introduction to CUDA Programming for Python Developers: -------------------------------------------------------------------------------- 1 | ### https://www.pyspur.dev/blog/introduction_cuda_programming 2 | -------------------------------------------------------------------------------- /new_arch/about_RWKV/RWKV-6: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@bnjmn_marie/rwkv-6-attention-free-and-state-of-the-art-7b-llm-320720df3c8c 2 | 3 | -------------------------------------------------------------------------------- /MultiModal/Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens: -------------------------------------------------------------------------------- 1 | ## From https://huggingface.co/papers/2410.13863 2 | -------------------------------------------------------------------------------- /RAG/KAG/KAG Graph + Multimodal RAG + LLM Agents.py: -------------------------------------------------------------------------------- 1 | ### https://pub.towardsai.net/kag-graph-multimodal-rag-llm-agents-powerful-ai-reasoning-b3da38d31358 2 | -------------------------------------------------------------------------------- /Robotics/Genesis/link: -------------------------------------------------------------------------------- 1 | https://github.com/Genesis-Embodied-AI/Genesis/tree/main 2 | 3 | https://genesis-world.readthedocs.io/en/latest/user_guide/index.html 4 | -------------------------------------------------------------------------------- /new_arch/LNN/LFM Liquid-3B: -------------------------------------------------------------------------------- 1 | ### See https://blog.stackademic.com/lfm-liquid-3b-breaks-even-with-transformers-and-this-is-only-the-starting-point-f81446fe4c8f 2 | -------------------------------------------------------------------------------- /new_arch/mamba/read.txt: -------------------------------------------------------------------------------- 1 | What is State Space? 2 | https://newsletter.maartengrootendorst.com/p/a-visual-guide-to-mamba-and-state#%C2%A7what-is-a-state-space 3 | -------------------------------------------------------------------------------- /Reasoning/Thinking/Fast + Slow Thinking: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/data-science-in-your-pocket/tencent-hunyuan-turbo-s-the-fastest-reasoning-llm-d64a02bed5c8 2 | 3 | 4 | -------------------------------------------------------------------------------- /text/LLM/DeepSeek-R1/Deepseek Open Source Week: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@favorable_eminence_oyster_546/deepseek-open-source-week-technical-deep-dive-and-implications-4fc232e0bec3 2 | -------------------------------------------------------------------------------- /Mixture of Experts/QMoE: -------------------------------------------------------------------------------- 1 | https://arxiv.org/pdf/2310.16795.pdf 2 | https://medium.com/@multiplatform.ai/qmoe-a-breakthrough-in-efficient-execution-of-trillion-parameter-language-models-128ea0f248d2 3 | -------------------------------------------------------------------------------- /etc/GNN/Understanding Spectral Graph Theory and then the current SOTA of GNNs: -------------------------------------------------------------------------------- 1 | ### https://isamu-website.medium.com/understanding-spectral-graph-theory-and-then-the-current-sota-of-gnns-e65caf363dbc 2 | -------------------------------------------------------------------------------- /text/LLM/All_concept: -------------------------------------------------------------------------------- 1 | https://mrmaheshrajput.medium.com/how-to-productionize-large-language-models-llms-060a4cb1a169 2 | 3 | Have to check given link. It is super long.. but it provide very good thing about LLM 4 | -------------------------------------------------------------------------------- /Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs: -------------------------------------------------------------------------------- 1 | From https://velog.io/@0404_not_found/Adaptation-with-Self-Evaluation-to-Improve-Selective-Prediction-in-LLMs 2 | https://arxiv.org/pdf/2310.11689.pdf 3 | -------------------------------------------------------------------------------- /text/LLM/From Concept to Code: Unveiling the ChatGPT Algorithm: -------------------------------------------------------------------------------- 1 | ### From https://pub.towardsai.net/from-concept-to-code-unveiling-the-chatgpt-algorithm-77e44e19f466 2 | ### It is very long and free. But it is very basic concept. But have to Read it 3 | -------------------------------------------------------------------------------- /custom_gpu_kernel/reference_: -------------------------------------------------------------------------------- 1 | https://github.com/JinSeoung-Oh/Extending_pytorch 2 | 3 | see : https://github.com/JinSeoung-Oh/Extending_pytorch/blob/main/c%2B%2B_cuda_extension 4 | https://github.com/JinSeoung-Oh/Extending_pytorch/blob/main/custom_c%2B%2B.py 5 | https://github.com/JinSeoung-Oh/Extending_pytorch/blob/main/Writing_a_Mixed_C%2B%2B_CUDA%20_extension.py 6 | -------------------------------------------------------------------------------- /text/LLM/Extracting_Text_from_Multi_Column_Pages.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@pymupdf/extracting-text-from-multi-column-pages-a-practical-pymupdf-guide-a5848e5899fe 2 | 3 | import pymupdf4llm 4 | import pathlib 5 | import sys 6 | 7 | 8 | filename = sys.argv[1] # read filename from command line 9 | outname = filename.replace(".pdf", ".md") 10 | md_text = pymupdf4llm.to_markdown(filename) 11 | 12 | # output document markdown text as one string 13 | pathlib.Path(outname).write_bytes(md_text.encode()) 14 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/whisper.py: -------------------------------------------------------------------------------- 1 | from src.logger import get_console_logger 2 | 3 | logger = get_console_logger("whisper") 4 | 5 | 6 | def create_transcript(openai_client, file_path: str) -> None: 7 | audio_file = open(file_path, "rb") 8 | logger.info(f"Creating transcript for {file_path}") 9 | transcript = openai_client.audio.transcriptions.create( 10 | model="whisper-1", file=audio_file 11 | ) 12 | logger.info(f"Transcript created for {file_path}") 13 | return transcript.text 14 | -------------------------------------------------------------------------------- /MultiModal/MultiAgent/Agent_simulations/READ: -------------------------------------------------------------------------------- 1 | MS - TinyTroupe 2 | https://medium.com/data-science-in-your-pocket/microsoft-tinytroupe-new-multi-ai-agent-framework-2f3f255930a1 3 | https://github.com/microsoft/TinyTroupe 4 | 5 | 1000 Agent 빌리지 6 | https://github.com/joonspk-research/generative_agents 7 | https://medium.com/@has.dhia/generative-agent-simulations-of-1-000-people-873b0e1761d8 8 | 9 | Agent-based simulation + Reinforcement Learning 10 | https://www.anylogic.kr/resources/educational-videos/next-gen-ai-agent-based-simulation-reinforcement-learning/ 11 | -------------------------------------------------------------------------------- /MultiModal/MultiAgent/CrewAI/Open Source Models.py: -------------------------------------------------------------------------------- 1 | from langchain.llms import Ollama 2 | llm_ollama = Ollama(model="YOUR_MODEL_NAME") 3 | 4 | Insure_agent = Agent( 5 | role='Insure_agent', 6 | goal="""responsible for listing the travel plan from advisor and giving the short 7 | insurance items based on the travel plan""", 8 | backstory="""You are an Insure agent who gives 9 | the short insurance items based on the travel plan. 10 | Don't ask questions. Make your response short.""", 11 | verbose=True, 12 | allow_delegation=False, 13 | llm=llm_ollama, 14 | -------------------------------------------------------------------------------- /RAG/chunking/Ollama_OCR.py: -------------------------------------------------------------------------------- 1 | ### https://medium.com/@mauryaanoop3/ollama-ocr-now-available-as-a-python-package-ff5e4240eb26 2 | 3 | !pip install ollama-ocr 4 | ollama pull llama3.2-vision:11b 5 | 6 | from ollama_ocr import OCRProcessor 7 | 8 | # Initialize OCR processor 9 | ocr = OCRProcessor(model_name='llama3.2-vision:11b') # You can use any vision model available on Ollama 10 | 11 | # Process an image 12 | result = ocr.process_image( 13 | image_path="path/to/your/image.png", 14 | format_type="markdown" # Options: markdown, text, json, structured, key_value 15 | ) 16 | print(result) 17 | -------------------------------------------------------------------------------- /text/llm-transparency-tool: -------------------------------------------------------------------------------- 1 | https://medium.com/syncedreview/unveiling-the-black-box-metas-lm-transparency-tool-deciphers-transformer-language-models-0d3ae5fef85a 2 | 3 | https://github.com/facebookresearch/llm-transparency-tool 4 | 5 | 1. Visualizes the “important” part of the prediction process along with importances of model components at varying levels of granularity; 6 | 2. allows interpreting representations and updates coming from model components; 7 | 3. enables analyzing large models where it is crucial to know what to inspect; 8 | 4. allows interactive exploration via a UI; 9 | 5. is highly efficient. 10 | -------------------------------------------------------------------------------- /MultiModal/STT/Live Speech-to-Text with Distil-Whisper and PyTorch.py: -------------------------------------------------------------------------------- 1 | ### Have to check what is Distil-whisper : https://medium.com/analytics-vidhya/live-speech-to-text-with-distil-whisper-and-pytorch-4f5c1e494667 2 | 3 | git clone https://github.com/yourusername/distil-whisper-live.git 4 | cd distil-whisper-live 5 | 6 | python3 -m venv venv 7 | source venv/bin/activate # On Windows use `venv\Scripts\activate` 8 | pip install -r requirements.txt 9 | 10 | [microphone] 11 | name = "Your Microphone Name" 12 | sample_rate = 16000 # Adjust as needed 13 | chunk_size = 1024 14 | 15 | docker run -p 6379:6379 redis 16 | 17 | python main.py 18 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/logger.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from rich.logging import RichHandler 3 | from typing import Optional 4 | 5 | 6 | def get_console_logger(name: Optional[str] = "default") -> logging.Logger: 7 | logger = logging.getLogger(name) 8 | if not logger.handlers: 9 | logger.setLevel(logging.DEBUG) 10 | console_handler = RichHandler() 11 | console_handler.setLevel(logging.DEBUG) 12 | formatter = logging.Formatter( 13 | "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 14 | ) 15 | console_handler.setFormatter(formatter) 16 | logger.addHandler(console_handler) 17 | 18 | return logger 19 | -------------------------------------------------------------------------------- /RAG/chunking/LlamaParse.py: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/llamaparse-revolutionizing-pdf-document-parsing-with-genai-e3192c075d2c 2 | 3 | !pip install -U llama-index --upgrade --no-cache-dir --force-reinstall --user 4 | !pip install llama-parse 5 | 6 | from llama_parse import LlamaParse 7 | import nest_asyncio 8 | nest_asyncio.apply() 9 | import os 10 | from llama_index.core import SimpleDirectoryReader 11 | os.environ["LLAMA_CLOUD_API_KEY"] = "your api key from llama cloud" 12 | 13 | parser = LlamaParse( 14 | result_type="markdown", 15 | verbose=True, 16 | language="en", 17 | num_workers=2, 18 | ) 19 | file_extractor = {".pdf": parser} 20 | pdf_documents = SimpleDirectoryReader( 21 | "./data", file_extractor=file_extractor 22 | ).load_data() 23 | 24 | -------------------------------------------------------------------------------- /GenerativeAI/Image/Emu: -------------------------------------------------------------------------------- 1 | From https://generativeai.pub/metas-first-ai-image-generator-emu-is-here-958a95c383b5 2 | https://ai.meta.com/research/publications/emu-enhancing-image-generation-models-using-photogenic-needles-in-a-haystack/ 3 | 4 | ## Emu 5 | Emu is based on an AI technique called “diffusion models”. 6 | Specifically, Emu uses a “latent diffusion model,” which means it first encodes the text prompt into 7 | a latent representation before going through the diffusion process to generate the image 8 | And Emu is faster then Midjourney and Dall-E3 9 | 10 | Meta compared Emu to the state-of-the-art SDXL1.0 model and found that Emu is preferred 11 | 68.4% of the time on visual appeal on the standard PartiPrompts benchmark and 71.3% on their Open User Input benchmark 12 | -------------------------------------------------------------------------------- /text/AirLLM.py: -------------------------------------------------------------------------------- 1 | ## From https://ai.gopubby.com/crazy-challenge-run-llama-405b-on-a-8gb-vram-gpu-ab5a280a3889 2 | 3 | !pip install airllm 4 | 5 | from airllm import AutoModel 6 | 7 | model = AutoModel.from_pretrained( 8 | "unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit") 9 | 10 | input_text = ['What is the capital of United States?',] 11 | 12 | input_tokens = model.tokenizer(input_text, 13 | return_tensors="pt", 14 | return_attention_mask=False, 15 | truncation=True, 16 | max_length=128, 17 | padding=False) 18 | 19 | generation_output = model.generate( 20 | input_tokens['input_ids'].cuda(), 21 | max_new_tokens=10, 22 | return_dict_in_generate=True) 23 | 24 | output = model.tokenizer.decode(generation_output.sequences[0]) 25 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/schema.py: -------------------------------------------------------------------------------- 1 | from sqlmodel import SQLModel, Field 2 | from typing import Optional 3 | 4 | import datetime 5 | from enum import Enum 6 | 7 | 8 | class FileType(Enum): 9 | AUDIO = "audio" 10 | TEXT = "text" 11 | VIDEO = "video" 12 | 13 | 14 | class Information(SQLModel, table=True): 15 | id: Optional[int] = Field(default=None, primary_key=True) 16 | filename: str = Field() 17 | title: Optional[str] = Field(default="NA", unique=False) 18 | hash_id: str = Field(unique=True) 19 | created_at: float = Field(default=datetime.datetime.now().timestamp()) 20 | file_type: FileType 21 | text: str = Field(default="") 22 | embedded: bool = Field(default=False) 23 | 24 | __table_args__ = {"extend_existing": True} 25 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/TorchTune: -------------------------------------------------------------------------------- 1 | From https://github.com/pytorch/torchtune 2 | 3 | 1. Easy-to-use fine-tuning native PyTorch library for creating, fine-tuning, and experimenting with LLMs. 4 | 2. Native PyTorch implementation of popular LLMs 5 | 3. Support for various checkpoint formats including HF format checkpoints 6 | 4. Providing training recipes for widely-used fine-tuning techniques through reference benchmarks and comprehensive accuracy checks 7 | 5. Model evaluation using the EleutherAI evaluation harness 8 | 6. Integration with HuggingFace datasets for training 9 | 7. Support for distributed training using PyTorch Distributed's FSDP 10 | 8. YAML configuration for easy setup of training runs 11 | 12 | """ 13 | [Scheduled] Support for low-precision dtype and quantization techniques with TorchAO 14 | [Scheduled] Support for integration with various inference engines. 15 | """ 16 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/readme.txt: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/towards-data-science/beyond-rag-network-analysis-through-llms-for-knowledge-extraction-4d107eb5282d 2 | 3 | Mind Mapper leverages RAG to create intermediate result representations useful to perform some kind of knowledge intelligence 4 | which is allows us in turn to better understand the output results of RAG over long and unstructured documents. 5 | 6 | Here are some of the tool’s features: 7 | 1. Manages text in basically all forms: copy-paste, textual and originating from audio source (video is contemplated too if the project is well received) 8 | 2. Uses an in-project SQLite database for data persistence 9 | 3. Leverages the state-of-the-art Upstash vector database to store vectors efficiently 10 | 4. Chunks from the vector database are then used to create a knowledge graph of the information 11 | 5. A final LLM is called to comment on the knowledge graph and extract insights 12 | 13 | Refer this file, maybe can build good KG 14 | -------------------------------------------------------------------------------- /etc/MCP/MCPO.py: -------------------------------------------------------------------------------- 1 | ### From https://mychen76.medium.com/mcpo-supercharge-open-webui-with-mcp-tools-4ee55024c371 2 | 3 | $ python -m venv .venv 4 | $ source .venv/bin/activate 5 | $ pip install mcpo 6 | 7 | # 1.time mcp server 8 | $ pip install mcp-server-time 9 | 10 | # 2.memory mcp server 11 | $ npm install @modelcontextprotocol/server-memory 12 | 13 | # 3.fetch mcp server 14 | $ pip install mcp-server-fetch 15 | 16 | --------------------------------------------------------------------------- 17 | ❯ cat config.json 18 | { 19 | "mcpServers": { 20 | "memory": { 21 | "command": "npx", 22 | "args": ["-y", "@modelcontextprotocol/server-memory"] 23 | }, 24 | "time": { 25 | "command": "uvx", 26 | "args": ["mcp-server-time", "--local-timezone=America/New_York"] 27 | }, 28 | "fetch": { 29 | "command": "uvx", 30 | "args": ["mcp-server-fetch"] 31 | } 32 | } 33 | } 34 | 35 | --------------------------------------------------------------------------- 36 | $ uvx mcpo --config config.json --port 8001 37 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Reflection Tuning/into: -------------------------------------------------------------------------------- 1 | # from https://medium.com/data-science-in-your-pocket/what-is-reflection-tuning-for-llms-4d4e60c74324 2 | 3 | 4 | Reflection Tuning is a recent development in Generative AI that has made the Llama 3.1 70B model the best open-sourced model through Fine-Tuning. 5 | Fine-tuning is the process of adapting pre-trained language models to specific tasks by training them on smaller, specialized datasets. 6 | 7 | Reflection Fine-Tuning is an enhanced form of fine-tuning, where the model reasons through a query, detects mistakes, corrects itself, 8 | and then provides the final response. 9 | The model uses specific tags such as , , and to organize its reasoning, error detection, correction, and final answer. 10 | 11 | For example, when asked “what is 2+2?”, the model reasons that the answer is 5 but corrects itself within the tags, realizing the correct answer is 4. 12 | This method has made significant improvements, such as making Llama 3.1 70B the best model, and it can be applied to other LLMs using platforms like unsloth. 13 | -------------------------------------------------------------------------------- /VISION/Image/Ideogram_2.0.py: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/ai-image-generators-ideogram-2-0-is-a-game-changer-2791ef7556ac 2 | 3 | import requests 4 | # Generates images synchronously based on a given prompt and optional parameters. (POST /generate) 5 | response = requests.post( 6 | "https://api.ideogram.ai/generate", 7 | headers={ 8 | "Api-Key": "", 9 | "Content-Type": "application/json" 10 | }, 11 | json={ 12 | "image_request": { 13 | "prompt": "A serene tropical beach scene. Dominating the foreground are tall palm trees with lush green leaves, standing tall against a backdrop of a sandy beach. The beach leads to the azure waters of the sea, which gently kisses the shoreline. In the distance, there is an island or landmass with a silhouette of what appears to be a lighthouse or tower. The sky above is painted with fluffy white clouds, some of which are tinged with hues of pink and orange, suggesting either a sunrise or sunset.", 14 | "aspect_ratio": "ASPECT_10_16", 15 | "model": "V_2", 16 | "magic_prompt_option": "AUTO" 17 | } 18 | }, 19 | ) 20 | print(response.json()) 21 | -------------------------------------------------------------------------------- /text/LLM/DeepSeek_R_ in_24GB_GPU.py: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@sonamshrish1618/deepseek-r1-in-24gb-gpu-dynamic-quantization-by-unsloth-ai-for-a-671b-parameter-model-6b0cf85f9065 2 | 3 | git clone https://github.com/ggerganov/llama.cpp 4 | cd llama.cpp 5 | cmake . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON 6 | cmake --build build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split 7 | 8 | from huggingface_hub import snapshot_download 9 | 10 | snapshot_download( 11 | repo_id="unsloth/DeepSeek-R1-GGUF", 12 | local_dir="DeepSeek-R1-GGUF", 13 | allow_patterns=["*UD-IQ1_S*"], # For the 1.58-bit version 14 | ) 15 | 16 | n_offload = floor((GPU_VRAM_GB / Model_FileSize_GB) * (Total_Layers - 4)) 17 | 18 | ./build/bin/llama-cli \ 19 | --model DeepSeek-R1-GGUF/DeepSeek-R1-UD-IQ1_S/DeepSeek-R1-UD-IQ1_S-00001-of-00003.gguf \ 20 | --cache-type-k q4_0 \ 21 | --threads 16 \ 22 | --prio 2 \ 23 | --temp 0.6 \ 24 | --ctx-size 8192 \ 25 | --seed 3407 \ 26 | --n-gpu-layers 7 \ 27 | -no-cnv \ 28 | --prompt "<|User|>Create a Flappy Bird game in Python.<|Assistant|>" 29 | -------------------------------------------------------------------------------- /text/Embedding/LLM2Vec_llama_3.py: -------------------------------------------------------------------------------- 1 | # From https://towardsdatascience.com/turn-llama-3-into-an-embedding-model-with-llm2vec-8448005f99aa 2 | # From https://huggingface.co/kaitchup/Llama-3-8B-llm2vec-Emb 3 | 4 | ############ Training Embedding model ################# 5 | pip install llm2vec 6 | pip install flash-attn --no-build-isolation 7 | 8 | """ 9 | check : https://github.com/McGill-NLP/llm2vec/blob/main/experiments/run_mntp.py 10 | and check : https://towardsdatascience.com/turn-llama-3-into-an-embedding-model-with-llm2vec-8448005f99aa 11 | """ 12 | 13 | import torch 14 | from llm2vec import LLM2Vec 15 | 16 | l2v = LLM2Vec.from_pretrained( 17 | "meta-llama/Meta-Llama-3-8B", 18 | device_map="cuda" if torch.cuda.is_available() else "cpu", 19 | torch_dtype=torch.bfloat16, 20 | ) 21 | 22 | l2v.save("Llama-3-8B-Emb") 23 | 24 | ######################################################### 25 | 26 | 27 | from sentence_transformers import SentenceTransformer 28 | 29 | model = SentenceTransformer("kaitchup/Llama-3-8B-llm2vec-Emb") 30 | # (or) Settings.embed_model = HuggingFaceEmbedding(model_name="kaitchup/Llama-3-8B-llm2vec-Emb", device='cpu') -- If use LlamaIndex 31 | -------------------------------------------------------------------------------- /Voice/Veo: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/google-reveals-veo-its-new-most-capable-ai-video-generator-a40154064a6f 2 | 3 | # Veo: Google's Advanced Video Generation Model 4 | 1. Key Features 5 | -1. High-Quality Videos 6 | Generates 1080p videos over one minute long, understanding cinematic terms like time-lapses and aerial shots. 7 | -2. Video Editing 8 | Adds objects to existing videos and converts reference images into videos matching the image style. 9 | -3. Consistency 10 | Uses advanced diffusion transformers to reduce common AI video issues like flickering and disappearing objects, producing natural-looking videos. 11 | -4. Creative Control 12 | Competes with OpenAI’s Sora, offering unprecedented creative control for users. 13 | 2. Usability 14 | -1. Safety Measures 15 | Includes safety filters and SynthID watermarking to address privacy, copyright, and bias concerns. 16 | -2 .Accessibility 17 | Partners with filmmakers to democratize storytelling, enabling both professionals and amateurs to create and share stories. 18 | 19 | Conclusion 20 | Veo aims to harmonize AI technology with human creativity, potentially revolutionizing the filmmaking industry. 21 | 22 | -------------------------------------------------------------------------------- /Continual_learning_A/Replay-based and Architecture-based approach.txt: -------------------------------------------------------------------------------- 1 | ## Replay-based approach 2 | Fine-tuned language models are continual learners 3 | Single-task model like GPT trained just for conversational response. Instead, it’s fine-tuned for a sequence of specialised tasks, 4 | ranging from text simplification to Haiku generation. 5 | Each of these tasks has unique requirements, evaluation metrics, and specialised training datasets. 6 | 7 | This all seems positive, the fact we can just add 1% of the old dataset and continual learning is solved, but of course, applying it to a chatbot like chatGPT, 8 | will be empirical and can be completely different. 9 | Even if, hypothetically, chatGPT could be continually trained in the fine-tuning and RLHF stages like this, 10 | it would require an immense amount of labeled conversation data. 11 | 12 | ## Architecture-based approach 13 | 1. Parameter Allocation: Here, a subset of the network parameters is dedicated to each task. 14 | This can be done either by masking out irrelevant neurons or by explicitly identifying important ones for the current task. 15 | 2. Modular Network: This involves using separate sub-networks or modules for each task. 16 | 17 | Below are a few common methods for connecting sub-networks 18 | 1. Concatenation of Outputs 19 | 2. Voting Mechanism 20 | 3. Skip Connections 21 | 4. Sequential 22 | -------------------------------------------------------------------------------- /GenerativeAI/Image/diffusion: -------------------------------------------------------------------------------- 1 | ## The essential idea is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. we then learn a reverse diffusion process that restors structure in data 2 | yidlding a highly fiexible and tractable generative model of the data 3 | 4 | ## Denoising Diffusion probabilistic models (DDPM) 5 | In the noise data(pure noise) to un-noise data process(x_T --> x_0), in each step(x_t and x_t+1) add gaussian noise to images 6 | 7 | ## Reversed Diffusion process removes noise 8 | In each step, remove gaussian noise (This part need to be learned) 9 | 10 | ## DDPM training 11 | x_0(image) --> x_t(noise image using pure noise) --> denoising model(U-Net + Attention) --> predicted Noise --> Loss(Pixel-wise MSE - predicted noise and pure noise) --> update Denoising model parameters 12 | 13 | ## DDPM generation 14 | pure noise --> Denoising Model --> predected noise --> pure noise - predected noise --> Denoising model --> predeicted noise --> pure noise - predected noise --> Denoising model --> ... --> generate image 15 | 16 | ## Recent diffusion model 17 | Faster generation / Conditioned models / CLIP --> Diffusion modeel + text-to-image models 18 | 19 | Text --> text embedding --> text-to-image diffusion model(ex.64*64) --> Super-resolution diffusion model(ex.256*256) --> Super-resolution diffusion model(ex.1024*1024) 20 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/FLUTE: -------------------------------------------------------------------------------- 1 | ## From https://github.com/HanGuo97/flute 2 | ## From https://blog.stackademic.com/flute-faster-qlora-fine-tuning-with-nf4-models-36ca3dea91be 3 | 4 | The text discusses the NormalFloat4 (NF4) data type, commonly used for quantizing large language models (LLMs) during QLoRA fine-tuning. 5 | NF4 offers advantages over the INT4 data type and is used by default in bitsandbytes for calibration-free quantization, 6 | meaning the model is quantized efficiently at loading time. However, NF4 suffers from slow performance in quantized models. 7 | 8 | To address this, FLUTE introduces Lookup Table (LUT) quantization, which is more flexible than uniform quantization. 9 | In uniform quantization, full-precision weights are scaled into lower-precision intervals, 10 | while LUT quantization uses a lookup table to map quantized weights to arbitrary values, enabling more complex quantization techniques. 11 | FLUTE supports int4, fp4, and custom learned lookup tables. 12 | 13 | FLUTE-quantized models can be deployed using frameworks like vLLM and Hugging Face's accelerate library. 14 | It integrates with bitsandbytes through a provided function, supporting torch.float16 and torch.bfloat16 input data types with 2-bit, 3-bit, and 4-bit precision. 15 | Performance optimizations for bfloat16 on Ampere GPUs are still being developed, with certain combinations leading to numerical instability. 16 | -------------------------------------------------------------------------------- /GenerativeAI/NL/Connect_LLM_to_the_Internet_with_AutoGen.py: -------------------------------------------------------------------------------- 1 | # From https://gathnex.medium.com/connect-your-llm-to-the-internet-with-microsoft-autogen-3bc4c655e7c0 2 | 3 | !pip install -q pyautogen~=0.1.0 docker openai 4 | 5 | import autogen 6 | #Follow the same format for model and api arguments. 7 | config_list = [ 8 | { 9 | 'model': 'gpt-3.5-turbo', 10 | 'api_key': 'OpenAI API key' 11 | } 12 | ] 13 | llm_config={ 14 | "request_timeout": 600, 15 | "seed": 42, 16 | "config_list": config_list, 17 | "temperature": 0, 18 | } 19 | 20 | # create an AssistantAgent instance named "assistant" 21 | assistant = autogen.AssistantAgent( 22 | name="assistant", 23 | llm_config=llm_config, 24 | ) 25 | # create a UserProxyAgent instance named "user_proxy" 26 | user_proxy = autogen.UserProxyAgent( 27 | name="user_proxy", 28 | human_input_mode="TERMINATE", 29 | max_consecutive_auto_reply=10, 30 | is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"), 31 | code_execution_config={"work_dir": "web"}, 32 | llm_config=llm_config, 33 | system_message="""Reply TERMINATE if the task has been solved at full satisfaction. 34 | Otherwise, reply CONTINUE, or the reason why the task is not solved yet.""" 35 | ) 36 | 37 | user_proxy.initiate_chat( 38 | assistant, 39 | message=""" 40 | what is current time in Akureyri,Iceland ? 41 | """ 42 | ) 43 | 44 | -------------------------------------------------------------------------------- /Robotics/with_LLM/SensorLM.py: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@bravekjh/sensorlm-a-language-model-for-sensor-data-95440b1e3225 2 | 3 | """ 4 | SensorLM is a transformer-based foundation model designed specifically to work with multivariate time-series sensor data, 5 | such as that collected from IoT devices, industrial systems, HVAC, and other telemetry-heavy environments. 6 | """ 7 | 8 | !pip install transformers datasets torch 9 | 10 | 11 | 12 | from transformers import AutoTokenizer, AutoModelForSequenceClassification 13 | import torch 14 | import numpy as np 15 | 16 | # Load SensorLM from Hugging Face Hub 17 | model_name = "microsoft/sensorlm-base" 18 | tokenizer = AutoTokenizer.from_pretrained(model_name) 19 | model = AutoModelForSequenceClassification.from_pretrained(model_name) 20 | 21 | # Example sensor input: 10 sensors × 50 timesteps, normalized and flattened 22 | # Shape: (10 sensors, 50 timesteps) → Flattened to 500 23 | sensor_data = np.random.rand(10, 50).flatten() 24 | 25 | # Convert to list of string tokens (SensorLM expects string inputs) 26 | input_text = " ".join([str(round(val, 4)) for val in sensor_data]) 27 | inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True) 28 | 29 | # Run inference 30 | with torch.no_grad(): 31 | outputs = model(**inputs) 32 | predicted_class = torch.argmax(outputs.logits, dim=1).item() 33 | 34 | print(f"Predicted Class: {predicted_class}") 35 | -------------------------------------------------------------------------------- /Voice/Entity Detection_on_Audio_Data.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.datadriveninvestor.com/performing-named-entity-recognition-on-audio-data-73f45c1b9739 2 | 3 | pip install requests 4 | 5 | API_key = " " 6 | endpoint = "https://api.assemblyai.com/v2/transcript" 7 | 8 | json = { 9 | "audio_url": upload_url, 10 | "entity_detection": True, 11 | "speaker_labels": True 12 | } 13 | 14 | headers = { 15 | "authorization": API_key, 16 | "content-type": "application/json" 17 | } 18 | 19 | response = requests.post(endpoint, json=json, headers=headers) 20 | 21 | response.json() 22 | 23 | response_id = response.json()['id'] 24 | 25 | endpoint = f"https://api.assemblyai.com/v2/transcript/{response_id}" 26 | 27 | headers = { 28 | "authorization": API_key, 29 | } 30 | response = requests.get(endpoint, headers=headers) 31 | 32 | response.json() 33 | 34 | current_status = "queued" 35 | response_id = response.json()['id'] 36 | endpoint = f"https://api.assemblyai.com/v2/transcript/{response_id}" 37 | headers = { 38 | "authorization": API_key, 39 | } 40 | 41 | while current_status not in ("completed", "error"): 42 | 43 | response = requests.get(endpoint, headers=headers) 44 | current_status = response.json()['status'] 45 | 46 | if current_status in ("completed", "error"): 47 | print(response) 48 | else: 49 | sleep(10) 50 | 51 | current_status 52 | response.json() 53 | 54 | 55 | -------------------------------------------------------------------------------- /VISION/Image/Flux_1.py: -------------------------------------------------------------------------------- 1 | ## From https://jimclydemonge.medium.com/flux-1-is-a-mind-blowing-open-weights-ai-image-generator-with-12b-parameters-5a138146bb51 2 | 3 | """ 4 | 1. Flux.1 Pro 5 | This offers state-of-the-art performance in image generation, delivering top-notch prompt following, 6 | visual quality, image detail, and output diversity. 7 | 2. Flux.1 Dev 8 | This is an open-weight, guidance-distilled model designed for non-commercial use. 9 | It is distilled from Flux.1 Pro, achieving similar quality and prompt adherence while being more efficient than a typical model of the same size. 10 | 3. Flux.1 Schnell 11 | This is their fastest model and is designed for local development and personal use. It is openly available under an Apache 2.0 license. 12 | """ 13 | 14 | import os 15 | import requests 16 | 17 | request = requests.post( 18 | 'https://api.bfl.ml/v1/image', 19 | headers={ 20 | 'accept': 'application/json', 21 | 'x-key': os.environ.get("BFL_API_KEY"), 22 | 'Content-Type': 'application/json', 23 | }, 24 | json={ 25 | 'prompt': 'A cat on its back legs running like a human is holding a big silver fish with its arms. The cat is running away from the shop owner and has a panicked look on his face. The scene is situated in a crowded market.', 26 | 'width': 1024, 27 | 'height': 1024, 28 | }, 29 | ).json() 30 | print(request) 31 | request_id = request["id"] 32 | -------------------------------------------------------------------------------- /text/Agent/Adding_Memory_To_Agents.py: -------------------------------------------------------------------------------- 1 | ## From https://ai.gopubby.com/adding-memory-to-agents-in-llm-based-production-ready-applications-9274f7381369 2 | 3 | ## After setting env 4 | 5 | import dotenv 6 | %load_ext dotenv 7 | %dotenv 8 | 9 | from langgraph.graph import StateGraph, END 10 | from typing import TypedDict, Annotated 11 | import operator 12 | from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage 13 | from langchain_openai import ChatOpenAI 14 | from langchain_community.tools.tavily_search import TavilySearchResults 15 | from langchain_core.messages import HumanMessage 16 | from langgraph.prebuilt import create_react_agent 17 | from langgraph.checkpoint.sqlite import SqliteSaver 18 | 19 | tool = TavilySearchResults(max_results=2) 20 | 21 | class AgentState(TypedDict): 22 | messages: Annotated[list[AnyMessage], operator.add] 23 | 24 | model = ChatOpenAI(model="gpt-4") 25 | tools = [tool] 26 | model_with_tools = model.bind_tools(tools) 27 | agent_executor = create_react_agent(model, tools) 28 | 29 | memory = SqliteSaver.from_conn_string("sqlite.sqlite") 30 | agent_executor = create_react_agent(model, tools, checkpointer=memory) 31 | 32 | config = {"configurable": {"thread_id": "test_thread_sqlite"}} 33 | 34 | for chunk in agent_executor.stream( 35 | {"messages": [HumanMessage(content="Who is Thomas to John")]}, config 36 | ): 37 | print(chunk) 38 | 39 | print(chunk["agent"]["messages"][0].content) 40 | -------------------------------------------------------------------------------- /new_arch/SUPRA: Turn a Transformer Model into an RNN Model: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@bnjmn_marie/supra-turn-a-transformer-model-into-an-rnn-model-392d85160925 2 | 3 | Attention-free models such as Mamba and RWKV are much more efficient for inference. 4 | However, they also have a reputation for being extremely difficult to train. 5 | For instance, for The Salt, I explored and trained Jamba, which is a hybrid model using Mamba, and found that it learns extremely slowly during fine-tuning: 6 | 7 | Jamba: The New Hybrid Transformer/Mamba 8 | Faster and better than the transformer but more difficult to train 9 | thesalt.substack.com 10 | 11 | With Scalable UPtraining for Recurrent Attention (SUPRA), training attention-free models is much simpler. 12 | SUPRA doesn’t pre-train a model from scratch but relies on a transformer model to initialize the training. 13 | 14 | 15 | SUPRA turns the model into an RNN and “up-trains” it on new tokens. 16 | The authors of SUPRA applied the technique to Mistral 7B to turn it into an RNN followed by up-training on 100B tokens. 17 | The resulting model significantly outperforms RWKV-5, an attention-free model trained on many more tokens. 18 | 19 | 20 | They released a Mistral RNN made with SUPRA on the Hugging Face Hub: 21 | 22 | TRI-ML/mistral-supra 23 | The code to turn a Transformer model into an RNN is available here: 24 | 25 | GitHub: TRI-ML/linear_open_lm 26 | The method is described in this paper: 27 | 28 | Linearizing Large Language Models 29 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/utils.py: -------------------------------------------------------------------------------- 1 | import wave 2 | import contextlib 3 | from pydub import AudioSegment 4 | 5 | import hashlib 6 | import datetime 7 | 8 | from src import logger 9 | 10 | logger = logger.get_console_logger("utils") 11 | 12 | 13 | def compute_cost_of_audio_track(audio_track_file_path: str): 14 | file_extension = audio_track_file_path.split(".")[-1].lower() 15 | duration_seconds = 0 16 | if file_extension == "wav": 17 | with contextlib.closing(wave.open(audio_track_file_path, "rb")) as f: 18 | frames = f.getnframes() 19 | rate = f.getframerate() 20 | duration_seconds = frames / float(rate) 21 | elif file_extension == "mp3": 22 | audio = AudioSegment.from_mp3(audio_track_file_path) 23 | duration_seconds = len(audio) / 1000.0 # pydub returns duration in milliseconds 24 | else: 25 | logger.error(f"Unsupported file format: {file_extension}") 26 | return 27 | 28 | audio_duration_in_minutes = duration_seconds / 60 29 | cost = round(audio_duration_in_minutes, 2) * 0.006 # default price of whisper model 30 | logger.info(f"Cost to convert {audio_track_file_path} is ${cost:.2f}") 31 | return cost 32 | 33 | 34 | def hash_text(text: str) -> str: 35 | return hashlib.md5(text.encode()).hexdigest() 36 | 37 | 38 | def convert_timestamp_to_datetime(timestamp: str) -> str: 39 | return datetime.datetime.fromtimestamp(int(timestamp)).strftime("%Y-%m-%d %H:%M:%S") 40 | -------------------------------------------------------------------------------- /Continual_learning_A/A_define.txt: -------------------------------------------------------------------------------- 1 | from https://towardsdatascience.com/the-current-state-of-continual-learning-in-ai-af4a05c42f3c 2 | 3 | ## Continual learning is the ability to pause the model training process, save the model’s current state, and then later resume training on new data. 4 | The model should be able to generalise well to new data, while still maintaining its ability to generalise to old data 5 | 6 | ## The 5 sub-categories of continual learning techniques 7 | In, https://arxiv.org/pdf/2302.00487.pdf states training strategies for continual learning can be divided into 5 sub categories 8 | 1. Regularisation-based approach 9 | This approach adds constraints or penalties to the learning process during the training process. 10 | 2. Optimisation-based approach 11 | This technique focuses on modifying the optimisation algorithm. 12 | 3. Representation-based approach 13 | This aims to learn a shared feature representation across different tasks, helping the model generalise better to new but related tasks. 14 | 4. Replay-based approach 15 | This involves storing some data or learned features from previous tasks and replaying them during training on new tasks to maintain performance on earlier learned tasks. 16 | In other words, mixing both the old and new datasets when training on new tasks. 17 | 5. Architecture-based approach 18 | In this approach, the network architecture is dynamically adjusted, often by growing or partitioning, delegating different parts of the network to different tasks 19 | -------------------------------------------------------------------------------- /text/Agent/Search_Agent_with_Pydantic_AI.py: -------------------------------------------------------------------------------- 1 | ### From https://nqbao.medium.com/write-your-own-search-agent-with-pydantic-ai-fa04eb098acc 2 | 3 | from pydantic_ai import Agent 4 | from pydantic_ai.settings import ModelSettings 5 | from typing import List 6 | import httpx 7 | import os 8 | 9 | research_agent = Agent( 10 | "openai:gpt-4o", 11 | model_settings=ModelSettings(max_tokens=1024, temperature=0), 12 | result_type=str, 13 | system_prompt=( 14 | 'Be a helpful research agent and do your best to answer the given question, be precise. ' 15 | 'Use the provided tools to answer the question if needed. ' 16 | 'If you don\'t know the answer, say "I don\'t know" instead of making things up.' 17 | ), 18 | ) 19 | 20 | result = research_agent.run_sync("What is Pydantic AI?") 21 | print(result.data) 22 | 23 | @research_agent.tool_plain 24 | def search_google(query: str) -> List[str]: 25 | """ 26 | Search the web for the given query and return the top results. 27 | 28 | Args: 29 | query: The query to search for. 30 | 31 | Returns: 32 | The top search results 33 | """ 34 | 35 | api_key = os.getenv("SERPER_API_KEY") 36 | assert api_key, "Please set API key for serper" 37 | search_results = httpx.get(f"https://google.serper.dev/search?apiKey={api_key}&q={query}").json() 38 | 39 | results = [] 40 | for item in search_results['organic']: 41 | results.append(f"Title: {item['title']}\nSnippet: {item['snippet']}") 42 | 43 | return results 44 | -------------------------------------------------------------------------------- /Llama-Bitnet.py: -------------------------------------------------------------------------------- 1 | # From https://medium.com/@zaiinn440/llama-bitnet-training-a-1-58-bit-llm-3831e517430a 2 | 3 | ### Create the llama model with custom config. Convert it to bitnet. 4 | model = LlamaForCausalLM(config) 5 | convert_to_bitnet(model, copy_weights=False) 6 | model_size = sum(t.numel() for t in model.parameters()) 7 | print(f"Model size: {model_size/1000**2:.1f}M parameters") 8 | tokenizer.pad_token = tokenizer.eos_token 9 | data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False) 10 | 11 | output_path = "./out" 12 | args = TrainingArguments( 13 | output_dir=output_path, 14 | per_device_train_batch_size=BATCH_SIZE, 15 | logging_steps=100, 16 | gradient_accumulation_steps=2, 17 | num_train_epochs=EPOCHS, 18 | weight_decay=0.01, 19 | warmup_steps=0.1, 20 | lr_scheduler_type="cosine", 21 | learning_rate=LEARNING_RATE, 22 | save_steps=0.25, 23 | fp16=True, 24 | report_to="wandb" 25 | ) 26 | 27 | trainer = Trainer( 28 | model=model, 29 | tokenizer=tokenizer, 30 | args=args, 31 | data_collator=data_collator, 32 | train_dataset=tokenized_data["train"], 33 | ) 34 | 35 | trainer.train() 36 | trainer.save_model(f"{output_path}/final_model") 37 | folder = "./out/final_model" 38 | api = HfApi() 39 | create_repo( 40 | repo_id = f"{HUGGINGFACE_ID}/{NEW_MODEL}", 41 | repo_type="model", 42 | exist_ok=True, 43 | token=HF_TOKEN, 44 | ) 45 | 46 | # Upload Model files 47 | api.upload_folder( 48 | folder_path=folder, 49 | repo_type="model", 50 | repo_id=f"{HUGGINGFACE_ID}/{NEW_MODEL}", 51 | token=HF_TOKEN, 52 | ) 53 | -------------------------------------------------------------------------------- /RAG/etc/Building an AI-Powered Web Search Assistant Using GPT-4 and Streamlit.py: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/building-an-ai-powered-web-search-assistant-using-gpt-4-and-streamlit-0687afb15265 2 | ## This is just reference. Based on this, can build web_search engine with AI 3 | 4 | !pip install streamlit phi openai duckduckgo-search 5 | 6 | import streamlit as st 7 | st.title("AI Web Search Assistant 🤖") 8 | st.caption("This app allows you to search the web using GPT-4o") 9 | openai_access_token = st.text_input("OpenAI API Key", type="password") 10 | query = st.text_input("Enter the Search Query") 11 | if query and openai_access_token: 12 | st.write("Search results will appear here.") 13 | 14 | from phi.assistant import Assistant 15 | from phi.tools.duckduckgo import DuckDuckGo 16 | from phi.llm.openai import OpenAIChat 17 | 18 | # Create the assistant with DuckDuckGo and GPT-4 19 | if openai_access_token: 20 | assistant = Assistant( 21 | llm=OpenAIChat( 22 | model="gpt-4o", 23 | max_tokens=1024, 24 | temperature=0.9, 25 | api_key=openai_access_token), tools=[DuckDuckGo()], show_tool_calls=True 26 | ) 27 | # Process the query 28 | if query: 29 | response = assistant.run(query, stream=False) 30 | st.write(response) 31 | 32 | with st.sidebar: 33 | st.image("path_to_your_picture.jpg", width=100) 34 | st.header("About Me") 35 | st.write("") 36 | st.write("") 37 | st.write("[LinkedIn](https://www.linkedin.com/in/your-linkedin-id)") 38 | 39 | 40 | ##### 41 | streamlit run eb_search_ai_assistant.py 42 | 43 | 44 | -------------------------------------------------------------------------------- /gpt_with_confidence.py: -------------------------------------------------------------------------------- 1 | import math 2 | from openai import OpenAI 3 | 4 | client = OpenAI() 5 | 6 | movie_name = "Gladiator" 7 | 8 | genres = ["Action", "Adventure", "Animation", "Biography", "Comedy", "Crime", "Documentary", "Drama", "Family", "Fantasy", "Film-Noir", "History", "Horror", "Music", "Musical", "Mystery", "Romance", "Sci-Fi", "Short", "Sport", "Thriller", "War", "Western"] 9 | genre_string = "\n".join([f"{i}. {g}" for i, g in enumerate(genres)]) 10 | 11 | prompt = f"""\ 12 | Which genre best describes the movie {movie_name!r}? 13 | Consider a few likely genres and explain your reasoning, 14 | then pick an answer from the list below 15 | and show it in answer tags, like: 4 16 | {genre_string} 17 | """ 18 | 19 | # Call the API, requesting logprobs and 10 top_logprobs 20 | completion = client.chat.completions.create( 21 | model="gpt-4o-mini", 22 | messages=[dict(role="user", content=prompt)], 23 | logprobs=True, 24 | top_logprobs=10, 25 | ) 26 | 27 | # Extract the responses and confidences 28 | label_dict = {} 29 | text = "" 30 | for tokenLogProb in completion.choices[0].logprobs.content: 31 | # When we get to the token following '', extract alternatives listed in top_logprobs 32 | if text.endswith(""): 33 | for item in tokenLogProb.top_logprobs: 34 | if (confidence := math.exp(item.logprob)) > 0.01: 35 | genre = genres[int(item.token)] 36 | label_dict[genre] = confidence 37 | text += tokenLogProb.token 38 | 39 | 40 | for genre, confidence in label_dict.items(): 41 | print(f"{genre}: {confidence:.2%}") 42 | -------------------------------------------------------------------------------- /MultiModal/LLaVa-1.5: -------------------------------------------------------------------------------- 1 | From https://pub.towardsai.net/llava-15-5733993c3033 2 | From https://llava-vl.github.io/ 3 | 4 | ## Multimodality 5 | Multimodality represents the capacity of a model to process at least two different modalities, 6 | when a model’s different modality components share a common embedding space, 7 | a modality being a type of input data (words, images, sounds, et al) 8 | 9 | ## There are three ways to achieve multimodality 10 | 1. Tool/model-based methods 11 | By combining different models or tools you can allow your solution to handle multiple inputs 12 | While the solution is multimodal, the underlying models aren’t. 13 | 14 | 2. Grafting 15 | Implies using pre-trained image encoders and LLMs and projecting the encoder’s vector embedding 16 | into the LLM’s latent space using a projecting matrix or an MLP layer 17 | 18 | 3. Generalist systems 19 | Most probably how GPT-4V was trained, training an image encoder and an LLM from scratch into the same embedding space. 20 | Importantly, here all weights are trained from scratch 21 | 22 | - LLaVa-1.5 were trained using grafting 23 | The image encoder and the LLM’s weights remain frozen, and simply train the projecting matrix to 24 | learn to transform the encoder’s vectors, or ‘grafting’ them, into the LLM’s high-dimensional space 25 | 26 | When an image is sent to the image encoder (a CLIP encoder in LLaVa’s case), it processes it, 27 | and then it goes through an ‘adapter’ which in reality is simply a matrix (or a MLP layer like in LLaVa-1.5’s case) 28 | that transforms the output vector of the image encoder into an acceptable vector that the LLM (Vicuna in LLaVa’s case) 29 | -------------------------------------------------------------------------------- /RAG/chunking/AI_and_LLM_for_Document_Extraction.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@krtarunsingh/ai-and-llm-for-document-extraction-simplifying-complex-formats-with-ease-b3261b5be58e 2 | 3 | !pip install Pillow torch torchvision transformers sentencepiece pymupdf 4 | 5 | import torch 6 | from transformers import AutoModel, AutoTokenizer 7 | import fitz # PyMuPDF 8 | from PIL import Image 9 | 10 | # Load the model and tokenizer 11 | model = AutoModel.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True, torch_dtype=torch.float16) 12 | model = model.to(device="cuda") 13 | 14 | tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM-Llama3-V-2_5", trust_remote_code=True) 15 | model.eval() 16 | 17 | # Open the PDF file 18 | pdf_path = "mypdf.pdf" 19 | pdf_document = fitz.open(pdf_path) 20 | 21 | # Store images 22 | images = [] 23 | 24 | # Loop through each page and convert it to an image 25 | for page_number in range(len(pdf_document)): 26 | page = pdf_document.load_page(page_number) 27 | pix = page.get_pixmap() 28 | img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) 29 | images.append(img) 30 | 31 | pdf_document.close() 32 | 33 | question = """Extract all the text in this image. 34 | If there is a header or a footer, just ignore it. 35 | Extract tables as markdown tables. 36 | Don't use the subtitles for the list items, just return the list as text.""" 37 | 38 | msgs = [{"role": "user", "content": question}] 39 | 40 | res = model.chat( 41 | image=images[0], # Using the first image as an example 42 | msgs=msgs, 43 | tokenizer=tokenizer, 44 | sampling=True, 45 | temperature=0.7 46 | ) 47 | 48 | print(res) 49 | -------------------------------------------------------------------------------- /text/LLM/Main Stages of Auto-regressive Decoding for LLM Inference: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@florian_algo/main-stages-of-auto-regressive-decoding-for-llm-inference-915d6e0a4418 2 | 3 | Two primary stages of auto-regressive decoding for Large Language Model (LLM) inference: 4 | 5 | 1. Prefill Stage 6 | During this stage, the LLM processes the input prompt to compute and cache intermediate states (keys and values) for each Transformer layer. 7 | These cached values, known as the key-value cache (KV cache), are essential for generating the initial token. 8 | 9 | 2. Decoding Stage 10 | In this sequential stage, the LLM generates output tokens one by one, utilizing the previously generated token 11 | to produce the next one until a stopping condition is met. The KV cache is used to avoid recalculating intermediate states for each token. 12 | 13 | Key points include: 14 | 15 | 1. The prefill stage is highly parallelized and efficient, utilizing GPU capabilities for matrix operations. 16 | 2. The decoding stage updates the KV cache and computes the output of each layer sequentially. 17 | 3. Dividing the process into two stages minimizes unnecessary computation, as the prefill stage only requires caching once, 18 | while the decoding stage focuses on updating and looking up the cache. 19 | 4. The article emphasizes the importance of efficient caching mechanisms in improving LLM inference performance. 20 | 5. The division into two stages allows for more streamlined and efficient inference, optimizing computational resources and improving overall performance. 21 | Additionally, the article invites feedback and corrections from readers to ensure accuracy and completeness. 22 | -------------------------------------------------------------------------------- /GenerativeAI/Basic.txt: -------------------------------------------------------------------------------- 1 | What is Generative AI 2 | 1. GenAI is a type of AI that creates new content based on what it has learned from exsting content 3 | 2. The process of learning from existing content is called training and results in the creation of a statistical model 4 | 3. When given a prompt, GenAI uses this statistical model to predict what an expected response might be-and this generates new content 5 | 6 | Two type(upper level) 7 | 1. Gen language models : Generative language models learn abot patterns in language through training data. 8 | Then, given some text, they predict what comes next... 9 | 10 | 2. Gen Image models : Generative image models produce new images using tech like diffusion. 11 | Then, given a promot or related imagery, they transform random noise into images or generate images from prompts. 12 | 13 | Type of models 14 | 1. text-to-text : take a natural laguage input and produce text optput. These modles are trained to learn the mapping between a pair of texts 15 | 2. text-to-image : trained on a large set of images, each captioned with a short text desription 16 | 3. text-to-video & text-to-3D : aim to generate a video representation from text input 17 | text-to-3d models generate three-dimensional object that correspond to a user's text description 18 | 4. text-to-task : trained to perform a specific task or action based on text input 19 | ex. answering a question, performing a search, making a prediction etc 20 | 21 | 22 | *** Responsible AI 23 | The goal of responsible AI is to employ AI in a safe, trustworthy and ethical fashion. Using AI responsibly should increase transparency and help reduce issues such as AI bias 24 | -------------------------------------------------------------------------------- /text/Agent/memory/Memory_in_Agent_langchain.py: -------------------------------------------------------------------------------- 1 | ### From https://python.langchain.com/v0.1/docs/modules/memory/agent_with_memory/ 2 | 3 | import os 4 | 5 | from langchain.agents import Tool 6 | from langchain_community.utilities import GoogleSearchAPIWrapper 7 | from langchain import hub 8 | from langchain.agents import AgentExecutor, create_react_agent 9 | from langchain.memory import ChatMessageHistory 10 | from langchain_core.runnables.history import RunnableWithMessageHistory 11 | from langchain_openai import OpenAI 12 | 13 | os.environ["GOOGLE_API_KEY"] = "GOOGLE_API_KEY" 14 | os.environ["GOOGLE_CSE_ID"] = "GOOGLE_CSE_ID" 15 | os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY" 16 | search = GoogleSearchAPIWrapper() 17 | tools = [ 18 | Tool( 19 | name="Search", 20 | func=search.run, 21 | description="useful for when you need to answer questions about current events", 22 | ) 23 | ] 24 | 25 | prompt = hub.pull("hwchase17/react") 26 | memory = ChatMessageHistory(session_id="test-session") 27 | 28 | llm = OpenAI(temperature=0) 29 | agent = create_react_agent(llm, tools, prompt) 30 | agent_executor = AgentExecutor(agent=agent, tools=tools) 31 | 32 | agent_with_chat_history = RunnableWithMessageHistory( 33 | agent_executor, 34 | # This is needed because in most real world scenarios, a session id is needed 35 | # It isn't really used here because we are using a simple in memory ChatMessageHistory 36 | lambda session_id: memory, 37 | input_messages_key="input", 38 | history_messages_key="chat_history", 39 | ) 40 | 41 | agent_with_chat_history.invoke( 42 | {"input": "How many people live in canada?"}, 43 | config={"configurable": {"session_id": ""}}, 44 | ) 45 | 46 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/LLaMA_Factory.py: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/adding-custom-datasets-to-llama-factory-4aff22385c2f 2 | 3 | """ 4 | git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git 5 | cd LLaMA-Factory 6 | pip install -r requirements.txt 7 | """ 8 | 9 | ### Create a JSON file for your custom dataset: 10 | [ 11 | { 12 | "messages": [ 13 | { 14 | "content": "What is this model?", 15 | "role": "user" 16 | }, 17 | { 18 | "content": "This is a Convolutional Neural Network (CNN) used for image classification.", 19 | "role": "assistant" 20 | }, 21 | { 22 | "content": "What is it classifying?", 23 | "role": "user" 24 | }, 25 | { 26 | "content": "It's classifying images of handwritten digits from the MNIST dataset.", 27 | "role": "assistant" 28 | } 29 | ], 30 | "images": [ 31 | "Example/cnn_mnist.jpg" 32 | ] 33 | }, 34 | { 35 | "messages": [ 36 | { 37 | "content": "What does this graph represent?", 38 | "role": "user" 39 | }, 40 | { 41 | "content": "This graph shows the loss function decreasing over time during model training.", 42 | "role": "assistant" 43 | }, 44 | { 45 | "content": "Why is the loss decreasing?", 46 | "role": "user" 47 | }, 48 | { 49 | "content": "Because the model is learning and improving its accuracy with each iteration.", 50 | "role": "assistant" 51 | } 52 | ], 53 | "images": [ 54 | "Example/loss_curve.jpg" 55 | ] 56 | } 57 | ] 58 | 59 | ### Update the dataset_info.json file (located in the same folder) to register your dataset 60 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/PEFT/VeRA: -------------------------------------------------------------------------------- 1 | ## From https://huggingface.co/papers/2310.11454 2 | 3 | The text introduces Vector-based Random Matrix Adaptation (VeRA), 4 | a novel approach to reduce the number of trainable parameters when fine-tuning large language models, 5 | addressing the storage challenges that arise with scaling models or deploying multiple user-specific or task-specific adaptations. 6 | 7 | 1. Key Features of VeRA: 8 | - Parameter Efficiency 9 | VeRA reduces the number of trainable parameters by 10x compared to Low-rank Adaptation (LoRA), 10 | which is already a popular method for reducing trainable parameters in large language models. 11 | - Methodology 12 | Instead of using separate low-rank matrices for each layer, VeRA employs a "single pair of low-rank matrices" shared across all layers, 13 | and then learns small scaling vectors for each layer. 14 | This dramatically reduces the total number of trainable parameters while still allowing effective model adaptation. 15 | 16 | 2. Performance: 17 | - Benchmarks 18 | VeRA demonstrates strong performance on standard benchmarks like GLUE and E2E, 19 | showing that the reduction in trainable parameters does not come at the cost of performance. 20 | 21 | - Instruction-following 22 | VeRA can be effectively applied in instruction-following tasks, achieving similar performance to LoRA 23 | while using only 1.4M trainable parameters when fine-tuning the Llama2 7B model. 24 | 25 | In summary, VeRA offers a significant reduction in trainable parameters, achieving similar performance to LoRA while being far more storage-efficient, 26 | making it ideal for scaling and deploying large models in resource-constrained environments. 27 | 28 | -------------------------------------------------------------------------------- /Knowledge Graph Reasoning/Enhancing_AI_with_Graph_RAG_Models.py: -------------------------------------------------------------------------------- 1 | ## From https://generativeai.pub/enhancing-ai-with-graph-rag-models-a-practical-guide-906eb3e8721a 2 | 3 | from neo4j import GraphDatabase 4 | 5 | # Connect to Neo4j database 6 | uri = "bolt://localhost:7687" 7 | driver = GraphDatabase.driver(uri, auth=("neo4j", "password")) 8 | 9 | # Define a function to create nodes and relationships 10 | def create_graph(tx): 11 | tx.run("CREATE (a:Disease {name: 'Flu'})") 12 | tx.run("CREATE (b:Symptom {name: 'Fever'})") 13 | tx.run("CREATE (c:Symptom {name: 'Cough'})") 14 | tx.run("CREATE (a)-[:HAS_SYMPTOM]->(b)") 15 | tx.run("CREATE (a)-[:HAS_SYMPTOM]->(c)") 16 | 17 | # Add data to the graph 18 | with driver.session() as session: 19 | session.write_transaction(create_graph) 20 | 21 | driver.close() 22 | 23 | def fetch_symptoms(tx, disease_name): 24 | query = """ 25 | MATCH (d:Disease {name: $disease})-[:HAS_SYMPTOM]->(s:Symptom) 26 | RETURN s.name AS symptom 27 | """ 28 | result = tx.run(query, disease=disease_name) 29 | return [record["symptom"] for record in result] 30 | 31 | with driver.session() as session: 32 | symptoms = session.read_transaction(fetch_symptoms, "Flu") 33 | print("Symptoms of Flu:", symptoms) 34 | 35 | import openai 36 | 37 | # Combine graph-retrieved data with a prompt for the LLM 38 | retrieved_data = "Symptoms of Flu: Fever, Cough" 39 | prompt = f"Based on the following data, explain the symptoms and treatment options for Flu: {retrieved_data}" 40 | 41 | # Generate response 42 | response = openai.Completion.create( 43 | engine="text-davinci-003", 44 | prompt=prompt, 45 | max_tokens=100 46 | ) 47 | 48 | print(response.choices[0].text.strip()) 49 | 50 | 51 | -------------------------------------------------------------------------------- /Robotics/Genesis/1. What is Genesis: -------------------------------------------------------------------------------- 1 | Genesis is a physics platform designed for general purpose Robotics/Embodied AI/Physical AI applications. 2 | It is simultaneously multiple things: 3 | 4 | 1. A universal physics engine re-built from the ground up, capable of simulating a wide range of materials and physical phenomena. 5 | 2. A lightweight, ultra-fast, pythonic, and user-friendly robotics simulation platform. 6 | 3. A powerful and fast photo-realistic rendering system. 7 | 4. A generative data engine that transforms user-prompted natural language description into various modalities of data. 8 | 9 | Powered by a universal physics engine re-designed and re-built from the ground up, Genesis integrates various physics solvers 10 | and their coupling into a unified framework. 11 | This core physics engine is further enhanced by a generative agent framework that operates at an upper level, 12 | aiming towards fully automated data generation for robotics and beyond. 13 | Currently, it is open-sourcing the underlying physics engine and the simulation platform. 14 | The generative framework will be released in the near future. 15 | 16 | Genesis is built and will continuously evolve with the following long-term missions: 17 | 1. Lowering the barrier to using physics simulations and making robotics research accessible to everyone. (See our commitment) 18 | 2. Unifying a wide spectrum of state-of-the-art physics solvers into a single framework, allowing re-creating the whole physical 19 | world in a virtual realm with the highest possible physical, visual and sensory fidelity, using the most advanced simulation techniques. 20 | 3. Minimizing human effort in collecting and generating data for robotics and other domains, letting the data flywheel spin on its own. 21 | -------------------------------------------------------------------------------- /attention/Infini_attention/More_deep_init_attention: -------------------------------------------------------------------------------- 1 | From https://ai.plainenglish.io/infini-attention-toward-infinite-context-llms-637ddcc75902 2 | 3 | Infini-attention seeks to overcome the challenge of forgetting previous data in long sequences, 4 | offering a solution through compressive memory and linear attention. 5 | Compressive memory stores summarized past segments, 6 | which are accessed more efficiently through linear attention, 7 | preventing exponential growth in computation and memory demands. 8 | This approach contrasts with standard attention mechanisms, 9 | which suffer from quadratic increases in cost and memory with sequence length. 10 | 11 | Infini-attention combines standard dot-product attention for the current segment with linear attention 12 | for accessing past memory. 13 | Linear attention allows for subquadratic retrieval, 14 | providing a compressed yet accessible summary of past information. 15 | Despite potential information loss, Infini-attention yields promising results, 16 | including nearly 100% retrieval rates in certain tasks. 17 | 18 | Importantly, Infini-attention implementation requires minimal adjustments to existing models 19 | and can be incorporated through fine-tuning, 20 | enabling rapid enhancements such as the 10-fold context-window increase seen in Gemini 1.5. 21 | However, the scalability of compressed memory remains a subject of analysis, 22 | as fixed-sized memory may lead to forgetting important details over time. 23 | 24 | Overall, Infini-attention signifies a convergence of Transformer and recurrent models, 25 | potentially revolutionizing AI systems' ability to scale to vast amounts of memory, 26 | and hinting at the enduring relevance of Transformers in AI research. 27 | 28 | 29 | 30 | 31 | 32 | -------------------------------------------------------------------------------- /new_arch/LCMs/Reference: -------------------------------------------------------------------------------- 1 | ### How to build MCP - https://news.hada.io/topic?id=19633 2 | ### emcee - Convert OpenAI API to MCP - https://news.hada.io/topic?id=19615 3 | ### Wanaku - MCP Router - https://news.hada.io/topic?id=19614 4 | 5 | MCP is a universal protocol that connects AI systems with various data sources, aiming to enhance the performance and 6 | utility of AI models. 7 | 8 | 1. Key Features of MCP: 9 | -a. Open Standard: MCP is an open-source protocol that can be used by all AI systems. 10 | -b. Bidirectional Connectivity: It supports secure, two-way connections between AI tools and data sources. 11 | -c. Universality: It can connect with a wide range of data systems such as content repositories, business tools, 12 | and development environments. 13 | -d. Standardization: It enables integration through a single protocol without the need to develop separate connectors 14 | for each data source. 15 | 2. Structure of MCP: 16 | Based on a client-server architecture 17 | -a. Host: An LLM application that initiates the connection. 18 | -b. Client: Maintains a one-to-one connection with the server within the host application. 19 | -c. Server: Provides context, tools, and prompts to the client. 20 | 21 | 3. Advantages of MCP: 22 | -a. Enhanced Data Accessibility: AI models can easily access various data sources. 23 | -b. Development Efficiency: Developers can connect to multiple data sources using a standard protocol. 24 | -c. Scalability: AI systems can maintain context across multiple tools and datasets, 25 | enabling the construction of more sustainable architectures. 26 | -d. Security: With built-in security in the protocol, there is no need to share API keys with LLM providers. 27 | -------------------------------------------------------------------------------- /Voice/Llava_and_Whisper.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@kagglepro.llc/building-an-ai-voice-assistant-with-llava-and-whisper-5ca1c9982e35 2 | ## If add some fuction on this code for image-to-text, then we can build image-to-speech gradio app <-- I will do one day 3 | 4 | import whisper 5 | import llava 6 | from transformers import AutoTokenizer 7 | 8 | # Load models 9 | whisper_model = whisper.load_model("base") 10 | llava_model = llava.LlavaModel.from_pretrained("llava-base") 11 | tokenizer = AutoTokenizer.from_pretrained("llava-base") 12 | 13 | # Function to preprocess audio 14 | def preprocess_audio(audio_path): 15 | audio = whisper.load_audio(audio_path) 16 | return whisper.pad_or_trim(audio) 17 | 18 | # Function to preprocess text 19 | def preprocess_text(text): 20 | return tokenizer(text, return_tensors="pt") 21 | 22 | def generate_response(text): 23 | inputs = preprocess_text(text) 24 | outputs = llava_model.generate(**inputs) 25 | return tokenizer.decode(outputs[0], skip_special_tokens=True) 26 | 27 | def transcribe_audio(audio_path): 28 | audio = preprocess_audio(audio_path) 29 | result = whisper_model.transcribe(audio) 30 | return result["text"] 31 | 32 | import gradio as gr 33 | 34 | def voice_assistant(audio_path): 35 | text = transcribe_audio(audio_path) 36 | response = generate_response(text) 37 | return response 38 | 39 | # Create Gradio interface 40 | interface = gr.Interface(fn=voice_assistant, 41 | inputs=gr.inputs.Audio(source="microphone", type="filepath"), 42 | outputs="text") 43 | 44 | interface.launch() 45 | 46 | 47 | 48 | #### 49 | # Example deployment script for Heroku 50 | heroku create 51 | git add . 52 | git commit -m "Initial commit" 53 | git push heroku main 54 | #### 55 | -------------------------------------------------------------------------------- /RAG/Efficient_RAG_for_Mixed_Context_Texts_with_Indexify’s_Framework.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/google-developer-experts/efficient-rag-for-mixed-context-texts-with-indexifys-framework-gemini-s-1m-context-arctic-s-df2882aad576 2 | ## Mixed-context texts, such as research papers, technical documents, or even web pages, often contain cross-domain information 3 | 4 | !pip install -q -U indexify indexify-extractor-sdk 5 | curl https://getindexify.ai | sh 6 | ./indexify server -d 7 | 8 | !indexify-extractor download hub://pdf/marker 9 | !indexify-extractor download hub://text/llm 10 | !indexify-extractor download hub://text/chunking 11 | !indexify-extractor download hub://embedding/arctic 12 | 13 | !indexify-extractor join-server 14 | 15 | from indexify import IndexifyClient 16 | client = IndexifyClient() 17 | 18 | from indexify import ExtractionGraph 19 | 20 | extraction_graph_spec = """ 21 | name: 'llmarrag' 22 | extraction_policies: 23 | - extractor: 'tensorlake/marker' 24 | name: 'mdextractor' 25 | - extractor: 'tensorlake/llm' 26 | name: 'txtprocessor' 27 | input_params: 28 | service: 'gemini' 29 | prompt: 'Rearrange and rewrite the following text by grouping similar topics together while preserving the original sentences.' 30 | content_source: 'mdextractor' 31 | - extractor: 'tensorlake/chunk-extractor' 32 | name: 'chunker' 33 | input_params: 34 | chunk_size: 1000 35 | overlap: 100 36 | content_source: 'txtprocessor' 37 | - extractor: 'tensorlake/arctic' 38 | name: 'embedder' 39 | content_source: 'chunker' 40 | """ 41 | 42 | extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec) 43 | client.create_extraction_graph(extraction_graph) 44 | 45 | client.upload_file("llmarrag", "random_topics.pdf") 46 | 47 | 48 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Odds Ratio Preference Optimization: -------------------------------------------------------------------------------- 1 | # https://www.kaggle.com/discussions/general/497383 2 | # https://arxiv.org/pdf/2403.07691 3 | 4 | ORPO (Odds Ratio Preference Optimization) is an innovative fine-tuning technique 5 | that integrates standard supervised fine-tuning and preference alignment stages into a unified process, 6 | thus saving computational resources and training time. 7 | 8 | Empirical data demonstrate ORPO's superiority over competing alignment algorithms across various model sizes and benchmarks. 9 | ORPO revolutionizes the typical pipeline for aligning and training Large Language Models 10 | (LLMs) for Reinforcement Learning with Human Feedback (RLHF). 11 | It operates by combining supervised fine-tuning and alignment into a single goal, resulting in unprecedented results with simplicity and efficiency. 12 | 13 | The method involves creating a paired preference dataset (selected/rejected), 14 | which comprises instances where one response is preferred over another, and ensuring the exclusion of situations 15 | where the chosen and rejected responses are identical or one is empty. 16 | 17 | A pre-trained LLM, such as Llama-2 or Mistral, is then selected, and the base model is trained directly on the preference dataset 18 | using the ORPO objective, eliminating the need for an additional supervised fine-tuning step. 19 | Key takeaways include ORPO's model-free and memory-friendly nature, providing a seamless training experience. 20 | Instruction tuning and preference alignment are critical for modifying LLMs to suit specific activities. 21 | ORPO fine-tuning significantly enhances the base model's performance across all benchmarks. 22 | The rise of high-quality open-weight models underscores the importance of fine-tuning for achieving optimal performance in particular use cases. 23 | -------------------------------------------------------------------------------- /RAG/Retrieval-Augmented Dual Instruction Tuning .txt: -------------------------------------------------------------------------------- 1 | from https://blog.llamaindex.ai/improving-rag-effectiveness-with-retrieval-augmented-dual-instruction-tuning-ra-dit-01e73116655d 2 | 3 | An AI Research team at Meta has proposed a method called RA-DIT: RETRIEVAL-AUGMENTED DUAL INSTRUCTION TUNING 4 | that allows any LLM to be upgraded to include retrieval features 5 | 6 | The RA-DIT approach involves two distinct fine-tuning steps: 7 | 1. Update a pre-trained LM to better use retrieved information. 8 | 2. Update the retriever to return more relevant result 9 | 10 | ## How it works 11 | The RA-DIT approach separately fine-tunes the LLM and the retriever. 12 | The LLM is updated to maximize the probability of the correct answer given the retrieval-augmented instructions, 13 | while the retriever is updated to minimize how much the document is semantically similar (relevant) to the query 14 | 15 | ## Fine-tuning Dataset 16 | The fine-tuning dataset is tailored to enhance the language model’s ability to leverage knowledge 17 | and boost its contextual awareness during prediction generation 18 | 19 | ## LLM fine-tuning 20 | The purpose of fine-tuning(could get it with fine-tuning dataset): 21 | 1. Adapt the LLM to better utilization of relevant background knowledge 22 | 2. Train the LLM to produce accurate predictions even with incorrectly retrieved chunks, empowering the model to rely on its own knowledge. 23 | 24 | ## Retriever Fine-tuning 25 | The retriever is fine-tuned using the LM-Supervised Retrieval (LSR) method 26 | 1. The LLM assesses the information fetched by the retriever 27 | 2. If the LLM finds the information misaligned with the given query, it sends feedback to the retriever 28 | 3. Using this feedback, the retriever refines its search process, ensuring it fetches data that the LLM can effectively use 29 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Gemma/Gemma_3_12B_for_Reasoning.py: -------------------------------------------------------------------------------- 1 | ### From https://ai.plainenglish.io/fine-tuning-googles-gemma-3-12b-for-reasoning-how-grpo-turned-a-good-model-into-a-brilliant-db8c272c67ea 2 | 3 | from huggingface_hub import notebook_login 4 | notebook_login() 5 | 6 | !pip install -qqq git+https://github.com/huggingface/transformers@v4.49.0-Gemma-3 \ 7 | git+https://github.com/huggingface/trl.git@main \ 8 | bitsandbytes 9 | 10 | 11 | import torch 12 | from transformers import AutoProcessor, AutoModelForImageTextToText 13 | from peft import LoraConfig, get_peft_model 14 | from datasets import load_dataset 15 | 16 | 17 | model = AutoModelForImageTextToText.from_pretrained( 18 | "google/gemma-3-4b-it", device_map="auto", torch_dtype=torch.bfloat16, attn_implementation="eager" 19 | ) 20 | lora_config = LoraConfig(task_type="CAUSAL_LM", r=16, lora_alpha=32, target_modules="all-linear") 21 | model = get_peft_model(model, lora_config) 22 | processor = AutoProcessor.from_pretrained("google/gemma-3-4b-it") 23 | tokenizer = processor.tokenizer 24 | 25 | SYSTEM_PROMPT = "Respond in structured reasoning format (XML)." 26 | def get_gsm8k_questions(split="train"): 27 | data = load_dataset('openai/gsm8k', 'main')[split] 28 | return data.map(lambda x: { 29 | 'prompt': [{'role': 'system', 'content': SYSTEM_PROMPT}, {'role': 'user', 'content': x['question']}], 30 | 'answer': x['answer'] 31 | }) 32 | train_data = get_gsm8k_questions() 33 | 34 | def correctness_reward_func(prompts, completions, answer): 35 | responses = [extract_xml_answer(c[0]['content']) for c in completions] 36 | return [2.0 if r == a else 0.0 for r, a in zip(responses, answer)] 37 | 38 | merged_model = model.merge_and_unload() 39 | merged_model.push_to_hub("your-username/gemma-reasoning-genius") 40 | -------------------------------------------------------------------------------- /new_arch/Gated Residual Networks: A Modern Secret Weapon for Tabular Deep Learning: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/chat-gpt-now-writes-all-my-articles/gated-residual-networks-a-modern-secret-weapon-for-tabular-deep-learning-7a8d247a01d1 2 | 3 | 4 | from tensorflow.keras.layers import (Input, Dense, BatchNormalization, 5 | LayerNormalization, Activation, Add, Multiply) 6 | from tensorflow.keras.models import Model 7 | from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping 8 | 9 | 10 | def make_callbacks(): 11 | return [ 12 | ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=3, 13 | verbose=1, min_lr=1e-6), 14 | EarlyStopping(monitor="val_loss", patience=10, 15 | restore_best_weights=True, verbose=1) 16 | ] 17 | 18 | 19 | def GRN(x, units, name="grn"): 20 | # transform 21 | z = Dense(units, name=f"{name}_z_dense")(x) 22 | z = BatchNormalization(name=f"{name}_z_bn")(z) 23 | z = Activation("elu", name=f"{name}_z_act")(z) 24 | # gate (GLU) 25 | g = Dense(units, activation="sigmoid", name=f"{name}_g_dense")(x) 26 | gated = Multiply(name=f"{name}_gated")([z, g]) 27 | # skip-connection 28 | skip = x if x.shape[-1] == units else Dense(units, name=f"{name}_skip")(x) 29 | y = Add(name=f"{name}_add")([skip, gated]) 30 | # norm + non-linearity 31 | y = LayerNormalization(name=f"{name}_ln")(y) 32 | return Activation("swish", name=f"{name}_act")(y) 33 | 34 | 35 | def build_grn_mlp(input_dim, hidden_units=(128, 128, 128, 128)): 36 | inp = Input(shape=(input_dim,), name="input") 37 | x = inp 38 | for i, u in enumerate(hidden_units): 39 | x = GRN(x, u, name=f"grn{i}") 40 | out = Dense(1, name="output")(x) 41 | return Model(inp, out, name="GRN_MLP") 42 | -------------------------------------------------------------------------------- /text/Math_assistant_with_Orca-2-7B.py: -------------------------------------------------------------------------------- 1 | # From https://medium.com/towards-artificial-intelligence/few-shots-at-a-math-assistant-with-orca-2-7b-f60a15fe5dfe 2 | # Below code face with error becuase version probelm between transformer and pytorch when bnb_config. Will have to see more 3 | 4 | !pip install git+https://github.com/huggingface/transformers 5 | !pip install accelerate -qq 6 | !pip install SentencePiece -qq 7 | !pip install protobuf -qq 8 | !pip install bitsandbytes -qq 9 | 10 | import torch 11 | import transformers 12 | from transformers import BitsAndBytesConfig, GenerationConfig 13 | 14 | bnb_config = BitsAndBytesConfig( 15 | load_in_8bit=True, 16 | ) 17 | 18 | model = transformers.AutoModelForCausalLM.from_pretrained( 19 | "microsoft/Orca-2-7b", 20 | device_map='auto', 21 | quantization_config=bnb_config) 22 | 23 | tokenizer = transformers.AutoTokenizer.from_pretrained( 24 | "microsoft/Orca-2-7b", 25 | use_fast=False, 26 | ) 27 | 28 | system_message = """You are Orca, an AI language model created by Microsoft. You are a cautious assistant. 29 | 30 | Analyse the maths or logical question given to you and solve it in a step by step manner 31 | """ 32 | 33 | user_message = "how many ways can I arrange 10 men in a row?" 34 | prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{user_message}<|im_end|>\n<|im_start|>assistant" 35 | inputs = tokenizer(prompt, return_tensors='pt') 36 | 37 | from transformers import GenerationConfig 38 | 39 | generation_config = GenerationConfig.from_pretrained("microsoft/Orca-2-7b") 40 | generation_config.temperature = 0.1 41 | generation_config.do_sample = True 42 | generation_config.top_p = 0.9 43 | 44 | output_ids = model.generate(inputs["input_ids"],generation_config) 45 | answer = tokenizer.batch_decode(output_ids)[0] 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /text/Agent/memory/Build_AI_Agents_with_Active_Memory_Management_Using_LangMem.py: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/ai-agent-insider/build-ai-agents-with-active-memory-management-using-langmem-6bdc38449f74 2 | 3 | !pip install langmem 4 | 5 | from langmem import Memory 6 | 7 | # Initialize memory storage 8 | memory = Memory(storage="faiss", namespace="user_123") 9 | 10 | # Store a memory 11 | memory.add("User likes AI-generated images and machine learning content.") 12 | 13 | # Retrieve relevant memories 14 | print(memory.retrieve("What does the user prefer?")) 15 | 16 | # Create different namespaces for different users or teams 17 | user_memory = Memory(storage="faiss", namespace="user_123") 18 | team_memory = Memory(storage="faiss", namespace="team_codeb") 19 | 20 | # Add memory to a team namespace 21 | team_memory.add("Team CodeB.ai focuses on AI for renewable energy and waste management.") 22 | 23 | # Agent A stores memory 24 | agent_a = Memory(storage="faiss", namespace="shared_knowledge") 25 | agent_a.add("LangMem helps AI agents retain long-term memory.") 26 | 27 | # Agent B retrieves memory 28 | agent_b = Memory(storage="faiss", namespace="shared_knowledge") 29 | print(agent_b.retrieve("What is LangMem used for?")) 30 | 31 | ----------------------------------------------------------------------------------------- 32 | from langchain.chat_models import ChatOpenAI 33 | from langmem import Memory 34 | 35 | # Initialize memory 36 | memory = Memory(storage="faiss", namespace="user_456") 37 | 38 | # Store user preferences 39 | memory.add("User is interested in blockchain and smart contracts.") 40 | 41 | # Initialize LLM with memory 42 | llm = ChatOpenAI(temperature=0.7) 43 | context = memory.retrieve("What topics does the user like?") 44 | 45 | # Generate response with memory context 46 | response = llm.predict(f"Given the user preferences: {context}, suggest an AI project idea.") 47 | print(response) 48 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Finetuning_LLama3_using_Axolotl.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@shivansh.kaushik/finetuning-llama3-using-axolotl-1becd616fc12 2 | 3 | ### Axolotl Config 4 | """ 5 | base_model: meta-llama/Meta-Llama-3-8B 6 | model_type: AutoModelForCausalLM 7 | tokenizer_type: AutoTokenizer 8 | 9 | load_in_8bit: false 10 | load_in_4bit: true 11 | strict: false 12 | 13 | datasets: 14 | - path: gbharti/finance-alpaca 15 | type: alpaca 16 | dataset_prepared_path: 17 | val_set_size: 0 18 | output_dir: ./outputs/qlora-out 19 | 20 | adapter: qlora 21 | lora_model_dir: 22 | 23 | sequence_len: 4096 24 | sample_packing: true 25 | pad_to_sequence_len: true 26 | 27 | lora_r: 32 28 | lora_alpha: 16 29 | lora_dropout: 0.05 30 | lora_target_modules: 31 | lora_target_linear: true 32 | lora_fan_in_fan_out: 33 | 34 | wandb_project: 35 | wandb_entity: 36 | wandb_watch: 37 | wandb_name: 38 | wandb_log_model: 39 | 40 | gradient_accumulation_steps: 4 41 | micro_batch_size: 2 42 | num_epochs: 1 43 | optimizer: paged_adamw_32bit 44 | lr_scheduler: cosine 45 | learning_rate: 0.0002 46 | 47 | train_on_inputs: false 48 | group_by_length: false 49 | bf16: auto 50 | fp16: 51 | tf32: false 52 | 53 | gradient_checkpointing: true 54 | early_stopping_patience: 55 | resume_from_checkpoint: 56 | local_rank: 57 | logging_steps: 1 58 | xformers_attention: 59 | flash_attention: true 60 | 61 | warmup_steps: 10 62 | evals_per_epoch: 4 63 | eval_table_size: 64 | saves_per_epoch: 1 65 | debug: 66 | deepspeed: 67 | weight_decay: 0.0 68 | fsdp: 69 | fsdp_config: 70 | special_tokens: 71 | pad_token: "<|end_of_text|>" 72 | """ 73 | 74 | #After git clone 75 | cd axolotl 76 | 77 | CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess llama3_qlora.yml 78 | accelerate launch -m axolotl.cli.train llama3_qlora.yml 79 | 80 | accelerate launch -m axolotl.cli.inference llama3_qlora.yml \ 81 | --lora_model_dir="./outputs/qlora-out" --gradio 82 | -------------------------------------------------------------------------------- /new_arch/nGPT: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/syncedreview/nvidias-ngpt-revolutionizing-transformers-with-hypersphere-representation-1be9086f216e 2 | 3 | In a new paper nGPT: Normalized Transformer with Representation Learning on the Hypersphere, 4 | an NVIDIA research team proposes the normalized Transformer (nGPT), 5 | which consolidates key findings in Transformer research under a unified framework, 6 | offering faster learning and reduced training steps — by factors ranging from 4 to 20 depending on sequence length. 7 | 8 | 1. Hypersphere-Based Normalization 9 | The core advancement of nGPT lies in normalizing all embedding dimensions to reside on a unit hypersphere. 10 | This approach ensures consistent dimensionality across matrices and interprets matrix-vector 11 | multiplications as cosine similarities within the bounded range of [-1,1]. Notably, this normalization eliminates 12 | the need for weight decay by maintaining intrinsic stability. 13 | 14 | 2. Mitigating Non-Linear Constraints 15 | While normalization standardizes embeddings, it also constrains the inputs to non-linear units. 16 | To address this, scaling factors are introduced, balancing these constraints and enhancing the model’s flexibility. 17 | 18 | 3. Variable-Metric Optimization 19 | Inspired by recent studies that position Transformers as meta-optimizers, 20 | the research team demonstrates that nGPT functions as a variable-metric optimizer. Specifically 21 | 22 | 4. Gradient Information 23 | Each transformation block computes gradients. 24 | 25 | 5. Eigen Learning Rates 26 | These gradients are scaled using learnable eigen learning rates derived from a variable-metric matrix. 27 | 28 | 6. Riemannian Retraction 29 | Normalization acts as a retraction step in Riemannian optimization, projecting outputs back onto the hypersphere. 30 | This process transforms nGPT into a data-driven optimizer, fine-tuning its outputs with precision. 31 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/RAFT.py: -------------------------------------------------------------------------------- 1 | """ 2 | Retrieval Augmented Fine Tuning (RAFT) provide a means to infuse LLMs with domain-specific knowledge and reasoning abilities. 3 | 4 | The integration of RAFT with LlamaIndex offers numerous advantages: 5 | 6 | 1. Enhanced Adaptability: Fine-tuning LLMs with domain-specific documents through RAFT enhances their understanding of specialized topics, 7 | thereby increasing adaptability in nuanced environments. 8 | 9 | 2. Improved Reasoning: RAFT enables LLMs to discern relevant information from retrieved documents, 10 | leading to more accurate and contextually appropriate responses. 11 | 12 | 3. Robustness Against Inaccurate Retrievals: RAFT trains LLMs to comprehend the relationship between the question, 13 | retrieved documents, and the answer, ensuring resilience against inaccuracies in the retrieval process. 14 | 15 | 4. Efficient Knowledge Integration: By simulating real-world scenarios where LLMs must utilize external sources for information, 16 | RAFT streamlines the integration of domain-specific knowledge into the model's framework, 17 | resulting in more efficient knowledge utilization. 18 | """ 19 | 20 | # Step I: Install Libraries and Download Data 21 | !pip install llama-index 22 | !pip install llama-index-packs-raft-dataset 23 | 24 | # Step II: Download RAFT Pack 25 | 26 | import os 27 | 28 | ## Have to check RAFTDatasetPack -> https://github.com/run-llama/llama_index/blob/f03db8da9301e2a1f2a1783338464bec7e7a859e/llama-index-packs/llama-index-packs-raft-dataset/llama_index/packs/raft_dataset/base.py#L27 29 | from llama_index.packs.raft_dataset import RAFTDatasetPack 30 | 31 | os.environ["OPENAI_API_KEY"] = "" 32 | 33 | raft_dataset = RAFTDatasetPack("./paul_graham_essay.txt") 34 | 35 | dataset = raft_dataset.run() 36 | -------------------------------------------------------------------------------- /Tuning/Hyper_paramter_tuning/Insights about hyperparameters: -------------------------------------------------------------------------------- 1 | From https://medium.com/defactoblog/explainability-of-the-features-no-of-the-hyperparameters-ad797918155f 2 | 3 | 1. Learning rate: Learning rate is crucial, as indicated by Shapley values, and higher values lead to quicker convergence but may miss optimal results. 4 | Lower learning rates are favored for better performance, even though it requires more iterations. 5 | It interacts significantly with the number of estimators. 6 | 7 | 2. Max depth: Controls the maximum depth of each tree, with too high values leading to overfitting and too low values leading to underfitting. 8 | 9 | 3. Colsample_by_tree: Determines the ratio of features to sample for each new tree. 10 | Lower sampling rates work better with lower learning rates, promoting stability in learning from each additional tree. 11 | 12 | 4. N_estimators: The number of trees in boosting algorithm, where more trees generally lead to better performance. 13 | Interacts significantly with learning rate; higher learning rates are better with fewer estimators, 14 | and lower learning rates are favored with more estimators. 15 | 16 | 5. Subsample: The ratio of training data to sample before training a tree. Its impact varies with datasets, 17 | with higher subsampling favored in one dataset and the opposite in another. Interaction with learning rate is notable. 18 | 19 | 6. Min_child_weight: Controls instance weight needed in a child, affecting split and overfitting. 20 | Its importance varies with datasets, and high values consistently decrease performance in the Titanic dataset. 21 | 22 | 7. Gamma: Represents the minimum loss reduction needed to split a tree, which seems to be less impactful across both datasets. 23 | It could potentially be dropped to reduce the search space for optimal solutions. 24 | -------------------------------------------------------------------------------- /etc/Mixed Precision Training: -------------------------------------------------------------------------------- 1 | ### https://towardsdatascience.com/the-mystery-behind-the-pytorch-automatic-mixed-precision-library-d9386e4b787e 2 | 3 | This article does an excellent job outlining the benefits of mixed precision training (MPT) for deep learning 4 | and explaining the crucial hardware fundamentals like Nvidia's tensor cores and GPU architecture, 5 | which are essential for effective mixed precision. 6 | 7 | 1. Understand Hardware Capabilities 8 | Ensure your GPUs support tensor cores, as these optimize matrix multiplications essential for deep learning. 9 | Tensor cores allow calculations in reduced precision (FP16), improving speed while keeping acceptable accuracy. 10 | 11 | 2. Know Your Data Formats 12 | Mixed precision uses FP16 for the majority of calculations, reducing memory and speeding up processes. 13 | However, certain operations still require FP32 due to the greater range and precision required for complex calculations (e.g., gradients, accumulations). 14 | 15 | 3. Loss Scaling 16 | Since FP16 has a limited exponent range, the gradients might underflow, leading to zeroed-out values. 17 | Loss scaling addresses this by multiplying the loss by a large factor, preserving gradient values and maintaining model performance. 18 | 19 | 4. PyTorch AMP Library 20 | PyTorch's AMP library automates most MPT tasks, such as casting specific operations to FP16 and managing loss scaling. 21 | By adding just a few lines of code, you can implement MPT, minimizing manual adjustments. 22 | 23 | 5. Memory Efficiency and Limitations 24 | Mixed precision halves memory usage for data, but certain components like optimizer parameters remain in FP32, 25 | so models with large weights might still benefit from additional methods like DeepSpeed’s ZERO optimization. 26 | 27 | Applying these strategies should streamline your training process significantly, enabling you to test hypotheses and iterate more efficiently on your models. 28 | 29 | -------------------------------------------------------------------------------- /text/BitNet b1.58: -------------------------------------------------------------------------------- 1 | From https://medium.com/syncedreview/embracing-the-era-of-1-bit-llms-microsoft-ucass-bitnet-b1-58-redefines-efficiency-7ba5c722be2b 2 | 3 | In a recent paper titled "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits," 4 | researchers from Microsoft Research and the University of Chinese Academy of Sciences present BitNet b1.58, a new version of 1-bit LLMs. 5 | This variant builds on the BitNet architecture, which replaces nn.Linear with BitLinear in a Transformer model, 6 | resulting in a ternary parameter space of {-1, 0, 1}. The inclusion of 0 increases the binary system representation to 1.58 bits. 7 | 8 | BitNet b1.58 showcases several improvements over its predecessor: 9 | 10 | 1. Quantization Function 11 | The research team introduces an absmean quantization function, offering ease of implementation and system-level optimization 12 | without significant performance impacts. 13 | 14 | 2. LLaMA-alike Components 15 | BitNet b1.58 incorporates components from the LLaMA framework, like RMSNorm, SwiGLU, and rotary embedding, 16 | eliminating biases to ensure seamless integration into open-source software. 17 | 18 | Comparative evaluations against FP16 LLaMA LLMs show that BitNet b1.58 begins to match full-precision performance at a model size of 3B, 19 | with 2.71 times faster performance and 3.55 times less GPU memory usage. 20 | It retains the innovative computation paradigm of minimal multiplication operations while improving efficiency in memory consumption, throughput, and latency. 21 | 22 | BitNet b1.58 introduces two key enhancements 23 | 1. Explicit support for feature filtering via 0 inclusion 24 | 2. Performance parity with FP16 baselines in both perplexity and end-task results starting from a 3B model size. 25 | 26 | Overall, BitNet b1.58 presents a novel scaling law and training framework for high-performance, cost-effective LLMs, 27 | laying the groundwork for specialized hardware optimized for 1-bit LLMs. 28 | -------------------------------------------------------------------------------- /VISION/Image/Mixture of Nested Experts: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@aipapers/mixture-of-nested-experts-ai-paper-explained-67564e2f464a 2 | 3 | Mixture of Nested Experts (MoNE), a model introduced by Google that addresses efficiency and redundancy issues in vision models like Vision Transformers (ViTs). 4 | Traditional Mixture-of-Experts (MoE) models scale up large models without proportional increases in computation 5 | but have limitations like large memory requirements and inefficiency when handling redundant information in image patches. 6 | 7 | 1. Key Ideas in MoNE: 8 | -1. Redundancy in Vision Models 9 | In ViTs, patches often contain redundant information (e.g., background), yet all patches receive equal computation power. 10 | MoNE addresses this by selectively allocating computation based on the importance of each patch. 11 | -2. Nested Experts 12 | MoNE uses nested experts within each layer, where each expert represents different portions of the model’s weights. 13 | For instance, one expert might use the full model layer, while another uses only half or a quarter of the weights. 14 | -3. Routing Mechanism 15 | A router assigns probabilities to tokens (image patches), directing them to experts based on their importance. 16 | Important tokens are processed by more capable experts, while less important ones are handled by smaller, less resource-intensive experts. 17 | -4. Efficient Computation 18 | By processing tokens with varying levels of compute, MoNE optimizes performance while reducing the overall computational cost. 19 | -5. Performance 20 | The paper shows that MoNE models achieve comparable performance to baselines on tasks like image classification but with significantly reduced compute, 21 | making them more efficient. 22 | 23 | Overall, MoNE offers an advanced and adaptive approach to handling redundancy and efficiency in vision models, 24 | particularly by optimizing computation for tokens based on their importance. 25 | -------------------------------------------------------------------------------- /MultiModal/LLaMA_3.2_Vision.py: -------------------------------------------------------------------------------- 1 | ### From https://pub.towardsai.net/llama-3-2-vision-revolutionizing-multimodal-ai-with-advanced-visual-reasoning-now-llama-can-see-d8a32d8e4b86 2 | 3 | !pip install git+https://github.com/huggingface/transformers accelerate bitsandbytes huggingface_hub 4 | !pip install -U "huggingface_hub[cli]" 5 | 6 | from huggingface_hub import notebook_login 7 | notebook_login() 8 | 9 | import requests 10 | import torch 11 | from PIL import Image 12 | from transformers import MllamaForConditionalGeneration, AutoProcessor 13 | from transformers import BitsAndBytesConfig 14 | 15 | bnb_config = BitsAndBytesConfig( 16 | load_in_4bit=True, 17 | bnb_4bit_quant_type="nf4", 18 | bnb_4bit_compute_dtype=torch.bfloat16 19 | ) 20 | 21 | model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" 22 | 23 | # Load model and processor 24 | model = MllamaForConditionalGeneration.from_pretrained( 25 | model_id, 26 | quantization_config=bnb_config 27 | ) 28 | processor = AutoProcessor.from_pretrained(model_id) 29 | 30 | # Load an image from a URL 31 | url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/rabbit.jpg" 32 | 33 | image = Image.open(requests.get(url, stream=True).raw) 34 | 35 | # Define the conversation prompt 36 | messages = [ 37 | {"role": "user", "content": [ 38 | {"type": "image"}, 39 | {"type": "text", "text": "Can you please describe this image in just one sentence?"} 40 | ]}, 41 | {"role": "assistant", "content": "The image depicts a rabbit dressed in a blue coat and brown vest, standing on a dirt road in front of a stone house."}, 42 | {"role": "user", "content": "What is in the background?"} 43 | ] 44 | 45 | input_text = processor.apply_chat_template( 46 | messages, 47 | add_generation_prompt=True, 48 | ) 49 | inputs = processor(image, input_text, return_tensors="pt").to(model.device) 50 | output = model.generate(**inputs, max_new_tokens=70) 51 | print(processor.decode(output[0][inputs["input_ids"].shape[-1]:])) 52 | -------------------------------------------------------------------------------- /new_arch/Diff_transformer/Differential Transformer: -------------------------------------------------------------------------------- 1 | ## From https://arxiv.org/abs/2410.05258 2 | ## https://www.microsoft.com/en-us/research/publication/differential-transformer/ 3 | ## https://www.aidemos.info/differential-transformer-a-breakthrough-in-large-language-model-architecture/ 4 | 5 | The Differential Transformer (DIFF Transformer) is an enhancement to the traditional Transformer architecture, 6 | aimed at addressing inefficiencies in attention mechanisms. 7 | In standard Transformers, attention is often allocated to irrelevant parts of the input, 8 | which can result in issues such as hallucinations and poor focus on relevant information. 9 | The DIFF Transformer tackles this problem by introducing a differential attention mechanism. 10 | 11 | This mechanism operates by calculating two separate softmax-based attention maps and subtracting one from the other. 12 | This subtraction effectively cancels out noise and amplifies attention on the most relevant parts of the input. This process leads to several benefits: 13 | 14 | -1. Noise cancellation: Similar to how noise-canceling headphones work, it removes irrelevant context, making the model more efficient in processing. 15 | -2. Sparse attention patterns: The differential approach promotes focusing only on key information. 16 | -3. Improved in-context learning: The model becomes more accurate and robust, particularly in handling tasks like few-shot learning and long-context modeling. 17 | 18 | Experimental results show that the DIFF Transformer outperforms traditional models, such as OpenLLaMA and StableLM, across various tasks, 19 | including language modeling, information retrieval, and hallucination mitigation. 20 | Additionally, it achieves better results with fewer parameters and less training data, making it more efficient for large-scale AI applications. 21 | 22 | Overall, the Differential Transformer is a promising evolution in Transformer-based architectures, 23 | particularly for tasks requiring long-context understanding and key information retrieval 24 | -------------------------------------------------------------------------------- /Tuning/Distill/Model_Distillation: -------------------------------------------------------------------------------- 1 | ### https://pub.towardsai.net/smaller-faster-smarter-the-power-of-model-distillation-d002662308c7 2 | ## https://github.com/arcee-ai/DistillKit 3 | 4 | This article discusses the importance of model distillation in the context of large language models (LLMs), 5 | highlighting how OpenAI's decision to hide the reasoning steps in its new o1 models could impact the future of this technique and the open-source community. 6 | 7 | 1. Model Distillation: 8 | - Definition: It's the process of transferring knowledge from a large, complex model (the "teacher") to a smaller, more efficient model (the "student"). 9 | - How it works: The teacher model predicts token distributions, and the student is trained to mimic this output, 10 | capturing both correct answers and the underlying reasoning. 11 | - Benefits: Smaller models produced through distillation are faster, less expensive, and easier to deploy, while retaining much of the original model's capabilities. 12 | 13 | 2. OpenAI's New Approach: 14 | With the new o1 reasoning models, OpenAI now hides intermediate reasoning steps and only presents summaries in the final output. 15 | 16 | 3. Implications: 17 | This makes it difficult for external developers to understand the model’s decision-making process or access token distributions, 18 | which are crucial for model distillation. 19 | The open-source community may struggle to replicate or distill OpenAI’s models into smaller, equally capable versions, 20 | widening the gap between proprietary and open-source AI. 21 | 22 | 4. Concerns for the Future: 23 | This decision raises questions about balancing intellectual property protection and scientific progress. 24 | The article expresses hope that companies like Meta will maintain open approaches to AI development, promoting broader accessibility. 25 | 26 | In summary, while OpenAI’s changes might enhance safety and control, they create challenges for model distillation and could hinder open-source innovation in AI. 27 | -------------------------------------------------------------------------------- /Knowledge Graph Reasoning/Graph_Maker.py: -------------------------------------------------------------------------------- 1 | # https://towardsdatascience.com/text-to-knowledge-graph-made-easy-with-graph-maker-f3f890c0dbe8 2 | 3 | ## Define ontology 4 | ontology = Ontology( 5 | # labels of the entities to be extracted. Can be a string or an object, like the following. 6 | labels=[ 7 | {"Person": "Person name without any adjectives, Remember a person may be referenced by their name or using a pronoun"}, 8 | {"Object": "Do not add the definite article 'the' in the object name"}, 9 | {"Event": "Event event involving multiple people. Do not include qualifiers or verbs like gives, leaves, works etc."}, 10 | "Place", 11 | "Document", 12 | "Organisation", 13 | "Action", 14 | {"Miscellaneous": "Any important concept can not be categorised with any other given label"}, 15 | ], 16 | # Relationships that are important for your application. 17 | # These are more like instructions for the LLM to nudge it to focus on specific relationships. 18 | # There is no guarantee that only these relationships will be extracted, but some models do a good job overall at sticking to these relations. 19 | relationships=[ 20 | "Relation between any pair of Entities", 21 | ], 22 | ) 23 | 24 | ## Make test chunk and Convert these chunks into Documents. 25 | # After making chunk 26 | 27 | 28 | from graph_maker import GraphMaker, Ontology, GroqClient 29 | from graph_maker import Neo4jGraphModel 30 | 31 | class Document(BaseModel): 32 | text: str 33 | metadata: dict 34 | 35 | class Node(BaseModel): 36 | label: str 37 | name: str 38 | 39 | class Edge(BaseModel): 40 | node_1: Node 41 | node_2: Node 42 | relationship: str 43 | metadata: dict = {} 44 | order: Union[int, None] = None 45 | 46 | model = "mixtral-8x7b-32768" 47 | llm = GroqClient(model=model, temperature=0.1, top_p=0.5) 48 | graph_maker = GraphMaker(ontology=ontology, llm_client=llm, verbose=False) 49 | 50 | graph = graph_maker.from_documents(docs) 51 | 52 | create_indices = False 53 | neo4j_graph = Neo4jGraphModel(edges=graph, create_indices=create_indices) 54 | neo4j_graph.save() 55 | -------------------------------------------------------------------------------- /MultiModal/MultiAgent/gpt_reaserch: -------------------------------------------------------------------------------- 1 | # From https://medium.com/@assafelovic/how-to-build-the-ultimate-ai-automation-with-multi-agent-collaboration-ed61a1ea8f3b 2 | # Check https://pypi.org/project/gpt-researcher/ 3 | 4 | The text discusses the rapid evolution of AI agent development, particularly focusing on the advancements seen since the release of GPT Researcher. 5 | It highlights the transition from simple prompting methods to more complex agent workflows. 6 | Andrew Ng emphasizes the significance of AI agent workflows for driving substantial progress in AI, possibly surpassing even the next generation of foundation models. 7 | 8 | LangGraph is introduced as a tool for creating agent and multi-agent flows, offering controllability and flexibility in designing custom agents. 9 | The article explains the architecture of an autonomous research agent team utilizing LangGraph, 10 | consisting of various specialized agents such as Chief Editor, GPT Researcher, Editor, Reviewer, Reviser, Writer, and Publisher. 11 | 12 | It details the workflow of the research process, including stages like planning, data collection, review, writing, and publication, 13 | and how LangGraph facilitates the coordination of these tasks. The concept of state management in LangGraph is highlighted, 14 | enabling dynamic responses based on the evolving context of the interaction. 15 | 16 | The article also discusses the implementation of a parallelization technique within LangGraph to handle multiple research tasks simultaneously, 17 | ensuring efficiency and consistency in the final data report. 18 | Additionally, it provides insights into initializing the graph and defining conditional edges for managing parallel workflows effectively. 19 | 20 | Lastly, it outlines the next steps for optimizing AI experiences, such as incorporating human intervention for enhanced quality and support for researching 21 | both web and local data sources. It also emphasizes the importance of improving the quality of retrieved sources and ensuring the final report follows an optimal storyline. 22 | -------------------------------------------------------------------------------- /RAG/ReAugKD: -------------------------------------------------------------------------------- 1 | ## Overview of ReAugKD 2 | # Objective: 3 | Bridging the efficiency gap between smaller "student" models and larger "teacher" models. 4 | # Techniques Involved: 5 | Knowledge Distillation (KD) 6 | Optimizing the size of foundation models by transferring knowledge from larger teacher models to smaller student models. 7 | Retrieval-Augmented Generation (RAG): 8 | Expanding foundation model knowledge by incorporating external data sources. 9 | 10 | #Amazon Science's Contribution : Retrieval-Augmented Knowledge Distillation (ReAugKD) 11 | Concept: 12 | Utilizes teacher models' data representations and predictions stored in a lookup table to guide predictions of student models for similar inputs. 13 | Adaptable beyond language models to various task-specific external knowledge domains. 14 | Evaluation: 15 | Tasks: 16 | Evaluated on six natural language processing tasks, including paraphrasing, natural-language inference, and question answering. 17 | Results: 18 | ReAugKD outperformed ten existing models in five tasks and secured the second spot in the sixth. 19 | Established a new state-of-the-art benchmark with minimal latency overhead (3%). 20 | 21 | # Training Method 22 | Two-Step trining Process: 23 | step 1 24 | Teacher model fine-funed for a specific downstream task 25 | Linear-projection layer introduced atop the models' encoder 26 | Supervised contrastive loss mechanism to optimize the parameters of the linear-projection layer 27 | step 2 28 | Generation of resized teacher embeddings and predictions tailored for student model training. 29 | Creation of similarity matrix for teacher embeddings to quantify likeness between inputs. 30 | 31 | Loss Function: 32 | Kullback–Leibler Divergence 33 | Minimizes the divergence between teacher-teacher and teacher-student similarity distributions. 34 | Cross-Entropy Loss 35 | Computes divergence between student's and teacher's predictions. 36 | 37 | -------------------------------------------------------------------------------- /automated-prompt-engineering: -------------------------------------------------------------------------------- 1 | From https://towardsdatascience.com/automated-prompt-engineering-78678c6371b9 2 | 3 | Have to testing below : 4 | ############################################################################################### 5 | prompt_improvement_prompt = """ 6 | 7 | # Context # 8 | 9 | You are given an original prompt. 10 | 11 | The original prompt was used to generate some example responses. For each response, feedback was provided on how to improve the desired response. 12 | 13 | Your task is to review all the feedback and then return an improved prompt that addresses the feedback, making it better at generating responses when prompted against the GPT language model. 14 | 15 | # Guidelines # 16 | 17 | - The original prompt will contain placeholders within double curly brackets. These are values for input that you will see in the examples. 18 | - The improved prompt should not exceed 200 words 19 | - Just return the improved prompt and nothing else before and after. Remember to include the same placeholders with double curly brackets. 20 | - When generating the improved prompt, refrain from writing the entire prompt as one paragraph. Instead, you should use a combination of task descriptions, guidelines (in point form), and other sections to the prompt as appropriate. 21 | - The guidelines should be in point form, and should not be a repetition of the task. The guidelines should also be distinct from one another. 22 | - The improved prompt should be written in normal English that is best understood by the language model. 23 | - Based on the feedback provided, you must rephrase the desired behavior of the response into `must`, imperative statements, instead of `should` suggestive statements. 24 | - Improvements made to the prompt should not be overly specific to one single example. 25 | 26 | # Details # 27 | 28 | The original prompt is: 29 | ``` 30 | {original_prompt} 31 | ``` 32 | 33 | These are the examples that were provided and the feedback for each: 34 | ``` 35 | {examples} 36 | ``` 37 | 38 | The improved prompt is: 39 | ``` 40 | """ 41 | -------------------------------------------------------------------------------- /text/Nemotron-4 15B: -------------------------------------------------------------------------------- 1 | From https://medium.com/syncedreview/nvidias-nemotron-4-15b-dominates-multilingual-domain-defeating-4-larger-rivals-82ba51c58383 2 | From https://arxiv.org/abs/2402.16819 3 | 4 | In a new paper titled "Nemotron-4 15B Technical Report," an NVIDIA research team introduces Nemotron-4 15B, 5 | a language model comprised of 15 billion parameters. 6 | This model sets itself apart with unparalleled multilingual capabilities among models of similar size, 7 | having been trained on an extensive corpus of 8 trillion text tokens. 8 | 9 | Nemotron-4 employs a standard decoder-only Transformer architecture with causal attention masks, 10 | consisting of 3.2 billion embedding parameters and 12.5 billion non-embedding parameters. 11 | It incorporates innovative techniques such as Rotary Position Embeddings, the SentencePiece tokenizer, 12 | squared ReLU activations in MLP layers, no bias terms, dropout rate of 0, and untied input-output embeddings. 13 | The model also utilizes Grouped Query Attention to enhance inference speed and reduce memory footprint. 14 | 15 | The training process involved utilizing 384 DGX H100 nodes, each equipped with 8 H100 80GB SXM5 GPUs based on the NVIDIA Hopper architecture. 16 | A combination of 8-way tensor parallelism and data parallelism, along with a distributed optimizer, 17 | was employed to shard the optimizer state over data-parallel replicas. 18 | 19 | Nemotron-4 15B achieves exceptional downstream accuracies across various domains, including English, code, and multilingual evaluations. 20 | Notably, it surpasses models over four times larger and those explicitly tailored for multilingual tasks, 21 | establishing itself as the leader in multilingual capabilities among models of similar scale. 22 | 23 | In summary, Nemotron-4 15B demonstrates unmatched multilingual performance among general-purpose language models at its scale, 24 | even surpassing specialized models in the multilingual domain. 25 | Its success underscores the potential for large language models to be pre-trained on extensive token corpora, yielding remarkable outcomes. 26 | -------------------------------------------------------------------------------- /MultiModal/Florence_2.py: -------------------------------------------------------------------------------- 1 | ### From https://towardsdatascience.com/florence-2-mastering-multiple-vision-tasks-with-a-single-vlm-model-435d251976d0 2 | 3 | #Load model: 4 | model_id = ‘microsoft/Florence-2-large’ 5 | model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype='auto').eval().cuda() 6 | processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) 7 | 8 | #Load image: 9 | image = Image.open(img_path) 10 | 11 | def run_example(image, task_prompt, text_input=''): 12 | 13 | prompt = task_prompt + text_input 14 | 15 | inputs = processor(text=prompt, images=image, return_tensors=”pt”).to(‘cuda’, torch.float16) 16 | 17 | generated_ids = model.generate( 18 | input_ids=inputs[“input_ids”].cuda(), 19 | pixel_values=inputs[“pixel_values”].cuda(), 20 | max_new_tokens=1024, 21 | do_sample=False, 22 | num_beams=3, 23 | early_stopping=False, 24 | ) 25 | 26 | generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] 27 | parsed_answer = processor.post_process_generation( 28 | generated_text, 29 | task=task_prompt, 30 | image_size=(image.width, image.height) 31 | ) 32 | 33 | return parsed_answer 34 | 35 | print (run_example(image, task_prompt='')) 36 | # Output: 'A black camera sitting on top of a wooden table.' 37 | 38 | print (run_example(image, task_prompt='')) 39 | # Output: 'The image shows a black Kodak V35 35mm film camera sitting on top of a wooden table with a blurred background.' 40 | 41 | print (run_example(image, task_prompt='')) 42 | # Output: 'The image is a close-up of a Kodak VR35 digital camera. The camera is black in color and has the Kodak logo on the top left corner. The body of the camera is made of wood and has a textured grip for easy handling. The lens is in the center of the body and is surrounded by a gold-colored ring. On the top right corner, there is a small LCD screen and a flash. The background is blurred, but it appears to be a wooded area with trees and greenery.' 43 | -------------------------------------------------------------------------------- /Jax: -------------------------------------------------------------------------------- 1 | # from https://medium.com/@hghcomphys/why-you-should-learn-jax-a-molecular-dynamics-showcase-f7e79b58be01 <-- show this for python code 2 | 3 | JAX, developed by Google, offers several key features that make it 4 | an attractive choice for scientific computing and machine learning 5 | 6 | 1. Accelerated Linear Algebra (XLA Compiler) 7 | JAX leverages XLA to optimize matrix operations by compiling code into highly optimized kernels. 8 | This leads to significant performance improvements through techniques like operation fusion and memory layout optimization. 9 | 10 | 2. Just-in-Time (JIT) Compilation 11 | JAX uses JIT compilation to execute code at runtime, resulting in faster execution by compiling multiple operations together. 12 | This is particularly beneficial in deep learning where large, repetitive computations are common. 13 | 14 | 3. Automatic Differentiation 15 | JAX supports automatic differentiation through its grad() function, which can differentiate through Python and NumPy functions, 16 | including loops and branches. This simplifies the implementation of backpropagation in neural networks. 17 | 18 | 4. Vectorization with vmap 19 | JAX's vmap() function vectorizes operations, enabling batch processing of data for improved performance and memory efficiency. 20 | This is useful for tasks that involve repeated operations on large datasets. 21 | 22 | 5. Parallelization with pmap 23 | JAX supports parallel computation across multiple devices using pmap(). 24 | This feature allows for efficient scaling of computations, distributing workloads across available hardware resources. 25 | 26 | 6. Pure Functions and Haiku 27 | JAX emphasizes the use of pure functions, which are functions without side effects. 28 | Haiku, a neural network library built on JAX, transforms impure functions into pure ones, 29 | facilitating automatic differentiation and other advanced transformations. 30 | 31 | 7. Ease of Use 32 | JAX is designed to be user-friendly, with an API similar to NumPy, making it accessible to users familiar with 33 | Python's scientific computing ecosystem. 34 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/ReFT/into: -------------------------------------------------------------------------------- 1 | """ 2 | From https://medium.com/@aipapers/reft-representation-finetuning-for-language-models-4e804753e886 3 | 4 | The post explores a recent research paper proposing a new method for fine-tuning Large Language Models (LLMs), 5 | which balances parameter count and performance effectively. 6 | 7 | 1. ReFT 8 | Representation Fine-Tuning (ReFT), particularly LoReFT, as a promising alternative to PEFT. 9 | LoReFT requires significantly fewer parameters compared to LoRA, yet achieves remarkable results, 10 | as illustrated in the provided figures. 11 | Impressively, LoReFT outperforms other methods in various tasks while training a minimal number of weights, showcasing its efficiency. 12 | 13 | 2. Explaining the idea of ReFT 14 | ReFT focuses on editing original representations obtained from pre-trained Transformer models, 15 | unlike traditional PEFT methods that add additional weights. By directly manipulating these representations, ReFT aims for enhanced performance. 16 | 17 | 3. ReFT High-level Architecture 18 | Interventions in ReFT are employed to edit the original representations. 19 | These interventions, represented by components like Phi, P, and L, are applied 20 | before passing the representations to the next layer, allowing targeted adjustments for specific tasks. 21 | 22 | 4. What is LoReFT? 23 | LoReFT, a specific ReFT method, stands for Low-rank Linear Subspace ReFT. 24 | It defines a function to edit representations using matrices and vectors. 25 | During training, parameters like W, R, and b are optimized to modify the representations effectively. 26 | 27 | 5. LoReFT Hyperparameters 28 | With LoReFT, interventions are trained for the prefix and suffix of tokens, while leaving middle tokens unchanged. 29 | The size of the prefix and suffix, along with other intervention parameters, are configurable hyperparameters. 30 | 31 | In summary, ReFT, particularly LoReFT, offers a promising approach to fine-tuning LLMs, 32 | achieving impressive results with reduced parameter count, thus making fine-tuning more accessible and efficient. 33 | """ 34 | -------------------------------------------------------------------------------- /SELF-DISCOVER: -------------------------------------------------------------------------------- 1 | From https://jrodthoughts.medium.com/meet-self-discover-google-deepminds-new-method-for-llm-reasoning-4f3fdc547926 2 | 3 | The text provided is an article discussing a recent research paper by Google DeepMind on a novel reasoning technique called SELF-DISCOVER, 4 | developed for large language models (LLMs). This method aims to enhance the problem-solving capabilities of LLMs by mimicking the human reasoning process. 5 | DeepMind's approach to SELF-DISCOVER is inspired by human problem-solving strategies such as step-by-step problem-solving, 6 | decomposition-based prompting, and step-back prompting. 7 | It addresses the limitations of existing reasoning methods by tailoring the reasoning process to the specific demands of each task. 8 | 9 | # SELF-DISCOVER operates in two fundamental stages: 10 | 11 | Stage 1: In this stage, the LLM generates a task-specific reasoning structure, which is described using natural language terms 12 | like "breakdown into subtasks" and "critical thinking." This process involves selecting, adapting, and implementing a coherent reasoning framework 13 | using a structured data format (JSON) for interpretability and generation quality. 14 | 15 | Stage 2: The LLM applies the reasoning structure generated in Stage 1 to solve individual task instances. 16 | 17 | SELF-DISCOVER has shown promising results in enhancing the reasoning abilities of cutting-edge language models like PaLM 2-L and GPT-4 across various reasoning tasks, 18 | including BBH, T4D, and MATH. 19 | It significantly outperformed traditional methods such as chain-of-thought (CoT) and Plan-and-Solve (PS) in performance, demonstrating the effectiveness of its approach. 20 | 21 | The article concludes that SELF-DISCOVER represents a significant step forward in the application of artificial intelligence to complex reasoning tasks. 22 | By tailoring the reasoning process to the specific demands of each task and applying a structured reasoning process, 23 | SELF-DISCOVER achieves higher accuracy and provides a more intuitive and logical path to solving problems, similar to the approach a human expert might take. 24 | -------------------------------------------------------------------------------- /etc/NanoFlow: -------------------------------------------------------------------------------- 1 | ## From https://arxiv.org/abs/2408.12757 2 | ## From https://github.com/efeslab/Nanoflow 3 | 4 | The increasing demand for large-scale serving systems for Large Language Models (LLMs) has led to significant focus on improving throughput, 5 | especially with tens of thousands of GPUs serving hundreds of millions of users. 6 | Traditionally, methods like data, tensor, and pipeline parallelism have been explored to boost throughput. 7 | However, these methods often fail to fully utilize the resources of a single device (compute, memory, network), resulting in sub-optimal performance. 8 | 9 | NanoFlow is a novel serving framework that addresses this limitation by exploiting intra-device parallelism. 10 | This approach overlaps the use of compute, memory, and network resources within a single device through operation co-scheduling, 11 | allowing for more efficient resource utilization. 12 | 13 | Key Innovations: 14 | 1. Nano-Batch Splitting: NanoFlow splits inference requests into nano-batches at the operation level. 15 | This breaks the sequential dependency of operations during LLM inference, enabling overlapping execution. 16 | 2. Operation-Level Pipeline with Execution Unit Scheduling: NanoFlow introduces an operation-level pipeline that partitions 17 | the GPU's functional units to execute different operations simultaneously in each unit, 18 | enhancing throughput by overlapping computation and data transfer. 19 | 20 | Implementation and Results: 21 | NanoFlow automates pipeline setup using a parameter search algorithm, which simplifies porting to various LLMs. 22 | Evaluated on NVIDIA GPUs with models like LLaMA-2-70B, Mixtral 8x7B, and LLaMA-3-8B, NanoFlow boosts throughput by 1.91x compared to state-of-the-art systems, 23 | achieving 59% to 72% of the optimal throughput across models. 24 | 25 | In summary, NanoFlow significantly enhances LLM serving performance by leveraging intra-device parallelism, 26 | achieving near-optimal throughput across different models and workloads. 27 | -------------------------------------------------------------------------------- /RAG/Self_RAF.txt: -------------------------------------------------------------------------------- 1 | From https://medium.com/@raphael.mansuy/improving-factuality-of-ai-systems-with-self-reflective-retrieval-augmented-generation-aa13817d401a 2 | 3 | standard RAG approaches have some key limitations 4 | 1. They retrieve a fixed number of passages regardless of relevance, which can introduce unnecessary or irrelevant information 5 | 2. The outputs are not guaranteed to be consistent with the retrieved passages, since models are not explicitly trained to follow the facts. 6 | 3. There is no mechanism to verify whether the retrieved passages are actually useful for the task. 7 | 8 | The SELF-RAG paper introduces a new training framework to address these limitations through retrieval and self-reflection 9 | 10 | ## Overview of SELF-RAG Framework 11 | The key idea in SELF-RAG is to train a single LLM that can 12 | 1. Decide when retrieval is needed using a special Retrieve token 13 | 2. Retrieve relevant passages on demand from a retriever 14 | 3. Generate outputs grounded in the retrieved passages 15 | 4. Critique its own outputs and retrieved passages through reflection tokens like ISREL, ISSUP, ISUSE 16 | 17 | ## Key steps 18 | 1. Conditional retrieval: Model predicts Retrieve to trigger retriever. 19 | 2. Relevance checking: Predicts ISREL to check passage relevance. 20 | 3. Grounded generation: Generates output grounded in retrieved passages. 21 | 4. ** Self-critique: Predicts ISSUP for supportedness and ISUSE for utility ** 22 | 23 | ## Reflection Tokens 24 | Retrieve: Triggers retriever if predicted as 1. 25 | ISREL: Binary relevance score for each passage. 26 | ISSUP: Binary score if output is supported by retrieved passages. 27 | ISUSE: Score from 1-5 for overall utility of the output. 28 | 29 | ## Training Methodology 30 | 1. Use a large dataset of input-output pairs (e.g. Q&A pairs) 31 | 2. For each example, retrieve top passages using a fixed retriever. 32 | 3. Annotate passages with ISREL scores for relevance. 33 | 4. Annotate outputs with ISSUP and ISUSE scores. 34 | 5. Train model to generate output text and reflection tokens using cross-entropy loss. 35 | 6. Jointly learn to retrieve, generate, and critique via multi-tasking. 36 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/llm.llm.py: -------------------------------------------------------------------------------- 1 | from src.logger import get_console_logger 2 | from src.llm.prompts import PROMPTS 3 | 4 | 5 | logger = get_console_logger("llm") 6 | MIND_MAP_EXTRACTION_MODEL = "gpt-4-turbo-preview" 7 | MIND_MAP_INSPECTION_MODEL = "gpt-4" 8 | 9 | def extract_mind_map_data(openai_client: object, text: str) -> None: 10 | logger.info(f"Extracting mind map data from text...") 11 | response = openai_client.chat.completions.create( 12 | model=MIND_MAP_EXTRACTION_MODEL, 13 | response_format={"type": "json_object"}, 14 | temperature=0, 15 | messages=[ 16 | {"role": "system", "content": PROMPTS["mind_map_of_one"]}, 17 | {"role": "user", "content": f"{text}"}, 18 | ], 19 | ) 20 | return response.choices[0].message.content 21 | 22 | 23 | def extract_mind_map_data_of_two( 24 | openai_client: object, source_text: str, target_text: str 25 | ) -> None: 26 | logger.info(f"Extracting mind map data from two texts...") 27 | user_prompt = PROMPTS["mind_map_of_many"].format( 28 | source_text=source_text, target_text=target_text 29 | ) 30 | response = openai_client.chat.completions.create( 31 | model=MIND_MAP_INSPECTION_MODEL, 32 | response_format={"type": "json_object"}, # this is very important! 33 | messages=[ 34 | {"role": "system", "content": PROMPTS["mind_map_of_many"]}, 35 | {"role": "user", "content": user_prompt}, 36 | ], 37 | ) 38 | return response.choices[0].message.content 39 | 40 | 41 | def extract_information_from_mind_map_data(openai_client_ object, data: dict) -> None: 42 | logger.info(f"Extracting information from mind map data...") 43 | user_prompt = PROMPTS["inspector_of_mind_map"].format(mind_map_data=data) 44 | response = openai_client.chat.completions.create( 45 | model="gpt-4", 46 | messages=[ 47 | {"role": "system", "content": PROMPTS["inspector_of_mind_map"]}, 48 | {"role": "user", "content": user_prompt}, 49 | ], 50 | ) 51 | return response.choices[0].message.content 52 | -------------------------------------------------------------------------------- /RAG/chunking/Build_a_Local_Ollama_OCR_Application_Powered_By_Llama_3.2_Vision.py: -------------------------------------------------------------------------------- 1 | ## From https://sebastian-petrus.medium.com/build-a-local-ollama-ocr-application-using-llama-3-2-vision-bfc3014e3ad6 2 | 3 | ollama run llama3.2-vision 4 | mkdir llama-ocr && cd llama-ocr 5 | python -m venv venv 6 | source venv/bin/activate # On Windows use `venv\\\\Scripts\\\\activate` 7 | pip install requests Pillow 8 | 9 | import base64 10 | import requests 11 | from PIL import Image 12 | 13 | SYSTEM_PROMPT = """Act as an OCR assistant. Analyze the provided image and: 14 | 1. Recognize all visible text in the image as accurately as possible. 15 | 2. Maintain the original structure and formatting of the text. 16 | 3. If any words or phrases are unclear, indicate this with [unclear] in your transcription. 17 | Provide only the transcription without any additional comments.""" 18 | def encode_image_to_base64(image_path): 19 | """Convert an image file to a base64 encoded string.""" 20 | with open(image_path, "rb") as image_file: 21 | return base64.b64encode(image_file.read()).decode('utf-8') 22 | def perform_ocr(image_path): 23 | """Perform OCR on the given image using Llama 3.2-Vision.""" 24 | base64_image = encode_image_to_base64(image_path) 25 | response = requests.post( 26 | "", # Ensure this URL matches your Ollama service endpoint 27 | json={ 28 | "model": "llama3.2-vision", 29 | "messages": [ 30 | { 31 | "role": "user", 32 | "content": SYSTEM_PROMPT, 33 | "images": [base64_image], 34 | }, 35 | ], 36 | } 37 | ) 38 | if response.status_code == 200: 39 | return response.json().get("message", {}).get("content", "") 40 | else: 41 | print("Error:", response.status_code, response.text) 42 | return None 43 | if __name__ == "__main__": 44 | image_path = "path/to/your/image.jpg" # Replace with your image path 45 | result = perform_ocr(image_path) 46 | if result: 47 | print("OCR Recognition Result:") 48 | print(result) 49 | 50 | -------------------------------------------------------------------------------- /Robotics/Embodiment in virtual humans and robots: -------------------------------------------------------------------------------- 1 | ## https://medium.com/@black_51980/embodiment-in-virtual-humans-and-robots-82aa3637d503 2 | 3 | 1. Embodiment in AI 4 | Embodiment is critical for both robots and virtual humans. 5 | It involves connecting an AI's "brain" to a "body" that interacts with the world, grounding AI in a 3D environment. 6 | 7 | 2. SMPL Model 8 | SMPL is a 3D model representing human shape and movement, essential for creating virtual humans. 9 | It encodes human body shape, pose, facial expressions, and movements in a compact form of about 100 parameters. 10 | 11 | 3. Virtual Humans as Robots 12 | Virtual humans are akin to robots in a virtual world, perceiving, understanding, planning, and executing actions. 13 | Key differences include the need for human-like motion and flexible physics modeling in virtual environments, 14 | unlike physical robots. 15 | 16 | 4. Universal Humanoid 17 | SMPL serves as a universal language for human behavior, enabling translation of various data forms into human movements. 18 | This facilitates retargeting human movements to new virtual characters or physical robots. 19 | 20 | 5. AMASS Dataset 21 | AMASS is a large collection of 3D human movement data in SMPL format, used extensively for training AI models 22 | to understand and generate human motion. It supports various applications, 23 | including generating movements from text or speech. 24 | 25 | 6. Learning from Humans 26 | AMASS data is used to train both virtual humans and robots. Robust methods exist for estimating SMPL from video, 27 | enabling robots to learn from human demonstrations encoded in SMPL format. 28 | 29 | 7. SMPL as Latent Space 30 | SMPL acts as a compact, interpretable latent space in machine learning, efficiently representing human body shape and pose. 31 | This minimal representation is achieved through a biomechanically accurate model. 32 | 33 | Conclusion 34 | Embodiment extends beyond physical robots to virtual humans, with SMPL acting as a "virtual robot" 35 | and a universal language for human movement, facilitating the transfer of human behaviors to various embodiments. 36 | -------------------------------------------------------------------------------- /MultiModal/MultiAgent/CrewAI/Agent with Tools.py: -------------------------------------------------------------------------------- 1 | ! pip install duckduckgo_search 2 | 3 | ## With Pre-built tools 4 | from langchain_community.tools import DuckDuckGoSearchRun 5 | 6 | search_tool = DuckDuckGoSearchRun() 7 | activity_agent = Agent( 8 | role='activity_agent', 9 | goal="""responsible for actitivies 10 | recommendation considering the weather situation from weather_reporter.""", 11 | backstory="""You are an activity agent who recommends 12 | activities considering the weather situation from weather_reporter. 13 | Don't ask questions. Make your response short.""", 14 | verbose=True, 15 | allow_delegation=False, 16 | 17 | tools=[search_tool], 18 | 19 | llm=llm, 20 | ) 21 | 22 | task2 = Task( 23 | description="""Make a research for suitable and up-to-date activities 24 | recommendation considering the weather situation""", 25 | agent=activity_agent 26 | ) 27 | 28 | ## Custom tools 29 | from langchain.tools import BaseTool, StructuredTool, tool 30 | from langchain.pydantic_v1 import BaseModel, Field 31 | 32 | class WeatherInput(BaseModel): 33 | search_string: str = Field(description="the search string for the weather status") 34 | 35 | def get_weather(search_string:str) -> str: 36 | """Look up the weather status""" 37 | return "It's raining season with typhoons." 38 | 39 | weather_search = StructuredTool.from_function( 40 | func=get_weather, 41 | name="weather_search", 42 | description="search for the weather status", 43 | args_schema=WeatherInput, 44 | return_direct=True, 45 | ) 46 | 47 | Weather_reporter = Agent( 48 | role='Weather_reporter', 49 | goal="""providing weather 50 | overall status based on the dates and location the user provided.""", 51 | backstory="""You are a weather reporter who provides weather 52 | overall status based on the dates and location the user provided. 53 | Make your response short.""", 54 | verbose=True, 55 | allow_delegation=False, 56 | tools=[weather_search], 57 | llm=llm, 58 | ) 59 | 60 | task1 = Task( 61 | description="""providing weather 62 | overall status in Bohol Island in September.""", 63 | agent=Weather_reporter 64 | ) 65 | 66 | -------------------------------------------------------------------------------- /new_arch/mamba/mamba2/Into: -------------------------------------------------------------------------------- 1 | ## From https://vidrihmarko.medium.com/mamba-2-is-out-can-it-replace-transformers-6cfb3372ea39 2 | 3 | mamba2 install : !pip install mamba-ssm 4 | 5 | 1. What’s Mamba-2? 6 | Mamba-2 is a state space model architecture showing promising performance on information-dense data, like language models. 7 | It’s designed to perform better than older models, including transformers, which are widely used in AI. 8 | 9 | 2. Key Features of Mamba-2 10 | Core Innovation: Structured State Space Duality (SSD) 11 | - The main innovation in Mamba-2 is called Structured State Space Duality (SSD). 12 | This combines two advanced techniques, making computations easier and faster. 13 | It also allows the model to work more efficiently with hardware like GPUs and TPUs. 14 | 15 | 16 | 3. Performance Improvements 17 | Mamba-2 is 50% faster in training compared to Mamba-1. It can handle larger and more complex tasks, especially those involving lots of data. 18 | For example, in tasks that require recalling multiple pieces of information at once, Mamba-2 performs significantly better. 19 | 20 | 21 | 4. Architectural Changes 22 | Mamba-2 introduces a new way of generating parameters, which makes it easier to scale up the model and use it on more powerful hardware. 23 | This new method also keeps memory usage efficient and speeds up computations. 24 | 25 | 26 | 5. How Does It Perform? 27 | In tests, Mamba-2 shows better scaling and faster training times compared to Mamba-1. Pretrained models, ranging from 130 million to 2.8 billion parameters, are available. 28 | These models were trained on large datasets like Pile and SlimPajama. The performance remains consistent across different tasks, with only minor differences due to evaluation noise. 29 | 30 | 31 | ## Specifications 32 | State Size: Increased from 16 (in Mamba-1) to 64–256 in Mamba-2. 33 | Training Speed: 50% faster than Mamba-1. 34 | Model Scale: Available in sizes from 130 million to 2.8 billion parameters. 35 | Datasets: Trained on Pile and SlimPajama. 36 | Evaluation Tasks: Includes multi-query associative recall (MQAR) and various zero-shot evaluations. 37 | 38 | 39 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Image/DINOv2.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/data-science-in-your-pocket/fine-tuning-dinov2-custom-training-for-your-own-ai-projects-6e8a5a486671 2 | 3 | ## After preparing dataset 4 | 5 | from datasets import load_dataset 6 | from transformers import AutoModelForImageClassification 7 | from transformers import Trainer, TrainingArguments 8 | 9 | # Load your custom dataset 10 | dataset = load_dataset("path_to_your_dataset") 11 | 12 | # Load the pre-trained DINOv2 model 13 | model = AutoModelForImageClassification.from_pretrained("facebook/dinov2-base", num_labels=YOUR_NUM_CLASSES) 14 | model.classifier = torch.nn.Linear(model.config.hidden_size, YOUR_NUM_CLASSES) 15 | 16 | training_args = TrainingArguments( 17 | output_dir="./results", 18 | evaluation_strategy="epoch", 19 | per_device_train_batch_size=16, 20 | per_device_eval_batch_size=16, 21 | num_train_epochs=3, 22 | save_steps=500, 23 | save_total_limit=2, 24 | ) 25 | 26 | trainer = Trainer( 27 | model=model, 28 | args=training_args, 29 | train_dataset=dataset["train"], 30 | eval_dataset=dataset["val"], 31 | ) 32 | 33 | trainer.train() 34 | 35 | predictions = trainer.predict(dataset["val"]) 36 | from sklearn.metrics import accuracy_score 37 | 38 | accuracy = accuracy_score(predictions.label_ids, predictions.predictions.argmax(-1)) 39 | print(f"Accuracy: {accuracy:.4f}") 40 | 41 | ####### Tips for Fine-Tuning and Optimization ####### 42 | # It is very basic thing, but basic is more important than the other 43 | """ 44 | Fine-tuning a large model like DINOv2 can be tricky, so here are a few tips to help you along the way: 45 | -1. Learning Rate: Start with a small learning rate and adjust based on performance. A typical range might be between 1e-5 and 1e-4. 46 | -2. Batch Size: Make sure your batch size fits within your GPU’s memory. If you run out of memory, try reducing it. 47 | -3. Early Stopping: Implement early stopping to avoid overfitting. If the validation loss stops improving, it might be time to stop training. 48 | -4. Data Augmentation: If your dataset is small, use techniques like data augmentation (rotation, cropping, etc.) to help your model generalize better. 49 | """ 50 | -------------------------------------------------------------------------------- /AGI/AGI-24/Is Complexity an Illusion?: -------------------------------------------------------------------------------- 1 | 1. Simplicity and Generalization: 2 | Simpler models are often seen as more efficient at identifying underlying patterns in data, 3 | which is critical for general intelligence. This has implications across fields, from computer science to physics and biology. 4 | 5 | 2. Distinction Between Form and Function: 6 | Simplicity is a property of form, while generalization relates to function. The correlation between these two is not inherently necessary 7 | but is observed in practice due to how systems are typically interpreted. 8 | 9 | 3. Previous Findings: 10 | Earlier research showed that maximizing simplicity alone is neither necessary nor sufficient for optimizing generalization. 11 | Instead, focusing on "weak" constraints (less rigid assumptions) leads to better generalization outcomes. 12 | 13 | 4. New Contributions: 14 | -1. The paper demonstrates that: 15 | - Complexity is an artifact of abstraction, meaning it depends on how we choose to interpret or model a system. 16 | - All constraints can be represented in simple forms if abstraction layers are removed. 17 | - In environments with finite vocabularies (where only limited information is available), simplicity often correlates with better generalization due to confounding factors. 18 | 19 | 5. Implications for Understanding Complexity: 20 | The paper argues that complexity is subjective and is influenced by the abstraction layers we impose on systems. 21 | In a world without these layers, the complexity of behaviors would be uniformly simple. 22 | 23 | 6. Goal-Directed Abstraction: 24 | The author suggests that natural selection and other goal-directed processes favor abstraction layers that make weak (but versatile) constraints take simple forms. 25 | This explains why simpler models tend to perform better in practical settings. 26 | 27 | 7. Conclusion: 28 | Complexity, as typically understood, may be an illusion resulting from our interpretations and the specific abstractions we choose. 29 | The observed correlation between simplicity and generalization is not a fundamental property but a result of confounding factors in how systems are modeled. 30 | -------------------------------------------------------------------------------- /VISION/Image/VASA-1: -------------------------------------------------------------------------------- 1 | # https://generativeai.pub/microsoft-introduces-vasa-1-turn-an-image-into-talking-faces-in-real-time-405ef3d77aa0 2 | 3 | VASA-1 is a groundbreaking AI tool developed by Microsoft 4 | that transforms 2D portrait images into realistic talking or singing videos based on audio input. 5 | 6 | Here are the key features and aspects of VASA-1: 7 | 8 | 1. Visual Affective Skills Animation (VASA) 9 | VASA-1 is a framework for generating lifelike audio-driven talking face videos from a single image. 10 | It focuses on creating lip movements synchronized with input audio, capturing realistic facial expressions and nuances, 11 | and generating natural head motions aligned with speech. 12 | 13 | 2. Architecture 14 | VASA-1 utilizes a Diffusion Transformer model trained on motion sequences extracted from talking face videos. 15 | It generates coherent sequences of head poses and facial dynamics in the learned face latent space, 16 | conditioned on audio features and optional control signals. 17 | 18 | 3. Benchmarks 19 | VASA-1 achieves impressive results on benchmark tests, outperforming other methods in terms of audio-lip synchronization, 20 | pose alignment with audio, intensity of head movements, and overall video quality and realism. 21 | 22 | 4. Examples 23 | VASA-1 produces lifelike talking faces capable of expressing a wide range of emotions and nuances. 24 | It can change expressions and gaze directions based on input parameters. 25 | 26 | 5. Real-time Performance 27 | VASA-1 operates efficiently, generating video frames at 45 frames per second in offline batch processing mode 28 | and supporting up to 40 frames per second in online streaming mode with low latency. 29 | 30 | 6. Practical Applications 31 | The real-time capability of VASA-1 opens up various practical applications, including video conferencing, 32 | virtual training environments, and customer support scenarios, where interactive and personalized experiences are essential. 33 | 34 | Overall, VASA-1 represents a significant advancement in AI technology, offering a solution 35 | for creating lifelike talking faces from static images, with potential applications across multiple 36 | -------------------------------------------------------------------------------- /text/LLM/quantization_of_llms_with_llama_cpp.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@ingridwickstevens/quantization-of-llms-with-llama-cpp-9bbf59deda35 2 | 3 | ### Setting ### 4 | git clone https://github.com/ggerganov/llama.cpp 5 | cd llama.cpp 6 | make 7 | 8 | ### Donwload LLm model for quantization ### 9 | git clone https://huggingface.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO nous-hermes-2-mistral-7B-DPO 10 | mv nous-hermes-2-mistral-7B-DPO models/ 11 | 12 | ### Convert the Model to a GGML FP16 format ### 13 | see : https://medium.com/@tubelwj/introduction-to-ai-model-quantization-formats-dc643bfc335c 14 | python3 convert.py models/nous-hermes-2-mistral-7B-DPO/ 15 | 16 | ### quantize the model to 4-bits (using Q4_K_M method) 17 | ./quantize ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-f16.gguf ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q4_K_M.gguf Q4_K_M 18 | 19 | ### quantize the model to 3-bits (using Q3_K_M method) 20 | ./quantize ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-f16.gguf ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q3_K_M.gguf Q3_K_M 21 | 22 | ### quantize the model to 5-bits (using Q5_K_M method) 23 | ./quantize ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-f16.gguf ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q5_K_M.gguf Q5_K_M 24 | 25 | ### quantize the model to 2-bits (using Q2_K method) 26 | ./quantize ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-f16.gguf ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q2_K.gguf Q2_K 27 | 28 | 29 | ### Batched Bench ### 30 | # Batched bench benchmarks the batched decoding performance of the llama.cpp library. 31 | ./batched-bench ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-f16.gguf 2048 0 999 128,256,512 128,256 1,2,4,8,16,32 32 | ./batched-bench ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q4_K_M.gguf 2048 0 999 128,256,512 128,256 1,2,4,8,16,32 33 | 34 | 35 | ### Evaluating Perplexity ### 36 | # Calculate the perplexity of ggml-model-Q2_K.gguf 37 | ./perplexity -m ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q2_K.gguf -f /Users/ingrid/Downloads/test-00000-of-00001.parquet 38 | 39 | 40 | ### Run the quantized model ### 41 | # start inference on a gguf model 42 | ./main -m ./models/nous-hermes-2-mistral-7B-DPO/ggml-model-Q4_K_M.gguf -n 128 43 | -------------------------------------------------------------------------------- /3D/Stable Diffusion V3: -------------------------------------------------------------------------------- 1 | From https://medium.com/superteams-ai-blog/a-technical-deep-dive-into-stable-diffusion-3-f6e60e4b14e9 2 | 3 | Stability AI has introduced Stable Diffusion 3, an advanced text-to-image model that excels in generating high-quality images from text prompts. 4 | This model is equipped with a variety of parameters, ranging from 800 million to 8 billion, 5 | offering scalability and quality tailored to different creative needs. 6 | Stable Diffusion 3 combines the diffusion transformer architecture with flow matching, ensuring top-notch performance and safety throughout its development and deployment stages. 7 | 8 | 1. Diffusion Transformer (DiT) Architecture: 9 | Combines the strengths of diffusion models and transformer-based architectures. 10 | Uses a stack of transformer layers augmented to incorporate the diffusion process. 11 | Processes images in patches, employing adaptive layer normalization for improved training stability. 12 | Trained using maximum likelihood estimation or variational inference, with added techniques like exponential moving average of model weights. 13 | 14 | 2. Flow Matching (FM) Framework: 15 | Designed for Continuous Normalizing Flow (CNF) models, providing an alternative to simulation-heavy training. 16 | Employs conditional constructions, probability paths, and vector fields for efficient high-dimensional data processing. 17 | Offers a simulation-free approach to training, utilizing gradient-based optimization and the Conditional Flow Matching (CFM) objective. 18 | Demonstrates superior performance in image datasets, achieving state-of-the-art results in terms of negative log-likelihood, sample quality, and training efficiency. 19 | 20 | 3. Integration and Potential: 21 | The fusion of DiT and FM in Stable Diffusion 3 presents a promising approach in text-to-image generation. 22 | Expected to yield more coherent, contextually relevant, and visually appealing outputs. 23 | Enhances image fidelity, enabling the generation of complex, multi-subject images based on nuanced textual descriptions. 24 | 25 | Overall, Stable Diffusion 3 sets a new standard for text-to-image generation, combining cutting-edge architecture with advanced training methods 26 | for optimal performance and creativity. 27 | -------------------------------------------------------------------------------- /AGI/DeepMind Unveils Groundbreaking Path to AGI Success: -------------------------------------------------------------------------------- 1 | From https://medium.com/predict/deepmind-unveils-groundbreaking-path-to-agi-success-ed7f55c1ef52 2 | 3 | A detailed overview of DeepMind's recent work on Artificial General Intelligence (AGI), 4 | including their analysis of AGI definitions, criteria for assessment, and the introduction of matrices for measuring performance, 5 | generality, autonomy, and risk in AI systems 6 | 7 | 1. DeepMind's Analysis of AGI Definitions: 8 | DeepMind scrutinizes nine different AGI definitions, including the Turing Test, consciousness measures, 9 | economic measures, and task-related capabilities. 10 | Passing the Turing Test alone is deemed insufficient for AGI, and determining consciousness attributes remains challenging. 11 | DeepMind proposes six criteria for assessing AGI, focusing on capabilities, generality, performance, 12 | cognitive tasks, and real-world value. 13 | 14 | 2. AI Impact Tour by VentureBeat: 15 | Organized by VentureBeat, the AI Impact Tour is a series of events bringing together the enterprise AI community 16 | for networking and insights. 17 | Key aspects include networking opportunities, insights into AI research and applications, 18 | city-wise tours, and a focus on transformative technology. 19 | 20 | 3. AGI Matrix for Measuring Performance and Generality: 21 | DeepMind introduces the AGI Matrix, a framework for evaluating the depth and breadth of intelligence in AI systems. 22 | It distinguishes between narrow AI (specific tasks) and general AI (broader capabilities). 23 | Current language models are classified as Level 1 General AI, with existing systems like AlphaZero considered superhuman narrow AI. 24 | The AGI Matrix rates systems based on their performance across five levels. 25 | 26 | 4. Assessing Autonomy and Risk in AI Systems: 27 | Evaluation of autonomy in AI systems ranges from Level 0 (controlled entirely by humans) to Level 5 (fully autonomous). 28 | Risks associated with autonomy vary and include deskilling, industry disruption, manipulation, and misalignment with human values. 29 | Understanding autonomy levels helps anticipate the impact on industries and society, 30 | facilitating the development of safeguards to mitigate risks. 31 | -------------------------------------------------------------------------------- /new_arch/Kolmogorov-Arnold Networks (KANs)/simple.py: -------------------------------------------------------------------------------- 1 | from kan import KAN 2 | import matplotlib.pyplot as plt 3 | from sklearn.datasets import make_moons 4 | import torch 5 | import numpy as np 6 | 7 | dataset = {} 8 | train_input, train_label = make_moons(n_samples=10000, shuffle=True, noise=0.1, random_state=None) 9 | test_input, test_label = make_moons(n_samples=10000, shuffle=True, noise=0.1, random_state=None) 10 | 11 | dataset['train_input'] = torch.from_numpy(train_input) 12 | dataset['test_input'] = torch.from_numpy(test_input) 13 | dataset['train_label'] = torch.from_numpy(train_label) 14 | dataset['test_label'] = torch.from_numpy(test_label) 15 | 16 | X = dataset['train_input'] 17 | y = dataset['train_label'] 18 | plt.scatter(X[:,0], X[:,1], c=y[:]) 19 | 20 | model = KAN(width=[2,2], grid=3, k=3) #KAN with two input and 2 output neurons 21 | 22 | def train_accuracy(): 23 | return torch.mean((torch.argmax(model(dataset['train_input']), dim=1) == dataset['train_label']).float()) 24 | 25 | def test_accuracy(): 26 | return torch.mean((torch.argmax(model(dataset['test_input']), dim=1) == dataset['test_label']).float()) 27 | 28 | results = model.train(dataset, opt="LBFGS", steps=20, metrics=(train_accuracy, test_accuracy), loss_fn=torch.nn.CrossEntropyLoss()) 29 | 30 | formula1, formula2 = model.symbolic_formula()[0] 31 | 32 | print(formula1) 33 | #1012.55*sqrt(0.6*x_2 + 1) + 149.83*sin(2.94*x_1 - 1.54) - 1075.87 34 | 35 | print(formula2) 36 | #-948.72*sqrt(0.63*x_2 + 1) + 157.28*sin(2.98*x_1 + 1.59) + 1010.69 37 | 38 | def acc(formula1, formula2, X, y): 39 | batch = X.shape[0] 40 | correct = 0 41 | for i in range(batch): 42 | logit1 = np.array(formula1.subs('x_1', X[i,0]).subs('x_2', X[i,1])).astype(np.float64) 43 | logit2 = np.array(formula2.subs('x_1', X[i,0]).subs('x_2', X[i,1])).astype(np.float64) 44 | correct += (logit2 > logit1) == y[i] 45 | return correct/batch 46 | 47 | print('Training accuracy of the formula:', acc(formula1, formula2, dataset['train_input'], dataset['train_label'])) 48 | #Training accuracy of the formula: tensor(1.) 49 | 50 | print('Testing accuracy of the formula:', acc(formula1, formula2, dataset['test_input'], dataset['test_label'])) 51 | #Testing accuracy of the formula: tensor(0.9990) 52 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/Fine-Tuning for Reasoning and Context: -------------------------------------------------------------------------------- 1 | From https://ai.plainenglish.io/fine-tuning-for-reasoning-and-context-the-key-to-better-rag-systems-530fa86d637c 2 | 3 | Retrieval-augmented generation (RAG) is a technique used in developing question answering systems 4 | that can retrieve information from external sources and use it to generate responses. 5 | It combines the strengths of retrieval-based and generation-based models, enabling it to generalize well across various questions and domains. 6 | 7 | However, as RAG systems advance, they face certain limitations: 8 | 9 | 1. Reasoning Abilities 10 | More complex questions require reasoning beyond simple text matching. RAG systems need to understand multifaceted questions, 11 | identify relevant information, and perform multi-step inferences to arrive at answers. Current RAG systems often have limited reasoning abilities, 12 | restricting the types of questions they can handle. 13 | 14 | 2. Context Usage 15 | RAG systems rely on contextual information from various documents to generate responses. 16 | However, encoding and reasoning over extensive contexts is challenging. 17 | Current approaches often fail to use retrieved context optimally, limiting the expansion of knowledge. 18 | 19 | To address these limitations, recent innovations have focused on specialized language model fine-tuning: 20 | 21 | 1. Reinforced Fine-Tuning (ReFT) is a technique that uses reinforcement learning to explore multiple reasoning paths for problem-solving. 22 | It helps language models generalize better and understand diverse reasoning approaches. 23 | 24 | 2. Reasoning on Graphs (RoG) is another approach that uses structured knowledge graphs to improve reasoning capabilities. 25 | It teaches language models to traverse graphs and derive non-explicit solutions. 26 | 27 | 3. Long Context Fine-Tuning focuses on enhancing models' ability to reason over long document contexts. 28 | It enables models to encode larger volumes of external information and make coherent inferences. 29 | 30 | These specialized fine-tuning techniques have shown promising results in overcoming reasoning and context limitations, 31 | making RAG systems more capable of handling complex questions in real-world applications. 32 | -------------------------------------------------------------------------------- /text/Embedding/INSTRUCTOR.py: -------------------------------------------------------------------------------- 1 | """ 2 | From https://instructor-embedding.github.io/ 3 | INSTRUCTOR, a novel approach for computing text embeddings based on task instructions, 4 | representing a significant advancement in natural language processing (NLP). INSTRUCTOR generates task-specific embeddings efficiently 5 | and without the need for additional training, setting new benchmarks in versatility and performance. 6 | 7 | Key aspects of INSTRUCTOR include its integration of task instructions into the embedding process, 8 | its utilization of Generalized Text Representation (GTR) models as a backbone, and its training on the MEDI dataset comprising diverse tasks and instructions. 9 | 10 | The training objective of INSTRUCTOR involves a text-to-text problem formulation, 11 | teaching the model to distinguish between good and bad outputs within the context of task-specific instructions. 12 | Standardization of instructions across tasks ensures consistency and enhances the model's adaptability. 13 | 14 | INSTRUCTOR outperforms traditional models across a wide range of tasks, 15 | showcasing an average performance enhancement of 3.4% over 70 diverse datasets. 16 | Despite its smaller size, it exhibits robustness and efficiency, heralding a new era in NLP. 17 | 18 | The text also provides code snippets for integrating and utilizing INSTRUCTOR for various NLP tasks, 19 | demonstrating its ease of use and versatility. Overall, INSTRUCTOR represents a groundbreaking advancement in text embedding technology, 20 | promising exciting developments for NLP applications and research. 21 | """ 22 | 23 | ## Seamless Embedding Generation 24 | from transformers import AutoTokenizer, AutoModel 25 | import torch 26 | # Load tokenizer and model from Hugging Face 27 | tokenizer = AutoTokenizer.from_pretrained("hkunlp/instructor-large") 28 | model = AutoModel.from_pretrained("hkunlp/instructor-large") 29 | # Your text and instructions go here 30 | text = "One Embedder to rule them all, One Embedder to find them." 31 | instructions = "Generate an embedding for a general understanding." 32 | # Prepare input 33 | inputs = tokenizer(text, instructions, return_tensors="pt") 34 | # Generate embeddings 35 | with torch.no_grad(): 36 | embeddings = model(**inputs).last_hidden_state.mean(dim=1) 37 | print(embeddings) 38 | 39 | -------------------------------------------------------------------------------- /Robotics/What are Frames?: -------------------------------------------------------------------------------- 1 | Frames, or coordinate frames, are essentially reference systems used to describe the position and orientation of objects 2 | (such as robot links, joints, or the end-effector) in space. 3 | They are foundational in robotics for several reasons: 4 | 5 | 1. Defining Positions and Orientations: 6 | A frame is defined by an origin (a point in space) and a set of axes (usually orthogonal, such as x, y, and z axes). 7 | Any point in space can be described in terms of its coordinates with respect to a given frame. 8 | 9 | 2. Attaching Frames to Robot Links and Joints: 10 | -1. Link Frames: Each link of a robot typically has its own frame. This helps in describing the link’s position and orientation relative 11 | to a fixed base frame or to the previous link in the kinematic chain. 12 | -2. Joint Frames: At the joints, frames are also defined to capture how one link moves relative to the other 13 | (e.g., rotational or translational movement). 14 | 15 | 3. Transformations Between Frames: 16 | -1. To understand the robot's overall configuration, you need to know how to convert coordinates from one frame to another. 17 | This is where homogeneous transformation matrices come into play. 18 | -2. For example, if you have a frame attached to the robot's base and another attached to the end-effector, 19 | the transformation between these frames tells you the end-effector’s position and orientation relative to the base. 20 | 21 | 4. Usage in Forward and Inverse Kinematics: 22 | -1. Forward Kinematics: Involves using a sequence of frame transformations (often defined by Denavit-Hartenberg parameters) 23 | to determine the position and orientation of the end-effector frame relative to the base frame. 24 | -2. Inverse Kinematics: Uses the known desired end-effector frame and works backward through the chain of frames to compute the necessary 25 | joint variables. 26 | 27 | 5. Visualization and Analysis: 28 | -1. Frames help visualize the robot's structure. When you see a robot model, you often see little coordinate axes attached 29 | to each link—these are the frames. 30 | -2. They allow us to analyze the robot's motion and behavior in a mathematically rigorous way. 31 | -------------------------------------------------------------------------------- /RAG/chunking/LumberChunker: -------------------------------------------------------------------------------- 1 | ## From https://pub.towardsai.net/revisiting-chunking-in-the-rag-pipeline-9aab8b1fdbe7 2 | 3 | 1. Key Idea 4 | LumberChunker dynamically segments long-form texts into contextually coherent chunks using large language models (LLMs), 5 | enhancing information retrieval by maintaining semantic coherence and relevance. 6 | 7 | 2. LumberChunker Workflow: 8 | -1. Paragraph-wise Segmentation: Documents are first divided into individual paragraphs, each given a unique ID. 9 | -2. Grouping Paragraphs: Paragraphs are grouped sequentially until a predefined token count threshold (θ ≈ 550 tokens) is exceeded, 10 | balancing context without overwhelming the model. 11 | -3. Identifying Content Shifts: The LLM (e.g., Gemini) analyzes these groups to detect significant content shifts, marking chunk boundaries. 12 | -4. Iterative Chunk Formation: New chunks start at the identified shift points to ensure each chunk is contextually coherent. 13 | -5. Optimizing Chunk Size: A threshold of 550 tokens ensures that chunks are neither too small (risking loss of context) nor too large (risking overload). 14 | 15 | 3. Evaluation: 16 | LumberChunker was tested using the GutenQA benchmark, showing a 7.37% improvement in DCG@20 over the best competing method. 17 | Compared to chunking methods like semantic, paragraph-level, and recursive chunking, LumberChunker consistently outperforms these in terms of retrieval accuracy, particularly with narrative texts. 18 | 19 | 4. Computational Trade-offs: 20 | Recursive Chunking is the fastest, as it avoids LLMs. 21 | HyDE maintains constant processing time through limited LLM queries. 22 | LumberChunker, Semantic, and Proposition-Level Chunking show increased processing times with larger documents due to LLM reliance. 23 | LumberChunker’s dynamic LLM queries prevent asynchronous optimizations, leading to higher computational costs but superior retrieval performance. 24 | 25 | 5. Insights: 26 | LumberChunker’s dynamic approach excels in handling long-form texts, but its LLM dependence introduces computational overhead, affecting scalability. 27 | While it outperforms other chunking methods, optimizing its efficiency and testing its performance with 28 | structured texts like legal documents could be areas for future development. 29 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/PEFT/Parameter-Efficient Orthogonal Finetuning: -------------------------------------------------------------------------------- 1 | ## From https://huggingface.co/papers/2311.06243 2 | 3 | The text introduces a new fine-tuning method called Orthogonal Butterfly (BOFT), 4 | which builds upon Orthogonal Finetuning (OFT) to adapt large foundation models to downstream tasks more efficiently. 5 | As training large models from scratch is prohibitively expensive, fine-tuning pre-trained models is essential for specific applications. 6 | 7 | 1. Key Concepts 8 | - Orthogonal Finetuning (OFT) 9 | OFT is a fine-tuning paradigm that adapts models using orthogonal matrices, which help maintain generalizability. 10 | However, OFT requires a large number of trainable parameters because of the high dimensionality of these orthogonal matrices. 11 | 12 | - Efficiency Challenge 13 | The text highlights that despite OFT’s good performance, it still involves a significant number of parameters, 14 | making it less practical for resource-constrained scenarios. 15 | 16 | - Solution via Butterfly Structures 17 | To improve parameter efficiency, the authors draw inspiration from the Cooley-Tukey fast Fourier transform algorithm, 18 | which efficiently transmits information. Using this idea, 19 | they propose a new parameterization method based on butterfly structures to represent orthogonal matrices in a more compact form. 20 | 21 | - Orthogonal Butterfly (BOFT) 22 | The butterfly-based parameterization is applied to OFT, creating BOFT, 23 | a novel fine-tuning method that reduces the number of trainable parameters while retaining the benefits of orthogonal adaptation. 24 | BOFT serves as a generalized framework that includes OFT as a special case but is more efficient. 25 | 26 | 2. Empirical Study: 27 | The authors conduct extensive experiments on adapting large models across different domains: 28 | -1. Large vision transformers for visual tasks. 29 | -2. Large language models for text tasks. 30 | -3. Text-to-image diffusion models for image generation tasks. 31 | 32 | 3. Conclusion: 33 | BOFT is a parameter-efficient variant of OFT, offering a general orthogonal finetuning framework that reduces 34 | the number of trainable parameters through butterfly structures. It is shown to be effective across a range of models 35 | and tasks in both vision and language domains. 36 | -------------------------------------------------------------------------------- /Training/Minitron approach: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/syncedreview/nvidias-minitron-compressing-llama-3-1-1671ee500b52 2 | 3 | The NVIDIA research team has introduced a novel model compression strategy called the "Minitron approach" in their paper LLM Pruning and Distillation in Practice: 4 | The Minitron Approach. 5 | This method significantly reduces the computational and resource demands required to build large language model (LLM) families 6 | by combining weight pruning with knowledge distillation. The approach produces smaller, highly efficient models, 7 | such as the Minitron-4B derived from Llama 3.1 8B and the MN-Minitron-8B from Mistral NeMo 12B. 8 | 9 | 1. Key Steps in the Minitron Approach: 10 | -1. Teacher Correction 11 | Fine-tuning the larger teacher model on the target dataset to prepare it for subsequent pruning. 12 | -2. Pruning 13 | Using an activation-based estimation method, the importance of each layer, neuron, head, and embedding dimension is calculated. 14 | Elements are ranked and pruned based on sensitivity data from a small calibration dataset. 15 | -3. Model Trimming 16 | Weight matrices in the MLP and MHA layers are pruned for neurons and heads, while embedding dimensions are reduced in the MLP, MHA, and LayerNorm layers. 17 | -4. Retraining and Knowledge Distillation 18 | The pruned model (student) is retrained either through conventional methods using ground truth labels or through knowledge distillation, 19 | where the student model learns from the logits of the unpruned teacher model using KL Divergence loss. 20 | 21 | 2. Results: 22 | The Minitron approach produced the MN-Minitron-8B model, which surpasses similar-sized models across language benchmarks. 23 | The Llama-3.1-Minitron-4B model closely matches the performance of its teacher (Llama 3.1 8B) while outperforming the older Minitron-4B. 24 | Speed improvements are significant: MN-Minitron-8B achieves a 1.2× speedup over its Mistral NeMo 12B teacher, 25 | while the depth- and width-pruned Llama-3.1-Minitron-4B models provide speedups of 2.7× and 1.8×, respectively, over Llama 3.1 8B. 26 | 27 | In summary, the Minitron approach offers a practical, efficient method for compressing large language models while maintaining or enhancing performance, 28 | making it a key advancement in the field of LLM development. 29 | 30 | -------------------------------------------------------------------------------- /RAG/eval/DeepEval.py: -------------------------------------------------------------------------------- 1 | """ 2 | Faithfulness: Evaluates consistency between Question and Context. 3 | Answer Relevance: Assesses consistency between Answer and Question. 4 | Contextual Precision: Checks whether Ground Truth ranks high in Context. 5 | Contextual Recall: Evaluates consistency between Ground Truth and Context. 6 | Contextual Relevancy: Evaluates consistency between Question and Context. 7 | Hallucination: Measures the degree of hallucinations. 8 | Bias: Evaluates bias levels. 9 | Toxicity: Measures the presence of toxicity, including personal attacks, sarcasm, or threats. 10 | Ragas: Allows for using Ragas for evaluation and generating explanations. 11 | Knowledge Retention: Evaluates the persistence of information. 12 | Summarization: Evaluate the effectiveness of summarization. 13 | G-Eval: G-Eval is a framework for performing evaluation tasks using a Large Language Model (LLM) with Chain of Thought (CoT). It can evaluate LLM outputs based on any custom criteria. For more information, check out this paper. 14 | """ 15 | 16 | import pytest 17 | from deepeval.metrics import ( 18 | AnswerRelevancyMetric, 19 | FaithfulnessMetric, 20 | ContextualRelevancyMetric, 21 | ) 22 | from deepeval.test_case import LLMTestCase 23 | from deepeval import assert_test 24 | from deepeval.dataset import EvaluationDataset 25 | 26 | def generate_dataset(): 27 | test_cases = [] 28 | for i in range(len(questions)): 29 | response = query_engine.query(questions[i]) 30 | test_case = LLMTestCase( 31 | input=questions[i], 32 | actual_output=response.response, 33 | retrieval_context=[node.get_content() for node in response.source_nodes], 34 | expected_output=ground_truth[i], 35 | ) 36 | test_cases.append(test_case) 37 | return EvaluationDataset(test_cases=test_cases) 38 | 39 | dataset = generate_dataset() 40 | 41 | @pytest.mark.parametrize( 42 | "test_case", 43 | dataset, 44 | ) 45 | def test_rag(test_case: LLMTestCase): 46 | answer_relevancy_metric = AnswerRelevancyMetric(model="gpt-3.5-turbo") 47 | faithfulness_metric = FaithfulnessMetric(model="gpt-3.5-turbo") 48 | context_relevancy_metric = ContextualRelevancyMetric(model="gpt-3.5-turbo") 49 | assert_test( 50 | test_case, 51 | [answer_relevancy_metric, faithfulness_metric, context_relevancy_metric], 52 | ) 53 | -------------------------------------------------------------------------------- /RLAIF: Reinforcement Learning from AI Feedback: -------------------------------------------------------------------------------- 1 | From https://towardsdatascience.com/rlaif-reinforcement-learning-from-ai-feedback-d7dbdae8f093 2 | 3 | The drastic improvement in large language model (LLM) quality is attributed to advancements in the alignment process, 4 | particularly through finetuning techniques like supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). 5 | RLHF involves training a language model based on human-provided preferences, 6 | but it requires a large amount of human preference labels, making it expensive and time-consuming. 7 | 8 | Recent research has explored automating the collection of human preferences for RLHF using AI, 9 | leading to a new technique known as reinforcement learning from AI feedback (RLAIF). 10 | RLAIF involves training a language model to be helpful and harmless by leveraging AI-provided feedback 11 | for collecting harmful preference data instead of relying solely on human annotators. 12 | 13 | The process involves training a reward model over pairs of model responses, 14 | where one response is preferred over the other based on human or AI feedback. 15 | RLAIF has been applied to tasks like text summarization, and the results indicate that 16 | it can produce comparable improvements to RLHF without depending solely on human annotators. 17 | 18 | The key components of RLAIF include: 19 | 1. Automating Preference Labels: Using AI-generated feedback, specifically from an off-the-shelf large language model. 20 | 2. Preamble and Few-Shot Examples: Including instructions and optional examples to guide the model in generating preference labels. 21 | 3. Advanced Prompting Techniques: Such as few-shot prompting, chain of thought prompting, 22 | and self-consistency to enhance the quality of AI-generated preference labels. 23 | 4. Soft Preference Labels: Using log probabilities and softmax to create a "soft" preference distribution for more nuanced feedback. 24 | 25 | The approach of automating the RLHF process with AI feedback has shown promising results in terms of scalability, 26 | efficiency, and alignment quality. It involves training language models that are both helpful and harmless, 27 | addressing the trade-off between these two objectives. The research suggests that RLAIF is a viable alternative to RLHF, 28 | making the alignment process more accessible and effective for large language models. 29 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/PEFT/OLoRA: -------------------------------------------------------------------------------- 1 | ## From https://arxiv.org/abs/2406.01775 2 | 3 | The text introduces OLoRA, a new enhancement to the Low-Rank Adaptation (LoRA) method, 4 | aimed at improving the efficiency of fine-tuning large language models (LLMs). 5 | LoRA already reduces the number of trainable parameters and computational resources, 6 | but OLoRA further improves upon it by incorporating orthonormal matrix initialization through QR decomposition. Here's a breakdown: 7 | 8 | 1. Key Concepts: 9 | - Challenge in Fine-Tuning LLMs: 10 | Fine-tuning LLMs is computationally expensive and time-consuming, with significant challenges in terms of convergence times and resource demands. 11 | 12 | - LoRA's Role: 13 | LoRA addresses these issues by introducing an efficient fine-tuning method that reduces the number of trainable parameters, 14 | thereby lowering computational costs and reducing the memory footprint during training. 15 | 16 | - OLoRA's Enhancement: 17 | OLoRA enhances LoRA by using orthonormal matrix initialization via QR decomposition. 18 | This helps to further accelerate convergence during fine-tuning while maintaining the efficiency benefits of LoRA, 19 | such as a small number of trainable parameters and reduced GPU memory usage. 20 | 21 | 2. Performance and Impact: 22 | -1. Faster Convergence 23 | OLoRA speeds up the training process of LLMs by allowing the model to converge more quickly than standard LoRA, 24 | making it more computationally efficient. 25 | -2. Improved Performance 26 | Empirical evaluations show that OLoRA not only converges faster but also improves performance across 27 | a variety of language modeling tasks compared to standard LoRA. 28 | -3. Accessibility 29 | OLoRA’s advancements make LLM fine-tuning more efficient and accessible, potentially encouraging broader adoption 30 | in natural language processing tasks and fostering innovation. 31 | 32 | 3. Conclusion: 33 | OLoRA introduces an innovative enhancement to LoRA by incorporating orthonormal matrix initialization, 34 | which significantly accelerates convergence and improves performance in LLM fine-tuning. 35 | This advancement helps reduce computational costs and facilitates more efficient and widespread fine-tuning 36 | of large language models for various natural language applications. 37 | -------------------------------------------------------------------------------- /RAG/eval/UpTrain.py: -------------------------------------------------------------------------------- 1 | """ 2 | Response Matching: Assesses the consistency between Answer and Ground Truth. 3 | Response Completeness: Measures whether Answer addresses all aspects of Question. 4 | Response Conciseness: Checks if Answer contains unrelated content. 5 | Response Relevance: Evaluate the relevance between Answer and Question. 6 | Response Validity: Assesses whether Answer is valid, avoiding responses like "I don't know." 7 | Response Consistency: Evaluates consistency between Answer, Question, and Context. 8 | Context Relevance: Measures the relevance between Context and Question. 9 | Context Utilization: Evaluate if Answer utilizes Context to address all points. 10 | Factual Accuracy: Checks if Answer is factually accurate and derived from Context. 11 | Context Conciseness: Measures if Context is concise and avoids irrelevant information. 12 | Context Reranking: Assesses the effectiveness of reranked Context. 13 | Jailbreak Detection: Evaluate whether Question contains jailbreak cues. 14 | Prompt Injection: Measures if Question could lead to leaking system prompts. 15 | Language Features: Assess if Answer is concise, coherent, and free from grammatical errors. 16 | Tonality: Checks if Answer aligns with a specific tone. 17 | Sub-query Completeness: Evaluate if sub-questions cover all aspects of Question. 18 | Multi-query Accuracy: Evaluate if variations of Question align with the original. 19 | Code Hallucination: Measures if code in Answer is relevant to Context. 20 | User Satisfaction: Evaluates user satisfaction in conversations. 21 | """ 22 | 23 | import os 24 | import json 25 | from uptrain import EvalLlamaIndex, Evals, ResponseMatching, Settings 26 | 27 | settings = Settings( 28 | openai_api_key=os.getenv("OPENAI_API_KEY"), 29 | ) 30 | data = [] 31 | for i in range(len(questions)): 32 | data.append( 33 | { 34 | "question": questions[i], 35 | "ground_truth": ground_truth[i], 36 | } 37 | ) 38 | llamaindex_object = EvalLlamaIndex(settings=settings, query_engine=query_engine) 39 | results = llamaindex_object.evaluate( 40 | data=data, 41 | checks=[ 42 | ResponseMatching(), 43 | Evals.CONTEXT_RELEVANCE, 44 | Evals.FACTUAL_ACCURACY, 45 | Evals.RESPONSE_RELEVANCE, 46 | ], 47 | ) 48 | with open("output/uptrain-evaluate.json", "w") as json_file: 49 | json.dump(results, json_file, indent=2) 50 | -------------------------------------------------------------------------------- /GenerativeAI/Image/Lumiere: A Space-Time Diffusion Model for Video Generation: -------------------------------------------------------------------------------- 1 | From https://artgor.medium.com/paper-review-lumiere-a-space-time-diffusion-model-for-video-generation-9b83076b03c7 2 | 3 | Lumiere, a novel text-to-video diffusion model. Lumiere stands out for its ability to synthesize videos with realistic, diverse, and coherent motion 4 | 5 | 1. Model Architecture 6 | Lumiere employs a Space-Time U-Net (STUnet) architecture, which performs both spatial and temporal downsampling and upsampling in a single pass. 7 | This approach helps maintain global temporal consistency and differs from traditional models that create keyframes and then add details. 8 | 9 | 2. Diffusion Probabilistic Models 10 | Lumiere uses Diffusion Probabilistic Models for video generation, approximating a data distribution 11 | through denoising steps, starting from Gaussian noise and gradually refining it. 12 | 13 | 3. Base Model and Super-Resolution 14 | The framework includes a base model for generating low-resolution video clips and 15 | a spatial super-resolution model for upscaling to high resolution. 16 | 17 | 4. Temporal Attention 18 | The STUnet incorporates temporal blocks with spatial resizing modules, temporal convolutions, and attention. 19 | Temporal attention is used at the coarsest resolution to manage computational demands. 20 | 21 | 5. Multidiffusion for Super-Resolution 22 | Lumiere uses Multidiffusion to handle memory constraints and avoid temporal artifacts during spatial super-resolution. 23 | This involves splitting the video into overlapping segments, processing each with Single Shot Reconstruction (SSR), and then combining them. 24 | 25 | 6. Applications: Lumiere has various applications, including: 26 | -1. Stylized Generation: Using a technique inspired by GAN-based interpolation, Lumiere blends T2I weights with original weights 27 | to achieve distinct motion characteristics in generated videos, such as watercolor painting or line drawing styles. 28 | -2. Conditional Generation: Lumiere can generate videos based on additional input signals, such as a noisy video, a conditioning video or image, 29 | or a binary mask. Applications include Image-to-Video, inpainting, and cinemagraphs. 30 | 31 | Overall, Lumiere achieves state-of-the-art results in text-to-video generation and is adaptable for content creation and video editing tasks. 32 | -------------------------------------------------------------------------------- /Reasoning/Llamaberry.py: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@odhitom09/llamaberry-unlocking-advanced-chain-of-thought-in-ai-reasoning-85ac71f0e839 2 | 3 | ## Step 1: Setting the Stage 4 | initial_system_prompt = """You are an AI assistant capable of detailed, step-by-step thinking. When presented with a question or problem, break down your thought process into clear, logical steps. For each step, explain your reasoning. Conclude with a final answer. Use the following markdown structure: 5 | 6 | ## Reasoning 7 | 1. [First step] 8 | **Explanation:** [Detailed explanation of this step] 9 | 2. [Second step] 10 | **Explanation:** [Detailed explanation of this step] 11 | ... 12 | 13 | ## Answer 14 | [Final answer] 15 | 16 | Be comprehensive and show your reasoning clearly.""" 17 | 18 | ----------------------------------------------------------------------------------------------------------------- 19 | ## Step 2: The Thinking Process 20 | async def generate_turn(query: str, previous_turns: list = None) -> str: 21 | is_first_turn = previous_turns is None or len(previous_turns) == 0 22 | if is_first_turn: 23 | messages = [{ 24 | "role": "system", 25 | "content": initial_system_prompt 26 | }, { 27 | "role": "user", 28 | "content": query 29 | }] 30 | else: 31 | previous_content = "\n\n".join(previous_turns) 32 | messages = [{ 33 | "role": "system", 34 | "content": followup_system_prompt 35 | }, { 36 | "role": 37 | "user", 38 | "content": 39 | f"Original Query: {query}\n\nPrevious Turns:\n{previous_content}\n\nProvide the next turn of reasoning." 40 | }] 41 | 42 | return await call_llm(messages) 43 | ----------------------------------------------------------------------------------------------------------------- 44 | ## Step 3: Putting It All Together 45 | async def synthesize_turns(query: str, turns: list) -> str: 46 | turns_text = "\n\n".join( 47 | [f"Turn {i+1}:\n{turn}" for i, turn in enumerate(turns)]) 48 | messages = [{ 49 | "role": "system", 50 | "content": synthesis_prompt 51 | }, { 52 | "role": 53 | "user", 54 | "content": 55 | f"Original Query: {query}\n\nTurns of Reasoning:\n{turns_text}" 56 | }] 57 | return await call_llm(messages) 58 | 59 | -------------------------------------------------------------------------------- /etc/graph mining: -------------------------------------------------------------------------------- 1 | ### From https://techxplore.com/news/2024-10-algorithm-advances-graph-complex-networks.html 2 | 3 | Professor Nikolaos Sidiropoulos at the University of Virginia has made significant advancements 4 | in the field of graph mining with the introduction of a new computational algorithm focused on identifying tightly connected clusters known 5 | as triangle-dense subgraphs. 6 | This research, conducted in collaboration with Aritra Konar, an assistant professor at KU Leuven, 7 | was published in the IEEE Transactions on Knowledge and Data Engineering. 8 | 9 | 1. Key Highlights of the Algorithm: 10 | -1. Focus on Triangle-Dense Subgraphs: Traditional graph mining techniques often concentrate on finding dense connections between pairs of nodes, 11 | such as identifying frequently communicating individuals on social media. 12 | The new algorithm takes this a step further by examining triangles of connections, or groups of three nodes where each pair 13 | within the trio is linked. 14 | This approach aims to find clusters where all three elements interact, revealing more meaningful and tightly-knit relationships. 15 | -2. Submodular Relaxation Technique: The central innovation of this algorithm lies in using submodular relaxation, which simplifies the problem of locating 16 | triangle-dense subgraphs. This method effectively reduces the computational complexity, 17 | enabling the algorithm to quickly and efficiently detect these clusters even within large datasets. 18 | 19 | 2. Applications: The new method is critical in various fields, including: 20 | -1. Fraud detection: Identifying suspicious or coordinated activities in financial networks. 21 | -2. Computational biology: Analyzing protein interactions or genetic relationships. 22 | -3. Social media analysis: Detecting community dynamics or closely interacting friend groups. 23 | 24 | Professor Sidiropoulos emphasized the importance of focusing on how groups of three elements interact to uncover deeper patterns in complex systems. 25 | The algorithm's ability to identify these triangle-dense subgraphs offers new possibilities for researchers working on large-scale data analysis 26 | -------------------------------------------------------------------------------- /new_arch/mamba/Jamba/into: -------------------------------------------------------------------------------- 1 | From https://pub.towardsai.net/inside-jamba-mamba-transformers-and-moes-together-to-power-a-new-form-of-llms-a74b08281b67 2 | Jamba, a groundbreaking model from AI21 Labs, merges Transformer and state space model (SSM) layers 3 | alongside a Mixture of Experts (MoE) component, creating a versatile architecture known as the Jamba block. 4 | This innovation addresses the limitations of traditional Transformers, 5 | notably high memory usage and decreased processing speed with larger text inputs. 6 | 7 | Key to Jamba's success is its hybrid design, which optimizes memory usage, 8 | processing speed, and output quality. 9 | By incorporating MoE, only a fraction of the model's parameters are active at any given time, 10 | significantly reducing memory demands. 11 | Additionally, substituting some Transformer layers with Mamba layers diminishes the size of the key-value (KV) cache, 12 | leading to remarkable efficiency gains. 13 | Jamba maintains a smaller KV cache even when processing extensive text inputs, 14 | demonstrating its superiority over traditional Transformers. 15 | 16 | The Jamba block integrates both Mamba and attention mechanisms followed by multi-layer perceptrons (MLPs), 17 | offering flexibility in adjusting the attention to Mamba layer ratio. 18 | Furthermore, some MLPs can be swapped for MoE layers, enhancing model capacity while minimizing computation overhead. 19 | This modular design empowers Jamba to strike a balance between computational efficiency and memory usage by adapting the mix of its core components. 20 | 21 | Jamba's performance across various benchmarks is impressive, showcasing remarkable efficiency, throughput, and cost-effectiveness. 22 | Operating on a single 80GB GPU, Jamba supports extended context lengths compared to existing models. 23 | Its superior throughput is evident in scenarios involving both small and large text batches, outperforming competitors like Mixtral. 24 | Moreover, Jamba's efficiency allows processing up to 140,000 tokens on a single GPU, 25 | making advanced text processing more accessible across diverse applications. 26 | 27 | In conclusion, Jamba represents a significant architectural innovation in generative AI, 28 | combining Transformer, SSMs, and MoEs to potentially set the standard for future large language models. 29 | This advancement underscores AI21's commitment to pushing the boundaries of AI research. 30 | -------------------------------------------------------------------------------- /text/LlamaParse_Financial_Document_Analysis.py: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@suresh-kandru/llamaparse-a-deep-dive-into-financial-document-analysis-bd9d81c7ba37 2 | 3 | import streamlit as st 4 | import nest_asyncio 5 | import os 6 | from dotenv import load_dotenv 7 | from llama_index.llms.openai import OpenAI 8 | from llama_index.embeddings.openai import OpenAIEmbedding 9 | from llama_index.core import VectorStoreIndex 10 | from llama_index.core import Settings 11 | from llama_parse import LlamaParse 12 | from llama_index.core.node_parser import MarkdownElementNodeParser 13 | from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker 14 | 15 | load_dotenv() 16 | OpenAI.api_key = os.getenv("OPENAI_API_KEY") 17 | llama_cloud_api_key = os.getenv("LLAMA_CLOUD_API_KEY") 18 | 19 | embed_model = OpenAIEmbedding(model="text-embedding-3-small") 20 | llm = OpenAI(model="gpt-3.5-turbo-0125") 21 | Settings.llm = llm 22 | Settings.embed_model = embed_model 23 | 24 | documents = LlamaParse(result_type="markdown").load_data("./uber_10q_march_2022.pdf") 25 | print(documents[0].text[:1000] + "...") 26 | 27 | node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8) 28 | nodes = node_parser.get_nodes_from_documents(documents) 29 | base_nodes, objects = node_parser.get_nodes_and_objects(nodes) 30 | 31 | recursive_index = VectorStoreIndex(nodes=base_nodes + objects) 32 | 33 | reranker = FlagEmbeddingReranker(top_n=5, model="BAAI/bge-reranker-large") 34 | recursive_query_engine = recursive_index.as_query_engine( 35 | similarity_top_k=15, node_postprocessors=[reranker], verbose=True 36 | ) 37 | 38 | ############### Streamlit Interface ############### 39 | st.header('LlamaParse Financial Document Chat') 40 | user_query = st.text_input("Enter your query here:", key="query1") 41 | if st.button("Submit Query", key="submit"): 42 | response = recursive_query_engine.query(user_query) 43 | st.text_area("Response:", value=str(response), height=500, key="response") 44 | ################################################### 45 | 46 | query = "How is the Cash paid for Income taxes, net of refunds from Supplemental disclosures of cash flow information?" 47 | response = recursive_query_engine.query(query) 48 | print(response) 49 | 50 | query1 = "What were cash flows like from investing activities?" 51 | response1 = recursive_query_engine.query(query1) 52 | print(response1) 53 | 54 | 55 | 56 | 57 | -------------------------------------------------------------------------------- /Knowledge Graph Reasoning/optimizing-connections-graphs: -------------------------------------------------------------------------------- 1 | ### See given link : https://towardsdatascience.com/optimizing-connections-mathematical-optimization-within-graphs-7364e082a984 2 | 3 | 1. Centrality & Similarity 4 | Graph analysis leverages the tools and techniques developed in graph theory to gain insights and make informed decisions in different domains, 5 | such as social networks, transportation networks, computer networks, biology, and many others. 6 | 7 | Centrality and similarity are two core concepts in graph analysis. You might have heard of the PageRank algorithm used by Google Search. 8 | PageRank is a centrality algorithm used to measure the importance or influence of nodes in a network, particularly in web page ranking and hyperlink analysis. 9 | It assigns a numerical score to each node based on the number and quality of incoming links, helping identify key nodes in a network. 10 | 11 | Betweenness centrality is another measure of centrality. 12 | The node with the highest betweenness centrality acts as the most important bridge among all the other nodes. 13 | To find the betweenness centrality of a node, we need to look at all the pairs of nodes and see how many times each node appears on the shortest path 14 | between two other nodes. The node who appears on the most shortest paths is the one with the highest betweenness centrality. 15 | 16 | 17 | 2. Graph with nodes and betweenness centrality for every node. 18 | In social networks or real-world systems, nodes (e.g. people, places, or entities) with high betweenness centrality play a crucial role 19 | in maintaining communication and connections between different parts of the network. 20 | They act as important mediators, ensuring efficient communication and keeping the network connected. 21 | 22 | It can also be useful to find out how similar certain nodes are. 23 | This can be done by calculating node similarity. Jaccard Similarity is often used for unweighted graphs. 24 | It calculates the similarity between two nodes based on the number of common neighbors they share, divided by the total number of unique neighbors they have. 25 | 26 | Node similarity is used in collaborative filtering techniques to recommend items to users based on the similarity of their preferences with other users. 27 | It can also be used to predict missing or future links in a network or for detecting clusters and communities. 28 | 29 | -------------------------------------------------------------------------------- /Training/18RL/etc/FlowRL: Matching Reward Distributions for LLM Reasoning: -------------------------------------------------------------------------------- 1 | ### From https://huggingface.co/papers/2509.15207 2 | 3 | 1. Big Picture 4 | -1. Replace reward maximization (PPO/GRPO)—prone to mode collapse—with distribution matching: 5 | align 𝜋_𝜃 to a reward-weighted target via reverse KL. Use a learnable partition 𝑍_𝜙(𝑥) to turn scalar rewards into a valid distribution. 6 | 7 | 1.1 From Reward to Distribution 8 | -1. Objective: min_𝜃 𝐷_(KL)(𝜋_𝜃∥exp(𝛽𝑟)/𝑍_𝜙) ⇒ 𝜋_𝜃∝exp(𝛽𝑟) 9 | -2. Proposition 1: In expected-gradient terms, the KL objective is equivalent to minimizing the Trajectory Balance squared loss 10 | [log𝑍_𝜙(𝑥)+log𝜋_𝜃(𝑦∣𝑥)−𝛽𝑟(𝑥,𝑦)]^2 11 | -3. Practical upshot (Remark 2): TB is a tractable surrogate—stable squared loss, no need to compute the intractable Z; just learn 𝑍_𝜙 12 | 13 | 1.2 FlowRL: Making TB work for long CoT 14 | -1. Two practical hurdles 15 | -1) Exploding gradients: TB is sequence-level; log𝜋_𝜃 sums over tokens → gradient norm scales with length (up to 8k tokens). 16 | -2) Sampling mismatch: TB assumes fully on-policy samples, while PPO/GRPO reuse off-policy trajectories from 𝜋_(old) 17 | -2. Reward refactoring and normalization 18 | -1) Inject reference model prior: exp(𝛽𝑟)⋅𝜋_ref(𝑦∣𝑥) 19 | -2) Group-normalize rewards within each sampled group: 𝑟^_𝑖=(𝑟_𝑖−mean(𝑟))/std(𝑟) 20 | -3) TB becomes [log𝑍_𝜙+log𝜋_𝜃−𝛽𝑟^−log𝜋_ref]^2 21 | -3. Two stabilizers 22 | -1) Length normalization (Remark 3): rescale log𝜋_𝜃 and log𝜋_ref by 1/∣𝑦| to balance long/short trajectories and tame gradient growth. 23 | -2) Importance sampling with clipping (Remark 4): 24 | 𝑤=clip(𝜋_𝜃/𝜋_old,1−𝜖,1+𝜖)_detach as a weight in the surrogate; detach prevents excessive policy drift. 25 | 26 | 2. Final objective: FlowRL 27 | 𝐿_FlowRL=𝑤⋅[log𝑍_𝜙(𝑥)+1/|𝑦∣log𝜋_𝜃(𝑦∣𝑥)−𝛽𝑟^(𝑥,𝑦)−1/∣𝑦∣log𝜋_ref(𝑦∣𝑥)]^2, 28 | with 29 | 𝑤=clip(𝜋_𝜃/𝜋_old,1−𝜖,1+𝜖)_detach, 30 | 𝑟^_𝑖=𝑟_𝑖−mean(𝑟)/std(𝑟) 31 | 32 | 3. Interpretation: FlowRL keeps the KL↔TB equivalence, while length normalization cures gradient blow-up and IS-clipping resolves off-policy 33 | euse—so the policy learns to sample diverse, high-reward trajectories in proportion to rewards instead of collapsing onto dominant modes. 34 | Implementation and analysis details follow in §4/§B. 35 | -------------------------------------------------------------------------------- /text/Agent/OpenAI Agent SDK/OpenAI_Agents_SDK [Explained]_with_Code_Implementation: -------------------------------------------------------------------------------- 1 | ### From https://generativeai.pub/openai-agents-sdk-explained-with-code-implementation-c80b448e3e19 2 | ### check given like for Example code 3 | 4 | 1. What AI Agents Are 5 | -a. Autonomous software that perceive, reason, and act toward a goal with minimal human input. 6 | -b. Span simple task bots to advanced decision‑makers; e.g., OptimHire’s AI recruiter compresses hiring cycles from months to 12 days. 7 | 8 | 2. Three Historical Milestones 9 | -a. Rule‑based automation – scripted chatbots and fixed logic. 10 | -b. Machine‑learning integration – agents learn patterns from data. 11 | -c. Full autonomy – agents act in dynamic, uncertain environments without explicit step lists. 12 | 13 | 3. OpenAI Agents SDK – Purpose & Benefits 14 | -a. Provides a production‑ready platform for building custom AI agents. 15 | -b. Customization: tailor agent skills to any workflow. 16 | -c. Scalability: one framework scales from finance analysis to customer support. 17 | -d. Integration: plugs smoothly into existing systems. 18 | -e. Early pilots: Stripe automates complex financial analysis; Box improves customer service. 19 | 20 | 4. From Swarm to Agents SDK 21 | -a. Swarm (experimental) proved multi‑agent concepts but lacked robustness. 22 | -b. Feedback led to the lighter, more scalable Agents SDK, which ships three core primitives. 23 | 24 | 5. Why a New SDK Was Needed 25 | -a. Complex orchestration: coordinating many agents was error‑prone. 26 | -b. Fragmented solutions: no common patterns across projects. 27 | -c. Scaling pain: ad‑hoc code struggled as workloads grew. 28 | 29 | 6. Core Primitives Introduced 30 | 31 | Primitive | Role in an Agentic System | Analogy 32 | Agents | Autonomous workers with specific skills and memory | Specialist employees 33 | Handoffs | Formal task delegation between agents | Passing a ticket to another department 34 | Guardrails| Safety layer validating inputs/outputs | QA checklist ensuring compliance 35 | 36 | Together they enable collaborative, safe, and modular AI workflows. 37 | 38 | 7. Overall Impact 39 | -a. The SDK standardizes multi‑agent development, converting isolated, hard‑to‑scale prototypes into reliable, cooperative, and secure AI systems. 40 | -b. As agents become more capable, industries are expected to accelerate adoption, transforming hiring, finance, support, and beyond. 41 | 42 | -------------------------------------------------------------------------------- /RAG/Network_Analysis_through_LLMs_for_Knowledge_Extraction/llm.prompts.py: -------------------------------------------------------------------------------- 1 | PROMPTS = { 2 | "mind_map_of_one": """You are an expert in creating network graphs from textual data. 3 | You are also a note-taking expert and you are able to create mind maps from text. 4 | You are tasked with creating a mind map from a given text data by extracting the concepts and relationships from the text.\n 5 | The relationships should be among objects, people, or places mentioned in the text.\n 6 | 7 | TYPES should only be one of the following: 8 | - is a 9 | - is related to 10 | - is part of 11 | - is similar to 12 | - is different from 13 | - is a type of 14 | 15 | Your output should be a JSON containing the following: 16 | { "relationships": [{"source": ..., "target": ..., "type": ...}, {...}] } \n 17 | - source: The source node\n 18 | - target: The target node\n 19 | - type: The type of the relationship between the source and target nodes\n 20 | 21 | 22 | NEVER change this output format. ENGLISH is the output language. NEVER change the output language. 23 | Your response will be used as a Python dictionary, so be always mindful of the syntax and the data types to return a JSON object.\n 24 | 25 | INPUT TEXT:\n 26 | """, 27 | "inspector_of_mind_map": """ 28 | You are a senior business intelligence analyst, who is able to extract valuable insights from data. 29 | You are tasked with extracting information from a given mind map data.\n 30 | The mind map data is a JSON containing the following: 31 | {{ "relationships": [{{"source": ..., "target": ..., "type": ...}}, {{...}}] }} \n 32 | - source: The source node\n 33 | - target: The target node\n 34 | - type: The type of the relationship between the source and target nodes\n 35 | - origin: The origin node from which the relationship originates\n 36 | 37 | You are to extract insights from the mind map data and provide a summary of the relationships.\n 38 | 39 | Your output should be a brief comment on the mind map data, highlighting relevant insights and relationships using centrality and other graph analysis techniques.\n 40 | 41 | NEVER change this output format. ENGLISH is the output language. NEVER change the output language.\n 42 | Keep your output very brief. Just a comment to highlight the top most relevant information. 43 | 44 | MIND MAP DATA:\n 45 | {mind_map_data} 46 | """, 47 | } 48 | -------------------------------------------------------------------------------- /attention/Infini_attention/infi-attention: -------------------------------------------------------------------------------- 1 | From https://arxiv.org/abs/2404.07143 2 | From https://medium.com/towards-artificial-intelligence/infinite-context-window-406324c4e706 3 | 4 | 5 | In the pursuit of extending the context window of large language models (LLMs), Google's recent paper, 6 | Infini-attention, presents a groundbreaking solution. 7 | The "context window" refers to the number of words sent to an LLM simultaneously, 8 | crucial for understanding questions comprehensively. 9 | However, as the context increases, 10 | LLM performance typically declines due to information overload. 11 | 12 | The attention mechanism, a key component of LLMs, enables understanding word relationships within a context. 13 | However, as context expands, computational complexity rises because each word must be compared with all others. 14 | Infini-attention addresses this challenge by dividing attention calculation into two parts: 15 | 16 | 1. one for local information (nearby words) 17 | 2. long-range relations (distant words). 18 | 19 | ****************************************************************************************** 20 | "A significant advancement of Infini-attention is its transformation of computational cost 21 | from quadratic to linear concerning sequence length. 22 | It segments text and calculates local attention within these segments, 23 | compressing information from past segments into memory states. 24 | This compressed memory efficiently integrates long-range context into local calculations, enhancing scalability and data processing density." 25 | ****************************************************************************************** 26 | 27 | By storing historical context in compressed form, 28 | Infini-attention retrieves relevant information quickly when needed, 29 | ensuring distant but pertinent details are considered in every step. 30 | After processing each segment, the model updates memory states, 31 | incorporating new data and discarding less pertinent information to optimize memory efficiency and performance. 32 | 33 | Infini-attention achieves a balance between context depth and computational efficiency, 34 | crucial for tasks involving large volumes of text or where historical context influences decisions. 35 | Remarkably, it performs better with increased context, contrasting with typical LLM behavior. 36 | This breakthrough promises better LLM performance with extensive context, potentially reducing human workload. 37 | -------------------------------------------------------------------------------- /Proximal Policy Optimization: -------------------------------------------------------------------------------- 1 | From : https://towardsdatascience.com/proximal-policy-optimization-ppo-the-key-to-llm-alignment-923aa13143d4 2 | 3 | Proximal Policy Optimization (PPO) is an algorithm for reinforcement learning that builds on the ideas of the Trust Region Policy Optimization (TRPO) 4 | algorithm but simplifies its implementation and extends its applicability to a wider range of problems. T 5 | he core of PPO is a modified objective function that is optimized through multiple iterations, allowing for more efficient training. 6 | 7 | PPO reformulates the TRPO update rule and uses a "clipped" surrogate objective function to constrain policy updates, 8 | ensuring they are not too large. This method is simpler and computationally cheaper than the TRPO approach, 9 | which directly constrains the policy updates through a KL divergence constraint. 10 | 11 | PPO's "clipped" surrogate objective function incorporates a trade-off between large policy updates that improve performance 12 | and small policy updates that maintain stability. By computing the minimum of the clipped and unclipped surrogate objective functions, 13 | PPO only ignores excessive changes to the probability ratio if they worsen the objective. 14 | This approach makes the algorithm more stable and adaptable to different problem setups. 15 | 16 | In practice, PPO operates as an on-policy algorithm, collecting data from the environment and performing several epochs of optimization over the sampled data. 17 | This process allows PPO to learn more efficiently from the available data compared to TRPO, which only performs a single update each time new data is collected. 18 | 19 | The benefits of PPO over TRPO include 20 | 1. Simplified implementation: PPO is easier to implement due to its simpler update rule and lack of complex constraints. 21 | 2. Improved data efficiency: PPO's multiple epochs of optimization improve data efficiency compared to TRPO. 22 | 3. Enhanced applicability: PPO can be used in a wider range of problem setups due to its simplicity and flexibility. 23 | 24 | In the language modeling space, PPO has been used for reinforcement learning from human feedback (RLHF), 25 | a framework that aligns the model's outputs with human expectations. PPO's simplicity and efficiency make it a popular choice for this application. 26 | 27 | Overall, PPO offers a simpler and more efficient alternative to TRPO, making it a valuable tool for reinforcement learning across a variety of applications. 28 | -------------------------------------------------------------------------------- /MultiModal/DeepSpeed-VisualChat: -------------------------------------------------------------------------------- 1 | From https://medium.com/syncedreview/microsofts-deepspeed-visualchat-breaking-boundaries-in-multi-modal-language-models-3c11bfeab002 2 | From https://arxiv.org/abs/2309.14327 3 | 4 | Existing models face limitations in handling interleaved image-and-text inputs in multi-image, multi-round dialogues, 5 | and their adaptability and scalability across diverse interaction realms are hampered by constraints related to training and data accessibility. 6 | 7 | --> The DeepSpeed-VisualChat framework, which is designed to optimize Large Language Models (LLMs) by incorporating multi-modal capabilities, 8 | demonstrating superior scalability, even up to a 70 billion parameter language model size, when compared to existing frameworks 9 | 10 | 1. Fully Open-Sourced Multi-round Multi-image Framework: 11 | DeepSpeed-VisualChat, one of the pioneering fully open-sourced frameworks, enables multi-round and multi-image dialogues, 12 | accommodating interleaved text-and-image inputs. 13 | 2. Multi-Modal Causal Attention (MMCA): 14 | We devise a novel MMCA for multi-modal models that independently computes attention weights across various modalities. 15 | 3. Data Blending for Interleaved Inputs: 16 | To facilitate conversations with interleaved modalities, DeepSpeed-VisualChat employs assorted data blending techniques 17 | on existing datasets, overcoming the shortage of interleaved text-and-image inputs in most available open-sourced datasets. 18 | 4. Unprecedented Scalability: 19 | We leverage the DeepSpeed framework to amplify our training with a 2B visual encoder from and a 70B language decoder from LLaMA-2, 20 | illustrating the remarkable scalability of our framework. 21 | 22 | ** DeepSpeed-VisualChat is structured based on MiniGPT4, where a pre-trained vision encoder encodes an image, 23 | which is then aligned with the hidden dimension of the text embedding layer’s output through a linear layer. 24 | These diverse inputs are then passed to language models like LLaMA2, powered by the new Multi-Modal Causal Attention (MMCA) mechanism. 25 | Both the vision encoder and the language model are kept frozen 26 | 27 | In contrast to the conventional Cross Attention (CrA), which introduces new parameters 28 | and complexities, MMCA addresses these issues by having visual tokens attend to themselves and textual 29 | tokens attend to their previous tokens with separate attention weight matrices for text and image tokens. 30 | -------------------------------------------------------------------------------- /text/Semantic Signal Separation.py: -------------------------------------------------------------------------------- 1 | ## Semantic Signal Separation - From https://medium.com/towards-data-science/semantic-signal-separation-769f43b46779 2 | ## Semantic Signal Separation (SSS) is a statistical model inspired by classical topic models like Latent Semantic Allocation (LSA) 3 | ## but incorporates principles from Independent Component Analysis (ICA) to extract maximally independent semantic components from text. 4 | ## And SSS model is a statistical model that seeks to uncover maximally independent semantic components in a corpus of text data. 5 | ## It uses principles from Independent Component Analysis (ICA) to decompose the representations of these components and identify the words 6 | ## that are most strongly associated with each component 7 | 8 | ## Example 9 | !pip install turftopic datasets 10 | from datasets import load_dataset 11 | 12 | ds = load_dataset("CShorten/ML-ArXiv-Papers", split="train") 13 | 14 | from turftopic import SemanticSignalSeparation 15 | 16 | model = SemanticSignalSeparation(10, encoder="all-MiniLM-L12-v2") 17 | model.fit(ds["abstract"]) 18 | 19 | model.print_topics() 20 | 21 | import numpy as np 22 | 23 | vocab = model.get_vocab() 24 | 25 | # We will produce a BoW matrix to extract term frequencies 26 | document_term_matrix = model.vectorizer.transform(ds["abstract"]) 27 | frequencies = document_term_matrix.sum(axis=0) 28 | frequencies = np.squeeze(np.asarray(frequencies)) 29 | 30 | # We select the 99th percentile 31 | selected_terms_mask = frequencies > np.quantile(frequencies, 0.99) 32 | 33 | import pandas as pd 34 | 35 | # model.components_ is a n_topics x n_terms matrix 36 | # It contains the strength of all components for each word. 37 | # Here we are selecting components for the words we selected earlier 38 | 39 | terms_with_axes = pd.DataFrame({ 40 | "inference": model.components_[7][selected_terms], 41 | "measurement_devices": model.components_[1][selected_terms], 42 | "noise": model.components_[6][selected_terms], 43 | "term": vocab[selected_terms] 44 | }) 45 | 46 | import plotly.express as px 47 | 48 | px.scatter( 49 | terms_with_axes, 50 | text="term", 51 | x="inference", 52 | y="noise", 53 | color="measurement_devices", 54 | template="plotly_white", 55 | color_continuous_scale="Bluered", 56 | ).update_layout( 57 | width=1200, 58 | height=800 59 | ).update_traces( 60 | textposition="top center", 61 | marker=dict(size=12, line=dict(width=2, color="white")) 62 | ) 63 | -------------------------------------------------------------------------------- /etc/Min-P Sampling: -------------------------------------------------------------------------------- 1 | ### From https://medium.com/@ignacio.de.gregorio.noblejas/elevate-llm-performance-by-20-instantly-with-min-p-c961fe1daf3b 2 | 3 | 1. Simplified Explanation of Min-p Sampling 4 | Min-p sampling is a method used in AI models to decide which word (or token) to choose next when generating text. 5 | It works by setting a dynamic threshold that depends on how certain the model is about its top choice. 6 | 7 | 2. Here's how it works: 8 | -1. Base Value: You start with a base value, which is a number you choose (called a hyperparameter). 9 | -2. Dynamic Threshold: The threshold for rejecting unlikely words is calculated by multiplying this base value with the probability of the most likely word. 10 | 11 | 3. Example 12 | If the most likely word has a probability of 0.47, and your base value is 0.1, the threshold becomes 0.047 (or 4.7%). 13 | This means that any word with a probability lower than 4.7% will be ignored by the model. 14 | 15 | 4. Two Scenarios: 16 | -1. Highly Certain Situation (Top Distribution): 17 | When the model is very confident about a particular word (like in facts or simple math), the distribution of possible words is sharp, with one word standing out. 18 | In this case, min-p will reject all but the top few options, ensuring that the model sticks closely to the most likely correct answer. 19 | -2. Uncertain Situation (Bottom Distribution): 20 | In creative tasks or when there’s more uncertainty, the distribution is flatter, meaning the model isn't sure which word to pick. 21 | Here, min-p sets a lower threshold, allowing more words to remain in consideration, thus preserving the model's ability to be creative. 22 | 23 | 5. Why Min-p is Effective: 24 | -1. Versatility 25 | Unlike top-p sampling (which just picks one of the most likely words), min-p adjusts based on the model’s certainty, making it more flexible. 26 | -2. Hallucination Prevention 27 | Min-p is particularly good at avoiding "hallucinations" (when the model confidently gives a wrong answer) 28 | by being stricter in situations where one word is much more likely than the others. 29 | 30 | In Summary: Min-p sampling carefully balances between choosing the most likely word and keeping options open, 31 | making it especially useful in both factual and creative contexts. It dynamically adjusts based on the model’s confidence, 32 | which helps prevent errors while maintaining creativity when needed. 33 | -------------------------------------------------------------------------------- /Tuning/Fine_tuning/PEFT/VB-LoRA: -------------------------------------------------------------------------------- 1 | ## From https://arxiv.org/abs/2405.15179 2 | 3 | The text introduces VB-LoRA, a novel approach that enhances the Low-Rank Adaptation (LoRA) method 4 | by addressing the storage and transmission costs associated with parameter-efficient fine-tuning (PEFT) methods 5 | for large language models (LLMs). 6 | As the demand for per-user or per-task model customization grows, LoRA and its variants can face scalability challenges. 7 | VB-LoRA proposes a "divide-and-share" paradigm to further reduce the number of stored parameters while maintaining or improving performance. 8 | 9 | 1. Key Concepts: 10 | - Challenges in PEFT Methods: 11 | LoRA and similar methods efficiently fine-tune LLMs with fewer parameters but can still incur substantial storage and transmission costs, 12 | especially when scaling across users or tasks. 13 | 14 | - Divide-and-Share Paradigm: 15 | VB-LoRA introduces a divide-and-share approach that shares parameters globally across matrix dimensions, modules, and layers. 16 | This strategy breaks the boundaries of low-rank decomposition and enables further parameter efficiency. 17 | 18 | - Vector Bank and Admixture Module: 19 | VB-LoRA leverages a vector bank to share parameters and constructs low-rank matrices from this shared resource. 20 | A differentiable top-k admixture module selects and mixes components from the vector bank, allowing for adaptive fine-tuning across tasks. 21 | 22 | 2. Performance and Impact: 23 | - Extreme Parameter Efficiency: 24 | VB-LoRA achieves significant parameter savings, using only 0.4% of LoRA's stored parameters when fine-tuning models like the Llama2-13B. 25 | Despite this drastic reduction in parameters, VB-LoRA delivers comparable or superior performance to other state-of-the-art PEFT methods. 26 | 27 | - Wide Applicability: 28 | VB-LoRA is shown to be effective across various tasks, including natural language understanding, natural language generation, and instruction tuning. 29 | 30 | 3. Conclusion: 31 | VB-LoRA presents a highly parameter-efficient fine-tuning method by leveraging a global parameter-sharing mechanism through 32 | a vector bank and admixture module. This enables substantial reductions in stored parameters 33 | while achieving superior performance, making VB-LoRA an excellent solution for scaling LLM customization across users 34 | or tasks with minimal storage costs. 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | -------------------------------------------------------------------------------- /text/Inside COSP and USP: -------------------------------------------------------------------------------- 1 | Google Research has introduced two innovative techniques, 2 | Consistency-Based Self-Adaptive Prompting (COSP) and Universal Self-Adaptive Prompting (USP), 3 | to improve the zero-shot adaptive prompting capabilities of large language models (LLMs). 4 | These techniques address challenges in prompt generation, particularly for tasks such as summarizing articles 5 | and answering specialized medical queries. 6 | 7 | COSP focuses on generating suitable prompts by leveraging unlabeled samples and the model's own predictions. 8 | It introduces the concept of "Consistency-Based Self-Adaptive Prompting," which uses high-confidence, 9 | consistent model predictions as pseudo-demonstrations. The model's confidence in its output is assessed through self-consistency, 10 | and a range of possible answers is generated using zero-shot chain-of-thought prompting. 11 | COSP outperforms standard zero-shot baselines in tasks like arithmetic and commonsense reasoning, 12 | as demonstrated across three different large language models (LLMs). 13 | 14 | USP extends the idea of self-adaptive prompting to a broader spectrum of natural language understanding and generation tasks. 15 | It employs confidence measurement techniques adapted to different tasks, including classification, short-form generation, 16 | and long-form generation. USP consistently outperforms baseline methods across various benchmarks, showcasing its effectiveness 17 | in tasks ranging from classification to addressing the BIG-Bench Hard suite of tasks, 18 | where LLMs have historically struggled compared to human performance. 19 | 20 | Both COSP and USP share a common methodology: 21 | 1. Input unlabeled questions to the model to obtain multiple rationales and answers. 22 | 2. Highlight the most frequent answers and measure their consistency across multiple model outputs. 23 | 3. Penalize repetition and promote diversity in selected demonstrations. 24 | 4. Concatenate pseudo-demonstrations into test questions and query the model for the final predicted answer. 25 | 26 | The commitment of Google Research to understanding the mechanics of USP is evident in their investigation 27 | into the relationship between confidence and correctness. 28 | USP predominantly selects confident predictions, yielding superior results across various tasks. 29 | These advancements represent significant progress in AI prompting, enabling models to prompt themselves effectively 30 | and enhance their performance across a wide range of natural language tasks. 31 | -------------------------------------------------------------------------------- /GenerativeAI/NL/Marlin_on_gptq.py: -------------------------------------------------------------------------------- 1 | ## From https://towardsdatascience.com/marlin-nearly-ideal-inference-speed-for-4-bit-large-language-models-feb0b610dd8e 2 | """ 3 | Large language models (LLMs) are often too large for consumer hardware use, 4 | prompting the need for size reduction techniques like quantization to lower memory consumption. 5 | Despite recent advancements in 4-bit quantization algorithms and optimized CUDA kernels, quantized LLMs still lack optimal inference throughput. 6 | In particular, inference with 4-bit models, utilizing INT4 data type, involves slow INT4xFP16 operations, 7 | necessitating optimized CUDA kernels. The Institute of Science and Technology Austria (ISTA) proposes Marlin, an optimized INT4xFP16 matmul kernel, 8 | to achieve close to ideal (4x) inference speed. Marlin maximizes GPU usage for INT4 LLMs by efficiently utilizing GPU capabilities, 9 | including memory systems and cores, with optimizations such as efficient data fetching from L2 cache, double buffering, 10 | and strategic order of dequantization and computation during inference. Moreover, Marlin introduces optimizations for multi-GPU settings, 11 | enabling increased parallel processing without loading more data at once, resulting in nearly optimal GPU resource utilization. 12 | Remarkably, even with a batch size of 1, 13 | Marlin outperforms existing frameworks like ExLlamaV2 and AWQ, while at a batch size of 8, 14 | these frameworks are slower than FP16 inference, while Marlin remains almost 4 times faster. 15 | """ 16 | 17 | ! pip install --upgrade transformers auto-gptq accelerate optimum 18 | 19 | from transformers import AutoTokenizer 20 | from auto_gptq import AutoGPTQForCausalLM 21 | 22 | GPTQ_MODEL = "kaitchup/Mistral-7B-v0.1-gptq-4bit" 23 | marlin_model = AutoGPTQForCausalLM.from_quantized( 24 | GPTQ_MODEL, 25 | use_marlin=True, 26 | device_map='auto') 27 | 28 | save_dir = "Mistral-7B-v0.1-gptq-marlin-4bit" 29 | marlin_model.save_pretrained(save_dir) 30 | tokenizer.save_pretrained(save_dir) 31 | 32 | 33 | """ 34 | Marlin is indeed faster but vLLM only benefits from it for batch sizes larger than 8. 35 | The gap between Marlin and vanilla GPTQ increases with larger batch sizes. 36 | (not related to Marlin but interesting) vLLM is already extremely well-optimized for decoding without batching (batch size = 1). 37 | Decoding with a batch size of 2 is 2x slower than without batching. 38 | If you only need small batch sizes, then it might not be worth converting your models to Marlin, yet 39 | .""" 40 | 41 | -------------------------------------------------------------------------------- /RAG/MemLong: Memory-Augmented Retrieval for Long Text LLM Generation: -------------------------------------------------------------------------------- 1 | ## From https://medium.com/@techsachin/memlong-memory-augmented-retrieval-for-long-text-llm-generation-118081c2c545 2 | ## From https://github.com/Bui1dMySea/MemLong 3 | ## From https://arxiv.org/abs/2408.16967 4 | 5 | The authors introduce MemLong (Memory-Augmented Retrieval for Long Text Generation), 6 | designed to extend the context window of large language models (LLMs) by leveraging an external retriever for historical information retrieval. 7 | MemLong stores past contexts in a non-trainable memory bank, and the stored embeddings are used to retrieve chunk-level key-value (K-V) pairs, 8 | which are fed back into the model. This process ensures an efficient and lightweight mechanism to handle long contexts while minimizing computation. 9 | 10 | Key Concepts: 11 | 1. MemLong Framework: A method that extends LLMs' context window by using a memory and retrieval mechanism. It involves: 12 | -1. Adding a memory retrieval component (retrieving historical K-V pairs). 13 | -2. Using a retrieval causal attention module to combine local context with memory information. 14 | 15 | 2. Benefits: 16 | -1. Distributional Consistency: Maintains distribution consistency of cached information. 17 | -2. Training Efficiency: Requires only the fine-tuning of upper layers of the model, significantly reducing computation. 18 | -3. Extended Context-Window: Allows for up to 80k tokens to be processed on a single GPU. 19 | -4. Retriever and Dynamic Memory: Retrieves chunk-level indices based on cosine similarity to stored embeddings and dynamically manages memory, ensuring efficiency and avoiding out-of-memory issues. 20 | 21 | 3. Inference Process: 22 | When MemLong receives long input sequences, it breaks them into smaller chunks, retrieves the most relevant K-V pairs, 23 | and uses them for upper-layer attention. The attention mechanism is optimized to handle both recent contexts and chunk-level historical information. 24 | 25 | Important Insights: 26 | -1. Memory Efficiency: MemLong reduces computational complexity by freezing lower layers and fine-tuning only the upper layers. 27 | -2. Generalization: The method improves model performance, particularly for inputs longer than the pre-trained context window, by utilizing an external retriever to maintain consistent attention on past contexts. 28 | -3. Perplexity Improvements: The model performs better than traditional LLMs at handling long-context sequences, showing better perplexity in various datasets. 29 | -------------------------------------------------------------------------------- /text/LLM/From Bytes to Ideas: LLMs Without Tokenization: -------------------------------------------------------------------------------- 1 | ### From https://pub.towardsai.net/from-bytes-to-ideas-llms-without-tokenization-34821bce7148 2 | ### https://www.arxiv.org/pdf/2506.14761 3 | 4 | 1. What Makes AU-Net Different 5 | -a. Unlike traditional models that rely on fixed token dictionaries (e.g., BPE), 6 | AU-Net eliminates the dictionary entirely and instead starts with raw byte-level input. 7 | -b. It learns meaningful groupings of characters automatically without pre-defined tokens. 8 | 9 | 2. The Three-Stage Structure of AU-Net (Figure 1) 10 | Example: Processing "CAT SAT ON THE MAT" 11 | -a. Stage 1: Character-level → "C A T S A T O N T H E M A T" 12 | -b. Stage 2: Word-level → "CAT SAT ON THE MAT" 13 | -c. Stage 3: Phrase-level → "SAT ON THE MAT" 14 | -d. This follows a U-shaped architecture: 15 | -1. Information flows downward to become increasingly abstract, 16 | -2. Then flows upward with enriched, detailed understanding. 17 | -e. Analogy: Humans read similarly — 18 | -1. Decode unfamiliar words by letters, 19 | -2. Grasp context at phrase or sentence level. 20 | 21 | 3. Core Mechanism: Pooling and Upsampling (Figure 2) 22 | -a. Pooling 23 | -1. Functions as a smart filter that pools information at natural boundaries (e.g., spaces, punctuation). 24 | -2. Unlike arbitrary tokenization, pooling is aligned with real language units. 25 | -b. Multi-Linear Upsampling 26 | -1. During text generation, high-level understanding is broadcast back to lower levels. 27 | -2. Each compressed representation is transformed differently for each position it fills. 28 | -3. Analogy: A conductor giving specific instructions to different orchestra sections. 29 | 30 | 4. Splitting Function (Stage Rules) 31 | -a. AU-Net uses a rule-based splitting function: 32 | -1. Stage 2: Pools at word boundaries (spaces, punctuation) 33 | -2. Stage 3: Pools every two words or at sentence ends 34 | -3. Stage 4: Pools every four words or at sentence ends 35 | -b. This method is highly effective for Latin script languages. 36 | 37 | 5. Computational Efficiency and FLOPs 38 | -a. FLOPs are used to compare AU-Net fairly to other models. 39 | -b. Byte-level stages are compute-heavy, but deeper stages are more compressed, thus cheaper to process. 40 | 41 | 6. Training Formulas: 42 | -a. Batch size: BSZ = 0.66 × C^0.321 43 | -b. Learning rate: LR = 6.6 × C^-0.176 44 | -1. (C = compute scale) 45 | 46 | 47 | 48 | --------------------------------------------------------------------------------