├── README.md └── colab_llm.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # 🧠 colab-llm 2 | 3 | Run **local LLM models on Google Colab** and access them remotely via API — ideal for lightweight, cost-effective development and testing using [Ollama](https://ollama.com/) and [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/). 4 | 5 | > ✅ Access your Colab-hosted LLM API from anywhere — even inside VS Code using the [ROO Code](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) extension! 6 | 7 | --- 8 | 9 | ## 🧩 Features 10 | 11 | - 🔥 Run advanced LLMs (like Qwen, LLaMA3, Mistral, DeepSeek) in Colab using [Ollama](https://ollama.com/) 12 | - 🌐 Expose the model via secure public URL using `cloudflared` 13 | - 🧑‍💻 Integrate with [ROO Code](https://roo.dev) in VS Code for seamless coding assistance 14 | - ✅ Automatically detects and waits for Ollama to be ready before tunneling 15 | - 💡 Simple, professional, and reusable setup 16 | 17 | --- 18 | 19 | ## 🛠️ Requirements 20 | 21 | - A Google Colab account 22 | - A GPU runtime (preferably **T4 High-RAM** or better) 23 | - No installation or cloud account needed for Cloudflare tunneling 24 | 25 | --- 26 | 27 | ## 📝 How It Works 28 | 29 | 1. Installs and launches **Ollama** in the background 30 | 2. Pulls the selected model (e.g., `maryasov/qwen2.5-coder-cline:7b-instruct-q8_0`) 31 | 3. Waits until Ollama is running and responsive 32 | 4. Starts a **Cloudflare tunnel** to expose `http://localhost:11434` 33 | 5. Prints a public `.trycloudflare.com` URL — ready to use 34 | 35 | --- 36 | 37 | ### ▶️ Usage Instructions 38 | 39 | Follow these steps to get your local LLM running in Colab and accessible via public API: 40 | 41 | 1. **Import the `.ipynb` notebook into your Google Colab** 42 | - Open [colab.research.google.com](https://colab.research.google.com) and upload the notebook. 43 | 44 | 2. **Choose the runtime as `T4 GPU`** 45 | - Go to `Runtime > Change runtime type` → select: 46 | - Hardware accelerator: **GPU** 47 | - GPU type: **T4** 48 | - **Note: Colab GPU sessions last up to ~3 hours before disconnecting. Then you can restart it.** 49 | 50 | 3. **Run all cells** 51 | - Click `Runtime > Run all` 52 | - Wait for the cells to complete. Model download can take a few minutes. 53 | 54 | 4. **Verify the API is working in Step 7** 55 | - You'll see a generated public `trycloudflare.com` URL 56 | - The cell will also run a test `curl` request 57 | 58 | 5. **Click the public link** 59 | - You should see the message: **“Ollama is running”** 60 | - This confirms the API is live and ready to be used from tools like **curl** or **ROO Code in VS Code** 61 | 62 | --- 63 | 64 | ## 💡 Use with ROO Code (VS Code Extension) 65 | 66 | 1. Install [ROO Code extension](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) 67 | 2. Open extension settings 68 | 3. Choose API Provider as **Ollama** 69 | 4. Paste the public URL from Colab (e.g. `https://bold-sky-1234.trycloudflare.com`) **(Do not include `/` at the end of the link)** 70 | 5. Choose your model 71 | 6. Done! You can now prompt your Colab-hosted model from your local VS Code 💬 72 | 73 | --- 74 | 75 | ## 🤝 Contributions 76 | 77 | Feel free to open issues, suggest improvements, or submit pull requests. Let's make local model hosting accessible for everyone! 78 | -------------------------------------------------------------------------------- /colab_llm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "gpuType": "T4" 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "code", 21 | "source": [ 22 | "# Model selection\n", 23 | "MODEL_NAME = \"maryasov/qwen2.5-coder-cline:7b-instruct-q8_0\"\n", 24 | "%env OLLAMA_CONTEXT_LENGTH=16384\n", 25 | "%env OLLAMA_HOST=0.0.0.0\n", 26 | "%env OLLAMA_KEEP_ALIVE=-1\n" 27 | ], 28 | "metadata": { 29 | "id": "WG0UDdk86dxb" 30 | }, 31 | "execution_count": null, 32 | "outputs": [] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "source": [ 37 | "!apt-get install -y lshw pciutils\n", 38 | "!nvcc --version\n", 39 | "!nvidia-smi\n", 40 | "\n", 41 | "from psutil import virtual_memory\n", 42 | "ram_gb = virtual_memory().total / 1e9\n", 43 | "print(f\"\\n🧠 Available RAM: {ram_gb:.1f} GB\")\n", 44 | "print(\"✅ High-RAM runtime!\" if ram_gb >= 20 else \"❌ Not a high-RAM runtime.\")\n" 45 | ], 46 | "metadata": { 47 | "id": "VskPhA1M6h8j" 48 | }, 49 | "execution_count": null, 50 | "outputs": [] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "source": [ 55 | "!curl -fsSL https://ollama.com/install.sh | sh\n" 56 | ], 57 | "metadata": { 58 | "id": "J83WxdLL6k75" 59 | }, 60 | "execution_count": null, 61 | "outputs": [] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "source": [ 66 | "import subprocess\n", 67 | "import time\n", 68 | "import requests\n", 69 | "import threading\n", 70 | "\n", 71 | "# Start ollama serve in a background thread\n", 72 | "def start_ollama():\n", 73 | " subprocess.call(['ollama', 'serve'])\n", 74 | "\n", 75 | "ollama_thread = threading.Thread(target=start_ollama)\n", 76 | "ollama_thread.daemon = True\n", 77 | "ollama_thread.start()\n", 78 | "\n", 79 | "# Pull model (this also verifies Ollama CLI is ready)\n", 80 | "!ollama pull {MODEL_NAME}\n", 81 | "\n", 82 | "# Wait for Ollama HTTP API to be ready\n", 83 | "def wait_for_ollama(timeout=60):\n", 84 | " for i in range(timeout):\n", 85 | " try:\n", 86 | " r = requests.get(\"http://localhost:11434\")\n", 87 | " if r.status_code in [200, 404]:\n", 88 | " print(f\"✅ Ollama is up (after {i+1}s).\")\n", 89 | " return\n", 90 | " except requests.exceptions.ConnectionError:\n", 91 | " pass\n", 92 | " print(f\"⏳ Waiting for Ollama to start... {i+1}s\")\n", 93 | " time.sleep(1)\n", 94 | " raise RuntimeError(\"❌ Ollama did not start in time.\")\n", 95 | "\n", 96 | "wait_for_ollama()\n" 97 | ], 98 | "metadata": { 99 | "id": "4uR5FDlu6nav" 100 | }, 101 | "execution_count": null, 102 | "outputs": [] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "source": [ 107 | "!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared\n", 108 | "!chmod +x cloudflared\n" 109 | ], 110 | "metadata": { 111 | "id": "_XUWhodx6pTh" 112 | }, 113 | "execution_count": null, 114 | "outputs": [] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "source": [ 119 | "import re\n", 120 | "\n", 121 | "# Run cloudflared tunnel in background and get the public URL\n", 122 | "cloudflared_proc = subprocess.Popen(\n", 123 | " ['./cloudflared', 'tunnel', '--url', 'http://localhost:11434', '--no-autoupdate'],\n", 124 | " stdout=subprocess.PIPE,\n", 125 | " stderr=subprocess.STDOUT,\n", 126 | " text=True\n", 127 | ")\n", 128 | "\n", 129 | "public_url = None\n", 130 | "for line in cloudflared_proc.stdout:\n", 131 | " print(line.strip())\n", 132 | " match = re.search(r'(https://.*\\.trycloudflare\\.com)', line)\n", 133 | " if match:\n", 134 | " public_url = match.group(1)\n", 135 | " break\n", 136 | "\n", 137 | "if public_url:\n", 138 | " print(f\"\\n✅ Public URL for Ollama:\\n{public_url}\")\n", 139 | "else:\n", 140 | " raise RuntimeError(\"❌ Could not find public Cloudflare URL.\")\n" 141 | ], 142 | "metadata": { 143 | "id": "eVmbIF6b6qsk" 144 | }, 145 | "execution_count": null, 146 | "outputs": [] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "source": [ 151 | "import json\n", 152 | "\n", 153 | "data = {\n", 154 | " \"model\": MODEL_NAME,\n", 155 | " \"prompt\": \"Question: What is the capital of Japan?\\nAnswer:\",\n", 156 | " \"stream\": False\n", 157 | "}\n", 158 | "\n", 159 | "response = requests.post(f\"{public_url}/api/generate\", json=data)\n", 160 | "print(response.json())\n" 161 | ], 162 | "metadata": { 163 | "id": "b4oAfEAG6sfb" 164 | }, 165 | "execution_count": null, 166 | "outputs": [] 167 | } 168 | ] 169 | } --------------------------------------------------------------------------------