├── README.md
└── colab_llm.ipynb


/README.md:
--------------------------------------------------------------------------------
 1 | # 🧠 colab-llm
 2 | 
 3 | Run **local LLM models on Google Colab** and access them remotely via API — ideal for lightweight, cost-effective development and testing using [Ollama](https://ollama.com/) and [Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/).
 4 | 
 5 | > ✅ Access your Colab-hosted LLM API from anywhere — even inside VS Code using the [ROO Code](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline) extension!
 6 | 
 7 | ---
 8 | 
 9 | ## 🧩 Features
10 | 
11 | - 🔥 Run advanced LLMs (like Qwen, LLaMA3, Mistral, DeepSeek) in Colab using [Ollama](https://ollama.com/)
12 | - 🌐 Expose the model via secure public URL using `cloudflared`
13 | - 🧑‍💻 Integrate with [ROO Code](https://roo.dev) in VS Code for seamless coding assistance
14 | - ✅ Automatically detects and waits for Ollama to be ready before tunneling
15 | - 💡 Simple, professional, and reusable setup
16 | 
17 | ---
18 | 
19 | ## 🛠️ Requirements
20 | 
21 | - A Google Colab account
22 | - A GPU runtime (preferably **T4 High-RAM** or better)
23 | - No installation or cloud account needed for Cloudflare tunneling
24 | 
25 | ---
26 | 
27 | ## 📝 How It Works
28 | 
29 | 1. Installs and launches **Ollama** in the background
30 | 2. Pulls the selected model (e.g., `maryasov/qwen2.5-coder-cline:7b-instruct-q8_0`)
31 | 3. Waits until Ollama is running and responsive
32 | 4. Starts a **Cloudflare tunnel** to expose `http://localhost:11434`
33 | 5. Prints a public `.trycloudflare.com` URL — ready to use
34 | 
35 | ---
36 | 
37 | ### ▶️ Usage Instructions
38 | 
39 | Follow these steps to get your local LLM running in Colab and accessible via public API:
40 | 
41 | 1. **Import the `.ipynb` notebook into your Google Colab**  
42 |    - Open [colab.research.google.com](https://colab.research.google.com) and upload the notebook.
43 | 
44 | 2. **Choose the runtime as `T4 GPU`**  
45 |    - Go to `Runtime > Change runtime type` → select:
46 |      - Hardware accelerator: **GPU**
47 |      - GPU type: **T4**
48 |    - **Note: Colab GPU sessions last up to ~3 hours before disconnecting. Then you can restart it.**
49 | 
50 | 3. **Run all cells**  
51 |    - Click `Runtime > Run all`  
52 |    - Wait for the cells to complete. Model download can take a few minutes.
53 | 
54 | 4. **Verify the API is working in Step 7**  
55 |    - You'll see a generated public `trycloudflare.com` URL
56 |    - The cell will also run a test `curl` request
57 | 
58 | 5. **Click the public link**  
59 |    - You should see the message: **“Ollama is running”**
60 |    - This confirms the API is live and ready to be used from tools like **curl** or **ROO Code in VS Code**
61 | 
62 | ---
63 | 
64 | ## 💡 Use with ROO Code (VS Code Extension)
65 | 
66 | 1. Install [ROO Code extension](https://marketplace.visualstudio.com/items?itemName=RooVeterinaryInc.roo-cline)
67 | 2. Open extension settings
68 | 3. Choose API Provider as **Ollama**
69 | 4. Paste the public URL from Colab (e.g. `https://bold-sky-1234.trycloudflare.com`) **(Do not include `/` at the end of the link)**
70 | 5. Choose your model
71 | 6. Done! You can now prompt your Colab-hosted model from your local VS Code 💬
72 | 
73 | ---
74 | 
75 | ## 🤝 Contributions
76 | 
77 | Feel free to open issues, suggest improvements, or submit pull requests. Let's make local model hosting accessible for everyone!
78 | 


--------------------------------------------------------------------------------
/colab_llm.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "provenance": [],
  7 |       "gpuType": "T4"
  8 |     },
  9 |     "kernelspec": {
 10 |       "name": "python3",
 11 |       "display_name": "Python 3"
 12 |     },
 13 |     "language_info": {
 14 |       "name": "python"
 15 |     },
 16 |     "accelerator": "GPU"
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "code",
 21 |       "source": [
 22 |         "# Model selection\n",
 23 |         "MODEL_NAME = \"maryasov/qwen2.5-coder-cline:7b-instruct-q8_0\"\n",
 24 |         "%env OLLAMA_CONTEXT_LENGTH=16384\n",
 25 |         "%env OLLAMA_HOST=0.0.0.0\n",
 26 |         "%env OLLAMA_KEEP_ALIVE=-1\n"
 27 |       ],
 28 |       "metadata": {
 29 |         "id": "WG0UDdk86dxb"
 30 |       },
 31 |       "execution_count": null,
 32 |       "outputs": []
 33 |     },
 34 |     {
 35 |       "cell_type": "code",
 36 |       "source": [
 37 |         "!apt-get install -y lshw pciutils\n",
 38 |         "!nvcc --version\n",
 39 |         "!nvidia-smi\n",
 40 |         "\n",
 41 |         "from psutil import virtual_memory\n",
 42 |         "ram_gb = virtual_memory().total / 1e9\n",
 43 |         "print(f\"\\n🧠 Available RAM: {ram_gb:.1f} GB\")\n",
 44 |         "print(\"✅ High-RAM runtime!\" if ram_gb >= 20 else \"❌ Not a high-RAM runtime.\")\n"
 45 |       ],
 46 |       "metadata": {
 47 |         "id": "VskPhA1M6h8j"
 48 |       },
 49 |       "execution_count": null,
 50 |       "outputs": []
 51 |     },
 52 |     {
 53 |       "cell_type": "code",
 54 |       "source": [
 55 |         "!curl -fsSL https://ollama.com/install.sh | sh\n"
 56 |       ],
 57 |       "metadata": {
 58 |         "id": "J83WxdLL6k75"
 59 |       },
 60 |       "execution_count": null,
 61 |       "outputs": []
 62 |     },
 63 |     {
 64 |       "cell_type": "code",
 65 |       "source": [
 66 |         "import subprocess\n",
 67 |         "import time\n",
 68 |         "import requests\n",
 69 |         "import threading\n",
 70 |         "\n",
 71 |         "# Start ollama serve in a background thread\n",
 72 |         "def start_ollama():\n",
 73 |         "    subprocess.call(['ollama', 'serve'])\n",
 74 |         "\n",
 75 |         "ollama_thread = threading.Thread(target=start_ollama)\n",
 76 |         "ollama_thread.daemon = True\n",
 77 |         "ollama_thread.start()\n",
 78 |         "\n",
 79 |         "# Pull model (this also verifies Ollama CLI is ready)\n",
 80 |         "!ollama pull {MODEL_NAME}\n",
 81 |         "\n",
 82 |         "# Wait for Ollama HTTP API to be ready\n",
 83 |         "def wait_for_ollama(timeout=60):\n",
 84 |         "    for i in range(timeout):\n",
 85 |         "        try:\n",
 86 |         "            r = requests.get(\"http://localhost:11434\")\n",
 87 |         "            if r.status_code in [200, 404]:\n",
 88 |         "                print(f\"✅ Ollama is up (after {i+1}s).\")\n",
 89 |         "                return\n",
 90 |         "        except requests.exceptions.ConnectionError:\n",
 91 |         "            pass\n",
 92 |         "        print(f\"⏳ Waiting for Ollama to start... {i+1}s\")\n",
 93 |         "        time.sleep(1)\n",
 94 |         "    raise RuntimeError(\"❌ Ollama did not start in time.\")\n",
 95 |         "\n",
 96 |         "wait_for_ollama()\n"
 97 |       ],
 98 |       "metadata": {
 99 |         "id": "4uR5FDlu6nav"
100 |       },
101 |       "execution_count": null,
102 |       "outputs": []
103 |     },
104 |     {
105 |       "cell_type": "code",
106 |       "source": [
107 |         "!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -O cloudflared\n",
108 |         "!chmod +x cloudflared\n"
109 |       ],
110 |       "metadata": {
111 |         "id": "_XUWhodx6pTh"
112 |       },
113 |       "execution_count": null,
114 |       "outputs": []
115 |     },
116 |     {
117 |       "cell_type": "code",
118 |       "source": [
119 |         "import re\n",
120 |         "\n",
121 |         "# Run cloudflared tunnel in background and get the public URL\n",
122 |         "cloudflared_proc = subprocess.Popen(\n",
123 |         "    ['./cloudflared', 'tunnel', '--url', 'http://localhost:11434', '--no-autoupdate'],\n",
124 |         "    stdout=subprocess.PIPE,\n",
125 |         "    stderr=subprocess.STDOUT,\n",
126 |         "    text=True\n",
127 |         ")\n",
128 |         "\n",
129 |         "public_url = None\n",
130 |         "for line in cloudflared_proc.stdout:\n",
131 |         "    print(line.strip())\n",
132 |         "    match = re.search(r'(https://.*\\.trycloudflare\\.com)', line)\n",
133 |         "    if match:\n",
134 |         "        public_url = match.group(1)\n",
135 |         "        break\n",
136 |         "\n",
137 |         "if public_url:\n",
138 |         "    print(f\"\\n✅ Public URL for Ollama:\\n{public_url}\")\n",
139 |         "else:\n",
140 |         "    raise RuntimeError(\"❌ Could not find public Cloudflare URL.\")\n"
141 |       ],
142 |       "metadata": {
143 |         "id": "eVmbIF6b6qsk"
144 |       },
145 |       "execution_count": null,
146 |       "outputs": []
147 |     },
148 |     {
149 |       "cell_type": "code",
150 |       "source": [
151 |         "import json\n",
152 |         "\n",
153 |         "data = {\n",
154 |         "    \"model\": MODEL_NAME,\n",
155 |         "    \"prompt\": \"Question: What is the capital of Japan?\\nAnswer:\",\n",
156 |         "    \"stream\": False\n",
157 |         "}\n",
158 |         "\n",
159 |         "response = requests.post(f\"{public_url}/api/generate\", json=data)\n",
160 |         "print(response.json())\n"
161 |       ],
162 |       "metadata": {
163 |         "id": "b4oAfEAG6sfb"
164 |       },
165 |       "execution_count": null,
166 |       "outputs": []
167 |     }
168 |   ]
169 | }


--------------------------------------------------------------------------------