├── LICENSE ├── README.md ├── .gitignore └── demo.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Joshua Sundance Bailey 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # llamacpp-langchain-neuralbeagle-demo 2 | 3 | a small demo repo to show how I got neuralbeagle14-7b running locally on my 8GB GPU 4 | 5 | __See [demo.ipynb](./demo.ipynb)__ 6 | 7 | # Summary 8 | 9 | This notebook will demonstrate the process I went through to run `neuralbeagle14-7b` on my laptop's 8GB GPU in Windows, as seen in [my LinkedIn post from January 27th, 2024](https://www.linkedin.com/posts/jsundance_free-local-private-ai-on-my-laptop-thanks-activity-7157117360728862720-MWxn?utm_source=share&utm_medium=member_desktop). It pulls heavily from [this LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp). 10 | 11 | I was able to use `llama-cpp-python` _without_ my GPU, and it took me a couple installs before it was really loading all of the layers onto the GPU. It was still fast without the GPU, but that's not the point. ;) 12 | 13 | I'm using an NVIDIA RTX A4000 laptop GPU. I will be compiling `llama-cpp-python` instead of using the "usual" `pip install` because I _think_ this is a more reliable method. I will be using the cuBLAS backend, but you can use other backends for AMD or Apple or whatever (more or this later). 14 | 15 | I will also describe an issue I had with a missing dll, and offer troubleshooting advice for that. 16 | 17 | Joshua Bailey #LearningInPublic January 28, 2024 18 | 19 | 20 | # tldr for users with conda and cuda 11.8: 21 | 22 | ```cmd 23 | mkdir neuralbeagle && cd neuralbeagle 24 | conda create -n neuralbeagle python=3.11 25 | conda activate neuralbeagle 26 | 27 | python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 28 | 29 | set CMAKE_ARGS=-DLLAMA_CUBLAS=on 30 | set FORCE_CMAKE=1 31 | python -m pip install -v --upgrade --force-reinstall --no-cache-dir llama-cpp-python 32 | ``` 33 | 34 | # TODO 35 | - [ ] make a todo list 36 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .idea/* 2 | __pycache__ 3 | Dockerfile_* 4 | # old .gitignore plus several others from https://github.com/github/gitignore 5 | # duplicates removed 6 | # https://raw.githubusercontent.com/github/gitignore/main/Global/NotepadPP.gitignore 7 | # https://raw.githubusercontent.com/github/gitignore/main/Global/PuTTY.gitignore 8 | # https://raw.githubusercontent.com/github/gitignore/main/Global/VisualStudioCode.gitignore 9 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Windows.gitignore 10 | # https://raw.githubusercontent.com/github/gitignore/main/Global/VirtualEnv.gitignore 11 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Vagrant.gitignore 12 | # https://raw.githubusercontent.com/github/gitignore/main/Global/MicrosoftOffice.gitignore 13 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Linux.gitignore 14 | # https://raw.githubusercontent.com/github/gitignore/main/community/Python/JupyterNotebooks.gitignore 15 | # https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore 16 | # https://raw.githubusercontent.com/github/gitignore/main/VisualStudio.gitignore 17 | # https://raw.githubusercontent.com/github/gitignore/main/Global/JetBrains.gitignore 18 | # https://raw.githubusercontent.com/github/gitignore/main/CUDA.gitignore 19 | #docs/_build/doctrees 20 | docs/_build/html/* 21 | #docs/_static 22 | #docs/_templates 23 | #docs/*.* 24 | #docs/Makefile 25 | .env 26 | # Application specific files 27 | .DS_Store 28 | .vscode 29 | 30 | # Byte-compiled / optimized / DLL files 31 | __pycache__/ 32 | *.py[cod] 33 | *$py.class 34 | 35 | # C extensions 36 | *.so 37 | 38 | # Distribution / packaging 39 | .Python 40 | build/ 41 | develop-eggs/ 42 | dist/ 43 | downloads/ 44 | eggs/ 45 | .eggs/ 46 | lib/ 47 | lib64/ 48 | parts/ 49 | sdist/ 50 | var/ 51 | wheels/ 52 | *.egg-info/ 53 | .installed.cfg 54 | *.egg 55 | MANIFEST 56 | 57 | # PyInstaller 58 | # Usually these files are written by a python script from a template 59 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 60 | *.manifest 61 | *.spec 62 | 63 | # Installer logs 64 | pip-log.txt 65 | pip-delete-this-directory.txt 66 | 67 | # Unit test / coverage reports 68 | htmlcov/ 69 | .tox/ 70 | .coverage 71 | .coverage.* 72 | .cache 73 | nosetests.xml 74 | coverage.xml 75 | *.cover 76 | .hypothesis/ 77 | 78 | # Translations 79 | *.mo 80 | *.pot 81 | 82 | # Django stuff: 83 | *.log 84 | .static_storage/ 85 | .media/ 86 | local_settings.py 87 | 88 | # Flask stuff: 89 | instance/ 90 | .webassets-cache 91 | 92 | # Scrapy stuff: 93 | .scrapy 94 | 95 | # Sphinx documentation 96 | # docs/_build/ 97 | 98 | # PyBuilder 99 | target/ 100 | 101 | # Jupyter Notebook 102 | .ipynb_checkpoints 103 | 104 | # pyenv 105 | .python-version 106 | 107 | # celery beat schedule file 108 | celerybeat-schedule 109 | 110 | # SageMath parsed files 111 | *.sage.py 112 | 113 | # Environments 114 | .env 115 | .venv 116 | env/ 117 | venv/ 118 | ENV/ 119 | env.bak/ 120 | venv.bak/ 121 | 122 | # Spyder project settings 123 | .spyderproject 124 | .spyproject 125 | 126 | # Rope project settings 127 | .ropeproject 128 | 129 | # mkdocs documentation 130 | /site 131 | 132 | # mypy 133 | .mypy_cache/ 134 | 135 | # ruff 136 | .ruff_cache/ 137 | 138 | # Pip 139 | Pipfile 140 | Pipfile.* 141 | 142 | # app 143 | settings.py 144 | 145 | # https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 146 | # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider 147 | # Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839 148 | 149 | # User-specific stuff 150 | .idea/**/workspace.xml 151 | .idea/**/tasks.xml 152 | .idea/**/usage.statistics.xml 153 | .idea/**/dictionaries 154 | .idea/**/shelf 155 | 156 | # AWS User-specific 157 | .idea/**/aws.xml 158 | 159 | # Generated files 160 | .idea/**/contentModel.xml 161 | 162 | # Sensitive or high-churn files 163 | .idea/**/dataSources/ 164 | .idea/**/dataSources.ids 165 | .idea/**/dataSources.local.xml 166 | .idea/**/sqlDataSources.xml 167 | .idea/**/dynamic.xml 168 | .idea/**/uiDesigner.xml 169 | .idea/**/dbnavigator.xml 170 | 171 | # Gradle 172 | .idea/**/gradle.xml 173 | .idea/**/libraries 174 | 175 | # Gradle and Maven with auto-import 176 | # When using Gradle or Maven with auto-import, you should exclude module files, 177 | # since they will be recreated, and may cause churn. Uncomment if using 178 | # auto-import. 179 | # .idea/artifacts 180 | # .idea/compiler.xml 181 | # .idea/jarRepositories.xml 182 | # .idea/modules.xml 183 | # .idea/*.iml 184 | # .idea/modules 185 | # *.iml 186 | # *.ipr 187 | 188 | # CMake 189 | cmake-build-*/ 190 | 191 | # Mongo Explorer plugin 192 | .idea/**/mongoSettings.xml 193 | 194 | # File-based project format 195 | *.iws 196 | 197 | # IntelliJ 198 | out/ 199 | 200 | # mpeltonen/sbt-idea plugin 201 | .idea_modules/ 202 | 203 | # JIRA plugin 204 | atlassian-ide-plugin.xml 205 | 206 | # Cursive Clojure plugin 207 | .idea/replstate.xml 208 | 209 | # SonarLint plugin 210 | .idea/sonarlint/ 211 | 212 | # Crashlytics plugin (for Android Studio and IntelliJ) 213 | com_crashlytics_export_strings.xml 214 | crashlytics.properties 215 | crashlytics-build.properties 216 | fabric.properties 217 | 218 | # Editor-based Rest Client 219 | .idea/httpRequests 220 | 221 | # Android studio 3.1+ serialized cache file 222 | .idea/caches/build_file_checksums.ser 223 | 224 | # https://github.com/github/gitignore/blob/main/Python.gitignore 225 | # Byte-compiled / optimized / DLL files 226 | 227 | # C extensions 228 | 229 | # Distribution / packaging 230 | share/python-wheels/ 231 | 232 | # PyInstaller 233 | # Usually these files are written by a python script from a template 234 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 235 | 236 | # Installer logs 237 | 238 | # Unit test / coverage reports 239 | .nox/ 240 | *.py,cover 241 | .pytest_cache/ 242 | cover/ 243 | 244 | # Translations 245 | 246 | # Django stuff: 247 | db.sqlite3 248 | db.sqlite3-journal 249 | 250 | # Flask stuff: 251 | 252 | # Scrapy stuff: 253 | 254 | # Sphinx documentation 255 | 256 | # PyBuilder 257 | .pybuilder/ 258 | 259 | # Jupyter Notebook 260 | 261 | # IPython 262 | profile_default/ 263 | ipython_config.py 264 | 265 | # pyenv 266 | # For a library or package, you might want to ignore these files since the backend is 267 | # intended to run in multiple environments; otherwise, check them in: 268 | # .python-version 269 | 270 | # pipenv 271 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 272 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 273 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 274 | # install all needed dependencies. 275 | #Pipfile.lock 276 | 277 | # poetry 278 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 279 | # This is especially recommended for binary packages to ensure reproducibility, and is more 280 | # commonly ignored for libraries. 281 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 282 | #poetry.lock 283 | 284 | # pdm 285 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 286 | #pdm.lock 287 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 288 | # in version control. 289 | # https://pdm.fming.dev/#use-with-ide 290 | .pdm.toml 291 | 292 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 293 | __pypackages__/ 294 | 295 | # Celery stuff 296 | celerybeat.pid 297 | 298 | # SageMath parsed files 299 | 300 | # Environments 301 | 302 | # Spyder project settings 303 | 304 | # Rope project settings 305 | 306 | # mkdocs documentation 307 | 308 | # mypy 309 | .dmypy.json 310 | dmypy.json 311 | 312 | # Pyre type checker 313 | .pyre/ 314 | 315 | # pytype static type analyzer 316 | .pytype/ 317 | 318 | # Cython debug symbols 319 | cython_debug/ 320 | 321 | # PyCharm 322 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 323 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 324 | # and can be added to the global gitignore or merged into this file. For a more nuclear 325 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 326 | #.idea/ 327 | 328 | # https://github.com/github/gitignore/blob/main/VisualStudio.gitignore 329 | ## Ignore Visual Studio temporary files, build results, and 330 | ## files generated by popular Visual Studio add-ons. 331 | ## 332 | ## Get latest from https://github.com/github/gitignore/blob/main/VisualStudio.gitignore 333 | 334 | # User-specific files 335 | *.rsuser 336 | *.suo 337 | *.user 338 | *.userosscache 339 | *.sln.docstates 340 | 341 | # User-specific files (MonoDevelop/Xamarin Studio) 342 | *.userprefs 343 | 344 | # Mono auto generated files 345 | mono_crash.* 346 | 347 | # Build results 348 | [Dd]ebug/ 349 | [Dd]ebugPublic/ 350 | [Rr]elease/ 351 | [Rr]eleases/ 352 | x64/ 353 | x86/ 354 | [Ww][Ii][Nn]32/ 355 | [Aa][Rr][Mm]/ 356 | [Aa][Rr][Mm]64/ 357 | bld/ 358 | [Bb]in/ 359 | [Oo]bj/ 360 | [Ll]og/ 361 | [Ll]ogs/ 362 | 363 | # Visual Studio 2015/2017 cache/options directory 364 | .vs/ 365 | # Uncomment if you have tasks that create the project's static files in wwwroot 366 | #wwwroot/ 367 | 368 | # Visual Studio 2017 auto generated files 369 | Generated\ Files/ 370 | 371 | # MSTest test Results 372 | [Tt]est[Rr]esult*/ 373 | [Bb]uild[Ll]og.* 374 | 375 | # NUnit 376 | *.VisualState.xml 377 | TestResult.xml 378 | nunit-*.xml 379 | 380 | # Build Results of an ATL Project 381 | [Dd]ebugPS/ 382 | [Rr]eleasePS/ 383 | dlldata.c 384 | 385 | # Benchmark Results 386 | BenchmarkDotNet.Artifacts/ 387 | 388 | # .NET Core 389 | project.lock.json 390 | project.fragment.lock.json 391 | artifacts/ 392 | 393 | # ASP.NET Scaffolding 394 | ScaffoldingReadMe.txt 395 | 396 | # StyleCop 397 | StyleCopReport.xml 398 | 399 | # Files built by Visual Studio 400 | *_i.c 401 | *_p.c 402 | *_h.h 403 | *.ilk 404 | *.meta 405 | *.obj 406 | *.iobj 407 | *.pch 408 | *.pdb 409 | *.ipdb 410 | *.pgc 411 | *.pgd 412 | *.rsp 413 | *.sbr 414 | *.tlb 415 | *.tli 416 | *.tlh 417 | *.tmp 418 | *.tmp_proj 419 | *_wpftmp.csproj 420 | *.tlog 421 | *.vspscc 422 | *.vssscc 423 | .builds 424 | *.pidb 425 | *.svclog 426 | *.scc 427 | 428 | # Chutzpah Test files 429 | _Chutzpah* 430 | 431 | # Visual C++ cache files 432 | ipch/ 433 | *.aps 434 | *.ncb 435 | *.opendb 436 | *.opensdf 437 | *.sdf 438 | *.cachefile 439 | *.VC.db 440 | *.VC.VC.opendb 441 | 442 | # Visual Studio profiler 443 | *.psess 444 | *.vsp 445 | *.vspx 446 | *.sap 447 | 448 | # Visual Studio Trace Files 449 | *.e2e 450 | 451 | # TFS 2012 Local Workspace 452 | $tf/ 453 | 454 | # Guidance Automation Toolkit 455 | *.gpState 456 | 457 | # ReSharper is a .NET coding add-in 458 | _ReSharper*/ 459 | *.[Rr]e[Ss]harper 460 | *.DotSettings.user 461 | 462 | # TeamCity is a build add-in 463 | _TeamCity* 464 | 465 | # DotCover is a Code Coverage Tool 466 | *.dotCover 467 | 468 | # AxoCover is a Code Coverage Tool 469 | .axoCover/* 470 | !.axoCover/settings.json 471 | 472 | # Coverlet is a free, cross platform Code Coverage Tool 473 | coverage*.json 474 | coverage*.xml 475 | coverage*.info 476 | 477 | # Visual Studio backend coverage results 478 | *.coverage 479 | *.coveragexml 480 | 481 | # NCrunch 482 | _NCrunch_* 483 | .*crunch*.local.xml 484 | nCrunchTemp_* 485 | 486 | # MightyMoose 487 | *.mm.* 488 | AutoTest.Net/ 489 | 490 | # Web workbench (sass) 491 | .sass-cache/ 492 | 493 | # Installshield output folder 494 | [Ee]xpress/ 495 | 496 | # DocProject is a documentation generator add-in 497 | DocProject/buildhelp/ 498 | DocProject/Help/*.HxT 499 | DocProject/Help/*.HxC 500 | DocProject/Help/*.hhc 501 | DocProject/Help/*.hhk 502 | DocProject/Help/*.hhp 503 | DocProject/Help/Html2 504 | DocProject/Help/html 505 | 506 | # Click-Once directory 507 | publish/ 508 | 509 | # Publish Web Output 510 | *.[Pp]ublish.xml 511 | *.azurePubxml 512 | # Note: Comment the next line if you want to checkin your web deploy settings, 513 | # but database connection strings (with potential passwords) will be unencrypted 514 | *.pubxml 515 | *.publishproj 516 | 517 | # Microsoft Azure Web App publish settings. Comment the next line if you want to 518 | # checkin your Azure Web App publish settings, but sensitive information contained 519 | # in these scripts will be unencrypted 520 | PublishScripts/ 521 | 522 | # NuGet Packages 523 | *.nupkg 524 | # NuGet Symbol Packages 525 | *.snupkg 526 | # The packages folder can be ignored because of Package Restore 527 | **/[Pp]ackages/* 528 | # except build/, which is used as an MSBuild target. 529 | !**/[Pp]ackages/build/ 530 | # Uncomment if necessary however generally it will be regenerated when needed 531 | #!**/[Pp]ackages/repositories.config 532 | # NuGet v3's project.json files produces more ignorable files 533 | *.nuget.props 534 | *.nuget.targets 535 | 536 | # Microsoft Azure Build Output 537 | csx/ 538 | *.build.csdef 539 | 540 | # Microsoft Azure Emulator 541 | ecf/ 542 | rcf/ 543 | 544 | # Windows Store app package directories and files 545 | AppPackages/ 546 | BundleArtifacts/ 547 | Package.StoreAssociation.xml 548 | _pkginfo.txt 549 | *.appx 550 | *.appxbundle 551 | *.appxupload 552 | 553 | # Visual Studio cache files 554 | # files ending in .cache can be ignored 555 | *.[Cc]ache 556 | # but keep track of directories ending in .cache 557 | !?*.[Cc]ache/ 558 | 559 | # Others 560 | ClientBin/ 561 | ~$* 562 | *~ 563 | *.dbmdl 564 | *.dbproj.schemaview 565 | *.jfm 566 | *.pfx 567 | *.publishsettings 568 | orleans.codegen.cs 569 | 570 | # Including strong name files can present a security risk 571 | # (https://github.com/github/gitignore/pull/2483#issue-259490424) 572 | #*.snk 573 | 574 | # Since there are multiple workflows, uncomment next line to ignore bower_components 575 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622) 576 | #bower_components/ 577 | 578 | # RIA/Silverlight projects 579 | Generated_Code/ 580 | 581 | # Backup & report files from converting an old project file 582 | # to a newer Visual Studio version. Backup files are not needed, 583 | # because we have git ;-) 584 | _UpgradeReport_Files/ 585 | Backup*/ 586 | UpgradeLog*.XML 587 | UpgradeLog*.htm 588 | ServiceFabricBackup/ 589 | *.rptproj.bak 590 | 591 | # SQL Server files 592 | *.mdf 593 | *.ldf 594 | *.ndf 595 | 596 | # Business Intelligence projects 597 | *.rdl.data 598 | *.bim.layout 599 | *.bim_*.settings 600 | *.rptproj.rsuser 601 | *- [Bb]ackup.rdl 602 | *- [Bb]ackup ([0-9]).rdl 603 | *- [Bb]ackup ([0-9][0-9]).rdl 604 | 605 | # Microsoft Fakes 606 | FakesAssemblies/ 607 | 608 | # GhostDoc plugin setting file 609 | *.GhostDoc.xml 610 | 611 | # Node.js Tools for Visual Studio 612 | .ntvs_analysis.dat 613 | node_modules/ 614 | 615 | # Visual Studio 6 build log 616 | *.plg 617 | 618 | # Visual Studio 6 workspace options file 619 | *.opt 620 | 621 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.) 622 | *.vbw 623 | 624 | # Visual Studio 6 auto-generated project file (contains which files were open etc.) 625 | *.vbp 626 | 627 | # Visual Studio 6 workspace and project file (working project files containing files to include in project) 628 | *.dsw 629 | *.dsp 630 | 631 | # Visual Studio 6 technical files 632 | 633 | # Visual Studio LightSwitch build output 634 | **/*.HTMLClient/GeneratedArtifacts 635 | **/*.DesktopClient/GeneratedArtifacts 636 | **/*.DesktopClient/ModelManifest.xml 637 | **/*.Server/GeneratedArtifacts 638 | **/*.Server/ModelManifest.xml 639 | _Pvt_Extensions 640 | 641 | # Paket dependency manager 642 | .paket/paket.exe 643 | paket-files/ 644 | 645 | # FAKE - F# Make 646 | .fake/ 647 | 648 | # CodeRush personal settings 649 | .cr/personal 650 | 651 | # Python Tools for Visual Studio (PTVS) 652 | *.pyc 653 | 654 | # Cake - Uncomment if you are using it 655 | # tools/** 656 | # !tools/packages.config 657 | 658 | # Tabs Studio 659 | *.tss 660 | 661 | # Telerik's JustMock configuration file 662 | *.jmconfig 663 | 664 | # BizTalk build output 665 | *.btp.cs 666 | *.btm.cs 667 | *.odx.cs 668 | *.xsd.cs 669 | 670 | # OpenCover UI analysis results 671 | OpenCover/ 672 | 673 | # Azure Stream Analytics local run output 674 | ASALocalRun/ 675 | 676 | # MSBuild Binary and Structured Log 677 | *.binlog 678 | 679 | # NVidia Nsight GPU debugger configuration file 680 | *.nvuser 681 | 682 | # MFractors (Xamarin productivity tool) working folder 683 | .mfractor/ 684 | 685 | # Local History for Visual Studio 686 | .localhistory/ 687 | 688 | # Visual Studio History (VSHistory) files 689 | .vshistory/ 690 | 691 | # BeatPulse healthcheck temp database 692 | healthchecksdb 693 | 694 | # Backup folder for Package Reference Convert tool in Visual Studio 2017 695 | MigrationBackup/ 696 | 697 | # Ionide (cross platform F# VS Code tools) working folder 698 | .ionide/ 699 | 700 | # Fody - auto-generated XML schema 701 | FodyWeavers.xsd 702 | 703 | # VS Code files for those working on multiple tools 704 | .vscode/* 705 | !.vscode/settings.json 706 | !.vscode/tasks.json 707 | !.vscode/launch.json 708 | !.vscode/extensions.json 709 | *.code-workspace 710 | 711 | # Local History for Visual Studio Code 712 | .history/ 713 | 714 | # Windows Installer files from build outputs 715 | *.cab 716 | *.msi 717 | *.msix 718 | *.msm 719 | *.msp 720 | 721 | # JetBrains Rider 722 | *.sln.iml 723 | 724 | 725 | !.vscode/*.code-snippets 726 | $RECYCLE.BIN/ 727 | *.bak 728 | *.cubin 729 | *.fatbin 730 | *.gpu 731 | *.i 732 | *.ii 733 | *.lnk 734 | *.ppk 735 | *.ptx 736 | *.stackdump 737 | *.vsix 738 | *.xlk 739 | *.~vsd* 740 | */.ipynb_checkpoints/* 741 | .Trash-* 742 | .directory 743 | .fuse_hidden* 744 | .nfs* 745 | .vagrant/ 746 | Backup of *.doc* 747 | Thumbs.db 748 | Thumbs.db:encryptable 749 | [Bb]in 750 | [Dd]esktop.ini 751 | [Ii]nclude 752 | [Ll]ib 753 | [Ll]ib64 754 | [Ll]ocal 755 | [Ss]cripts 756 | ehthumbs.db 757 | ehthumbs_vista.db 758 | pip-selfcheck.json 759 | pyvenv.cfg 760 | ~$*.doc* 761 | ~$*.ppt* 762 | ~$*.xls* 763 | docs/_build/ 764 | coverage_html_report/ 765 | -------------------------------------------------------------------------------- /demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "9cff789b-a218-4d39-992e-c423b68bef8a", 6 | "metadata": {}, 7 | "source": [ 8 | "This notebook will demonstrate the process I went through to run `neuralbeagle14-7b` on my laptop's 8GB GPU in Windows, as seen in [my LinkedIn post from January 27th, 2024](https://www.linkedin.com/posts/jsundance_free-local-private-ai-on-my-laptop-thanks-activity-7157117360728862720-MWxn?utm_source=share&utm_medium=member_desktop). It pulls heavily from [this LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp).\n", 9 | "\n", 10 | "I was able to use `llama-cpp-python` _without_ my GPU, and it took me a couple installs before it was really loading all of the layers onto the GPU. It was still fast without the GPU, but that's not the point. ;)\n", 11 | "\n", 12 | "I'm using an NVIDIA RTX A4000 laptop GPU. I will be compiling `llama-cpp-python` instead of using the \"usual\" `pip install` because I _think_ this is a more reliable method. I will be using the cuBLAS backend, but you can use other backends for AMD or Apple or whatever (more or this later).\n", 13 | "\n", 14 | "I will also describe an issue I had with a missing dll, and offer troubleshooting advice for that.\n", 15 | "\n", 16 | "Joshua Bailey #LearningInPublic January 28, 2024" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "id": "14eabf3c-27e7-43dd-8017-f3f8dd4f961a", 22 | "metadata": { 23 | "jp-MarkdownHeadingCollapsed": true 24 | }, 25 | "source": [ 26 | "# Prerequisites (and gotchas)" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "id": "491caa09-5dea-48c1-bb51-2ed2c1f5850f", 32 | "metadata": { 33 | "jp-MarkdownHeadingCollapsed": true 34 | }, 35 | "source": [ 36 | "## NVIDIA stuff" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "id": "c183c5bc-029c-4394-9bdb-783b21f9745f", 42 | "metadata": {}, 43 | "source": [ 44 | "- [NVIDIA driver](https://www.nvidia.com/download/index.aspx)\n", 45 | "- [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit)" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "id": "778907f6-ea8d-404f-83bc-b57784576c39", 51 | "metadata": { 52 | "jp-MarkdownHeadingCollapsed": true 53 | }, 54 | "source": [ 55 | "## Microsoft Visual Studio stuff" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "70d3c244-2c0c-4d48-8b96-25d8c729c471", 61 | "metadata": {}, 62 | "source": [ 63 | "From [the LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp):\n", 64 | "\n", 65 | "- Visual Studio Community (make sure you install this with the following settings)\n", 66 | " - Desktop development with C++\n", 67 | " - Python development\n", 68 | " - Linux embedded development with C++\n", 69 | "\n", 70 | "_side note_: I installed this stuff a while ago along with `cudnn` for ArcGIS deep learning, and I don't think I included the Linux embedded development thing (maybe), which is probably why I had the dll trouble I'll describe later. ;)" 71 | ] 72 | }, 73 | { 74 | "attachments": {}, 75 | "cell_type": "markdown", 76 | "id": "7373ed17-dad5-4530-a1ca-2b95774e1e3f", 77 | "metadata": { 78 | "jp-MarkdownHeadingCollapsed": true 79 | }, 80 | "source": [ 81 | "## Check `nvidia-smi` and `nvcc --version`\n", 82 | "\n", 83 | "If either of these commands don't work, you'll have trouble.\n", 84 | "Install NVIDIA driver and CUDA toolkit." 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 1, 90 | "id": "779f4d54-fd62-4d5f-ae05-f3c8d2b778cc", 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "name": "stdout", 95 | "output_type": "stream", 96 | "text": [ 97 | "Sun Jan 28 17:29:46 2024 \n", 98 | "+---------------------------------------------------------------------------------------+\n", 99 | "| NVIDIA-SMI 537.79 Driver Version: 537.79 CUDA Version: 12.2 |\n", 100 | "|-----------------------------------------+----------------------+----------------------+\n", 101 | "| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", 102 | "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", 103 | "| | | MIG M. |\n", 104 | "|=========================================+======================+======================|\n", 105 | "| 0 NVIDIA RTX A4000 Laptop GPU WDDM | 00000000:01:00.0 Off | N/A |\n", 106 | "| N/A 55C P8 16W / 110W | 0MiB / 8192MiB | 0% Default |\n", 107 | "| | | N/A |\n", 108 | "+-----------------------------------------+----------------------+----------------------+\n", 109 | " \n", 110 | "+---------------------------------------------------------------------------------------+\n", 111 | "| Processes: |\n", 112 | "| GPU GI CI PID Type Process name GPU Memory |\n", 113 | "| ID ID Usage |\n", 114 | "|=======================================================================================|\n", 115 | "| No running processes found |\n", 116 | "+---------------------------------------------------------------------------------------+\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "!nvidia-smi" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 2, 127 | "id": "28c6657c-79c6-4335-868b-d00e64bf3a2a", 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "nvcc: NVIDIA (R) Cuda compiler driver\n", 135 | "Copyright (c) 2005-2022 NVIDIA Corporation\n", 136 | "Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022\n", 137 | "Cuda compilation tools, release 11.8, V11.8.89\n", 138 | "Build cuda_11.8.r11.8/compiler.31833905_0\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "!nvcc --version" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "id": "be89ed97-c1e3-4923-a7a6-16ebd9ff8f17", 149 | "metadata": { 150 | "jp-MarkdownHeadingCollapsed": true 151 | }, 152 | "source": [ 153 | "## Python stuff" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "id": "082553d6-a620-46f0-99a2-690684436feb", 159 | "metadata": {}, 160 | "source": [ 161 | "First of all, I highly recommend using an environment management tool like `conda` to manage your environments-- and never tinker around in the base environment. That way, when your package versions get messed up or whatever, you can just start fresh. ;)\n", 162 | "\n", 163 | "A common approach is to install [anaconda](https://anaconda.org/). \n", 164 | "\n", 165 | "Assuming you have `conda`, create a new Python environment. At the time of writing, version constraints meant that Python 3.12 was not supported, so:\n", 166 | "\n", 167 | "```\n", 168 | "conda create -n llama-cpp-python python=3.11\n", 169 | "conda activate llama-cpp-python\n", 170 | "```" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "id": "10c0b462-50b2-4803-8bd2-8f236531a4d2", 176 | "metadata": { 177 | "jp-MarkdownHeadingCollapsed": true 178 | }, 179 | "source": [ 180 | "### `torch`" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "id": "c5c21f5e-1787-4159-8bec-03e2e449b0a2", 186 | "metadata": { 187 | "jp-MarkdownHeadingCollapsed": true 188 | }, 189 | "source": [ 190 | "You have to install `torch` before installing `llama-cpp-python`. I think if you just `pip install torch` then you get the cpu-only version.\n", 191 | "\n", 192 | "So assuming you'll be using CUDA 11.8, based on [the pytorch documentation](https://pytorch.org/get-started/locally/), run:\n", 193 | "\n", 194 | "```\n", 195 | "pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n", 196 | "```\n", 197 | "\n", 198 | "(12.1 is available at `/whl/cu121` but 12.2 is apparently not supported yet)" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 3, 204 | "id": "0aa50379-f0a4-4a7d-b4a0-88714cd2d837", 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "import torch\n", 209 | "\n", 210 | "if not torch.cuda.is_available():\n", 211 | " raise RuntimeError()" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "id": "e9a32f5a-365d-4aa4-89b5-90c0a4cf7050", 217 | "metadata": { 218 | "jp-MarkdownHeadingCollapsed": true 219 | }, 220 | "source": [ 221 | "# Compiling and installing `llama-cpp-python`" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "id": "36306695-bbc3-4b4d-a1d2-a7973b5a3eb1", 227 | "metadata": {}, 228 | "source": [ 229 | "```markdown\n", 230 | "There are different options on how to install the llama-cpp package:\n", 231 | "\n", 232 | "- CPU usage\n", 233 | "- CPU + GPU (using one of many BLAS backends)\n", 234 | "- Metal GPU (MacOS with Apple Silicon Chip)\n", 235 | "```\n", 236 | "(from https://python.langchain.com/docs/integrations/llms/llamacpp)\n", 237 | "\n", 238 | "In Windows with CUBLAS:\n", 239 | "\n", 240 | "```\n", 241 | "set CMAKE_ARGS=-DLLAMA_CUBLAS=on\n", 242 | "set FORCE_CMAKE=1\n", 243 | "pip install -v llama-cpp-python\n", 244 | "# or if you've already installed it and need to try again (or you just wanna be extra careful I guess?):\n", 245 | "# pip install -v --upgrade --force-reinstall --no-cache-dir llama-cpp-python\n", 246 | "```" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "id": "b3c57974-3129-4c49-b06a-362ee34e153d", 252 | "metadata": { 253 | "jp-MarkdownHeadingCollapsed": true 254 | }, 255 | "source": [ 256 | "## `No CUDA toolset found`? Check for `Nvda.Build.CudaTasks.v*.*.dll`" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "id": "26fdb49b-775f-4451-8152-16d257aa8b06", 262 | "metadata": {}, 263 | "source": [ 264 | "If you try to compile `llama-cpp-python` and get an error message like\n", 265 | "```text\n", 266 | "[...]\n", 267 | "CMake Error at [...]\n", 268 | "No CUDA toolset found.\n", 269 | "[...]\n", 270 | "*** CMake configuration failed.\n", 271 | "[end of output]\n", 272 | "```\n", 273 | "\n", 274 | "Then take a look at [this GitHub issue comment](https://github.com/NVlabs/tiny-cuda-nn/issues/164#issuecomment-1280749170) and possibly use the following code to help find your problem.\n", 275 | "\n", 276 | "If everything's good, the code below will print stuff and not raise any errors." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 4, 282 | "id": "82432156-34c1-4a6c-a8ec-cb3bdbe02603", 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "Found files:\n", 290 | "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.4\\extras\\visual_studio_integration\\MSBuildExtensions\\Nvda.Build.CudaTasks.v11.4.dll\n", 291 | "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\extras\\visual_studio_integration\\MSBuildExtensions\\Nvda.Build.CudaTasks.v11.8.dll \n", 292 | "\n", 293 | "Highest version: 11.8\n", 294 | "\n", 295 | "Build customizations dir: C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\n", 296 | "\n", 297 | "Checking for files:\n", 298 | "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.xml\n", 299 | "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.props\n", 300 | "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\Nvda.Build.CudaTasks.v11.8.dll\n", 301 | "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.targets \n", 302 | "\n", 303 | "All good\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "import os\n", 309 | "\n", 310 | "from glob import glob\n", 311 | "import re\n", 312 | "\n", 313 | "def version_files(version: str) -> set[str]:\n", 314 | " return {\n", 315 | " f\"CUDA {version}.props\",\n", 316 | " f\"CUDA {version}.targets\",\n", 317 | " f\"CUDA {version}.xml\",\n", 318 | " f\"Nvda.Build.CudaTasks.v{version}.dll\",\n", 319 | " }\n", 320 | "\n", 321 | "dll_pat = re.compile(r\"^Nvda.Build.CudaTasks.v(?P\\d{2})\\.(?P\\d)\\.dll$\")\n", 322 | "\n", 323 | "nvidia_cuda_glob = f\"C:\\\\Program Files\\\\NVIDIA GPU Computing Toolkit\\\\CUDA\\\\**\\\\extras\\\\visual_studio_integration\\\\MSBuildExtensions\\\\Nvda.Build.CudaTasks.v*.*.dll\"\n", 324 | "nvidia_cuda_files = glob(nvidia_cuda_glob, recursive=True)\n", 325 | "print(f\"Found files:\")\n", 326 | "print(\"\\n\".join(nvidia_cuda_files), \"\\n\")\n", 327 | "\n", 328 | "if not nvidia_cuda_files:\n", 329 | " raise RuntimeError()\n", 330 | "\n", 331 | "basenames = (os.path.basename(f) for f in nvidia_cuda_files)\n", 332 | "matches = (dll_pat.match(bn) for bn in basenames)\n", 333 | "groups = (match.groupdict() for match in matches if match)\n", 334 | "sorted_versions = sorted(groups, key=lambda x: (int(x['major']), int(x['minor'])))\n", 335 | "highest_version = sorted_versions[-1]\n", 336 | "highest_str = highest_version['major'] + '.' + highest_version['minor']\n", 337 | "highest_files = version_files(highest_str)\n", 338 | "\n", 339 | "print(f\"Highest version: {highest_str}\\n\")\n", 340 | "\n", 341 | "bc_dirs = glob(\"C:\\\\Program Files (x86)\\\\Microsoft Visual Studio\\\\*\\\\BuildTools\\\\MSBuild\\\\Microsoft\\\\VC\\\\v*\\\\BuildCustomizations\", recursive=True)\n", 342 | "if len(bc_dirs) != 1:\n", 343 | " print(\"Only expected to find one directory lol\")\n", 344 | " print(bc_dirs)\n", 345 | " raise RuntimeError()\n", 346 | "bc_dir = bc_dirs[0]\n", 347 | "\n", 348 | "print(f\"Build customizations dir: {bc_dir}\\n\")\n", 349 | "\n", 350 | "expected_files = [os.path.join(bc_dir, file) for file in highest_files]\n", 351 | "print(\"Checking for files:\")\n", 352 | "print(\"\\n\".join(expected_files), \"\\n\")\n", 353 | "\n", 354 | "for file in highest_files:\n", 355 | " expected_file = os.path.join(bc_dir, file)\n", 356 | " if not os.path.exists(expected_file):\n", 357 | " raise FileNotFoundError(expected_file)\n", 358 | "\n", 359 | "print(\"All good\")" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "id": "25efc524-b3a5-4884-ba9d-067e570c8046", 365 | "metadata": { 366 | "jp-MarkdownHeadingCollapsed": true 367 | }, 368 | "source": [ 369 | "# Using `neuralbeagle14-7b` in `langchain`" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "id": "5d6562ca-2771-4514-aab9-ed0079a6e99b", 375 | "metadata": { 376 | "jp-MarkdownHeadingCollapsed": true 377 | }, 378 | "source": [ 379 | "## Enable LangSmith logging (optional)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "id": "a79df914-5ae5-439b-b73d-8e257b827330", 385 | "metadata": {}, 386 | "source": [ 387 | "`.env`:\n", 388 | "```\n", 389 | "LANGCHAIN_API_KEY=ls__...\n", 390 | "LANGCHAIN_ENDPOINT=https://api.smith.langchain.com\n", 391 | "LANGCHAIN_TRACING_V2=true\n", 392 | "LANGCHAIN_PROJECT=\"neuralbeagle-demo\"\n", 393 | "```" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 5, 399 | "id": "49da49db-f359-4703-99ae-909fd091f1dc", 400 | "metadata": {}, 401 | "outputs": [ 402 | { 403 | "data": { 404 | "text/plain": [ 405 | "['LANGCHAIN_API_KEY',\n", 406 | " 'LANGCHAIN_ENDPOINT',\n", 407 | " 'LANGCHAIN_TRACING_V2',\n", 408 | " 'LANGCHAIN_PROJECT']" 409 | ] 410 | }, 411 | "execution_count": 5, 412 | "metadata": {}, 413 | "output_type": "execute_result" 414 | } 415 | ], 416 | "source": [ 417 | "# for langsmith logging\n", 418 | "from dotenv import load_dotenv\n", 419 | "load_dotenv()\n", 420 | "\n", 421 | "[k for k in os.environ.keys() if 'langchain' in k.lower()]" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "id": "ddb544e4-de85-4dc4-8c07-01a4f26ffccc", 427 | "metadata": { 428 | "jp-MarkdownHeadingCollapsed": true 429 | }, 430 | "source": [ 431 | "## Call the model using `langchain_community.llms.LlamaCpp`" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 6, 437 | "id": "d37ffd44-4e4c-4979-882b-3a1777fd702a", 438 | "metadata": {}, 439 | "outputs": [], 440 | "source": [ 441 | "from langchain.callbacks.manager import CallbackManager\n", 442 | "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n", 443 | "from langchain.chains import LLMChain\n", 444 | "from langchain.prompts import PromptTemplate\n", 445 | "from langchain_community.llms import LlamaCpp" 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 7, 451 | "id": "013cc096-20b9-40a9-a4a4-c18dbff8ced1", 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "name": "stdout", 456 | "output_type": "stream", 457 | "text": [ 458 | "CPU times: total: 2.8 s\n", 459 | "Wall time: 2.82 s\n" 460 | ] 461 | }, 462 | { 463 | "name": "stderr", 464 | "output_type": "stream", 465 | "text": [ 466 | "AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | \n", 467 | "Model metadata: {'general.name': 'mlabonne_neuralbeagle14-7b', 'general.architecture': 'llama', 'llama.context_length': '32768', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '17', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '10000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.padding_token_id': '2', 'tokenizer.chat_template': \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}\"}\n" 468 | ] 469 | } 470 | ], 471 | "source": [ 472 | "%%time\n", 473 | "\n", 474 | "model_path = r\"C:\\users\\joshua.bailey\\downloads\\neuralbeagle14-7b.Q5_K_M.gguf\"\n", 475 | "\n", 476 | "llm_kwargs = {\n", 477 | " \"temperature\": 0.75,\n", 478 | " \"max_tokens\": 5000,\n", 479 | " \"top_p\": 1,\n", 480 | " # The following settings are from https://python.langchain.com/docs/integrations/llms/llamacpp\n", 481 | " # These settings used about 5.7GB GPU RAM on my system\n", 482 | " \"n_gpu_layers\": 40, # Change this value based on your model and your GPU VRAM pool.\n", 483 | " \"n_batch\": 512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.\n", 484 | " # Callbacks support token-wise streaming\n", 485 | " \"callback_manager\": CallbackManager([StreamingStdOutCallbackHandler()]),\n", 486 | " # Verbose is required to pass to the callback manager\n", 487 | " \"verbose\": True,\n", 488 | "}\n", 489 | "\n", 490 | "if not os.path.exists(model_path):\n", 491 | " raise FileNotFoundError(model_path)\n", 492 | "\n", 493 | "llm = LlamaCpp(\n", 494 | " model_path=model_path,\n", 495 | " **llm_kwargs,\n", 496 | ")" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": 8, 502 | "id": "4a109420-cc1b-4187-bc36-ac69f82f7670", 503 | "metadata": {}, 504 | "outputs": [], 505 | "source": [ 506 | "template = \"\"\"Question: {question}\n", 507 | "\n", 508 | "Answer: Let's work this out in a step by step way to be sure we have the right answer.\"\"\"\n", 509 | "\n", 510 | "llm_chain = PromptTemplate(template=template, input_variables=[\"question\"]) | llm" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 9, 516 | "id": "bc0ce5dd-4f11-4239-91c8-d676978ef77b", 517 | "metadata": {}, 518 | "outputs": [], 519 | "source": [ 520 | "question = \"What are Scrub-Jays? Should a biologist expect to find them in Virginia?\"" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": 10, 526 | "id": "43ae770c-a826-468a-8dcb-be9e6641217c", 527 | "metadata": {}, 528 | "outputs": [ 529 | { 530 | "name": "stdout", 531 | "output_type": "stream", 532 | "text": [ 533 | " First, let's consider what a Scrub-Jay is. There are two main types of Scrub-Jays found in North America: Western Scrub-Jay and the Eastern Scrub-Jay (also known as Blue Jay). The latter is also called the Florida Scrub-Jay because it has a limited range compared to the former, mostly restricted to peninsular Florida. This means that the Eastern or Florida Scrub-Jay would not be found in Virginia since its range does not extend there. On the other hand, the Western Scrub-Jay is widely distributed across the western United States and parts of Mexico. It's range includes California, Arizona, Nevada, Utah, Colorado, Oregon and Washington.\n", 534 | "\n", 535 | "However, Virginia is located on the east coast of the USA and is in close proximity to the Atlantic Ocean, which is outside the distribution range of Western Scrub-Jays. Nonetheless, there is a third kind of JAY that can be found in Virginia, and it's known as the Blue Jay (Cyanocitta cristata) - this is an eastern species and is quite distinct from the Eastern or Florida Scrub-Jay mentioned earlier.\n", 536 | "\n", 537 | "So, to summarize, a biologist should not expect to find Scrub-Jays (specifically, Western or Eastern Scrub-Jays) in Virginia. However, they would indeed find Blue Jays in that state.\n", 538 | "\n", 539 | "## Related Questions\n", 540 | "\n", 541 | "Below you may find additional questions related to the topic:\n", 542 | "\n", 543 | "### Is a scrub jay an endangered species?\n", 544 | "\n", 545 | "No, the Western Scrub-Jay is not considered an endangered species. However, as mentioned earlier, there is another type of Scrub-Jay referred to as the Eastern or Florida Scrub-Jay which has a more restricted distribution and is classified by IUCN Red List as Near Threatened due to its limited range and habitat loss.\n", 546 | "\n", 547 | "### Do scrub jays migrate?\n", 548 | "\n", 549 | "Both Western and Eastern Scrub-Jays are permanent residents in their respective ranges. The Western Scrub-Jay resides in the western parts of North America while the Eastern or Florida Scrub-CPU times: total: 13.9 s\n", 550 | "Wall time: 12.5 s\n" 551 | ] 552 | } 553 | ], 554 | "source": [ 555 | "%%time\n", 556 | "\n", 557 | "answer = llm_chain.invoke(dict(question=question))" 558 | ] 559 | }, 560 | { 561 | "cell_type": "markdown", 562 | "id": "f2b59055-7276-4827-a437-90c320b152ee", 563 | "metadata": { 564 | "jp-MarkdownHeadingCollapsed": true 565 | }, 566 | "source": [ 567 | "## View LangSmith run" 568 | ] 569 | }, 570 | { 571 | "cell_type": "markdown", 572 | "id": "7bf8d1e6-4a5a-407f-ba8b-1ff5645afa4b", 573 | "metadata": { 574 | "jp-MarkdownHeadingCollapsed": true 575 | }, 576 | "source": [ 577 | "I've shared the resulting run [here](https://smith.langchain.com/public/0451496b-78a6-4cf1-b2e0-e58d6997d0ad/r).\n", 578 | "\n", 579 | "Time to first token: 207 ms\n", 580 | "\n", 581 | "Total tokens: 457 tokens\n", 582 | "\n", 583 | "Latency: 12.45 seconds" 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "source": [ 589 | "## View logs in terminal" 590 | ], 591 | "metadata": { 592 | "collapsed": false 593 | }, 594 | "id": "7f9b3de7629f5768" 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "source": [ 599 | "```text\n", 600 | "ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no\n", 601 | "ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes\n", 602 | "ggml_init_cublas: found 1 CUDA devices:\n", 603 | " Device 0: NVIDIA RTX A4000 Laptop GPU, compute capability 8.6, VMM: yes\n", 604 | "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\\users\\joshua.bailey\\downloads\\neuralbeagle14-7b.Q5_K_M.gguf (version GGUF V3 (latest))\n", 605 | "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", 606 | "llama_model_loader: - kv 0: general.architecture str = llama\n", 607 | "llama_model_loader: - kv 1: general.name str = mlabonne_neuralbeagle14-7b\n", 608 | "llama_model_loader: - kv 2: llama.context_length u32 = 32768\n", 609 | "llama_model_loader: - kv 3: llama.embedding_length u32 = 4096\n", 610 | "llama_model_loader: - kv 4: llama.block_count u32 = 32\n", 611 | "llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336\n", 612 | "llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128\n", 613 | "llama_model_loader: - kv 7: llama.attention.head_count u32 = 32\n", 614 | "llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8\n", 615 | "llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010\n", 616 | "llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000\n", 617 | "llama_model_loader: - kv 11: general.file_type u32 = 17\n", 618 | "llama_model_loader: - kv 12: tokenizer.ggml.model str = llama\n", 619 | "llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = [\"\", \"\", \"\", \"<0x00>\", \"<...\n", 620 | "llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...\n", 621 | "llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n", 622 | "llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1\n", 623 | "llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2\n", 624 | "llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0\n", 625 | "llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 2\n", 626 | "llama_model_loader: - kv 20: tokenizer.chat_template str = {% for message in messages %}{{'<|im_...\n", 627 | "llama_model_loader: - kv 21: general.quantization_version u32 = 2\n", 628 | "llama_model_loader: - type f32: 65 tensors\n", 629 | "llama_model_loader: - type q5_K: 193 tensors\n", 630 | "llama_model_loader: - type q6_K: 33 tensors\n", 631 | "llm_load_vocab: special tokens definition check successful ( 259/32000 ).\n", 632 | "llm_load_print_meta: format = GGUF V3 (latest)\n", 633 | "llm_load_print_meta: arch = llama\n", 634 | "llm_load_print_meta: vocab type = SPM\n", 635 | "llm_load_print_meta: n_vocab = 32000\n", 636 | "llm_load_print_meta: n_merges = 0\n", 637 | "llm_load_print_meta: n_ctx_train = 32768\n", 638 | "llm_load_print_meta: n_embd = 4096\n", 639 | "llm_load_print_meta: n_head = 32\n", 640 | "llm_load_print_meta: n_head_kv = 8\n", 641 | "llm_load_print_meta: n_layer = 32\n", 642 | "llm_load_print_meta: n_rot = 128\n", 643 | "llm_load_print_meta: n_embd_head_k = 128\n", 644 | "llm_load_print_meta: n_embd_head_v = 128\n", 645 | "llm_load_print_meta: n_gqa = 4\n", 646 | "llm_load_print_meta: n_embd_k_gqa = 1024\n", 647 | "llm_load_print_meta: n_embd_v_gqa = 1024\n", 648 | "llm_load_print_meta: f_norm_eps = 0.0e+00\n", 649 | "llm_load_print_meta: f_norm_rms_eps = 1.0e-05\n", 650 | "llm_load_print_meta: f_clamp_kqv = 0.0e+00\n", 651 | "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n", 652 | "llm_load_print_meta: n_ff = 14336\n", 653 | "llm_load_print_meta: n_expert = 0\n", 654 | "llm_load_print_meta: n_expert_used = 0\n", 655 | "llm_load_print_meta: rope scaling = linear\n", 656 | "llm_load_print_meta: freq_base_train = 10000.0\n", 657 | "llm_load_print_meta: freq_scale_train = 1\n", 658 | "llm_load_print_meta: n_yarn_orig_ctx = 32768\n", 659 | "llm_load_print_meta: rope_finetuned = unknown\n", 660 | "llm_load_print_meta: model type = 7B\n", 661 | "llm_load_print_meta: model ftype = Q5_K - Medium\n", 662 | "llm_load_print_meta: model params = 7.24 B\n", 663 | "llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)\n", 664 | "llm_load_print_meta: general.name = mlabonne_neuralbeagle14-7b\n", 665 | "llm_load_print_meta: BOS token = 1 ''\n", 666 | "llm_load_print_meta: EOS token = 2 ''\n", 667 | "llm_load_print_meta: UNK token = 0 ''\n", 668 | "llm_load_print_meta: PAD token = 2 ''\n", 669 | "llm_load_print_meta: LF token = 13 '<0x0A>'\n", 670 | "llm_load_tensors: ggml ctx size = 0.22 MiB\n", 671 | "llm_load_tensors: offloading 32 repeating layers to GPU\n", 672 | "llm_load_tensors: offloading non-repeating layers to GPU\n", 673 | "llm_load_tensors: offloaded 33/33 layers to GPU\n", 674 | "llm_load_tensors: CPU buffer size = 85.94 MiB\n", 675 | "llm_load_tensors: CUDA0 buffer size = 4807.05 MiB\n", 676 | "...................................................................................................\n", 677 | "llama_new_context_with_model: n_ctx = 512\n", 678 | "llama_new_context_with_model: freq_base = 10000.0\n", 679 | "llama_new_context_with_model: freq_scale = 1\n", 680 | "llama_kv_cache_init: CUDA0 KV buffer size = 64.00 MiB\n", 681 | "llama_new_context_with_model: KV self size = 64.00 MiB, K (f16): 32.00 MiB, V (f16): 32.00 MiB\n", 682 | "llama_new_context_with_model: CUDA_Host input buffer size = 9.01 MiB\n", 683 | "llama_new_context_with_model: CUDA0 compute buffer size = 80.30 MiB\n", 684 | "llama_new_context_with_model: CUDA_Host compute buffer size = 8.80 MiB\n", 685 | "llama_new_context_with_model: graph splits (measure): 3\n", 686 | "\n", 687 | "llama_print_timings: load time = 126.88 ms\n", 688 | "llama_print_timings: sample time = 60.32 ms / 464 runs ( 0.13 ms per token, 7692.56 tokens per second)\n", 689 | "llama_print_timings: prompt eval time = 126.79 ms / 48 tokens ( 2.64 ms per token, 378.57 tokens per second)\n", 690 | "llama_print_timings: eval time = 10946.64 ms / 463 runs ( 23.64 ms per token, 42.30 tokens per second)\n", 691 | "llama_print_timings: total time = 12363.03 ms / 511 tokens\n", 692 | "```" 693 | ], 694 | "metadata": { 695 | "collapsed": false 696 | }, 697 | "id": "567d575896b4b301" 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "source": [ 702 | "## Check `nvidia-smi` to see current GPU usage" 703 | ], 704 | "metadata": { 705 | "collapsed": false 706 | }, 707 | "id": "f3502d40d330f10b" 708 | }, 709 | { 710 | "cell_type": "code", 711 | "execution_count": 11, 712 | "id": "c9435585-9b61-434e-9c1b-4f9fc0e22184", 713 | "metadata": {}, 714 | "outputs": [ 715 | { 716 | "name": "stdout", 717 | "output_type": "stream", 718 | "text": [ 719 | "Sun Jan 28 17:30:04 2024 \n", 720 | "+---------------------------------------------------------------------------------------+\n", 721 | "| NVIDIA-SMI 537.79 Driver Version: 537.79 CUDA Version: 12.2 |\n", 722 | "|-----------------------------------------+----------------------+----------------------+\n", 723 | "| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", 724 | "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", 725 | "| | | MIG M. |\n", 726 | "|=========================================+======================+======================|\n", 727 | "| 0 NVIDIA RTX A4000 Laptop GPU WDDM | 00000000:01:00.0 Off | N/A |\n", 728 | "| N/A 66C P0 99W / 100W | 5705MiB / 8192MiB | 81% Default |\n", 729 | "| | | N/A |\n", 730 | "+-----------------------------------------+----------------------+----------------------+\n", 731 | " \n", 732 | "+---------------------------------------------------------------------------------------+\n", 733 | "| Processes: |\n", 734 | "| GPU GI CI PID Type Process name GPU Memory |\n", 735 | "| ID ID Usage |\n", 736 | "|=======================================================================================|\n", 737 | "| 0 N/A N/A 17084 C ...\\conda\\envs\\neuralbeagle\\python.exe N/A |\n", 738 | "+---------------------------------------------------------------------------------------+\n" 739 | ] 740 | } 741 | ], 742 | "source": [ 743 | "!nvidia-smi" 744 | ] 745 | } 746 | ], 747 | "metadata": { 748 | "kernelspec": { 749 | "display_name": "Python 3 (ipykernel)", 750 | "language": "python", 751 | "name": "python3" 752 | }, 753 | "language_info": { 754 | "codemirror_mode": { 755 | "name": "ipython", 756 | "version": 3 757 | }, 758 | "file_extension": ".py", 759 | "mimetype": "text/x-python", 760 | "name": "python", 761 | "nbconvert_exporter": "python", 762 | "pygments_lexer": "ipython3", 763 | "version": "3.11.7" 764 | } 765 | }, 766 | "nbformat": 4, 767 | "nbformat_minor": 5 768 | } 769 | --------------------------------------------------------------------------------