├── LICENSE
├── README.md
├── .gitignore
└── demo.ipynb


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Joshua Sundance Bailey
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 6 | 
 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # llamacpp-langchain-neuralbeagle-demo
 2 | 
 3 | a small demo repo to show how I got neuralbeagle14-7b running locally on my 8GB GPU
 4 | 
 5 | __See [demo.ipynb](./demo.ipynb)__
 6 | 
 7 | # Summary
 8 | 
 9 | This notebook will demonstrate the process I went through to run `neuralbeagle14-7b` on my laptop's 8GB GPU in Windows, as seen in [my LinkedIn post from January 27th, 2024](https://www.linkedin.com/posts/jsundance_free-local-private-ai-on-my-laptop-thanks-activity-7157117360728862720-MWxn?utm_source=share&utm_medium=member_desktop). It pulls heavily from [this LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp).
10 | 
11 | I was able to use `llama-cpp-python` _without_ my GPU, and it took me a couple installs before it was really loading all of the layers onto the GPU. It was still fast without the GPU, but that's not the point. ;)
12 | 
13 | I'm using an NVIDIA RTX A4000 laptop GPU. I will be compiling `llama-cpp-python` instead of using the "usual" `pip install` because I _think_ this is a more reliable method. I will be using the cuBLAS backend, but you can use other backends for AMD or Apple or whatever (more or this later).
14 | 
15 | I will also describe an issue I had with a missing dll, and offer troubleshooting advice for that.
16 | 
17 | Joshua Bailey #LearningInPublic January 28, 2024
18 | 
19 | 
20 | # tldr for users with conda and cuda 11.8:
21 | 
22 | ```cmd
23 | mkdir neuralbeagle && cd neuralbeagle
24 | conda create -n neuralbeagle python=3.11
25 | conda activate neuralbeagle
26 | 
27 | python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
28 | 
29 | set CMAKE_ARGS=-DLLAMA_CUBLAS=on
30 | set FORCE_CMAKE=1
31 | python -m pip install -v --upgrade --force-reinstall --no-cache-dir llama-cpp-python
32 | ```
33 | 
34 | # TODO
35 | - [ ] make a todo list
36 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | .idea/*
  2 | __pycache__
  3 | Dockerfile_*
  4 | # old .gitignore plus several others from https://github.com/github/gitignore
  5 | # duplicates removed
  6 | # https://raw.githubusercontent.com/github/gitignore/main/Global/NotepadPP.gitignore
  7 | # https://raw.githubusercontent.com/github/gitignore/main/Global/PuTTY.gitignore
  8 | # https://raw.githubusercontent.com/github/gitignore/main/Global/VisualStudioCode.gitignore
  9 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Windows.gitignore
 10 | # https://raw.githubusercontent.com/github/gitignore/main/Global/VirtualEnv.gitignore
 11 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Vagrant.gitignore
 12 | # https://raw.githubusercontent.com/github/gitignore/main/Global/MicrosoftOffice.gitignore
 13 | # https://raw.githubusercontent.com/github/gitignore/main/Global/Linux.gitignore
 14 | # https://raw.githubusercontent.com/github/gitignore/main/community/Python/JupyterNotebooks.gitignore
 15 | # https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore
 16 | # https://raw.githubusercontent.com/github/gitignore/main/VisualStudio.gitignore
 17 | # https://raw.githubusercontent.com/github/gitignore/main/Global/JetBrains.gitignore
 18 | # https://raw.githubusercontent.com/github/gitignore/main/CUDA.gitignore
 19 | #docs/_build/doctrees
 20 | docs/_build/html/*
 21 | #docs/_static
 22 | #docs/_templates
 23 | #docs/*.*
 24 | #docs/Makefile
 25 | .env
 26 | # Application specific files
 27 | .DS_Store
 28 | .vscode
 29 | 
 30 | # Byte-compiled / optimized / DLL files
 31 | __pycache__/
 32 | *.py[cod]
 33 | *$py.class
 34 | 
 35 | # C extensions
 36 | *.so
 37 | 
 38 | # Distribution / packaging
 39 | .Python
 40 | build/
 41 | develop-eggs/
 42 | dist/
 43 | downloads/
 44 | eggs/
 45 | .eggs/
 46 | lib/
 47 | lib64/
 48 | parts/
 49 | sdist/
 50 | var/
 51 | wheels/
 52 | *.egg-info/
 53 | .installed.cfg
 54 | *.egg
 55 | MANIFEST
 56 | 
 57 | # PyInstaller
 58 | #  Usually these files are written by a python script from a template
 59 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 60 | *.manifest
 61 | *.spec
 62 | 
 63 | # Installer logs
 64 | pip-log.txt
 65 | pip-delete-this-directory.txt
 66 | 
 67 | # Unit test / coverage reports
 68 | htmlcov/
 69 | .tox/
 70 | .coverage
 71 | .coverage.*
 72 | .cache
 73 | nosetests.xml
 74 | coverage.xml
 75 | *.cover
 76 | .hypothesis/
 77 | 
 78 | # Translations
 79 | *.mo
 80 | *.pot
 81 | 
 82 | # Django stuff:
 83 | *.log
 84 | .static_storage/
 85 | .media/
 86 | local_settings.py
 87 | 
 88 | # Flask stuff:
 89 | instance/
 90 | .webassets-cache
 91 | 
 92 | # Scrapy stuff:
 93 | .scrapy
 94 | 
 95 | # Sphinx documentation
 96 | # docs/_build/
 97 | 
 98 | # PyBuilder
 99 | target/
100 | 
101 | # Jupyter Notebook
102 | .ipynb_checkpoints
103 | 
104 | # pyenv
105 | .python-version
106 | 
107 | # celery beat schedule file
108 | celerybeat-schedule
109 | 
110 | # SageMath parsed files
111 | *.sage.py
112 | 
113 | # Environments
114 | .env
115 | .venv
116 | env/
117 | venv/
118 | ENV/
119 | env.bak/
120 | venv.bak/
121 | 
122 | # Spyder project settings
123 | .spyderproject
124 | .spyproject
125 | 
126 | # Rope project settings
127 | .ropeproject
128 | 
129 | # mkdocs documentation
130 | /site
131 | 
132 | # mypy
133 | .mypy_cache/
134 | 
135 | # ruff
136 | .ruff_cache/
137 | 
138 | # Pip
139 | Pipfile
140 | Pipfile.*
141 | 
142 | # app
143 | settings.py
144 | 
145 | # https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
146 | # Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio, WebStorm and Rider
147 | # Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
148 | 
149 | # User-specific stuff
150 | .idea/**/workspace.xml
151 | .idea/**/tasks.xml
152 | .idea/**/usage.statistics.xml
153 | .idea/**/dictionaries
154 | .idea/**/shelf
155 | 
156 | # AWS User-specific
157 | .idea/**/aws.xml
158 | 
159 | # Generated files
160 | .idea/**/contentModel.xml
161 | 
162 | # Sensitive or high-churn files
163 | .idea/**/dataSources/
164 | .idea/**/dataSources.ids
165 | .idea/**/dataSources.local.xml
166 | .idea/**/sqlDataSources.xml
167 | .idea/**/dynamic.xml
168 | .idea/**/uiDesigner.xml
169 | .idea/**/dbnavigator.xml
170 | 
171 | # Gradle
172 | .idea/**/gradle.xml
173 | .idea/**/libraries
174 | 
175 | # Gradle and Maven with auto-import
176 | # When using Gradle or Maven with auto-import, you should exclude module files,
177 | # since they will be recreated, and may cause churn.  Uncomment if using
178 | # auto-import.
179 | # .idea/artifacts
180 | # .idea/compiler.xml
181 | # .idea/jarRepositories.xml
182 | # .idea/modules.xml
183 | # .idea/*.iml
184 | # .idea/modules
185 | # *.iml
186 | # *.ipr
187 | 
188 | # CMake
189 | cmake-build-*/
190 | 
191 | # Mongo Explorer plugin
192 | .idea/**/mongoSettings.xml
193 | 
194 | # File-based project format
195 | *.iws
196 | 
197 | # IntelliJ
198 | out/
199 | 
200 | # mpeltonen/sbt-idea plugin
201 | .idea_modules/
202 | 
203 | # JIRA plugin
204 | atlassian-ide-plugin.xml
205 | 
206 | # Cursive Clojure plugin
207 | .idea/replstate.xml
208 | 
209 | # SonarLint plugin
210 | .idea/sonarlint/
211 | 
212 | # Crashlytics plugin (for Android Studio and IntelliJ)
213 | com_crashlytics_export_strings.xml
214 | crashlytics.properties
215 | crashlytics-build.properties
216 | fabric.properties
217 | 
218 | # Editor-based Rest Client
219 | .idea/httpRequests
220 | 
221 | # Android studio 3.1+ serialized cache file
222 | .idea/caches/build_file_checksums.ser
223 | 
224 | # https://github.com/github/gitignore/blob/main/Python.gitignore
225 | # Byte-compiled / optimized / DLL files
226 | 
227 | # C extensions
228 | 
229 | # Distribution / packaging
230 | share/python-wheels/
231 | 
232 | # PyInstaller
233 | #  Usually these files are written by a python script from a template
234 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
235 | 
236 | # Installer logs
237 | 
238 | # Unit test / coverage reports
239 | .nox/
240 | *.py,cover
241 | .pytest_cache/
242 | cover/
243 | 
244 | # Translations
245 | 
246 | # Django stuff:
247 | db.sqlite3
248 | db.sqlite3-journal
249 | 
250 | # Flask stuff:
251 | 
252 | # Scrapy stuff:
253 | 
254 | # Sphinx documentation
255 | 
256 | # PyBuilder
257 | .pybuilder/
258 | 
259 | # Jupyter Notebook
260 | 
261 | # IPython
262 | profile_default/
263 | ipython_config.py
264 | 
265 | # pyenv
266 | #   For a library or package, you might want to ignore these files since the backend is
267 | #   intended to run in multiple environments; otherwise, check them in:
268 | # .python-version
269 | 
270 | # pipenv
271 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
272 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
273 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
274 | #   install all needed dependencies.
275 | #Pipfile.lock
276 | 
277 | # poetry
278 | #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
279 | #   This is especially recommended for binary packages to ensure reproducibility, and is more
280 | #   commonly ignored for libraries.
281 | #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
282 | #poetry.lock
283 | 
284 | # pdm
285 | #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
286 | #pdm.lock
287 | #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
288 | #   in version control.
289 | #   https://pdm.fming.dev/#use-with-ide
290 | .pdm.toml
291 | 
292 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
293 | __pypackages__/
294 | 
295 | # Celery stuff
296 | celerybeat.pid
297 | 
298 | # SageMath parsed files
299 | 
300 | # Environments
301 | 
302 | # Spyder project settings
303 | 
304 | # Rope project settings
305 | 
306 | # mkdocs documentation
307 | 
308 | # mypy
309 | .dmypy.json
310 | dmypy.json
311 | 
312 | # Pyre type checker
313 | .pyre/
314 | 
315 | # pytype static type analyzer
316 | .pytype/
317 | 
318 | # Cython debug symbols
319 | cython_debug/
320 | 
321 | # PyCharm
322 | #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
323 | #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
324 | #  and can be added to the global gitignore or merged into this file.  For a more nuclear
325 | #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
326 | #.idea/
327 | 
328 | # https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
329 | ## Ignore Visual Studio temporary files, build results, and
330 | ## files generated by popular Visual Studio add-ons.
331 | ##
332 | ## Get latest from https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
333 | 
334 | # User-specific files
335 | *.rsuser
336 | *.suo
337 | *.user
338 | *.userosscache
339 | *.sln.docstates
340 | 
341 | # User-specific files (MonoDevelop/Xamarin Studio)
342 | *.userprefs
343 | 
344 | # Mono auto generated files
345 | mono_crash.*
346 | 
347 | # Build results
348 | [Dd]ebug/
349 | [Dd]ebugPublic/
350 | [Rr]elease/
351 | [Rr]eleases/
352 | x64/
353 | x86/
354 | [Ww][Ii][Nn]32/
355 | [Aa][Rr][Mm]/
356 | [Aa][Rr][Mm]64/
357 | bld/
358 | [Bb]in/
359 | [Oo]bj/
360 | [Ll]og/
361 | [Ll]ogs/
362 | 
363 | # Visual Studio 2015/2017 cache/options directory
364 | .vs/
365 | # Uncomment if you have tasks that create the project's static files in wwwroot
366 | #wwwroot/
367 | 
368 | # Visual Studio 2017 auto generated files
369 | Generated\ Files/
370 | 
371 | # MSTest test Results
372 | [Tt]est[Rr]esult*/
373 | [Bb]uild[Ll]og.*
374 | 
375 | # NUnit
376 | *.VisualState.xml
377 | TestResult.xml
378 | nunit-*.xml
379 | 
380 | # Build Results of an ATL Project
381 | [Dd]ebugPS/
382 | [Rr]eleasePS/
383 | dlldata.c
384 | 
385 | # Benchmark Results
386 | BenchmarkDotNet.Artifacts/
387 | 
388 | # .NET Core
389 | project.lock.json
390 | project.fragment.lock.json
391 | artifacts/
392 | 
393 | # ASP.NET Scaffolding
394 | ScaffoldingReadMe.txt
395 | 
396 | # StyleCop
397 | StyleCopReport.xml
398 | 
399 | # Files built by Visual Studio
400 | *_i.c
401 | *_p.c
402 | *_h.h
403 | *.ilk
404 | *.meta
405 | *.obj
406 | *.iobj
407 | *.pch
408 | *.pdb
409 | *.ipdb
410 | *.pgc
411 | *.pgd
412 | *.rsp
413 | *.sbr
414 | *.tlb
415 | *.tli
416 | *.tlh
417 | *.tmp
418 | *.tmp_proj
419 | *_wpftmp.csproj
420 | *.tlog
421 | *.vspscc
422 | *.vssscc
423 | .builds
424 | *.pidb
425 | *.svclog
426 | *.scc
427 | 
428 | # Chutzpah Test files
429 | _Chutzpah*
430 | 
431 | # Visual C++ cache files
432 | ipch/
433 | *.aps
434 | *.ncb
435 | *.opendb
436 | *.opensdf
437 | *.sdf
438 | *.cachefile
439 | *.VC.db
440 | *.VC.VC.opendb
441 | 
442 | # Visual Studio profiler
443 | *.psess
444 | *.vsp
445 | *.vspx
446 | *.sap
447 | 
448 | # Visual Studio Trace Files
449 | *.e2e
450 | 
451 | # TFS 2012 Local Workspace
452 | $tf/
453 | 
454 | # Guidance Automation Toolkit
455 | *.gpState
456 | 
457 | # ReSharper is a .NET coding add-in
458 | _ReSharper*/
459 | *.[Rr]e[Ss]harper
460 | *.DotSettings.user
461 | 
462 | # TeamCity is a build add-in
463 | _TeamCity*
464 | 
465 | # DotCover is a Code Coverage Tool
466 | *.dotCover
467 | 
468 | # AxoCover is a Code Coverage Tool
469 | .axoCover/*
470 | !.axoCover/settings.json
471 | 
472 | # Coverlet is a free, cross platform Code Coverage Tool
473 | coverage*.json
474 | coverage*.xml
475 | coverage*.info
476 | 
477 | # Visual Studio backend coverage results
478 | *.coverage
479 | *.coveragexml
480 | 
481 | # NCrunch
482 | _NCrunch_*
483 | .*crunch*.local.xml
484 | nCrunchTemp_*
485 | 
486 | # MightyMoose
487 | *.mm.*
488 | AutoTest.Net/
489 | 
490 | # Web workbench (sass)
491 | .sass-cache/
492 | 
493 | # Installshield output folder
494 | [Ee]xpress/
495 | 
496 | # DocProject is a documentation generator add-in
497 | DocProject/buildhelp/
498 | DocProject/Help/*.HxT
499 | DocProject/Help/*.HxC
500 | DocProject/Help/*.hhc
501 | DocProject/Help/*.hhk
502 | DocProject/Help/*.hhp
503 | DocProject/Help/Html2
504 | DocProject/Help/html
505 | 
506 | # Click-Once directory
507 | publish/
508 | 
509 | # Publish Web Output
510 | *.[Pp]ublish.xml
511 | *.azurePubxml
512 | # Note: Comment the next line if you want to checkin your web deploy settings,
513 | # but database connection strings (with potential passwords) will be unencrypted
514 | *.pubxml
515 | *.publishproj
516 | 
517 | # Microsoft Azure Web App publish settings. Comment the next line if you want to
518 | # checkin your Azure Web App publish settings, but sensitive information contained
519 | # in these scripts will be unencrypted
520 | PublishScripts/
521 | 
522 | # NuGet Packages
523 | *.nupkg
524 | # NuGet Symbol Packages
525 | *.snupkg
526 | # The packages folder can be ignored because of Package Restore
527 | **/[Pp]ackages/*
528 | # except build/, which is used as an MSBuild target.
529 | !**/[Pp]ackages/build/
530 | # Uncomment if necessary however generally it will be regenerated when needed
531 | #!**/[Pp]ackages/repositories.config
532 | # NuGet v3's project.json files produces more ignorable files
533 | *.nuget.props
534 | *.nuget.targets
535 | 
536 | # Microsoft Azure Build Output
537 | csx/
538 | *.build.csdef
539 | 
540 | # Microsoft Azure Emulator
541 | ecf/
542 | rcf/
543 | 
544 | # Windows Store app package directories and files
545 | AppPackages/
546 | BundleArtifacts/
547 | Package.StoreAssociation.xml
548 | _pkginfo.txt
549 | *.appx
550 | *.appxbundle
551 | *.appxupload
552 | 
553 | # Visual Studio cache files
554 | # files ending in .cache can be ignored
555 | *.[Cc]ache
556 | # but keep track of directories ending in .cache
557 | !?*.[Cc]ache/
558 | 
559 | # Others
560 | ClientBin/
561 | ~$*
562 | *~
563 | *.dbmdl
564 | *.dbproj.schemaview
565 | *.jfm
566 | *.pfx
567 | *.publishsettings
568 | orleans.codegen.cs
569 | 
570 | # Including strong name files can present a security risk
571 | # (https://github.com/github/gitignore/pull/2483#issue-259490424)
572 | #*.snk
573 | 
574 | # Since there are multiple workflows, uncomment next line to ignore bower_components
575 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
576 | #bower_components/
577 | 
578 | # RIA/Silverlight projects
579 | Generated_Code/
580 | 
581 | # Backup & report files from converting an old project file
582 | # to a newer Visual Studio version. Backup files are not needed,
583 | # because we have git ;-)
584 | _UpgradeReport_Files/
585 | Backup*/
586 | UpgradeLog*.XML
587 | UpgradeLog*.htm
588 | ServiceFabricBackup/
589 | *.rptproj.bak
590 | 
591 | # SQL Server files
592 | *.mdf
593 | *.ldf
594 | *.ndf
595 | 
596 | # Business Intelligence projects
597 | *.rdl.data
598 | *.bim.layout
599 | *.bim_*.settings
600 | *.rptproj.rsuser
601 | *- [Bb]ackup.rdl
602 | *- [Bb]ackup ([0-9]).rdl
603 | *- [Bb]ackup ([0-9][0-9]).rdl
604 | 
605 | # Microsoft Fakes
606 | FakesAssemblies/
607 | 
608 | # GhostDoc plugin setting file
609 | *.GhostDoc.xml
610 | 
611 | # Node.js Tools for Visual Studio
612 | .ntvs_analysis.dat
613 | node_modules/
614 | 
615 | # Visual Studio 6 build log
616 | *.plg
617 | 
618 | # Visual Studio 6 workspace options file
619 | *.opt
620 | 
621 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
622 | *.vbw
623 | 
624 | # Visual Studio 6 auto-generated project file (contains which files were open etc.)
625 | *.vbp
626 | 
627 | # Visual Studio 6 workspace and project file (working project files containing files to include in project)
628 | *.dsw
629 | *.dsp
630 | 
631 | # Visual Studio 6 technical files
632 | 
633 | # Visual Studio LightSwitch build output
634 | **/*.HTMLClient/GeneratedArtifacts
635 | **/*.DesktopClient/GeneratedArtifacts
636 | **/*.DesktopClient/ModelManifest.xml
637 | **/*.Server/GeneratedArtifacts
638 | **/*.Server/ModelManifest.xml
639 | _Pvt_Extensions
640 | 
641 | # Paket dependency manager
642 | .paket/paket.exe
643 | paket-files/
644 | 
645 | # FAKE - F# Make
646 | .fake/
647 | 
648 | # CodeRush personal settings
649 | .cr/personal
650 | 
651 | # Python Tools for Visual Studio (PTVS)
652 | *.pyc
653 | 
654 | # Cake - Uncomment if you are using it
655 | # tools/**
656 | # !tools/packages.config
657 | 
658 | # Tabs Studio
659 | *.tss
660 | 
661 | # Telerik's JustMock configuration file
662 | *.jmconfig
663 | 
664 | # BizTalk build output
665 | *.btp.cs
666 | *.btm.cs
667 | *.odx.cs
668 | *.xsd.cs
669 | 
670 | # OpenCover UI analysis results
671 | OpenCover/
672 | 
673 | # Azure Stream Analytics local run output
674 | ASALocalRun/
675 | 
676 | # MSBuild Binary and Structured Log
677 | *.binlog
678 | 
679 | # NVidia Nsight GPU debugger configuration file
680 | *.nvuser
681 | 
682 | # MFractors (Xamarin productivity tool) working folder
683 | .mfractor/
684 | 
685 | # Local History for Visual Studio
686 | .localhistory/
687 | 
688 | # Visual Studio History (VSHistory) files
689 | .vshistory/
690 | 
691 | # BeatPulse healthcheck temp database
692 | healthchecksdb
693 | 
694 | # Backup folder for Package Reference Convert tool in Visual Studio 2017
695 | MigrationBackup/
696 | 
697 | # Ionide (cross platform F# VS Code tools) working folder
698 | .ionide/
699 | 
700 | # Fody - auto-generated XML schema
701 | FodyWeavers.xsd
702 | 
703 | # VS Code files for those working on multiple tools
704 | .vscode/*
705 | !.vscode/settings.json
706 | !.vscode/tasks.json
707 | !.vscode/launch.json
708 | !.vscode/extensions.json
709 | *.code-workspace
710 | 
711 | # Local History for Visual Studio Code
712 | .history/
713 | 
714 | # Windows Installer files from build outputs
715 | *.cab
716 | *.msi
717 | *.msix
718 | *.msm
719 | *.msp
720 | 
721 | # JetBrains Rider
722 | *.sln.iml
723 | 
724 | 
725 | !.vscode/*.code-snippets
726 | $RECYCLE.BIN/
727 | *.bak
728 | *.cubin
729 | *.fatbin
730 | *.gpu
731 | *.i
732 | *.ii
733 | *.lnk
734 | *.ppk
735 | *.ptx
736 | *.stackdump
737 | *.vsix
738 | *.xlk
739 | *.~vsd*
740 | */.ipynb_checkpoints/*
741 | .Trash-*
742 | .directory
743 | .fuse_hidden*
744 | .nfs*
745 | .vagrant/
746 | Backup of *.doc*
747 | Thumbs.db
748 | Thumbs.db:encryptable
749 | [Bb]in
750 | [Dd]esktop.ini
751 | [Ii]nclude
752 | [Ll]ib
753 | [Ll]ib64
754 | [Ll]ocal
755 | [Ss]cripts
756 | ehthumbs.db
757 | ehthumbs_vista.db
758 | pip-selfcheck.json
759 | pyvenv.cfg
760 | ~$*.doc*
761 | ~$*.ppt*
762 | ~$*.xls*
763 | docs/_build/
764 | coverage_html_report/
765 | 


--------------------------------------------------------------------------------
/demo.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "id": "9cff789b-a218-4d39-992e-c423b68bef8a",
  6 |    "metadata": {},
  7 |    "source": [
  8 |     "This notebook will demonstrate the process I went through to run `neuralbeagle14-7b` on my laptop's 8GB GPU in Windows, as seen in [my LinkedIn post from January 27th, 2024](https://www.linkedin.com/posts/jsundance_free-local-private-ai-on-my-laptop-thanks-activity-7157117360728862720-MWxn?utm_source=share&utm_medium=member_desktop). It pulls heavily from [this LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp).\n",
  9 |     "\n",
 10 |     "I was able to use `llama-cpp-python` _without_ my GPU, and it took me a couple installs before it was really loading all of the layers onto the GPU. It was still fast without the GPU, but that's not the point. ;)\n",
 11 |     "\n",
 12 |     "I'm using an NVIDIA RTX A4000 laptop GPU. I will be compiling `llama-cpp-python` instead of using the \"usual\" `pip install` because I _think_ this is a more reliable method. I will be using the cuBLAS backend, but you can use other backends for AMD or Apple or whatever (more or this later).\n",
 13 |     "\n",
 14 |     "I will also describe an issue I had with a missing dll, and offer troubleshooting advice for that.\n",
 15 |     "\n",
 16 |     "Joshua Bailey #LearningInPublic January 28, 2024"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "id": "14eabf3c-27e7-43dd-8017-f3f8dd4f961a",
 22 |    "metadata": {
 23 |     "jp-MarkdownHeadingCollapsed": true
 24 |    },
 25 |    "source": [
 26 |     "# Prerequisites (and gotchas)"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "id": "491caa09-5dea-48c1-bb51-2ed2c1f5850f",
 32 |    "metadata": {
 33 |     "jp-MarkdownHeadingCollapsed": true
 34 |    },
 35 |    "source": [
 36 |     "## NVIDIA stuff"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "id": "c183c5bc-029c-4394-9bdb-783b21f9745f",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "- [NVIDIA driver](https://www.nvidia.com/download/index.aspx)\n",
 45 |     "- [CUDA toolkit](https://developer.nvidia.com/cuda-toolkit)"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "markdown",
 50 |    "id": "778907f6-ea8d-404f-83bc-b57784576c39",
 51 |    "metadata": {
 52 |     "jp-MarkdownHeadingCollapsed": true
 53 |    },
 54 |    "source": [
 55 |     "## Microsoft Visual Studio stuff"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "markdown",
 60 |    "id": "70d3c244-2c0c-4d48-8b96-25d8c729c471",
 61 |    "metadata": {},
 62 |    "source": [
 63 |     "From [the LangChain documentation](https://python.langchain.com/docs/integrations/llms/llamacpp):\n",
 64 |     "\n",
 65 |     "- Visual Studio Community (make sure you install this with the following settings)\n",
 66 |     "  - Desktop development with C++\n",
 67 |     "  - Python development\n",
 68 |     "  - Linux embedded development with C++\n",
 69 |     "\n",
 70 |     "_side note_: I installed this stuff a while ago along with `cudnn` for ArcGIS deep learning, and I don't think I included the Linux embedded development thing (maybe), which is probably why I had the dll trouble I'll describe later. ;)"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "attachments": {},
 75 |    "cell_type": "markdown",
 76 |    "id": "7373ed17-dad5-4530-a1ca-2b95774e1e3f",
 77 |    "metadata": {
 78 |     "jp-MarkdownHeadingCollapsed": true
 79 |    },
 80 |    "source": [
 81 |     "## Check `nvidia-smi` and `nvcc --version`\n",
 82 |     "\n",
 83 |     "If either of these commands don't work, you'll have trouble.\n",
 84 |     "Install NVIDIA driver and CUDA toolkit."
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 1,
 90 |    "id": "779f4d54-fd62-4d5f-ae05-f3c8d2b778cc",
 91 |    "metadata": {},
 92 |    "outputs": [
 93 |     {
 94 |      "name": "stdout",
 95 |      "output_type": "stream",
 96 |      "text": [
 97 |       "Sun Jan 28 17:29:46 2024       \n",
 98 |       "+---------------------------------------------------------------------------------------+\n",
 99 |       "| NVIDIA-SMI 537.79                 Driver Version: 537.79       CUDA Version: 12.2     |\n",
100 |       "|-----------------------------------------+----------------------+----------------------+\n",
101 |       "| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
102 |       "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
103 |       "|                                         |                      |               MIG M. |\n",
104 |       "|=========================================+======================+======================|\n",
105 |       "|   0  NVIDIA RTX A4000 Laptop GPU  WDDM  | 00000000:01:00.0 Off |                  N/A |\n",
106 |       "| N/A   55C    P8              16W / 110W |      0MiB /  8192MiB |      0%      Default |\n",
107 |       "|                                         |                      |                  N/A |\n",
108 |       "+-----------------------------------------+----------------------+----------------------+\n",
109 |       "                                                                                         \n",
110 |       "+---------------------------------------------------------------------------------------+\n",
111 |       "| Processes:                                                                            |\n",
112 |       "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
113 |       "|        ID   ID                                                             Usage      |\n",
114 |       "|=======================================================================================|\n",
115 |       "|  No running processes found                                                           |\n",
116 |       "+---------------------------------------------------------------------------------------+\n"
117 |      ]
118 |     }
119 |    ],
120 |    "source": [
121 |     "!nvidia-smi"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 2,
127 |    "id": "28c6657c-79c6-4335-868b-d00e64bf3a2a",
128 |    "metadata": {},
129 |    "outputs": [
130 |     {
131 |      "name": "stdout",
132 |      "output_type": "stream",
133 |      "text": [
134 |       "nvcc: NVIDIA (R) Cuda compiler driver\n",
135 |       "Copyright (c) 2005-2022 NVIDIA Corporation\n",
136 |       "Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022\n",
137 |       "Cuda compilation tools, release 11.8, V11.8.89\n",
138 |       "Build cuda_11.8.r11.8/compiler.31833905_0\n"
139 |      ]
140 |     }
141 |    ],
142 |    "source": [
143 |     "!nvcc --version"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "markdown",
148 |    "id": "be89ed97-c1e3-4923-a7a6-16ebd9ff8f17",
149 |    "metadata": {
150 |     "jp-MarkdownHeadingCollapsed": true
151 |    },
152 |    "source": [
153 |     "## Python stuff"
154 |    ]
155 |   },
156 |   {
157 |    "cell_type": "markdown",
158 |    "id": "082553d6-a620-46f0-99a2-690684436feb",
159 |    "metadata": {},
160 |    "source": [
161 |     "First of all, I highly recommend using an environment management tool like `conda` to manage your environments-- and never tinker around in the base environment. That way, when your package versions get messed up or whatever, you can just start fresh. ;)\n",
162 |     "\n",
163 |     "A common approach is to install [anaconda](https://anaconda.org/). \n",
164 |     "\n",
165 |     "Assuming you have `conda`, create a new Python environment. At the time of writing, version constraints meant that Python 3.12 was not supported, so:\n",
166 |     "\n",
167 |     "```\n",
168 |     "conda create -n llama-cpp-python python=3.11\n",
169 |     "conda activate llama-cpp-python\n",
170 |     "```"
171 |    ]
172 |   },
173 |   {
174 |    "cell_type": "markdown",
175 |    "id": "10c0b462-50b2-4803-8bd2-8f236531a4d2",
176 |    "metadata": {
177 |     "jp-MarkdownHeadingCollapsed": true
178 |    },
179 |    "source": [
180 |     "### `torch`"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "markdown",
185 |    "id": "c5c21f5e-1787-4159-8bec-03e2e449b0a2",
186 |    "metadata": {
187 |     "jp-MarkdownHeadingCollapsed": true
188 |    },
189 |    "source": [
190 |     "You have to install `torch` before installing `llama-cpp-python`. I think if you just `pip install torch` then you get the cpu-only version.\n",
191 |     "\n",
192 |     "So assuming you'll be using CUDA 11.8, based on [the pytorch documentation](https://pytorch.org/get-started/locally/), run:\n",
193 |     "\n",
194 |     "```\n",
195 |     "pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n",
196 |     "```\n",
197 |     "\n",
198 |     "(12.1 is available at `/whl/cu121` but 12.2 is apparently not supported yet)"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": 3,
204 |    "id": "0aa50379-f0a4-4a7d-b4a0-88714cd2d837",
205 |    "metadata": {},
206 |    "outputs": [],
207 |    "source": [
208 |     "import torch\n",
209 |     "\n",
210 |     "if not torch.cuda.is_available():\n",
211 |     "    raise RuntimeError()"
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "markdown",
216 |    "id": "e9a32f5a-365d-4aa4-89b5-90c0a4cf7050",
217 |    "metadata": {
218 |     "jp-MarkdownHeadingCollapsed": true
219 |    },
220 |    "source": [
221 |     "# Compiling and installing `llama-cpp-python`"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "markdown",
226 |    "id": "36306695-bbc3-4b4d-a1d2-a7973b5a3eb1",
227 |    "metadata": {},
228 |    "source": [
229 |     "```markdown\n",
230 |     "There are different options on how to install the llama-cpp package:\n",
231 |     "\n",
232 |     "- CPU usage\n",
233 |     "- CPU + GPU (using one of many BLAS backends)\n",
234 |     "- Metal GPU (MacOS with Apple Silicon Chip)\n",
235 |     "```\n",
236 |     "(from https://python.langchain.com/docs/integrations/llms/llamacpp)\n",
237 |     "\n",
238 |     "In Windows with CUBLAS:\n",
239 |     "\n",
240 |     "```\n",
241 |     "set CMAKE_ARGS=-DLLAMA_CUBLAS=on\n",
242 |     "set FORCE_CMAKE=1\n",
243 |     "pip install -v llama-cpp-python\n",
244 |     "# or if you've already installed it and need to try again (or you just wanna be extra careful I guess?):\n",
245 |     "# pip install -v --upgrade --force-reinstall --no-cache-dir llama-cpp-python\n",
246 |     "```"
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "markdown",
251 |    "id": "b3c57974-3129-4c49-b06a-362ee34e153d",
252 |    "metadata": {
253 |     "jp-MarkdownHeadingCollapsed": true
254 |    },
255 |    "source": [
256 |     "## `No CUDA toolset found`? Check for `Nvda.Build.CudaTasks.v*.*.dll`"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "id": "26fdb49b-775f-4451-8152-16d257aa8b06",
262 |    "metadata": {},
263 |    "source": [
264 |     "If you try to compile `llama-cpp-python` and get an error message like\n",
265 |     "```text\n",
266 |     "[...]\n",
267 |     "CMake Error at [...]\n",
268 |     "No CUDA toolset found.\n",
269 |     "[...]\n",
270 |     "*** CMake configuration failed.\n",
271 |     "[end of output]\n",
272 |     "```\n",
273 |     "\n",
274 |     "Then take a look at [this GitHub issue comment](https://github.com/NVlabs/tiny-cuda-nn/issues/164#issuecomment-1280749170) and possibly use the following code to help find your problem.\n",
275 |     "\n",
276 |     "If everything's good, the code below will print stuff and not raise any errors."
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "code",
281 |    "execution_count": 4,
282 |    "id": "82432156-34c1-4a6c-a8ec-cb3bdbe02603",
283 |    "metadata": {},
284 |    "outputs": [
285 |     {
286 |      "name": "stdout",
287 |      "output_type": "stream",
288 |      "text": [
289 |       "Found files:\n",
290 |       "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.4\\extras\\visual_studio_integration\\MSBuildExtensions\\Nvda.Build.CudaTasks.v11.4.dll\n",
291 |       "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\extras\\visual_studio_integration\\MSBuildExtensions\\Nvda.Build.CudaTasks.v11.8.dll \n",
292 |       "\n",
293 |       "Highest version: 11.8\n",
294 |       "\n",
295 |       "Build customizations dir: C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\n",
296 |       "\n",
297 |       "Checking for files:\n",
298 |       "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.xml\n",
299 |       "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.props\n",
300 |       "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\Nvda.Build.CudaTasks.v11.8.dll\n",
301 |       "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\MSBuild\\Microsoft\\VC\\v170\\BuildCustomizations\\CUDA 11.8.targets \n",
302 |       "\n",
303 |       "All good\n"
304 |      ]
305 |     }
306 |    ],
307 |    "source": [
308 |     "import os\n",
309 |     "\n",
310 |     "from glob import glob\n",
311 |     "import re\n",
312 |     "\n",
313 |     "def version_files(version: str) -> set[str]:\n",
314 |     "    return {\n",
315 |     "        f\"CUDA {version}.props\",\n",
316 |     "        f\"CUDA {version}.targets\",\n",
317 |     "        f\"CUDA {version}.xml\",\n",
318 |     "        f\"Nvda.Build.CudaTasks.v{version}.dll\",\n",
319 |     "    }\n",
320 |     "\n",
321 |     "dll_pat = re.compile(r\"^Nvda.Build.CudaTasks.v(?P<major>\\d{2})\\.(?P<minor>\\d)\\.dll$\")\n",
322 |     "\n",
323 |     "nvidia_cuda_glob = f\"C:\\\\Program Files\\\\NVIDIA GPU Computing Toolkit\\\\CUDA\\\\**\\\\extras\\\\visual_studio_integration\\\\MSBuildExtensions\\\\Nvda.Build.CudaTasks.v*.*.dll\"\n",
324 |     "nvidia_cuda_files = glob(nvidia_cuda_glob, recursive=True)\n",
325 |     "print(f\"Found files:\")\n",
326 |     "print(\"\\n\".join(nvidia_cuda_files), \"\\n\")\n",
327 |     "\n",
328 |     "if not nvidia_cuda_files:\n",
329 |     "    raise RuntimeError()\n",
330 |     "\n",
331 |     "basenames = (os.path.basename(f) for f in nvidia_cuda_files)\n",
332 |     "matches = (dll_pat.match(bn) for bn in basenames)\n",
333 |     "groups = (match.groupdict() for match in matches if match)\n",
334 |     "sorted_versions = sorted(groups, key=lambda x: (int(x['major']), int(x['minor'])))\n",
335 |     "highest_version = sorted_versions[-1]\n",
336 |     "highest_str = highest_version['major'] + '.' + highest_version['minor']\n",
337 |     "highest_files = version_files(highest_str)\n",
338 |     "\n",
339 |     "print(f\"Highest version: {highest_str}\\n\")\n",
340 |     "\n",
341 |     "bc_dirs = glob(\"C:\\\\Program Files (x86)\\\\Microsoft Visual Studio\\\\*\\\\BuildTools\\\\MSBuild\\\\Microsoft\\\\VC\\\\v*\\\\BuildCustomizations\", recursive=True)\n",
342 |     "if len(bc_dirs) != 1:\n",
343 |     "    print(\"Only expected to find one directory lol\")\n",
344 |     "    print(bc_dirs)\n",
345 |     "    raise RuntimeError()\n",
346 |     "bc_dir = bc_dirs[0]\n",
347 |     "\n",
348 |     "print(f\"Build customizations dir: {bc_dir}\\n\")\n",
349 |     "\n",
350 |     "expected_files = [os.path.join(bc_dir, file) for file in highest_files]\n",
351 |     "print(\"Checking for files:\")\n",
352 |     "print(\"\\n\".join(expected_files), \"\\n\")\n",
353 |     "\n",
354 |     "for file in highest_files:\n",
355 |     "    expected_file = os.path.join(bc_dir, file)\n",
356 |     "    if not os.path.exists(expected_file):\n",
357 |     "        raise FileNotFoundError(expected_file)\n",
358 |     "\n",
359 |     "print(\"All good\")"
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "markdown",
364 |    "id": "25efc524-b3a5-4884-ba9d-067e570c8046",
365 |    "metadata": {
366 |     "jp-MarkdownHeadingCollapsed": true
367 |    },
368 |    "source": [
369 |     "# Using `neuralbeagle14-7b` in `langchain`"
370 |    ]
371 |   },
372 |   {
373 |    "cell_type": "markdown",
374 |    "id": "5d6562ca-2771-4514-aab9-ed0079a6e99b",
375 |    "metadata": {
376 |     "jp-MarkdownHeadingCollapsed": true
377 |    },
378 |    "source": [
379 |     "## Enable LangSmith logging (optional)"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "markdown",
384 |    "id": "a79df914-5ae5-439b-b73d-8e257b827330",
385 |    "metadata": {},
386 |    "source": [
387 |     "`.env`:\n",
388 |     "```\n",
389 |     "LANGCHAIN_API_KEY=ls__...\n",
390 |     "LANGCHAIN_ENDPOINT=https://api.smith.langchain.com\n",
391 |     "LANGCHAIN_TRACING_V2=true\n",
392 |     "LANGCHAIN_PROJECT=\"neuralbeagle-demo\"\n",
393 |     "```"
394 |    ]
395 |   },
396 |   {
397 |    "cell_type": "code",
398 |    "execution_count": 5,
399 |    "id": "49da49db-f359-4703-99ae-909fd091f1dc",
400 |    "metadata": {},
401 |    "outputs": [
402 |     {
403 |      "data": {
404 |       "text/plain": [
405 |        "['LANGCHAIN_API_KEY',\n",
406 |        " 'LANGCHAIN_ENDPOINT',\n",
407 |        " 'LANGCHAIN_TRACING_V2',\n",
408 |        " 'LANGCHAIN_PROJECT']"
409 |       ]
410 |      },
411 |      "execution_count": 5,
412 |      "metadata": {},
413 |      "output_type": "execute_result"
414 |     }
415 |    ],
416 |    "source": [
417 |     "# for langsmith logging\n",
418 |     "from dotenv import load_dotenv\n",
419 |     "load_dotenv()\n",
420 |     "\n",
421 |     "[k for k in os.environ.keys() if 'langchain' in k.lower()]"
422 |    ]
423 |   },
424 |   {
425 |    "cell_type": "markdown",
426 |    "id": "ddb544e4-de85-4dc4-8c07-01a4f26ffccc",
427 |    "metadata": {
428 |     "jp-MarkdownHeadingCollapsed": true
429 |    },
430 |    "source": [
431 |     "## Call the model using `langchain_community.llms.LlamaCpp`"
432 |    ]
433 |   },
434 |   {
435 |    "cell_type": "code",
436 |    "execution_count": 6,
437 |    "id": "d37ffd44-4e4c-4979-882b-3a1777fd702a",
438 |    "metadata": {},
439 |    "outputs": [],
440 |    "source": [
441 |     "from langchain.callbacks.manager import CallbackManager\n",
442 |     "from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler\n",
443 |     "from langchain.chains import LLMChain\n",
444 |     "from langchain.prompts import PromptTemplate\n",
445 |     "from langchain_community.llms import LlamaCpp"
446 |    ]
447 |   },
448 |   {
449 |    "cell_type": "code",
450 |    "execution_count": 7,
451 |    "id": "013cc096-20b9-40a9-a4a4-c18dbff8ced1",
452 |    "metadata": {},
453 |    "outputs": [
454 |     {
455 |      "name": "stdout",
456 |      "output_type": "stream",
457 |      "text": [
458 |       "CPU times: total: 2.8 s\n",
459 |       "Wall time: 2.82 s\n"
460 |      ]
461 |     },
462 |     {
463 |      "name": "stderr",
464 |      "output_type": "stream",
465 |      "text": [
466 |       "AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | \n",
467 |       "Model metadata: {'general.name': 'mlabonne_neuralbeagle14-7b', 'general.architecture': 'llama', 'llama.context_length': '32768', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '17', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '10000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.padding_token_id': '2', 'tokenizer.chat_template': \"{% for message in messages %}{{'<|im_start|>' + message['role'] + '\\n' + message['content'] + '<|im_end|>' + '\\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\\n' }}{% endif %}\"}\n"
468 |      ]
469 |     }
470 |    ],
471 |    "source": [
472 |     "%%time\n",
473 |     "\n",
474 |     "model_path = r\"C:\\users\\joshua.bailey\\downloads\\neuralbeagle14-7b.Q5_K_M.gguf\"\n",
475 |     "\n",
476 |     "llm_kwargs = {\n",
477 |     "    \"temperature\": 0.75,\n",
478 |     "    \"max_tokens\": 5000,\n",
479 |     "    \"top_p\": 1,\n",
480 |     "    # The following settings are from https://python.langchain.com/docs/integrations/llms/llamacpp\n",
481 |     "    # These settings used about 5.7GB GPU RAM on my system\n",
482 |     "    \"n_gpu_layers\": 40,  # Change this value based on your model and your GPU VRAM pool.\n",
483 |     "    \"n_batch\": 512,  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.\n",
484 |     "    # Callbacks support token-wise streaming\n",
485 |     "    \"callback_manager\": CallbackManager([StreamingStdOutCallbackHandler()]),\n",
486 |     "    # Verbose is required to pass to the callback manager\n",
487 |     "    \"verbose\": True,\n",
488 |     "}\n",
489 |     "\n",
490 |     "if not os.path.exists(model_path):\n",
491 |     "    raise FileNotFoundError(model_path)\n",
492 |     "\n",
493 |     "llm = LlamaCpp(\n",
494 |     "    model_path=model_path,\n",
495 |     "    **llm_kwargs,\n",
496 |     ")"
497 |    ]
498 |   },
499 |   {
500 |    "cell_type": "code",
501 |    "execution_count": 8,
502 |    "id": "4a109420-cc1b-4187-bc36-ac69f82f7670",
503 |    "metadata": {},
504 |    "outputs": [],
505 |    "source": [
506 |     "template = \"\"\"Question: {question}\n",
507 |     "\n",
508 |     "Answer: Let's work this out in a step by step way to be sure we have the right answer.\"\"\"\n",
509 |     "\n",
510 |     "llm_chain = PromptTemplate(template=template, input_variables=[\"question\"]) | llm"
511 |    ]
512 |   },
513 |   {
514 |    "cell_type": "code",
515 |    "execution_count": 9,
516 |    "id": "bc0ce5dd-4f11-4239-91c8-d676978ef77b",
517 |    "metadata": {},
518 |    "outputs": [],
519 |    "source": [
520 |     "question = \"What are Scrub-Jays? Should a biologist expect to find them in Virginia?\""
521 |    ]
522 |   },
523 |   {
524 |    "cell_type": "code",
525 |    "execution_count": 10,
526 |    "id": "43ae770c-a826-468a-8dcb-be9e6641217c",
527 |    "metadata": {},
528 |    "outputs": [
529 |     {
530 |      "name": "stdout",
531 |      "output_type": "stream",
532 |      "text": [
533 |       " First, let's consider what a Scrub-Jay is. There are two main types of Scrub-Jays found in North America: Western Scrub-Jay and the Eastern Scrub-Jay (also known as Blue Jay). The latter is also called the Florida Scrub-Jay because it has a limited range compared to the former, mostly restricted to peninsular Florida. This means that the Eastern or Florida Scrub-Jay would not be found in Virginia since its range does not extend there. On the other hand, the Western Scrub-Jay is widely distributed across the western United States and parts of Mexico. It's range includes California, Arizona, Nevada, Utah, Colorado, Oregon and Washington.\n",
534 |       "\n",
535 |       "However, Virginia is located on the east coast of the USA and is in close proximity to the Atlantic Ocean, which is outside the distribution range of Western Scrub-Jays. Nonetheless, there is a third kind of JAY that can be found in Virginia, and it's known as the Blue Jay (Cyanocitta cristata) - this is an eastern species and is quite distinct from the Eastern or Florida Scrub-Jay mentioned earlier.\n",
536 |       "\n",
537 |       "So, to summarize, a biologist should not expect to find Scrub-Jays (specifically, Western or Eastern Scrub-Jays) in Virginia. However, they would indeed find Blue Jays in that state.\n",
538 |       "\n",
539 |       "## Related Questions\n",
540 |       "\n",
541 |       "Below you may find additional questions related to the topic:\n",
542 |       "\n",
543 |       "### Is a scrub jay an endangered species?\n",
544 |       "\n",
545 |       "No, the Western Scrub-Jay is not considered an endangered species. However, as mentioned earlier, there is another type of Scrub-Jay referred to as the Eastern or Florida Scrub-Jay which has a more restricted distribution and is classified by IUCN Red List as Near Threatened due to its limited range and habitat loss.\n",
546 |       "\n",
547 |       "### Do scrub jays migrate?\n",
548 |       "\n",
549 |       "Both Western and Eastern Scrub-Jays are permanent residents in their respective ranges. The Western Scrub-Jay resides in the western parts of North America while the Eastern or Florida Scrub-CPU times: total: 13.9 s\n",
550 |       "Wall time: 12.5 s\n"
551 |      ]
552 |     }
553 |    ],
554 |    "source": [
555 |     "%%time\n",
556 |     "\n",
557 |     "answer = llm_chain.invoke(dict(question=question))"
558 |    ]
559 |   },
560 |   {
561 |    "cell_type": "markdown",
562 |    "id": "f2b59055-7276-4827-a437-90c320b152ee",
563 |    "metadata": {
564 |     "jp-MarkdownHeadingCollapsed": true
565 |    },
566 |    "source": [
567 |     "## View LangSmith run"
568 |    ]
569 |   },
570 |   {
571 |    "cell_type": "markdown",
572 |    "id": "7bf8d1e6-4a5a-407f-ba8b-1ff5645afa4b",
573 |    "metadata": {
574 |     "jp-MarkdownHeadingCollapsed": true
575 |    },
576 |    "source": [
577 |     "I've shared the resulting run [here](https://smith.langchain.com/public/0451496b-78a6-4cf1-b2e0-e58d6997d0ad/r).\n",
578 |     "\n",
579 |     "Time to first token: 207 ms\n",
580 |     "\n",
581 |     "Total tokens: 457 tokens\n",
582 |     "\n",
583 |     "Latency: 12.45 seconds"
584 |    ]
585 |   },
586 |   {
587 |    "cell_type": "markdown",
588 |    "source": [
589 |     "## View logs in terminal"
590 |    ],
591 |    "metadata": {
592 |     "collapsed": false
593 |    },
594 |    "id": "7f9b3de7629f5768"
595 |   },
596 |   {
597 |    "cell_type": "markdown",
598 |    "source": [
599 |     "```text\n",
600 |     "ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no\n",
601 |     "ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes\n",
602 |     "ggml_init_cublas: found 1 CUDA devices:\n",
603 |     "  Device 0: NVIDIA RTX A4000 Laptop GPU, compute capability 8.6, VMM: yes\n",
604 |     "llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\\users\\joshua.bailey\\downloads\\neuralbeagle14-7b.Q5_K_M.gguf (version GGUF V3 (latest))\n",
605 |     "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
606 |     "llama_model_loader: - kv   0:                       general.architecture str              = llama\n",
607 |     "llama_model_loader: - kv   1:                               general.name str              = mlabonne_neuralbeagle14-7b\n",
608 |     "llama_model_loader: - kv   2:                       llama.context_length u32              = 32768\n",
609 |     "llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096\n",
610 |     "llama_model_loader: - kv   4:                          llama.block_count u32              = 32\n",
611 |     "llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336\n",
612 |     "llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128\n",
613 |     "llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32\n",
614 |     "llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8\n",
615 |     "llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010\n",
616 |     "llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000\n",
617 |     "llama_model_loader: - kv  11:                          general.file_type u32              = 17\n",
618 |     "llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama\n",
619 |     "llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n",
620 |     "llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...\n",
621 |     "llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n",
622 |     "llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1\n",
623 |     "llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2\n",
624 |     "llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0\n",
625 |     "llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 2\n",
626 |     "llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...\n",
627 |     "llama_model_loader: - kv  21:               general.quantization_version u32              = 2\n",
628 |     "llama_model_loader: - type  f32:   65 tensors\n",
629 |     "llama_model_loader: - type q5_K:  193 tensors\n",
630 |     "llama_model_loader: - type q6_K:   33 tensors\n",
631 |     "llm_load_vocab: special tokens definition check successful ( 259/32000 ).\n",
632 |     "llm_load_print_meta: format           = GGUF V3 (latest)\n",
633 |     "llm_load_print_meta: arch             = llama\n",
634 |     "llm_load_print_meta: vocab type       = SPM\n",
635 |     "llm_load_print_meta: n_vocab          = 32000\n",
636 |     "llm_load_print_meta: n_merges         = 0\n",
637 |     "llm_load_print_meta: n_ctx_train      = 32768\n",
638 |     "llm_load_print_meta: n_embd           = 4096\n",
639 |     "llm_load_print_meta: n_head           = 32\n",
640 |     "llm_load_print_meta: n_head_kv        = 8\n",
641 |     "llm_load_print_meta: n_layer          = 32\n",
642 |     "llm_load_print_meta: n_rot            = 128\n",
643 |     "llm_load_print_meta: n_embd_head_k    = 128\n",
644 |     "llm_load_print_meta: n_embd_head_v    = 128\n",
645 |     "llm_load_print_meta: n_gqa            = 4\n",
646 |     "llm_load_print_meta: n_embd_k_gqa     = 1024\n",
647 |     "llm_load_print_meta: n_embd_v_gqa     = 1024\n",
648 |     "llm_load_print_meta: f_norm_eps       = 0.0e+00\n",
649 |     "llm_load_print_meta: f_norm_rms_eps   = 1.0e-05\n",
650 |     "llm_load_print_meta: f_clamp_kqv      = 0.0e+00\n",
651 |     "llm_load_print_meta: f_max_alibi_bias = 0.0e+00\n",
652 |     "llm_load_print_meta: n_ff             = 14336\n",
653 |     "llm_load_print_meta: n_expert         = 0\n",
654 |     "llm_load_print_meta: n_expert_used    = 0\n",
655 |     "llm_load_print_meta: rope scaling     = linear\n",
656 |     "llm_load_print_meta: freq_base_train  = 10000.0\n",
657 |     "llm_load_print_meta: freq_scale_train = 1\n",
658 |     "llm_load_print_meta: n_yarn_orig_ctx  = 32768\n",
659 |     "llm_load_print_meta: rope_finetuned   = unknown\n",
660 |     "llm_load_print_meta: model type       = 7B\n",
661 |     "llm_load_print_meta: model ftype      = Q5_K - Medium\n",
662 |     "llm_load_print_meta: model params     = 7.24 B\n",
663 |     "llm_load_print_meta: model size       = 4.78 GiB (5.67 BPW)\n",
664 |     "llm_load_print_meta: general.name     = mlabonne_neuralbeagle14-7b\n",
665 |     "llm_load_print_meta: BOS token        = 1 '<s>'\n",
666 |     "llm_load_print_meta: EOS token        = 2 '</s>'\n",
667 |     "llm_load_print_meta: UNK token        = 0 '<unk>'\n",
668 |     "llm_load_print_meta: PAD token        = 2 '</s>'\n",
669 |     "llm_load_print_meta: LF token         = 13 '<0x0A>'\n",
670 |     "llm_load_tensors: ggml ctx size =    0.22 MiB\n",
671 |     "llm_load_tensors: offloading 32 repeating layers to GPU\n",
672 |     "llm_load_tensors: offloading non-repeating layers to GPU\n",
673 |     "llm_load_tensors: offloaded 33/33 layers to GPU\n",
674 |     "llm_load_tensors:        CPU buffer size =    85.94 MiB\n",
675 |     "llm_load_tensors:      CUDA0 buffer size =  4807.05 MiB\n",
676 |     "...................................................................................................\n",
677 |     "llama_new_context_with_model: n_ctx      = 512\n",
678 |     "llama_new_context_with_model: freq_base  = 10000.0\n",
679 |     "llama_new_context_with_model: freq_scale = 1\n",
680 |     "llama_kv_cache_init:      CUDA0 KV buffer size =    64.00 MiB\n",
681 |     "llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB\n",
682 |     "llama_new_context_with_model:  CUDA_Host input buffer size   =     9.01 MiB\n",
683 |     "llama_new_context_with_model:      CUDA0 compute buffer size =    80.30 MiB\n",
684 |     "llama_new_context_with_model:  CUDA_Host compute buffer size =     8.80 MiB\n",
685 |     "llama_new_context_with_model: graph splits (measure): 3\n",
686 |     "\n",
687 |     "llama_print_timings:        load time =     126.88 ms\n",
688 |     "llama_print_timings:      sample time =      60.32 ms /   464 runs   (    0.13 ms per token,  7692.56 tokens per second)\n",
689 |     "llama_print_timings: prompt eval time =     126.79 ms /    48 tokens (    2.64 ms per token,   378.57 tokens per second)\n",
690 |     "llama_print_timings:        eval time =   10946.64 ms /   463 runs   (   23.64 ms per token,    42.30 tokens per second)\n",
691 |     "llama_print_timings:       total time =   12363.03 ms /   511 tokens\n",
692 |     "```"
693 |    ],
694 |    "metadata": {
695 |     "collapsed": false
696 |    },
697 |    "id": "567d575896b4b301"
698 |   },
699 |   {
700 |    "cell_type": "markdown",
701 |    "source": [
702 |     "## Check `nvidia-smi` to see current GPU usage"
703 |    ],
704 |    "metadata": {
705 |     "collapsed": false
706 |    },
707 |    "id": "f3502d40d330f10b"
708 |   },
709 |   {
710 |    "cell_type": "code",
711 |    "execution_count": 11,
712 |    "id": "c9435585-9b61-434e-9c1b-4f9fc0e22184",
713 |    "metadata": {},
714 |    "outputs": [
715 |     {
716 |      "name": "stdout",
717 |      "output_type": "stream",
718 |      "text": [
719 |       "Sun Jan 28 17:30:04 2024       \n",
720 |       "+---------------------------------------------------------------------------------------+\n",
721 |       "| NVIDIA-SMI 537.79                 Driver Version: 537.79       CUDA Version: 12.2     |\n",
722 |       "|-----------------------------------------+----------------------+----------------------+\n",
723 |       "| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
724 |       "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
725 |       "|                                         |                      |               MIG M. |\n",
726 |       "|=========================================+======================+======================|\n",
727 |       "|   0  NVIDIA RTX A4000 Laptop GPU  WDDM  | 00000000:01:00.0 Off |                  N/A |\n",
728 |       "| N/A   66C    P0              99W / 100W |   5705MiB /  8192MiB |     81%      Default |\n",
729 |       "|                                         |                      |                  N/A |\n",
730 |       "+-----------------------------------------+----------------------+----------------------+\n",
731 |       "                                                                                         \n",
732 |       "+---------------------------------------------------------------------------------------+\n",
733 |       "| Processes:                                                                            |\n",
734 |       "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
735 |       "|        ID   ID                                                             Usage      |\n",
736 |       "|=======================================================================================|\n",
737 |       "|    0   N/A  N/A     17084      C   ...\\conda\\envs\\neuralbeagle\\python.exe    N/A      |\n",
738 |       "+---------------------------------------------------------------------------------------+\n"
739 |      ]
740 |     }
741 |    ],
742 |    "source": [
743 |     "!nvidia-smi"
744 |    ]
745 |   }
746 |  ],
747 |  "metadata": {
748 |   "kernelspec": {
749 |    "display_name": "Python 3 (ipykernel)",
750 |    "language": "python",
751 |    "name": "python3"
752 |   },
753 |   "language_info": {
754 |    "codemirror_mode": {
755 |     "name": "ipython",
756 |     "version": 3
757 |    },
758 |    "file_extension": ".py",
759 |    "mimetype": "text/x-python",
760 |    "name": "python",
761 |    "nbconvert_exporter": "python",
762 |    "pygments_lexer": "ipython3",
763 |    "version": "3.11.7"
764 |   }
765 |  },
766 |  "nbformat": 4,
767 |  "nbformat_minor": 5
768 | }
769 | 


--------------------------------------------------------------------------------