├── test-scripts
    ├── swaptensorgopu
    ├── stressgpu
    ├── i-haz-tokenz
    ├── torchamp
    ├── README.md
    ├── tensorgpu
    ├── lintest
    ├── numpytime
    ├── numpyprof
    └── numpybench
├── .gitignore
├── .github
    └── FUNDING.yml
├── macOS_Apple_Silicon_QuickStart.md
├── macOS-Install.md
└── README.md


/test-scripts/swaptensorgopu:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import torch
 4 | 
 5 | # Set the device to MPS
 6 | device = torch.device("mps", 0)
 7 | 
 8 | # Create random data
 9 | N = 10000
10 | a = torch.randn([N, N], device=device)
11 | b = torch.randn([N, N], device=device)
12 | 
13 | # Perform matrix multiplication
14 | for _ in range(10):
15 |     a @ b
16 | 
17 | 


--------------------------------------------------------------------------------
/test-scripts/stressgpu:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import torch
 4 | 
 5 | # Check if a GPU is available and if not, use a CPU
 6 | device = torch.device("cuda" if torch.cuda.is_available() else "mps")
 7 | 
 8 | # Create two random 5000x5000 matrices
 9 | mat1 = torch.randn(10000, 10000, device=device)
10 | mat2 = torch.randn(10000, 10000, device=device)
11 | 
12 | # Perform a matrix multiplication
13 | result = torch.mm(mat1, mat2)
14 | 
15 | # Print the result
16 | print(result)
17 | 
18 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | cache
 2 | characters
 3 | training/datasets
 4 | extensions/silero_tts/outputs
 5 | extensions/elevenlabs_tts/outputs
 6 | extensions/sd_api_pictures/outputs
 7 | extensions/multimodal/pipelines
 8 | logs
 9 | loras
10 | models
11 | presets
12 | repositories
13 | softprompts
14 | torch-dumps
15 | *pycache*
16 | */*pycache*
17 | */*/pycache*
18 | venv/
19 | .venv/
20 | .vscode
21 | .idea/
22 | *.bak
23 | *.ipynb
24 | *.log
25 | 
26 | settings.json
27 | settings.yaml
28 | notification.mp3
29 | img_bot*
30 | img_me*
31 | prompts/[0-9]*
32 | models/config-user.yaml
33 | 
34 | Thumbs.db
35 | .DS_Store
36 | 


--------------------------------------------------------------------------------
/test-scripts/i-haz-tokenz:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import nltk
 4 | import sys
 5 | #nltk.download('punkt')  # Download the Punkt tokenizer models
 6 | 
 7 | def tokenize_text(text):
 8 |     tokens = nltk.word_tokenize(text)
 9 |     return len(tokens)
10 | 
11 | # Check if filename is passed as argument
12 | if len(sys.argv) > 1:
13 |     filename = sys.argv[1]
14 |     with open(filename, 'r') as file:
15 |         text = file.read()
16 |         num_tokens = tokenize_text(text)
17 |         print(f"Number of tokens in file: {num_tokens}")
18 | # If no filename is passed, read from STDIN
19 | else:
20 |     text = sys.stdin.read()
21 |     num_tokens = tokenize_text(text)
22 |     print(f"Number of tokens from STDIN: {num_tokens}")
23 | 
24 | 


--------------------------------------------------------------------------------
/.github/FUNDING.yml:
--------------------------------------------------------------------------------
 1 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
 2 | patreon: unixwzrd
 3 | open_collective: # Replace with a single Open Collective username
 4 | ko_fi: unixwzrd
 5 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel
 6 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry
 7 | liberapay: # Replace with a single Liberapay username
 8 | issuehunt: # Replace with a single IssueHunt username
 9 | otechie: # Replace with a single Otechie username
10 | lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry
11 | custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2']ko_fi: unixwzrd
12 | 


--------------------------------------------------------------------------------
/test-scripts/torchamp:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import torch
 4 | 
 5 | # List of data types to test
 6 | dtypes = [torch.float32, torch.float16, torch.bfloat16, torch.int64, torch.int32, torch.int16, torch.int8]
 7 | 
 8 | for dtype in dtypes:
 9 |     print(f"*************************************************************I")
10 |     print(f"               Testing dtype: {dtype}")
11 |     print(f"*************************************************************I")
12 |     # Create some tensors of the specified data type
13 |     a = torch.randn(10000, 10000, dtype=dtype)
14 |     b = torch.randn(10000, 10000, dtype=dtype)
15 | 
16 |     with torch.autocast('mps'):
17 |         c = a + b
18 |         d = (a * b).mean()
19 | 
20 |     print(c)
21 |     print(d)
22 | 
23 | 


--------------------------------------------------------------------------------
/test-scripts/README.md:
--------------------------------------------------------------------------------
 1 | # Just Some Scripts
 2 | 
 3 | These are some scripts to stress test the GPU with meaningless tensors. Can be used with CUDA. MPS. or CPU computing engines.  This will let you see if your GPU is actually being used, and I'm sure of someone would like they could use them to determine the capacity of their GPU.
 4 | 
 5 | Enjoy.
 6 | 
 7 | 
 8 | The only one which really gives any information is this:
 9 | 
10 | There are now several scripts which give timing and profiling information for comparing different VENV's configurations, bit in setup and config and with some timing and profiling. For now, these are for NumPy.
11 | 
12 | numpybench  - displays numpy config and times tests using time.
13 | numpyprof   - displays numpy config and profiles tests using cProf.
14 | numpytime   - displays numpy config and times tests using timit.
15 | 
16 | 
17 | Environment variable NO_TEST is set in order to just get configuration and bypass the testing which may be time-consuming.
18 | 
19 |     export NO_TEST=1    # Turn off testong 
20 | 
21 |     unset NO_TEST       # Turn on testing (Default)


--------------------------------------------------------------------------------
/test-scripts/tensorgpu:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import torch
 4 | 
 5 | # Set the device to GPU
 6 | device = 'cuda' if torch.cuda.is_available() else ( 'mps' if torch.backends.mps.is_available() else 'cpu' )
 7 | 
 8 | # Increase the size of the tensors
 9 | N = 10000  # Number of rows
10 | D_in = 10000  # Input dimension
11 | H = 10000  # Hidden layer dimension
12 | D_out = 10000  # Output dimension
13 | 
14 | # Create random input and output data
15 | x = torch.randn(N, D_in, device=device)
16 | y = torch.randn(N, D_out, device=device)
17 | 
18 | # Randomly initialize weights
19 | w1 = torch.randn(D_in, H, device=device)
20 | w2 = torch.randn(H, D_out, device=device)
21 | 
22 | learning_rate = 1e-6
23 | for t in range(500):
24 |     # Forward pass: compute predicted y
25 |     h = x.mm(w1)
26 |     h_relu = h.clamp(min=0)
27 |     y_pred = h_relu.mm(w2)
28 | 
29 |     # Compute and print loss
30 |     loss = (y_pred - y).pow(1.5).sum().item()
31 |     print(t, loss)
32 | 
33 |     # Backprop to compute gradients of w1 and w2 with respect to loss
34 |     grad_y_pred = 1.75 * (y_pred - y)
35 |     grad_w2 = h_relu.t().mm(grad_y_pred)
36 |     grad_h_relu = grad_y_pred.mm(w2.t())
37 |     grad_h = grad_h_relu.clone()
38 |     grad_h[h < 0] = 0
39 |     grad_w1 = x.t().mm(grad_h)
40 | 
41 |     # Update weights using gradient descent
42 |     w1 -= learning_rate * grad_w1
43 |     w2 -= learning_rate * grad_w2
44 | 


--------------------------------------------------------------------------------
/test-scripts/lintest:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import numpy as np
 4 | import numpy.random as npr
 5 | import torch
 6 | import time
 7 | 
 8 | 
 9 | if not torch.backends.mps.is_available():
10 |    if not torch.backends.mps.is_built():
11 |       print("MPS not available because the current PyTorch install was not "
12 |       "built with MPS enabled.")
13 |    else:
14 |       print("MPS not available because the current MacOS version is not 12.3+ "
15 |       "and/or you do not have an MPS-enabled device on this machine.")
16 |  
17 | # --- Test 1
18 | N = 1
19 | n = 1000
20 |  
21 | A = npr.randn(n,n)
22 | B = npr.randn(n,n)
23 |  
24 | t = time.time()
25 | for i in range(N):
26 |     C = np.dot(A, B)
27 | td = time.time() - t
28 | print("dotted two (%d,%d) matrices in %0.1f ms" % (n, n, 1e3*td/N))
29 |  
30 | # --- Test 2
31 | N = 100
32 | n = 4000
33 |  
34 | A = npr.randn(n)
35 | B = npr.randn(n)
36 |  
37 | t = time.time()
38 | for i in range(N):
39 |     C = np.dot(A, B)
40 | td = time.time() - t
41 | print("dotted two (%d) vectors in %0.2f us" % (n, 1e6*td/N))
42 |  
43 | # --- Test 3
44 | m,n = (2000,1000)
45 |  
46 | A = npr.randn(m,n)
47 |  
48 | t = time.time()
49 | [U,s,V] = np.linalg.svd(A, full_matrices=False)
50 | td = time.time() - t
51 | print("SVD of (%d,%d) matrix in %0.3f s" % (m, n, td))
52 |  
53 | # --- Test 4
54 | n = 1500
55 | A = npr.randn(n,n)
56 |  
57 | t = time.time()
58 | w, v = np.linalg.eig(A)
59 | td = time.time() - t
60 | print("Eigendecomp of (%d,%d) matrix in %0.3f s" % (n, n, td))
61 | 
62 | 


--------------------------------------------------------------------------------
/test-scripts/numpytime:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import numpy as np
 4 | import os
 5 | import timeit
 6 | 
 7 | def main():
 8 |     size = 2000
 9 |     A = np.random.rand(size, size)
10 |     B = np.random.rand(size, size)
11 | 
12 |     # Number of iterations
13 |     iterations = 10
14 | 
15 |     # Matrix multiplication
16 |     multiplication_time = timeit.timeit(lambda: np.dot(A, B), number=iterations)
17 |     print(f"Time for matrix multiplication: {multiplication_time/iterations:.4f} seconds")
18 | 
19 |     # Matrix transposition
20 |     transposition_time = timeit.timeit(lambda: np.transpose(A), number=iterations)
21 |     print(f"Time for matrix transposition: {transposition_time/iterations:.4f} seconds")
22 | 
23 |     # Eigenvalue computation
24 |     eigenvalue_time = timeit.timeit(lambda: np.linalg.eigvals(A), number=iterations)
25 |     print(f"Time for eigenvalue computation: {eigenvalue_time/iterations:.4f} seconds")
26 | 
27 |     # Fourier transformation
28 |     fft_time = timeit.timeit(lambda: np.fft.fft(A), number=iterations)
29 |     print(f"Time for Fourier transformation: {fft_time/iterations:.4f} seconds")
30 | 
31 |     # Summation
32 |     summation_time = timeit.timeit(lambda: np.sum(A), number=iterations)
33 |     print(f"Time for summation: {summation_time/iterations:.4f} seconds")
34 | 
35 | if __name__ == "__main__":
36 | 
37 |     print("Producing information for VENV ----> ", os.getenv("CONDA_DEFAULT_ENV"))
38 | 
39 |     np.show_config()
40 | 
41 |     if os.getenv('NO_TEST') == "1":
42 |         print("############### Skipping performance checks.")
43 |     else:
44 |         main()
45 | 
46 | 


--------------------------------------------------------------------------------
/test-scripts/numpyprof:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | 
 3 | import numpy as np
 4 | import os
 5 | import timeit
 6 | import cProfile
 7 | 
 8 | 
 9 | 
10 | def main():
11 |     size = 2000
12 |     A = np.random.rand(size, size)
13 |     B = np.random.rand(size, size)
14 | 
15 |     # Number of iterations
16 |     iterations = 10
17 | 
18 |     # Matrix multiplication
19 |     multiplication_time = timeit.timeit(lambda: np.dot(A, B), number=iterations)
20 |     print(f"Time for matrix multiplication: {multiplication_time/iterations:.4f} seconds")
21 | 
22 |     # Matrix transposition
23 |     transposition_time = timeit.timeit(lambda: np.transpose(A), number=iterations)
24 |     print(f"Time for matrix transposition: {transposition_time/iterations:.4f} seconds")
25 | 
26 |     # Eigenvalue computation
27 |     eigenvalue_time = timeit.timeit(lambda: np.linalg.eigvals(A), number=iterations)
28 |     print(f"Time for eigenvalue computation: {eigenvalue_time/iterations:.4f} seconds")
29 | 
30 |     # Fourier transformation
31 |     fft_time = timeit.timeit(lambda: np.fft.fft(A), number=iterations)
32 |     print(f"Time for Fourier transformation: {fft_time/iterations:.4f} seconds")
33 | 
34 |     # Summation
35 |     summation_time = timeit.timeit(lambda: np.sum(A), number=iterations)
36 |     print(f"Time for summation: {summation_time/iterations:.4f} seconds")
37 | 
38 | if __name__ == "__main__":
39 | 
40 |     print("Producing information for VENV ----> ", os.getenv("CONDA_DEFAULT_ENV"))
41 | 
42 |     np.show_config()
43 | 
44 |     if os.getenv('NO_TEST') == "1":
45 |         print("############### Skipping performance checks.")
46 |     else:
47 |         profiler = cProfile.Profile()
48 |         profiler.enable()
49 |         main()
50 |         profiler.disable()
51 |         profiler.print_stats(sort='cumulative')
52 | 


--------------------------------------------------------------------------------
/test-scripts/numpybench:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python3
 2 | """
 3 | This will give a high level overview of teh performance of various
 4 | matrix operations on an array and iterate throught he calculktions
 5 | a number of times. There is nothing useful in these calculations,
 6 | this is simply to run throug soem calcutions a number of times to
 7 | see the relative performance of the GPU or CPU on larce matrices.
 8 | """
 9 | import os
10 | import sys
11 | import time
12 | from argparse import ArgumentParser
13 | from datetime import datetime
14 | from io import StringIO
15 | import numpy as np
16 | 
17 | # Initialize argparse
18 | parser = ArgumentParser(description="Run NumPy benchmarks and output results.")
19 | parser.add_argument("-d", "--datafile", type=str, nargs='?',
20 |                     default=f"{os.getenv('CONDA_DEFAULT_ENV', 'default')}-timing.txt",
21 |                     help="Specify the datafile to write the output to.")
22 | parser.add_argument("-s", "--skip-tests", action="store_true", help="Skip time-consuming tests.")
23 | parser.add_argument("-c", "--count", type=int, default=1, help="Number of iterations.")
24 | 
25 | # Parse the arguments
26 | args = parser.parse_args()
27 | 
28 | datafile = None
29 | if args.datafile:
30 |     datafile = open(args.datafile, 'w', encoding="utf-8")
31 | 
32 | def print_with_timestamp(message, file=None):
33 |     """ Function to print messages with a timestamp """
34 |     output = f"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')} {message}"
35 |     print(output)
36 |     if file:
37 |         file.write(output + '\n')
38 | 
39 | 
40 | def do_tests():
41 |     """ Perform all the basic tests """
42 |     size = 2500
43 |     A = np.random.rand(size, size)
44 |     B = np.random.rand(size, size)
45 | 
46 |     # Number of iterations
47 |     iterations = args.count
48 | 
49 |     tests = [
50 |         ("Matrix multiplication", lambda: np.dot(A, B)),
51 |         ("Matrix transposition", lambda: np.transpose(A)),
52 |         ("Eigenvalue computation", lambda: np.linalg.eigvals(A)),
53 |         ("Fourier transformation", lambda: np.fft.fft(A)),
54 |         ("Summation", lambda: np.sum(A))
55 |     ]
56 | 
57 |     for name, test_func in tests:
58 |         print_with_timestamp(f"BEGIN TEST: {name}", file=datafile)
59 |         start = time.time()
60 |         for _ in range(iterations):
61 |             test_func()
62 |         end = time.time()
63 |         print_with_timestamp(f"Time for {name.lower()}: {(end - start):.4f} seconds",
64 |                              file=datafile)
65 |         print_with_timestamp("END TEST / BEGIN NEXT TEST", file=datafile)
66 | 
67 | 
68 | def main():
69 |     """ Main script """
70 |     print_with_timestamp(f"Producing information for VENV ----> {os.getenv('CONDA_DEFAULT_ENV')}",
71 |                          file=datafile)
72 | 
73 |     # Capture np.show_config() output and print it line by line with timestamps
74 |     old_stdout = sys.stdout
75 |     new_stdout = StringIO()
76 |     sys.stdout = new_stdout
77 |     np.show_config()
78 |     sys.stdout = old_stdout
79 | 
80 |     for line in new_stdout.getvalue().split("\n"):
81 |         print_with_timestamp(line, file=datafile)
82 | 
83 |     if args.skip_tests:
84 |         print_with_timestamp("############### SKIPPING PERFORMANCE CHECKS", file=datafile)
85 |     else:
86 |         do_tests()
87 | 
88 |     if datafile:
89 |         datafile.close()
90 | 
91 | if __name__ == "__main__":
92 |     main()


--------------------------------------------------------------------------------
/macOS_Apple_Silicon_QuickStart.md:
--------------------------------------------------------------------------------
  1 | # oobabooga macOS Apple Silicon Quick Start for the Impatient
  2 | 
  3 | Make sure Xcode at the minimum is installed.
  4 | 
  5 | If you are really in a rush and feeling brave, copy all of these lines into a text file and edit the uncomment line for version for the type of install you want. Uncomment the lines you wish to use and paste them in one at time into a terminal session of your choice. Use the script created as a template for your start script.
  6 | 
  7 | These instructions have been tested with a non-admin, plain user, so they should work for most everyone, but do let me know if something doesn't and I'll fix it. Typos and copy and paste sometimes have a away for going wrong.
  8 | 
  9 | ## DO NOT JUST COPY AND PASTE UNLESS YOU HAVE READ AND UNDERSTAND THE INSTRUCTIONS - YOU MAY NEED TO CHANGE THEM FOR YOUR SYSTEM
 10 | 
 11 | ## 15 Sep 2024 - updated instructions, you may need to update your CMake, I did.
 12 | 
 13 | This has ben updated with a few new items, like CMake, installing in the user's home directory.
 14 | 
 15 | ```bash
 16 | #!/bin/bash
 17 | ## These instructions assume you are using the Bash shell. I also sugget getting a copy
 18 | ## of iTerm2, it will make your life better, iut is much better than the default terminal
 19 | ## on macOS.
 20 | ## 
 21 | ## If you are using zsh, do this first, do it even if you are running bash,
 22 | ## it will not hurt anything.
 23 | 
 24 | ## This will give you a login shell with bash.
 25 | exec bash -l
 26 | 
 27 | cd "${HOME}"
 28 | 
 29 | umask 022
 30 | 
 31 | ### Choose a target directory for everything to be put into, I'm using "${HOME}/projects/ai-projects" You
 32 | ### may use whatever you wish. This must be exported because we will exec a new login shell later.
 33 | export TARGET_DIR="${HOME}/projects/ai-projects"
 34 | 
 35 | mkdir -p "${TARGET_DIR}"
 36 | cd "${TARGET_DIR}"
 37 | 
 38 | # This will add to your path and DYLD_LIBRARY_PATH if they aren't already seyt up.
 39 | # export PATH=${HOME}/local/bin
 40 | # export DYLD_LIBRARY_PATH=${HOME}/local/lib:$DYLD_LIBRARY_PATH
 41 | 
 42 | ### Be sure to add ${HOME}/local/bin to your path  **Add to your .profile, .bashrc, etc...**
 43 | export PATH=${HOME}/local/bin:${PATH}
 44 | 
 45 | ### Thwe following Sed line will add it permanantly to your .bashrc if it's not already there.
 46 | sed -i.bak '
 47 |   /export PATH=/ {
 48 |     h; s|$|:${HOME}/local/bin|
 49 |   }
 50 |   ${
 51 |     x; /./ { x; q0 }
 52 |     x; s|.*|export PATH=${HOME}/local/bin:\$PATH|; h
 53 |   }
 54 |   /export DYLD_LIBRARY_PATH=/ {
 55 |     h; s|$|:${HOME}/local/lib|
 56 |   }
 57 | ' ~/.bashrc && source ~/.bashrc
 58 | 
 59 | ## Install Miniconda
 60 | 
 61 | #### We will set this heer and it will be used later when we source .bashrc later.
 62 | echo 'export MACOS_LLAMA_ENV="macOS-llama-env"' >> ~/.bashrc
 63 | 
 64 | ### Download the miniconda installer
 65 | curl  https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o miniconda.sh
 66 | 
 67 | ### Run the installer in non-destructive mode in order to preserve any existing installation.
 68 | sh miniconda.sh -b -u
 69 | . "${HOME}/miniconda3/bin/activate"
 70 | 
 71 | conda init $(basename "${SHELL}")
 72 | conda update -n base -c defaults conda -y
 73 |  
 74 | #### Get a new login shell no that conda is activated to your shell profile.
 75 | exec bash -l
 76 | 
 77 | umask 022
 78 | 
 79 | #### Just in case your startup login environment scripts do some thing like change to another directory.
 80 | #### Get back into teh target directory for teh build.
 81 | cd "${TARGET_DIR}"
 82 | 
 83 | #### Set the name of the VENV to whatever you wish it to be. This will be used later when the procedure
 84 | #### creates a script for sourcing in the Conda environment and activating the one set here when you installed.
 85 | #### Create the base Python 3.10 and the llama-env VENV.
 86 | conda create -n ${MACOS_LLAMA_ENV} python=3.10 -y
 87 | conda activate ${MACOS_LLAMA_ENV}
 88 | 
 89 | ## Build and install CMake
 90 | 
 91 | ### Clone the CMake repository, build, and install CMake
 92 | git clone https://github.com/Kitware/CMake.git
 93 | cd CMake
 94 | git checkout tags/v3.30.2
 95 | mkdir build
 96 | cd build
 97 | 
 98 | ### This will configure the installation of cmake to be in your home directory under local, rather than /usr/local
 99 | ../bootstrap --prefix=${HOME}/local
100 | make -j
101 | make -j test
102 | make install
103 | 
104 | ### Verify the installation
105 | which cmake       # Should say $HOME/local/bin
106 | ### Verify you are running cmake z3.29.3
107 | cmake --version
108 | cd  "${TARGET_DIR}"
109 | 
110 | 
111 | ## Get my oobabooga and checkout macOS-test branch
112 | git clone https://github.com/unixwzrd/text-generation-webui-macos.git textgen-macOS
113 | cd textgen-macOS
114 | git checkout macOS-dev
115 | pip install -r requirements.txt
116 | 
117 | ## llamacpp-python
118 | CMAKE_ARGS="-DLLAMA_METAL=on" \
119 | FORCE_CMAKE=1 \
120 | PATH=/usr/local/bin:$PATH \
121 | pip install llama-cpp-python==0.2.90 --force-reinstall --no-cache --no-binary :all: --compile --no-deps --no-build-isolation
122 | 
123 | ## Pip install from daily build
124 | pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu --no-deps --force-reinstall
125 | 
126 | ## NumPy Rebuild with Pip
127 | CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" \
128 | pip install numpy==1.26.* --force-reinstall --no-deps --no-cache --no-binary :all: --no-build-isolation --compile -Csetup-args=-Dblas=accelerate -Csetup-args=-Dlapack=accelerate -Csetup-args=-Duse-ilp64=true
129 | 
130 | ## CTransformers
131 | export CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate"
132 | export CT_METAL=1
133 | pip install ctransformers --no-binary :all: --no-deps --no-build-isolation --compile --force-reinstall
134 | 
135 | ### Unset all the stuff we set while building.
136 | unset CMAKE_ARGS FORCE_CMAKE CFLAGS CT_METAL
137 | 
138 | 
139 | ## This will create a startup script whcih shoudl be clickable in finder.
140 | 
141 | ### Set the startup options you wish to use
142 | 
143 | # Add any startup options you wich to this here:
144 | START_OPTIONS=
145 | #START_OPTIONS="--verbose "
146 | #START_OPTIONS="--verbose --listen"
147 | 
148 | cat <<_EOT_ > start-webui.sh
149 | #!/bin/bash
150 | 
151 | # >>> conda initialize >>>
152 | __conda_setup="$('${HOME}/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
153 | if [ $? -eq 0 ]; then
154 |   eval "$__conda_setup"
155 | else
156 |   if [ -f "${HOME}/miniconda3/etc/profile.d/conda.sh" ]; then
157 |     . "${HOME}/miniconda3/etc/profile.d/conda.sh"
158 |   else
159 |     export PATH="${HOME}/miniconda3/bin:$PATH"
160 |   fi
161 | fi
162 | unset __conda_setup
163 | # <<< conda initialize <<<
164 | 
165 | cd "${TARGET_DIR}/textgen-macOS"
166 | 
167 | conda activate ${MACOS_LLAMA_ENV}
168 | 
169 | python server.py ${START_OPTIONS}
170 | _EOT_
171 | 
172 | 
173 | chmod +x start-webui.sh
174 | ```
175 | 
176 | ## Starting the Web UI
177 | 
178 | ### This will create the file in the current directory which is displayed here feel free to move it
179 | 
180 | ````bash
181 | ${PWD}
182 | ```
183 | 
184 | ### Feel free to move it to another location.
185 | 
186 | ```bash
187 | ./start-webui.sh
188 | ```
189 | 


--------------------------------------------------------------------------------
/macOS-Install.md:
--------------------------------------------------------------------------------
  1 | # Apple Silicon Support for oobabooga text-generation-webui
  2 | 
  3 | This guide provides instructions on how to build and run the oobabooga text-generation-webui on macOS, specifically on Apple Silicon.
  4 | 
  5 | This repository is primarily for oobabooga users at the moment, many of the Python libraries and packages used here may also be used for Data Analytics, Machine Learning and other purposes.
  6 | 
  7 | I have a new repository on the way to assist with Apple Silicon M1/M2 and GPU performance VENV builds. This will produce configurable, repeatable, consistent VENV builds for Python packages and modules in all types of layering/stacking and at some point soon, branching builds. This will allow different installation procedures to be compared and evaluated for performance and through regression tests. Getting these consistent, working builds has been a bit difficult as new packages come out all the time and there are many cross-module/package dependencies, some incompatible, and some in conflict.
  8 | 
  9 | ## 02 Jun 2024 - Rolled back Jinja, should be fine now
 10 | 
 11 | Latest != Greatest, Latest + Greatest != Best, Stable == None
 12 | 
 13 | ## TL;DR
 14 | 
 15 | 1. **Python**: Install Python 3.10 using Miniconda. Create a virtual environment and install pip.
 16 | 1. **CMake**: Install CMake from source to avoid potential issues with universal binaries. This is used for building other software.
 17 | 1. **oobabooga Base**: Clone the oobabooga GitHub repository and install the Python modules listed in its requirements.txt file.
 18 | 1. **Llama for macOS and MPS**: Uninstall any existing version of llama-cpp-python, then reinstall it with specific CMake arguments to enable Metal support.
 19 | 1. **PyTorch for macOS and MPS**: Install PyTorch, torchvision, and torchaudio from the PyTorch Conda channel.
 20 | 
 21 | Check out [oobabooga macOS Apple Silicon Quick Start for the Impatient](https://github.com/unixwzrd/oobabooga-macOS/blob/main/macOS_Apple_Silicon_QuickStart.md) for the short method without explanations.
 22 | 
 23 | Throughout the process, you're advised to create clones of your Conda environments at various stages. This allows you to easily roll back to a previous state if something goes wrong.
 24 | 
 25 | Please note that the guide is incomplete and is expected to be continued.
 26 | 
 27 | - [Apple Silicon Support for oobabooga text-generation-webui](#apple-silicon-support-for-oobabooga-text-generation-webui)
 28 |   - [02 Jun 2024 - Rolled back Jinja, should be fine now](#02-jun-2024---rolled-back-jinja-should-be-fine-now)
 29 |   - [TL;DR](#tldr)
 30 |   - [Building for macOS and Apple Silicon](#building-for-macos-and-apple-silicon)
 31 |   - [Pre-requisites](#pre-requisites)
 32 |   - [Some initial setup](#some-initial-setup)
 33 |   - [Get Conda (Miniconda)](#get-conda-miniconda)
 34 |   - [CMake](#cmake)
 35 |   - [Verify we have everything set up for the rest of the build and install](#verify-we-have-everything-set-up-for-the-rest-of-the-build-and-install)
 36 |   - [Clone my oobabooga macOS GitHub Repository](#clone-my-oobabooga-macos-github-repository)
 37 |   - [Pip Install the PyTorch Daily Build](#pip-install-the-pytorch-daily-build)
 38 |   - [Llama for macOS and MPS (Metal Performance Shaders)](#llama-for-macos-and-mps-metal-performance-shaders)
 39 |     - [Using Pip for PyTorch](#using-pip-for-pytorch)
 40 |   - [NunPy](#nunpy)
 41 |   - [CTransformers](#ctransformers)
 42 |   - [Nearly finished](#nearly-finished)
 43 |   - [Where We Are](#where-we-are)
 44 |   - [Extensions](#extensions)
 45 | 
 46 | 
 47 | 
 48 | This guide is quite comprehensive and covers everything from getting the necessary prerequisites to building and installing all the required components. It also includes a section on how to clone and install the oobabooga repository and its requirements. The guide is still a work in progress and will be updated with more information in the future.
 49 | 
 50 | ## Building for macOS and Apple Silicon
 51 | 
 52 | You will likely need the pre-requisites regardless. This document is a work in progress. If you notice anything incorrect, unclear, or outdated, please let me know.
 53 | 
 54 | Many people might suggest using Brew, but I am old-school and have been building Open Source before package managers existed. Package managers are both a blessing and a curse. I've had bad experiences with Brew and other package managers that manage Open Source and other source-distributed installations.
 55 | 
 56 | - Advantages of Package Managers:
 57 | 
 58 |   - Package managers handle dependencies.
 59 |   - Package managers help keep your system up to date.
 60 |   - With package managers, everything is pre-configured.
 61 |   - Package managers automatically provide updates.
 62 | 
 63 | - Disadvantages of Package Managers:
 64 | 
 65 |   - Package managers handle dependencies, which can sometimes lead to unwanted changes.
 66 |   - Package managers keep your system up to date, but sometimes you might want to stick with a specific version.
 67 |   - With package managers, everything is pre-configured, which can limit customization.
 68 |   - Package managers automatically provide updates, which can sometimes break things.
 69 | 
 70 | These points illustrate why package managers can be both good and not so good. If you want maximum control over your environment, build it yourself, document it, write some scripts to help automate the process, and figure out something that works for you. However, building everything yourself comes at a cost: it's time-consuming, you need to keep things up to date (though version inconsistencies can still exist either way), and you need to know what you're doing to debug odd problems during a build.
 71 | 
 72 | Building is sometimes the best option, such as when you need special options built in or want to use a version other than what is typically distributed. There are many ways to do this, but I'm going to present one method. While it may not be the best, could probably be improved upon, or there's always another way, this is what we're going to do and hopefully, it's simple enough for anyone to follow the directions. In fact, as I am updating this file, I have completely torn down my build environment (making a backup) and am going to follow the steps through here to validate.
 73 | 
 74 | I did it the long way so I could ensure I had the proper versions of libraries and modules which, for many reasons, get overlaid, reverted, or uninstalled, and a different version gets installed from a different repository. Some repositories are better in sync than others, but I tried going to the source for these things. I mention using the --dry-run argument, but sometimes the output is difficult to sift through. I will also explain setting up virtual environments or VENV using Conda.
 75 | 
 76 | ## Pre-requisites
 77 | 
 78 | Before you begin, there are a few things you'll need.
 79 | 
 80 | 1. **iTerm2**
 81 | 
 82 |     This should be the first thing you download.
 83 | 
 84 |     Download iTerm2 here: <https://iterm2.com/downloads.html>
 85 | 
 86 |     If you spend any time on the command line, this is a must-have, unless you're content with Terminal.app. There are many configuration options to explore. PROTIP: Set it up for tabbed windows.
 87 | 
 88 |     **IMPORTANT NOTE:** This is a universal application. Before you run it, find the application where you installed it, "Right Click" on it, select "Get Info", and ensure that "Open using Rosetta" is not checked. If it is, iTerm will think it's running on an Intel machine, which can cause problems during software builds.
 89 | 
 90 | 2. **Xcode**
 91 | 
 92 |     You'll need a compiler. While it would be ideal if GCC ran on macOS, Xcode is a sufficient alternative.
 93 | 
 94 |     You can download Xcode from the App Store.
 95 | 
 96 |     **IMPORTANT NOTE:** If you ONLY want the command line tools and not the complete Xcode IDE, you can get just the command line tools for Xcode by running the xcode-select command. Open up the iTerm2 you downloaded and installed earlier or Terminal.app in /Applications/Utilities and get the command line tools for Xcode like this:
 97 | 
 98 |       ```bash
 99 |       xcode-select --install
100 |       ```
101 | 
102 | 3. **VSCode**
103 | 
104 |     Yes, two IDE's, but there are many plugins for VSCode. Unless you're developing macOS or other Apple apps, this is a great IDE. I like it because there's a Vi/Vim mode for it. If you're familiar with the keystrokes for Vi, you're good to go. An integrated terminal means you can run a command line while in the IDE, and it can even do ssh tunneling so you can develop on a remote machine appearing as though everything was local. There are many options, settings, and plugins for this, and finding the right ones may be challenging. However, if you're working with AI or Data Science, you'll likely want this for the Jupyter Notebooks support alone. It also integrates seamlessly with GitHub.
105 | 
106 |     Make sure you get the "Apple Silicon" zip file. The universal and the Intel version caused me problems when I migrated from my Intel Mac to Apple Silicon because it would run in Rosetta, and everything on the system would report that it was running on Intel when using the terminal. Universal could possibly run inside Rosetta as there is an option on some applications, like iTerm, where when you open the "Get Info" for the application, there is an option to run it using Rosetta. Make sure you don't have the universal build and this box is unchecked.
107 | 
108 |     Download VSCode here: <https://code.visualstudio.com/Download>
109 | 
110 |     Unzip the file wherever your downloads are and copy the application to your preferred location.
111 | 
112 | 4. **GNU Coreutils (Optional, but a matter of taste)**
113 | 
114 |     This isn't strictly a requirement, but more of a personal preference. You might want to use something like Brew to install this. The problem is that the new ls command that comes with macOS displays directory listings in color, just like GNU ls which comes with most Linux distributions. However, the colors and configuration of your colors for macOS is not compatible with GNU ls, is extremely difficult to configure using the scant information in the man page, but the bottom line is the colors they chose for things gives me a headache, so any screenshots of my terminal will be done using GNU ls for directory listings.
115 | 
116 |     My theory is they are trying to actively discourage people from using the command line.
117 | 
118 | ## Some initial setup
119 | 
120 | You will need to have your environment set up for all the following steps to work. These need to be done so your installation will go as smoothly as possible. You may wish to change some of these items for how you like to do things. However they should work for pretty much any non-privileged user.
121 | 
122 | ```bash
123 | ### These commands are for a bash shell. I tryed switching to Zsh, but I have to much legacy with bash it
124 | ### wasted a lot of my time trying to get all my accumulated stuff to work with Zsh.
125 | ###
126 | ### So, let's make sure we are using a fresh bash shell.
127 | exec bash -l
128 | 
129 | cd "${HOME}"
130 | 
131 | ### Choose a target directory for everything to be put into, I'm using "${HOME}/projects/ai-projects" You
132 | ### may use whatever you wish. These must be exported because we will exec a new login shell later. "Normal" shell variables will not be passed to th enew login shell, we are just setting them up front.
133 | export TARGET_DIR="${HOME}/projects/ai-projects"
134 | 
135 | ### Run this commadn to ad the MACOS_LLAMA_ENV variable to your .bashrc
136 | ### we will being it inrothe environment after teh PATH is modified below.
137 | echo 'export MACOS_LLAMA_ENV="macOS-llama-env"' >> ~/.bashrc
138 | 
139 | ### Set a reasonable umask - this controls the default permissions for your files when they are created.
140 | umask 0022
141 | 
142 | 
143 | ### Create the target directory where we sill be dowloading, building and installing from.
144 | mkdir -p "${TARGET_DIR}"
145 | cd "${TARGET_DIR}"
146 | 
147 | ### Be sure to add ${HOME}/local/bin to your path  **Add to your .profile, .bashrc, etc...**
148 | export PATH=${HOME}/local/bin:${PATH}
149 | 
150 | ### Thwe following Sed line will add it permanantly to your .bashrc if it's not already there.
151 | sed -i.bak '
152 |   /export PATH=/ {
153 |     h; s|$|:${HOME}/local/bin|
154 |   }
155 |   ${
156 |     x; /./ { x; q0 }
157 |     x; s|.*|export PATH=${HOME}/local/bin:\$PATH|; h
158 |   }
159 |   /export DYLD_LIBRARY_PATH=/ {
160 |     h; s|$|:${HOME}/local/lib|
161 |   }
162 | ' ~/.bashrc && source ~/.bashrc
163 | ```
164 | 
165 | ## Get Conda (Miniconda)
166 | 
167 | **NOTE:** If Conda is already installed on your machine, skip this step, but this will also ensure your Conda setup is up-to-date. We're going to skip over the NumPy rebuild here because the llama-cpp-python build will bring NumPy along with it, and the Conda installation of PyTorch also brings along a different NumPy with support libraries in a "hidden" package called "numpy-base".
168 | 
169 | During this process, be cautious as some libraries require the properly compiled version rather than the version that comes with pip or conda. This is important because some extensions for oobabooga may uninstall perfectly fine versions of libraries and downgrade them due to dependencies. This can lead to performance loss and troubleshooting issues. This has happened to me with NumPy and llama.cpp. My goal here is to pay close attention to the libraries during the construction of the environment for running and managing LLMs using oobabooga. I aim to catch as many potential issues as possible.
170 | 
171 | One way to avoid conflicts, downgrades, and other issues is to use the "--dry-run" argument. This will show you what it plans to do without actually doing it. The output can be lengthy and you might miss things. As an extra precaution, I clone my virtual environments (VENV), then switch to the new one before making any potentially harmful changes.
172 | 
173 | ```bash
174 | cd
175 | mkdir tmp
176 | cd tmp
177 | curl  https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o miniconda.sh
178 | # Do a non-destructive Conda install whcih will preserve existing VENV's
179 | sh miniconda.sh -b -u
180 | 
181 | # Activate the conda environment.
182 | source ${HOME}/miniconda3/bin/activate
183 | 
184 |  # Initialize Conda which will add initialization functions to your shell's profile.
185 | conda init $(basename ${SHELL})
186 | 
187 |  # Update your Conda environment with the latest updates to th ebase environment
188 | conda update -n base -c defaults conda -y
189 | 
190 | # Grab a new login shell - this will work for any shell aand you wil enter back in the
191 | # tmp directory we just created..
192 | exec bash -l
193 | 
194 | umask 022
195 | ```
196 | 
197 | Create a new VENV using Python 3.10. This will serve as your base virtual environment for anything you wish to use with Python 3.10. This is the version you need for running oobabooga. If you have another project, you can always return to the base and build from there. This helps avoid the issue of conflicting versions resulting from using package managers.
198 | 
199 | ```bash
200 | #### Create the base Python 3.10 and the llama-env VENV.
201 | conda create -n ${MACOS_LLAMA_ENV} python=3.10 -y
202 | conda activate ${MACOS_LLAMA_ENV}
203 | ```
204 | 
205 | This gives us a clean environment to return to as a base. I tend to clone my conda VENVs so it's easy to roll back any changes that have negatively impacted my environment. It saves time to be able to roll back to a known good environment and move forward again. These VENVs are useful for rolling back to a known configuration. I recommend cloning your good VENV, activating it, and applying any changes to that. Many packages or updates affect multiple python modules at once, and this is an easy way to roll back and then move forward, creating a new VENV cloned from the previous one. Then, new items are installed into that VENV. When it's working, clone that one, activate it, and do the next round of updates or changes. At any point, VENVs can be completely removed and even renamed. So, you can take your final VENV, if you're happy with it, and rename it back to the base for your application. I will try to do this as I go along in this installation, taking VENV checkpoints which I can roll back to if needed.
206 | 
207 | Cloning a VENV can also help you quickly determine if a compile, or some other module, provides any performance advantage. I can explain some of these techniques at another time.
208 | 
209 | ## CMake
210 | 
211 | You will need the latest version of CMake, at least version 3.29.3. Make sure you have it installed and working. You may already have CMake installed, if you do, skip this step, but verify you are using the proper version. Many dependencies rely on CMake, which is beneficial as it builds based on the original hardware and software configuration of your machine.
212 | 
213 | You can find it here: <https://cmake.org/download/>
214 | 
215 | CMake is easy to install and will be needed for later steps like llama.cpp, llama-cpp-python, and other modules.
216 | 
217 | Download the latest source version of CMake. Avoid using the packages as they are universal binaries, and you might accidentally end up building something with x86_64 architecture. This is unverified, but it's better to be safe.
218 | 
219 | A lot of issues surrounding getting all this to work stem from various machines building libraries and packages running universal binaries through Rosetta. This allows them to run on macOS, but not necessarily take advantage of the M1/M2 system on chip and unified memory. I discovered that a number of libraries are universal binaries, which could be an issue. I first noticed this when I was looking in the "Activity Monitor" and was surprised when oobabooga came up running as "Intel". This was a result of my VSCode running using Rosetta.
220 | 
221 | **NOTE:** I am using a recent copy of GNU Make, which is a parallelizing make. Apple's make with macOS is an older version of GNU Make - 3.81, so it should be fine as well.
222 | 
223 | **NOTE** This will want to install in ${HOME}/local/bin. Alternatively, you could install in /usr/local but you will need administrator access and possibly have to disable Apples SIP (System Integrity Protection), a process I will not go into here as it affects overall system protection. Either way, you will need to make sure that where ever you install it, it is in your path.  I Am going to assume ${HOME}/local/bin in these instructions.
224 | 
225 | **NOTE:** This will want to install in /usr/local. You may not want it installing there, and there are some special things you may have to do for it to install there. I will update this later with information on how to get it installed in something like ${HOME}/local/bin, which works just fine too, as long as it's in your PATH.
226 | 
227 | The steps are pretty simple and only take about 5 minutes:
228 | 
229 | ```bash
230 | ### Clone the CMake repository, build, and install CMake
231 | git clone https://github.com/Kitware/CMake.git
232 | cd CMake
233 | git checkout tags/v3.29.3
234 | mkdir build
235 | cd build
236 | 
237 | ### This will configure the installation of cmake to be in your home directory under local, rather than /usr/local
238 | ### This is just preference and will work for a non-privilged user.
239 | ../bootstrap --prefix=${HOME}/local
240 | make -j
241 | make -j test
242 | make install
243 | ```
244 | 
245 | This creates 24 compile jobs. I have 12 cores on my MBP, so I use 2 times cores. This works rather well and builds quickly. Make should parallelize as much as it can based on dependencies.
246 | 
247 | ## Verify we have everything set up for the rest of the build and install
248 | 
249 | Make sure we can find the CMAke we installed earlier and make sure we are in the target directory.
250 | 
251 | ```bash
252 | ### Be sure to add ${HOME}/local/bin to your path  **Add to your .profile, .bashrc, etc...**
253 | export PATH=${HOME}/local/bin:${PATH}
254 | 
255 | ### Verify the installation
256 | which cmake       # Should say $HOME/local/bin
257 | 
258 | ### Verify you are running cmake v3.30.2
259 | cmake --version
260 | 
261 | ### Change to the target directory.
262 | cd "${TARGET_DIR}"
263 | ```
264 | 
265 | ## Clone my oobabooga macOS GitHub Repository
266 | 
267 | **NOTE THIS IS A DEVELOPMENT BUILD - IT WILL BE PROMOTED TO TEET SOON = I need feeback**
268 | 
269 | At this point, get started setting up oobabooga in your working location, we'll use it later, referring to the requirements.txt to see which Python modules we will need.
270 | 
271 | Pick a good location for your clone of the project. I have a projects directory with several sub-directories off of it to contain certain projects and source code, but you can pick any place you want. For me, I use ~/.projects/AI as the location where I place anything related to AI. So, open up iTerm and create a location and get the oobabooga text-generation-webui repository.
272 | 
273 | This will pull clone my repository which has some changes that allow it to run with GPU acceleration.  This is unsupported, by the oobabooga people, but I will try to keep my information as up-to-date as possible along with merging code into the repository on a regular basis.
274 | 
275 | ```bash
276 | ## Get my oobabooga and checkout macOS-dev branch
277 | git clone https://github.com/unixwzrd/text-generation-webui-macos.git textgen-macOS
278 | cd textgen-macOS
279 | git checkout macOS-dev
280 | pip install -r requirements.txt
281 | ```
282 | 
283 | ## Pip Install the PyTorch Daily Build
284 | 
285 | ## Llama for macOS and MPS (Metal Performance Shaders)
286 | 
287 | The one loaded with the requirements for oobabooga is not compiled for MPS (Metal Performance Shaders) installed from PyPi at this time. It is also probably best to build your own anyway.
288 | 
289 | You're going to need the llama library and the Python module for it. You should recompile it, and I have validated that my build using OpenBLAS. I will also add instructions later for building a stand-alone llama.cpp which can run by itself. This is handy in case you don't want the entire UI running, you want to use it for testing, or you only need the stand-alone version.
290 | 
291 | The application llama.cpp compiles with MPS support. I'm not sure if the cmake configuration takes care of it in the llama-cpp repository build, but the flag -DLLAMA_METAL=on is required here.  When I compiled llama-cpp in order to compare its performance to the llama-cpp-python. I didn’t have to specify any flags and it just built right out of the box. This could have been due to the configuration of CMake as it thoroughly probes the system for its installed software and capabilities in order to make decisions when it creates the makefile. It is required in this case.
292 | 
293 | ```bash
294 | ## llamacpp-python
295 | CMAKE_ARGS="-DLLAMA_METAL=on" \
296 | FORCE_CMAKE=1 \
297 | PATH=/usr/local/bin:$PATH \
298 | pip install llama-cpp-python==0.2.90 --force-reinstall --no-cache --no-binary :all: --compile --no-deps --no-build-isolation
299 | 
300 | ```
301 | 
302 | ### Using Pip for PyTorch
303 | 
304 | This will install the latest PyTorch optimized for Apple Silicon.
305 | 
306 | ```bash
307 | ## Pip install from daily build
308 | pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu --force-reinstall --no-deps
309 | ```
310 | 
311 | Now, at this point, we have everything we need to run the basic server with no extensions. However, we should have a look at the llama.cpp and llama-cpp-python as we may need to build them ourselves.
312 | 
313 | ## NunPy
314 | 
315 | NumPy is finally supporting Apple Silicon. You will have to compile it on install. Many packages I've found want to install their preferred version of NumPy or other NumPy support libraries. This will likely uninstall your NumPy in your VENV. You should be on the lookout when you install anything new that it does not overlay your NumPy with a previous version or a different installation of the current version.
316 | 
317 | ```bash
318 | CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" pip install numpy==1.26.* --force-reinstall --no-deps --no-cache --no-binary :all: --compile -Csetup-args=-Dblas=accelerate -Csetup-args=-Dlapack=accelerate -Csetup-args=-Duse-ilp64=true
319 | ```
320 | 
321 | ## CTransformers
322 | 
323 | I include this one, but haven't tested it and it's unclear is it works properly on macOS.
324 | 
325 | ```bash
326 | CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate"  CT_METAL=1  pip install ctransformers --no-binary :all: --no-deps --compile --force-reinstall
327 | ```
328 | 
329 | ## Nearly finished
330 | 
331 | While there are more advanced instructions in the "QuickStart" guide, basically you are now finished. In the command window you are in, you can set the preferred VENV to use, and the start options for the webui. This creates a short script for starting the webui from the command line or you may open Finder to the location you gave installed and simply double click on the file "start-webui.sh and it should run in a terminal window. The options may be edited later in the start-webui.sh created here.
332 | 
333 | ```bash
334 | # Add any startup options you wich to this here:
335 | START_OPTIONS=""
336 | #START_OPTIONS="--verbose "
337 | #START_OPTIONS="--verbose --listen"
338 | 
339 | cat <<_EOT_
340 | #!/bin/bash
341 | 
342 | # >>> conda initialize >>>
343 | __conda_setup="$('${HOME}/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
344 | if [ $? -eq 0 ]; then
345 |     eval "$__conda_setup"
346 | else
347 |     if [ -f "${HOME}/miniconda3/etc/profile.d/conda.sh" ]; then
348 |         . "${HOME}/miniconda3/etc/profile.d/conda.sh"
349 |     else
350 |         export PATH="${HOME}/miniconda3/bin:$PATH"
351 |     fi
352 | fi
353 | unset __conda_setup
354 | # <<< conda initialize <<<
355 | 
356 | cd "${TARGET_DIR}/textgen-macOS"
357 | 
358 | conda activate ${PREFERRED_VENV}
359 | 
360 | python server.py ${START_OPTIONS}
361 | _EOT_ > start-webui.sh
362 | 
363 | chmod +x start-webui.sh
364 | ```
365 | 
366 | ## Where We Are
367 | 
368 | This is a lot to cover, but there are more modules which get mis-installed and need to be repaired, re-installed, or built from source. This package has a lot of modules and a lot of dependencies, so expect breakage from time to time.  Making checkpoints for rollback along the way will help a lot if you get a bad module, you won't have to destroy your whole VENV or figure out which modules need to be uninstalled and re-installed.
369 | 
370 | Once you feel comfortable with your checkpoints and working VENV, you can remove some of the ones you aren't using and this will improve Conda's performance.
371 | 
372 | At his point, LLaMA models should start up just fine as long as they are GGUF                      formatted models and you should see a noticeable performance improvement.  Put as many GPU layers as you possibly can and set the threads at a reasonable number like 8.
373 | 
374 | Some other numbers, and parameters of note which I have verified through testing.  If you have others, please let me know and I'll have a look at them and add them is they work well.
375 | 
376 | | Parameter    |  Value                                                                           |
377 | | ------------ | -------------------------------------------------------------------------------- |
378 | | n_gpu_layers | Set this to the number of n_layers in the output of llama.cpp when it starts     |
379 | | mlock        | Set this on, this will pin the memory so it doesn't get paged out or compressed  |
380 | | n_batch      | the number of batches for each iteration, if someone has guidance for this, please let me know. |
381 | | no_mmap      | I use this because with mlock set, there should be no need to reference the model as a file. |
382 | 
383 | ## Extensions
384 | 
385 | There are a number of extensions you can use with oobabooga textgen, but som break other things with their requirements. At this point, here are the extensions I have had no problems with so far:
386 | 
387 | All the extensions should work with this version 
388 | 
389 | - Elevenlabs
390 | 
391 |   I've included this one because i use it, but am working on other TTS options which will run locally, like AllTalk and other Coqui-based solutions.
392 |   
393 | - Silero
394 | 
395 |     These are both for TTS, Text To Speech, one relies on ElevenLabs to generate the speech, and the other runs locally using a torch speech model. I'd be interested in discovering other TTS packages available which could be hosted locally without relying on th eInternet.
396 | 
397 | - Whisper
398 | 
399 |   This is the speech to text utility, modules or libraries from OpenAI, which I believe may be hosted locally as the model it uses is fairly small, though I haven't had a chance to check it out yet.
400 | 
401 |   The issue with Whisper is that it requires some older Python packages which will cause NumPy to be downgraded and there you have a problem.  Hopefully this will be sorted out soon.
402 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Use the GPU on your Apple Silicon Mac
  2 | 
  3 | ## Latest Update
  4 | 
  5 | - [12 Feb 2025 - Virtual Environment Utilities as been updates and released](#12-feb-2025---virtual-environment-utilities-as-been-updates-and-released)
  6 | - [28 Nov 2024 - Announcing Venvutil: Streamlining Python Virtual Environments](#28-nov-2024---announcing-venvutil-streamlining-python-virtual-environments)
  7 | - [16 Nov 2024 - NumPy build for Apple Silicon: NumPy 1.26 solved](#16-nov-2024---numpy-build-for-apple-silicon-numpy-1.26-solved)
  8 | - [01 Oct 2024 - Library dependencies have changed](#01-oct-2024---library-dependencies-have-changed)
  9 | - [16 Sep 2024 - Basic testing, yes it works, and is kinda fast](#16-sep-2024---basic-testing-yes-it-works-and-is-kinda-fast)
 10 | 
 11 | ## Background
 12 | 
 13 | This stared out as a guide to getting oobabooga working with Apple Silicon better, but has turned out to contain now useful information regarding how to get numerical analysis, data science, and AI core software running to take advantage of the Apple Silicon M1 and M2 processor technologies. There is information in the guides for installing OpenBLAS, LAPACK, Pandas, NumPy, PyTorch/Torch and llama-cpp-python. I will probably create a new repository for all things Apple Silicon in the interest of getting maximum performance out of the M1 and M2 architecture.
 14 | 
 15 | ## You probably want this: [Building Apple Silicon Support for oobabooga text-generation-webui](https://github.com/unixwzrd/oobabooga-macOS/blob/main/macOS-Install.md)
 16 | 
 17 | ## If you hate standing in line at the bank: [oobabooga macOS Apple Silicon Quick Start for the Impatient](https://github.com/unixwzrd/oobabooga-macOS/blob/main/macOS_Apple_Silicon_QuickStart.md)
 18 | 
 19 | In the test-scripts directory, there are some random Python scripts using tensors to test things like data types for MPS and other compute engines.  Nothing special, just hacked together in a few minutes for checking GPU utilization and AutoCast Data Typing. BLAS and LAPACK are no longer required to be build.
 20 | 
 21 | ## 12 Feb 2025 - Virtual Environment Utilities as been updates and released
 22 | 
 23 | I've updated the Venvutil repository and released a new version.  It now includes a new tool, `vdiff`  which will compare two different virtual environments and list the differences.  It will also list the differences between the same virtual environment at different points in time.  It will also list the differences between the Python packages in the virtual environment and the Python packages in the system.
 24 | 
 25 | All the test scripts here have been moved in to that repository and there are many other tools for working with Python Virtual Environments and LLM's.
 26 | 
 27 | I was able to get oobabooga to work, but had to make a couple of tweaks.  I think I'm looking at other alternatives for the future, but this will work for now.
 28 | 
 29 | ## 28 Nov 2024 - Announcing Venvutil: Streamlining Python Virtual Environments  
 30 | 
 31 | I’m excited to release **Venvutil**, a versatile toolset for building and managing Python Virtual Environments. While it’s still evolving, the current release offers several powerful features and solutions for common challenges, including workarounds for Meson builds for NumPy 1.26.4.
 32 | 
 33 | If you are thinking about using NumPy for anything, you should use `numpy-compile` in the [Virtual Environment Tools repository](https://github.com/unixwzrd/venvutil) to build NumPy for Apple Silicon. There are some scripts you can run in there which will measure the performance of NumPy so you can see how it compares to other versions like the pre-compiled or bundled versions.
 34 | 
 35 | ### Addressing the Meson Build Issue  
 36 | 
 37 | NumPy 1.26.4 requires Meson for building, but Meson’s use of `--version` (instead of `-v`) to query the linker (`ld`, `c++`, `g++`) creates issues with Apple’s linker. This problem can prevent successful builds, particularly on macOS. Venvutil addresses this by providing hard-linked scripts that correct the flag, allowing NumPy to compile and take full advantage of the Accelerate framework for Apple Silicon.  
 38 | 
 39 | ### Installation  
 40 | 
 41 | Setting up Venvutil is straightforward. Run the following commands:  
 42 | 
 43 | ```bash
 44 | git clone https://github.com/unixwzrd/venvutil.git
 45 | cd venvutil
 46 | bash setup.sh install
 47 | ```
 48 | 
 49 | This installs Venvutil in $HOME/local/venvutil/bin. The installer is designed to be non-destructive and includes:
 50 |   • Conda setup
 51 |   • NLTK and Rich installation
 52 |   • The core Venvutil payload
 53 | 
 54 | Key Features
 55 | 
 56 |   • Environment Tracking: Logs and tracks all changes to your Python virtual environments, aiding in recreation and debugging.
 57 |   • Basic `vdiff`: Compare two virtual environments easily.
 58 |   • Meson Workarounds: Custom scripts replace --version with -v, enabling successful builds on macOS and RHEL 9.
 59 |   • Accelerate Framework: Leverages Apple Silicon’s performance advantages for NumPy compilation.
 60 | 
 61 | Tested Platforms
 62 | 
 63 | Venvutil has been tested on:
 64 |   • macOS 15.3.1 Sequoia
 65 |   • macOS 15.3.0 Sequoia
 66 |   • macOS 12.7.6 Monterey
 67 |   • RHEL 9
 68 |   • RHEL 8
 69 | 
 70 | For detailed NumPy compilation steps, see this repo and the Venvutil README.
 71 | 
 72 | Feedback and Support
 73 | 
 74 | Give Venvutil a try! If you encounter any issues or have suggestions, please report them in the repository’s Issues section. Your feedback is invaluable in making Venvutil even better.s always you can buy me a coffee at [BuyMeACoffee](https://www.buymeacoffee.com/unixwzrd). or support me on [my Patreon](https://patreon.com/unixwzrd).
 75 | 
 76 | ## 16 Nov 2024 - NumPy build for Apple Silicon NumPy 1.26 solved
 77 | 
 78 | **Note this involves a hack** is more than I can write up here.  The basic issue is Meson is not passing the correct flags to detect the linker `ld` version correctly and using `--version` instead of `-v`. I spent a lot of time diving into Meson and even began looking at a NumPy build from source code. Given that and the macOS updates, rebuilding my GNU toolchain from source, it was taking way too much time. I have wrappers for the tools which meson is using and I replace `--version` with `-v` and it works just fine these "hacks" will be included in venvutil which also has a number of others useful tools for working with LLM's and VENV's.
 79 | 
 80 | Meson and a source build of NumPy were both deep dark rabbit holes consuming a lot of my time, but I am able to compile using a Pip recompile and take advantage of the Apple Silicon using this for the build:
 81 | 
 82 | ```bash
 83 | # NumPy Rebuild with Pip
 84 | CFLAGS="-I/System/Library/Frameworks/vecLib.framework/Headers -Wl,-framework -Wl,Accelerate -framework Accelerate" pip install numpy==1.26.* --force-reinstall --no-deps --no-cache --no-binary :all: --compile -Csetup-args=-Dblas=accelerate -Csetup-args=-Dlapack=accelerate -Csetup-args=-Duse-ilp64=true
 85 | ```
 86 | 
 87 | This will not work without some supplementary tools, scripts and just plain hacks, but I have verified that it improves the performance of NumPy on Apple Silicon dramatically. I'll post an update here when I get the files in the venvutil repository. and instructions for how to use it.
 88 | 
 89 | ---
 90 | 
 91 | The new VENV build process here is, [venvutil](https://github.com/unixwzrd/venvutil). It's a set of shell functions and hopefully soon a way to get reproducible builds. Using Git, pip, conda and user definable functions.  There's still a few issues I need to work out, but it will eventually track your installed python packages and even do diffs between different VENV's and points in time.  Not quite there yet, but hoping for this to be a way to track and rebuild VENV's no matter if you use Conda, pip, and possibly a few others at some point.
 92 | 
 93 | There's also a lot of helpful shell functions in there, one in particular if a way to lookup and use POSIX return codes for return values and exit codes - `errno` and `errfind`.  There's also `errno_warn` and `errno_exit` for scripts. If you have program which uses POSIX return codes, you would be able to do this:
 94 | 
 95 | ```bash
 96 | someprogram
 97 | errno $?
 98 | ```
 99 | 
100 | Let's say it returned 15 as an exit code, you would get this sent to STDERR:
101 | 
102 | ```
103 | errno 15
104 | (ENOTBLK: 15): Block device required
105 | ```
106 | 
107 | errno_warn will return after sending the code and message to STDERR, and errno_exit will cause your script to exit after writing the error code and message to STDERR.
108 | 
109 | **Anyone wishing to provide any additional information or assistance, pleas feel free.  If you are interested in working on this with me, please let me know as well. It's still only myself and a few volunteers assisting me at the moment. Keeping up with call this does take a good bit of time to keep up with and organize in this rapidly changing world, so any help would be appreciated.**
110 | 
111 | ## 10 Oct 2024 - Change of Plans With Sequoia plus GCC
112 | 
113 | I upgraded to macOS Sequoia 15.0.1, additionally, I've done a new merge with oobabooga 1.15, their latest.  I am working on getting the libraries and packages built, but am having some difficulty with . I'm rebuilding lots of things from source right now an want to do some checking for perfomance, especially with PyTorch and Numpy which seem to be the ones most dependent on the Accelerate Framework.
114 | 
115 | Right now I'm trying to figure out how to get mason to build scikit for Python. It's having issues with using certain invalid flags to ld. I wil get this figured out. Additionally, I've built the Gnu Compiler Collection, including GFORTRAN, so I may have additional performance improvements soon.
116 | 
117 | Simultaneously I am tightening up the code in my [venvutil](https://github.com/unixwzrd/venvutil), which helps manage Python Virtual environments, but also has a lot of other helpful functions. Along with that I've been working on setting up web pages for my consulting services using Jekyll, a work in progress, but it's getting there. I is up and running so check it out at [Distributed Thinking Systems](https://unixwzrd.ai)
118 | 
119 | Things like this journal will be moving over there and it will be a central location for everything I am working on. I would appreciate any feedback and suggestions anyone might have.
120 | 
121 | ## 01 Oct 2024 - Library dependencies have changed
122 | 
123 | I just finished up with creating a web site and I now know way too much about Jekyll, also OpeAI's new model o1-mini writes shell scripts just like it's writing Python, complete with a call to function "main" at the end. What was supposed to take me two weeks and at most four, took almost three months to accomplish.  Not all of that is based on getting to know Jekyll, some was waiting for other people, which I have no control over. The site is up now, is it perfect? Not really, it will be a work in progress. I will be moving posts like this over there as I will keep a blog about projects I am working on and other things happening in life.
124 | 
125 | Anyway, there are two major libraries I can see right off the top which are both critical to oobabooga and critical to have the proper version. I'm still nailing down some potential issues, but that's what happens when you want to upgrade libraries and try the latest and greatest. Ok so here they are so far:
126 | 
127 | - llama-cpp-python --> 0.2.90
128 | - numpy --> 1.26.4
129 | 
130 | There may be others and I've updated the instructions and will be pushing a new version of the requirements up with the new library versions. These two should be built manually or compiled locally using the instructions I have as they will give the best performance as far as I know.
131 | 
132 | I may be supporting this less and less as I have several other things I have been working on and would like to shift to them.  One is a Python module which will assist in porting things from CUDA to Apple Silicon.  Another is yet another LLM framework, but I hope it is simpler and easier to maintain as I have learned a lot while trying to keep bandaids on the original oobabooga keeping it working on macOS.  That's been on hold the past three months and I need to get back to it.
133 | 
134 | I also made a port or Coqui and AllTalk to run on macOS using Apple Silicon, but haven't created branches of them and only have them in local repositories for now. There's a lot I'd like to be doing, but I have been scraping to get by for more than a year now. I am still looking for employment or project funding. I got cut loose while on FMLA and haven't been able to find any thing to help pay the bills.
135 | 
136 | ## 16 Sep 2024 - Basic testing, yes it works, and is kinda fast?
137 | 
138 | Ok, I just got finished making another commit to the repository for the macOS oobabooga. It's working and well, whether it's working well, is a matter of opinion. Would I trust it for production systems? Definitely not. It is kind of fun to play with and tinker with, but that's about all I can see for now. It needs a lot of work to make it solid production grade software. I've been into the code a number of times and some parts have improves, and others have not. I'll support this release as much as I can, and will continue to add bits to it and make improvements as I can too.
139 | 
140 | My thoughts is that it attempts to be all things to everyone and has way too many people trying to contribute code to it, all in varying degrees of success and whether it is good or not, well, it shows signs of all things to everyone, and too many cooks. I had considered making a fork of the code (actually I did) and the two are divergent, so each release must be gone over carefully to ensure the upstream origin hasn't botched something in it's development that has hosed my branch.
141 | 
142 | I have had plans of developing a system for LLM's and AI in general which should in, in effect, portable between macOS and Linus as they are both POSIX operating systems. SO, apart from anything specifically written to the hardware level could be portable across systems. In order to do this, device interfaces would have to be abstracted away to some common denominator. This would be done in a similar manner as I have done with the TorchDevice to bridge the gap between MPS and CUDA. It's a shim, but will also help people who are porting code from one platform to the other by highlighting he parts of the code which make hardware specific actions. It's not a long-term solution, though it could be a way of creating device independent code.
143 | 
144 | For my plan for an LLM/AI system, I am taking some lessons from Plan-9 from Bell Labs. They took the concept of Unix and extended it in all directions and extended it a bit. Unix was a first attempt at making something simple and clean where each part of the system did one thing and did it well. It also is one of the first systems to treat everything like a file, though some things couldn't be as easily done that way and network computing was growing rapidly, so things were added where they made sense. One of the divergences which is inherent in the *nix lineage today is the difference between socket access and file access. One must use socket calls to open and maintain a socket on the network, when in reality it is just stream of data. Plan-9 was an attempt to look at Unix and what the original developers of it would do differently, and one of the things is they designed it to make everything a file, or act like one anyway, giving them the ability to create an operating system with (my memory gets a bit hazy here) only about 8 actual system calls so all access to system resources was through a familiar interface just as though it was a file.
145 | 
146 | I used to be very active in the Unix Community on the East Coast and we were lucky that we had Bell Labs so close by. I remember attending a presentation given by Rob Pike about the Plan-9 Operating System, which was only proof of concept, later leading to Lucent's Inferno Operating System. The system calls were, open, close, read, write, position, delete, rfork, and a few others. The whole idea was to allow everything to communicate through a well known interface.
147 | 
148 | The same applies to data passing between the user, an agent, an LLM, or really any part of the system and if these are distributed, they could run anywhere and still need to communicate with each other. From there any component or object in the system, using this standard interface, could be used to communicate with each other, no matter whether it is on the same machine or on another machine in some other location. This also provides the ability for processes to move to the data and process it in place which reduces costs and redundancies of moving data around over networks. These small units of code could be sent to all nodes in a network in advance to minimize the possibility of executing arbitrary or malicious code, they could then be started up as requested and managed similar to micro-services.
149 | 
150 | That's the idea anyway, and as long as long as all objects/entities/actors in the system use the same standard interface, they can communicate with each other. AAfter all there is really no difference between a data stream to a user interface, one to an agent, or another LLM. Even simpler, this could all be done using URL's to locate and access resources or other services. This would be a fairly large project, but it is something I am working on, I'm looking for others who would be interested in helping out either as a sponsor or as a contributor.
151 | 
152 | ## 15 Sep 2024 - New oobabooga macOS-dev merge with 1.14 ready for testing
153 | 
154 | I have a n updated version of the macOS oobabooga merged with the main oobabooga branch 1.14. There are many other things I am working on right now, and will try to get another update of this dome if they tag it as soon as possible.
155 | 
156 | My web site will be launching, hopefully this week and I will move things around and my web site will be a central location for everything I am working on and more.  This repository will likely be used for the oobabooga-macOS development, and I am also working in a rewrite, which is more of a new effort than a rewrite because I wanted to add a lot of functionality for collecting metrics and more. My plan was to have my LLM backend and GUI to be somewhat compatible with oobabooga, but I have pretty much started from scratch. It is completely object oriented in design and the code should be a lot cleaner and more maintainable.
157 | 
158 | I have also finished up with and plan to release in a new repo a module which intercepts CUDA functions and methods for PyTorch and redirects them into PyTorch MPS functions and methods.  The idea here is to drop this module into your code and be able to run code written for CUDA run on macOS. When it receives a CUDA function or method intercept it and send a log message out identifying the function or method and where it was called from. The goal is to locate the CUDA use and then go back into the code and add the functionality for MPS PyTorch support. I will have that ready in the next day or two. I believe I have it written so it will also work with CUDA, but I don't have any Nvidia hardware to test it on.
159 | 
160 | Installation instructions for this are completed and the macOS-dev branch will be ready for testing in the next few minutes. I haven't had a chance to test it myself, but will be in the process of doing that. The intent is to put the merge as a commit before I begin testing. Let me know if you have any problems with it.
161 | 
162 | ## 22 Jul 2024 - Watch this space, I have something interesting coming soon
163 | 
164 | I have been working on rebuilding oobabooga from the ground up and creating my own object model for a text generation web UI. Right now I am concentrating on the object model for everything to do with the front-end to support conversations, or a series of turns between one or more other "Actors". Conversations of dialogues will be able to be able to branch and metadata for conversations/dialog will be retained with any turn where they changed, including changing the underlying LLM. Performance metrics will also be kept for each turn for doing testing and benchmarking. Playback is also a possibility, but I have quiet a bot planned and it may be rather ambitious, but I will be getting as much put in as I can,
165 | 
166 | This effort is a departure from the oobabooga code-base completely and will have a separate repository. I do plan to try to keep it as backwards compatible as possible and plan to support llama.cpp/llama-cpp-python to begin with. Please let me know if you have any suggestions or things you would like to see in a completely overhauled product.
167 | 
168 | ## 25 Jun 2024 - NumPY updated and breaks things
169 | 
170 | Had to change the instruction just slightly for NumPy since it breaks Numba.
171 | 
172 | ## 02 Jun 2024 - Rolled back Jinja, should be fine now
173 | 
174 | Rolled back to prior working version, should be fine now.
175 | 
176 | **I have a problem between my user and the test user.  Wait to download and install, unless you can debug the issue and you are welcome to do so.**
177 | 
178 | ## 01 Jun 2024 - Minor update, branches are synced
179 | 
180 | I'm updating my instructions to use the main branch of the repository as I've done some housekeeping and all are in sync at this point.
181 | 
182 | ## 31 May 2024 - Well, it's been a while and it's time for an update
183 | 
184 | Actually, not a whole lot has changed except for an updated to oobabooga being folded in. to the dev version of oobabooga dev from 20 May 2024. I will try to merge in other bits as soon as possible. But this is ready for anyone brave enough to test.
185 | 
186 | I have created a clone of the AllTalk TTS extension modified to use MPS instead of CUDA. It's not the speediest thing, but it works and has lots of features I haven't really had a chance to explore, like using your own clones/samples voices, and much more. This also depends on CoQui TTS, which I have also been able to get running using MPS as well and must be installed from a local clone. I will try to get that in a repository.
187 | 
188 | I am looking into possibly getting GPTQ, Transformers working, but I am thinking it might be nicer to build a native MLX module for macOS, so that is in the works a bit too. I think that might be a better pursuit instead of trying to worm all the CUDA code out of the other modules, though I was working on a package which could install and intercept PyTorch CUDA calls, provide a traceback indicating where the code was intercepted for future revision, and route them to MPS. If there is interest in any of these, let me know.
189 | 
190 | There are a number of things I'd like to do with this, so get in the GitHub Discussion for this project or open an "Issue' with an enhancement or fix request.  As always contributions are welcome
191 | 
192 | See the updated instruction  for install in the places at the top of this page. Please leave your comments, suggestions and issues which I will get back to as soon as I can.
193 | 
194 | I am also still looking for a paying gig, is if anyone knows someone who is hiring someone like me or you could use some assistance with a project, please get in touch. I have just set up a new LLC and am willing to work with anyone to help keep a roof over my head and the lights on.
195 | 
196 | I have set up a Discord which you are welcome to join and help build a community around oobabooga for macOS. I am on Discord a lot recently as I am testing the OpenAI app for macOS and providing feedback. The community link is here [unixwzrd Discord Community](https://discord.com/channels/1153784977964675172/1245878190338084996)
197 | 
198 | ## 31 Oct 2023 - NumPy uses the GPU on M1/M2 and presumably M3 Processors
199 | 
200 | **NOTE** I have updated the install instructions with the latest build information.
201 | 
202 | ## TL;DR
203 | 
204 | **NumPy on Apple Silicon GPUs (M1/M2/M3)**:
205 | 
206 | - NumPy has improved its compatibility with Apple Silicon GPUs.
207 | - Installation still requires specific steps beyond a simple pip install.
208 | - Key Points:
209 |   - NumPy now uses the Accelerate Framework.
210 |   - Special flags are needed for `llama.cpp` during build.
211 |   - PyTorch should be installed from daily builds.
212 |   - Numpy's build process has transitioned to Meson.
213 |   - Beware of library conflicts; NumPy might prioritize pre-existing libraries over the macOS Accelerate Framework.
214 |   - New switches have replaced environment variables for recompiling NumPy.
215 |   - A basic benchmarking tool, `numpybench`, is available to test the GPU build.
216 |   - Despite certain outputs suggesting otherwise, the recompiled NumPy does link to the Accelerate Framework.
217 |   - As of NumPy version 1.26.1, a straightforward compile might yield similar results to a custom compile, but using force flags is still recommended.
218 |   - Avoid simple Pip installs without specific flags as they may not optimize for the GPU and could over-utilize the CPU.
219 | 
220 | I've been doing a good bit of testing to see what configurations work using the GPU on Apple Silicon machines and while it's still not as straightforward as just doing a plain pip install, the changes to getting NumPy to run taking advantage of Apple Silicon GPU's has stabilized. I stated in another update that I'd been taking some time to allow things to settle, and it appears they have to some extent. Here's where things are now.
221 | 
222 | ## Numpy, llamas, and PyTorch, oh my
223 | 
224 | - [NumPy builds for the Accelerate Framework](#numpy)
225 | - [llama.cpp still needs special flags to build correctly](#llama-cpp-python)
226 | - [PyTorch still needs to be installed from the daily builds](#pytorch)
227 |   
228 |   ### NumPY
229 | 
230 |   Alright, let's take each of those things in order. Numpy changed their build process from what they had to using Meson to generate the build files and actually build. We used to use `NPY_BLAS_ORDER` and `NPY_LAPACK_ORDER`, well as I mentioned before, the new build will try to determine if you have a libBLAS and libLAPACK dynamic library installed on your system.  There is a set order of precedence it will use to search for these libraries and I discovered that it would use previously installed libraries in /usr/local/lib if it found them there before it would use the Accelerate Framework on macOS. I haven't tested the updated Meson install to see if it checks for the framework first or if it just grabs the first library it sees, in any event, I moved the OpenBLAS and BLIS libraries I'd build previously out of the way to be sure.
231 |   
232 |   Instead of environment variables there are now switches which need to be used when recompiling NumPy with Pip, and I would suggest doing a forced install if your NumPy is ever overlaid. I will show some rough numbers from my very basic benchmark, found in this repository in the test-scripts directory called `numpybench` which will give you the build configuration of your NumPy as well as do som every simple tests to exercise the GPU. At one point I needed the NPY environment variables to build, but now it's just like this:
233 | 
234 |   ```bash
235 |   pip install numpy --force-reinstall --no-deps --no-cache --no-binary :all: --compile \
236 |        -Csetup-args=-Dblas=accelerate \
237 |        -Csetup-args=-Dlapack=accelerate \
238 |        -Csetup-args=-Duse-ilp64=true
239 |   ```
240 | 
241 |   This will build and Accelerate Framework capable NumPy. When you do this install, depending on how you do it, like in verbose mode, you may notice and in the output of [numpybench](test-scripts/numpybench) that it is *not* using accelerate for `lapack`, but this very strange thing, `dep4365539152` which is some internal symbol which points to `liblapack_lite` which is included in the NumPy package, but is not linked to the Accelerate Framework but an internal `libblas` or `libcblas`, I forget which, but when you do the recompile as specified above, you do indeed get the Accelerate Framework linked in, even though it's just a wrapper. In macOS, `otool` will confirm this and on other systems, such as Linux or Unix you would likely use `ldd`. The only thing this does not seem to respect is the ilp64, 64 bit integers, bit because the Accelerate Framework doesn't support  it, it's because `lapack_lite` doesn't for Accelerate.
242 | 
243 |   **HOWEVER**, I was just double checking all this and it seems NumPy has gone to 1.26.1 now and what I just wrote, which still works, it seems to get the same results as doing a straight compile when installing with Pip. It appears this will work just as well to install using Pip, feel free to use, but I'm going to stick with the forced flags. Either way 64 bit is not enabled in either one. I would not recommend installing with a simple Pip without the `--no-binary` and `--compile` also set. Here is the install with and without the compiles.
244 | 
245 |   ```bash
246 |   pip install numpy --force-reinstall --no-deps --no-cache --no-binary :all: --compile
247 |   ```
248 | 
249 |   and without, really not a good idea because this includes a copy of the `libopenblas` library precompiled, and it will run all over your CPU.
250 | 
251 |   ```bash
252 |   pip install numpy --force-reinstall --no-deps --no-cache
253 |   ```
254 |   
255 |   Another interesting observation is it seems linking with the Accelerate Framework actually uses no GPU or CPU, I can only guess they are using the Neural Engine in the SOC. I would be curious to see the performance of llama-cpp-python if it was linked with the Accelerate Framework's `libBLAS`, right now `ligggml` is taking care of any linear algebra issues. The `libllamacpp` uses a lot of GPU for processing, in fact it usually has the GPU pegged at 100%.
256 | 
257 |   ### llama-cpp-python
258 |   
259 |   `llama-cpp-python` will insist on getting a new NumPy for you, but we can no longer compile NumPy during the rebuild of `llama-cpp-python` or rather the sub-package `llama.cpp`, and needs to be done the same as before, it's just we cannot specify the extra flags to Pip and using CFLAGS didn't work. I'm not a Meson expert or a Pip expert, but how enough to be dangerous. The procedure for rebuilding `llama-cpp-python` is like this still, but not installing its dependencies with `--no-deps`. Don't even bother trying the `-Csetup-args` with this, it will fail.
260 | 
261 |   ```bash
262 |   CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 \
263 |   pip install llama-cpp-python --force-reinstall --no-cache --no-binary :all: --compile --no-deps
264 |   ```
265 | 
266 |   ### PyTorch
267 | 
268 |   Not really anything new about this one, it's still taken from the daily build until they get the PyPi and Conda packages up to date with the next release.
269 | 
270 |   ```bash
271 |   pip install --pre torch torchvision torchaudio \
272 |               --index-url https://download.pytorch.org/whl/nightly/cpu
273 |   ```
274 | 
275 |   ### numpybench and llama.cpp output and performance
276 | 
277 |   A word about the GPU, and this first part applies to *any* GPU you might be using.  The output of llama.cpp called from llama-cpp-python gives this message in verbose mode:
278 | 
279 |   ```bash
280 |   AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
281 |   ```
282 | 
283 |   It is not the `BLAS = 1` which indicates you are using the SIMD or ASIMD extensions in the M1/M2/M3 AARM processor, but the `NEON = 1`
284 | 
285 |     To quote [Introducing NEON Development Article](https://developer.arm.com/documentation/dht0002/a/Introducing-NEON/What-is-NEON-) from the arm developer documentation,
286 |      > What is NEON?
287 |      >
288 |      > ARMv7 architecture introduced the Advanced SIMD extension as an optional extension to the ARMv7-A and ARMv7-R profiles. It extends the SIMD concept by defining groups of instructions operating on vectors stored in 64-bit D, doubleword, registers and 128-bit Q, quadword, vector registers.
289 |      >
290 |      > The implementation of the Advanced SIMD extension used in ARM processors is called NEON, and this is the common terminology used outside architecture specifications. NEON technology is implemented on all current ARM Cortex-A series processors.
291 | 
292 |    The BLAS simply indicates that it linked with a Basic Linear Algebra System, could be any of them `libblas, libBLAS, libcblas, libopenblas, libcublas, libblis`... Could be any of them really, so you need to look further to determine whether you're using the GPU, on any machine really.
293 | 
294 |    `numpybench` actually gives more information than that and it has a number of options. running it from the command line with the help option you get this:
295 | 
296 |    ```bash
297 |    (numpy.07.oobacurrentnpy216llama211) [unixwzrd@xanax characters]$ numpybench --help
298 | 
299 |     usage: numpybench [-h] [-d [DATAFILE]] [-s] [-c COUNT]
300 | 
301 |     Run NumPy benchmarks and output results.
302 | 
303 |     options:
304 |         -h, --help            show this help message and exit
305 |         -d [DATAFILE], --datafile [DATAFILE]
306 |                               Specify the datafile to write the output to.
307 |         -s, --skip-tests      Skip time-consuming tests.
308 |         -c COUNT, --count COUNT
309 |                               Number of iterations.
310 |                         
311 |     (numpy.07.oobacurrentnpy216llama211) [unixwzrd@xanax characters]$ numpybench -d /dev/null -c 20
312 |     2023-11-01 03:17:02 Producing information for VENV ----> numpy.07.oobacurrentnpy216llama211
313 |     2023-11-01 03:17:02 Build Dependencies:
314 |     2023-11-01 03:17:02   blas:
315 |     2023-11-01 03:17:02     detection method: system
316 |     2023-11-01 03:17:02     found: true
317 |     2023-11-01 03:17:02     include directory: unknown
318 |     2023-11-01 03:17:02     lib directory: unknown
319 |     2023-11-01 03:17:02     name: accelerate
320 |     2023-11-01 03:17:02     openblas configuration: unknown
321 |     2023-11-01 03:17:02     pc file directory: unknown
322 |     2023-11-01 03:17:02     version: unknown
323 |     2023-11-01 03:17:02   lapack:
324 |     2023-11-01 03:17:02     detection method: internal
325 |     2023-11-01 03:17:02     found: true
326 |     2023-11-01 03:17:02     include directory: unknown
327 |     2023-11-01 03:17:02     lib directory: unknown
328 |     2023-11-01 03:17:02     name: dep4364251104
329 |     2023-11-01 03:17:02     openblas configuration: unknown
330 |     2023-11-01 03:17:02     pc file directory: unknown
331 |     2023-11-01 03:17:02     version: 1.26.1
332 |     2023-11-01 03:17:02 Compilers:
333 |     2023-11-01 03:17:02   c:
334 |     2023-11-01 03:17:02     commands: cc
335 |     2023-11-01 03:17:02     linker: ld64
336 |     2023-11-01 03:17:02     name: clang
337 |     2023-11-01 03:17:02     version: 15.0.0
338 |     2023-11-01 03:17:02   c++:
339 |     2023-11-01 03:17:02     commands: c++
340 |     2023-11-01 03:17:02     linker: ld64
341 |     2023-11-01 03:17:02     name: clang
342 |     2023-11-01 03:17:02     version: 15.0.0
343 |     2023-11-01 03:17:02   cython:
344 |     2023-11-01 03:17:02     commands: cython
345 |     2023-11-01 03:17:02     linker: cython
346 |     2023-11-01 03:17:02     name: cython
347 |     2023-11-01 03:17:02     version: 3.0.5
348 |     2023-11-01 03:17:02 Machine Information:
349 |     2023-11-01 03:17:02   build:
350 |     2023-11-01 03:17:02     cpu: aarch64
351 |     2023-11-01 03:17:02     endian: little
352 |     2023-11-01 03:17:02     family: aarch64
353 |     2023-11-01 03:17:02     system: darwin
354 |     2023-11-01 03:17:02   host:
355 |     2023-11-01 03:17:02     cpu: aarch64
356 |     2023-11-01 03:17:02     endian: little
357 |     2023-11-01 03:17:02     family: aarch64
358 |     2023-11-01 03:17:02     system: darwin
359 |     2023-11-01 03:17:02 Python Information:
360 |     2023-11-01 03:17:02   path: /Users/unixwzrd/miniconda3/envs/numpy.07.oobacurrentnpy216llama211/bin/python
361 |     2023-11-01 03:17:02   version: '3.10'
362 |     2023-11-01 03:17:02 SIMD Extensions:
363 |     2023-11-01 03:17:02   baseline:
364 |     2023-11-01 03:17:02   - NEON
365 |     2023-11-01 03:17:02   - NEON_FP16
366 |     2023-11-01 03:17:02   - NEON_VFPV4
367 |     2023-11-01 03:17:02   - ASIMD
368 |     2023-11-01 03:17:02   found:
369 |     2023-11-01 03:17:02   - ASIMDHP
370 |     2023-11-01 03:17:02   not found:
371 |     2023-11-01 03:17:02   - ASIMDFHM
372 |     2023-11-01 03:17:02
373 |     2023-11-01 03:17:02
374 |     2023-11-01 03:17:02 BEGIN TEST: Matrix multiplication
375 |     2023-11-01 03:17:03 Time for matrix multiplication: 1.0580 seconds
376 |     2023-11-01 03:17:03 END TEST / BEGIN NEXT TEST
377 |     2023-11-01 03:17:03 BEGIN TEST: Matrix transposition
378 |     2023-11-01 03:17:03 Time for matrix transposition: 0.0000 seconds
379 |     2023-11-01 03:17:03 END TEST / BEGIN NEXT TEST
380 |     2023-11-01 03:17:03 BEGIN TEST: Eigenvalue computation
381 |     2023-11-01 03:17:52 Time for eigenvalue computation: 49.0088 seconds
382 |     2023-11-01 03:17:52 END TEST / BEGIN NEXT TEST
383 |     2023-11-01 03:17:52 BEGIN TEST: Fourier transformation
384 |     2023-11-01 03:17:53 Time for fourier transformation: 0.7249 seconds
385 |     2023-11-01 03:17:53 END TEST / BEGIN NEXT TEST
386 |     2023-11-01 03:17:53 BEGIN TEST: Summation
387 |     2023-11-01 03:17:53 Time for summation: 0.0221 seconds
388 |     2023-11-01 03:17:53 END TEST / BEGIN NEXT TEST
389 |     ```
390 | 
391 |     What you should really beconcerned about here is the `blas` and `lapack` entries. You can see that it is using the Accelerate Framework by this line:
392 | 
393 |     ```
394 |     2023-11-01 03:17:02     name: accelerate
395 |     ```
396 | 
397 |     However, hereps what the output looks like from a NumPy whic hhas been installed from PyPi:
398 | 
399 |     ```bash
400 |    (dbug.01.torchtest) [unixwzrd@xanax include]$ numpybench -d /dev/null -c 20
401 |     2023-11-01 03:54:27 Producing information for VENV ----> dbug.01.torchtest
402 |     2023-11-01 03:54:27 Build Dependencies:
403 |     2023-11-01 03:54:27   blas:
404 |     2023-11-01 03:54:27     detection method: pkgconfig
405 |     2023-11-01 03:54:27     found: true
406 |     2023-11-01 03:54:27     include directory: /opt/arm64-builds/include
407 |     2023-11-01 03:54:27     lib directory: /opt/arm64-builds/lib
408 |     2023-11-01 03:54:27     name: openblas64
409 |     2023-11-01 03:54:27     openblas configuration: USE_64BITINT=1 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS=
410 |     2023-11-01 03:54:27       NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= SANDYBRIDGE MAX_THREADS=3
411 |     2023-11-01 03:54:27     pc file directory: /usr/local/lib/pkgconfig
412 |     2023-11-01 03:54:27     version: 0.3.23.dev
413 |     2023-11-01 03:54:27   lapack:
414 |     2023-11-01 03:54:27     detection method: internal
415 |     2023-11-01 03:54:27     found: true
416 |     2023-11-01 03:54:27     include directory: unknown
417 |     2023-11-01 03:54:27     lib directory: unknown
418 |     2023-11-01 03:54:27     name: dep4364960240
419 |     2023-11-01 03:54:27     openblas configuration: unknown
420 |     2023-11-01 03:54:27     pc file directory: unknown
421 |     2023-11-01 03:54:27     version: 1.26.1
422 |     2023-11-01 03:54:27 Compilers:
423 |     2023-11-01 03:54:27   c:
424 |     2023-11-01 03:54:27     commands: cc
425 |     2023-11-01 03:54:27     linker: ld64
426 |     2023-11-01 03:54:27     name: clang
427 |     2023-11-01 03:54:27     version: 14.0.0
428 |     2023-11-01 03:54:27   c++:
429 |     2023-11-01 03:54:27     commands: c++
430 |     2023-11-01 03:54:27     linker: ld64
431 |     2023-11-01 03:54:27     name: clang
432 |     2023-11-01 03:54:27     version: 14.0.0
433 |     2023-11-01 03:54:27   cython:
434 |     2023-11-01 03:54:27     commands: cython
435 |     2023-11-01 03:54:27     linker: cython
436 |     2023-11-01 03:54:27     name: cython
437 |     2023-11-01 03:54:27     version: 3.0.3
438 |     2023-11-01 03:54:27 Machine Information:
439 |     2023-11-01 03:54:27   build:
440 |     2023-11-01 03:54:27     cpu: aarch64
441 |     2023-11-01 03:54:27     endian: little
442 |     2023-11-01 03:54:27     family: aarch64
443 |     2023-11-01 03:54:27     system: darwin
444 |     2023-11-01 03:54:27   host:
445 |     2023-11-01 03:54:27     cpu: aarch64
446 |     2023-11-01 03:54:27     endian: little
447 |     2023-11-01 03:54:27     family: aarch64
448 |     2023-11-01 03:54:27     system: darwin
449 |     2023-11-01 03:54:27 Python Information:
450 |     2023-11-01 03:54:27   path: /private/var/folders/76/zy5ktkns50v6gt5g8r0sf6sc0000gn/T/cibw-run-27utctq_/cp310-macosx_arm64/build/venv/bin/python
451 |     2023-11-01 03:54:27   version: '3.10'
452 |     2023-11-01 03:54:27 SIMD Extensions:
453 |     2023-11-01 03:54:27   baseline:
454 |     2023-11-01 03:54:27   - NEON
455 |     2023-11-01 03:54:27   - NEON_FP16
456 |     2023-11-01 03:54:27   - NEON_VFPV4
457 |     2023-11-01 03:54:27   - ASIMD
458 |     2023-11-01 03:54:27   found:
459 |     2023-11-01 03:54:27   - ASIMDHP
460 |     2023-11-01 03:54:27   not found:
461 |     2023-11-01 03:54:27   - ASIMDFHM
462 |     2023-11-01 03:54:27
463 |     2023-11-01 03:54:27
464 |     2023-11-01 03:54:27 BEGIN TEST: Matrix multiplication
465 |     2023-11-01 03:54:30 Time for matrix multiplication: 2.8783 seconds
466 |     2023-11-01 03:54:30 END TEST / BEGIN NEXT TEST
467 |     2023-11-01 03:54:30 BEGIN TEST: Matrix transposition
468 |     2023-11-01 03:54:30 Time for matrix transposition: 0.0000 seconds
469 |     2023-11-01 03:54:30 END TEST / BEGIN NEXT TEST
470 |     2023-11-01 03:54:30 BEGIN TEST: Eigenvalue computation
471 |     2023-11-01 03:55:58 Time for eigenvalue computation: 87.9183 seconds
472 |     2023-11-01 03:55:58 END TEST / BEGIN NEXT TEST
473 |     2023-11-01 03:55:58 BEGIN TEST: Fourier transformation
474 |     2023-11-01 03:55:58 Time for fourier transformation: 0.7570 seconds
475 |     2023-11-01 03:55:58 END TEST / BEGIN NEXT TEST
476 |     2023-11-01 03:55:58 BEGIN TEST: Summation
477 |     2023-11-01 03:55:58 Time for summation: 0.0219 seconds
478 |     2023-11-01 03:55:58 END TEST / BEGIN NEXT TEST
479 |    ```
480 | 
481 |     The numbers here to really take note of is that for 20 iterations of the tests I have in my Python script, the NumPy install from PyPi precompiled, takes almost twice as long due to the pre-compiled OpenBLAS library which in brings with it. Here they are, and standing out specifically for the Eigenvalue computation:
482 | 
483 |     ```
484 |     Pip installed NumPy Binaries
485 |     2023-11-01 03:55:58 Time for eigenvalue computation: 87.9183 seconds
486 | 
487 |     Locally Compiled NumPy
488 |     2023-11-01 03:17:52 Time for eigenvalue computation: 49.0088 seconds
489 |     ```
490 | 
491 | ## 20 Oct 2023 - Where things stand right now
492 | 
493 | I have the discussions turned on for this repository and the subject was brought up regarding CoreML and I weighed with a rather lengthy response. Please feel free to add to the discussion if you like.  it was regarding this paper about [Swift and CoreML LLMs](https://huggingface.co/blog/swift-coreml-llm) and rather use an Apple native solution rather than the patchwork of python and C libraries we have here.  Well, [it's complicated, you can read all about it here](https://github.com/unixwzrd/oobabooga-macOS/discussions/2#discussioncomment-7286842). GPT-4 was nice enough to give this...
494 | 
495 | ### TL;DR
496 | 
497 | - Apple's CoreML is powerful but remains largely within the Apple ecosystem, making it niche.
498 | 
499 | - llama.cpp aims for model portability and could become a standard for running models, especially with its GGUF file format.
500 | - oobabooga is feature-rich but has performance and stability issues, and it's too PC-focused.
501 | - Dependency hell and update cycles are significant challenges, especially with libraries like NumPy and PyTorch.
502 | - ctransformers could be a game-changer, but you haven't had a chance to explore it yet.
503 | - You're considering a new project that would be more modular and flexible than oobabooga, possibly using a message-passing architecture.
504 | 
505 | ## 15 Oct 2023 - Update coming soon
506 | 
507 | I haven't gone away, still around, but waiting on the dust to settle before updating.  Lots of things have changed.
508 | 
509 | - PyTorch has had a few updates and is for-the-most-part, able to run using Apple Silicon M1/M2 GPU
510 | - NumPy has had a major update, but last time I updated, the Python distributions did not have NumPy using Apple Silicon GPU by default.
511 | - llama.cpp is using the Apple Silicon GPU and has reasonable performance. While for llama-cpp-python there is a dependency for NumPy, it doesn't require it for integrating it into oobabooga, though other things require NumPy.
512 | - BLAS/LAPACK for NumPy, last I checked, point to the Accelerate Framework in macOS 13.5 (I have tested 13.5 thru 13.6.) and higher.
513 | - macOS sent it an upgrade for their OS macOS 14.0.  I have not tested this but I had numerous things change on my system and break with the 13.6 update. So, I decided to give things a rest until things settled down. Everything was running in my config below, and I didn't need the latest and greatest features of anything, so I decided to wait until things stabilized.
514 | - iOS Updates. Yeah, that too.  If it wasn't enough for everything else, Apple also put out iOS updates to iOS 17.
515 | 
516 | Basically, I was bouncing from one update to the next and hardly time to take a break. The repo I have for oobabooga-macos had a buglet or two patched, but it does llama2 support, though with configurations I have specified below and on my instructions, it is supported to version 1.6 as far as I can tell, and should run fine.  If you have any problems with it, feel free to reach out and let me know.
517 | 
518 | I'm also staying away from macOS 14.0 until I am sure everything works.  MAcGPG MAil is broken with the new release, Mail.app changed the way it handles plugins.  I'm waiting until that works before I upgrade my OS. A lot of things have changed in macSO 14, but I'll wait a bit until everything works for me.
519 | 
520 | ## 17 Sep 2023 - Too many moving parts
521 | 
522 | Ok, so lots of things broke over the weekend.  llama-cpp-python went to 02.26, NumPy sometime this morning went to 1.26.0 and I need to gather the Metal/MPS build instructions and test. Guaranteed there will be something else this week that breaks things. People gotta figure out whether they want the latest and greatest or "stable and works." then pick the one that meets your needs.  The latest and greatest may have cool new features, but at the cost of time.  I will have updates later, probably tomorrow.
523 | 
524 | ## 15 Sep 2023 - They aren't making this easy
525 | 
526 | So many dependencies between packages each needing the other version of each other sometimes incompatible. Containers and VENV's offer a good way of handling all this, but here's the latest news in moving targets.  It seems llama.cpp is moving very quickly, and the llama-cpp-python is behind, or there are bad links to vendor packages on GitHub, or really who knows.  Bottom line is things don't work. After a bit of debugging and chasing package chasing here's what I have as the latest information on building a stack which will run oobabooga on macOS with Apple Silicon M1/M2 GPU Acceleration.
527 | 
528 | Basically, there's no change from my last update.  The order is still the same, but the versions are moving targets right now. SO, real quickly, just a few paragraphs down, thee are the actual instruction in the proper order for stacking the libraries and packages.
529 | 
530 | - Build a clean VENV with Python 3.10
531 | - pip install the daily PyTorch build. It will not have total Apple Silicon GPU support in it, and it adds some libraries for BLAS and LAPACK. These are opaque and might link with something else you use as they are kind of drop-in replacements for the NumPy libraries, in a sub-package called numpy-base which is installed with their flavor of NumPy. I have experienced problems removing PyTorch's NumPy completely due to this.
532 | - pip install the oobabooga requirements.txt
533 | - pip rebuild in one shot llama-cpp-python and NumPu together with Metal/Accelerate Framework. Using anything else will be considerably slower.
534 | 
535 | The only thing new to add is to specify which version of llama-cpp-python you need like this for version 0.2.5 which his the most current and works as far as my testing is concerned. The line for llama-cpp-python needs to have the version number put on the end like this (works for every other package this way for version numbers too):
536 | 
537 | ```bash
538 |     llama-cpp-python==0.2.5
539 | ```
540 | 
541 | That's it for llama.cpp and the Python API. Nothing else as it seems that some versions didn't. Be aware that the llama.cpp person/team, is not the same as the llama-cpp-python. llama.cpp is cranking out code with commits sometimes hourly throughout the day. With this quick a release cycle, you have to expect something may not go as planned, especially with all the moving ports.  No package is necessarily better than another right now, things are changing that quickly.  But this is one argument in favor of not always being on the latest and greatest version of software as it may not be fully integrated or tested yet.
542 | 
543 | I'm guilty of this to some extent, I have things I need to get released, but something else pops up.  Right now, my oobabooga webui "main" branch is a bit stale, and probably best left alone.  Right now the most stable version is in my "test" branch, dev is fairly stable, but might break as I am working on it. both dev and test branches need to be promoted. So, if you want it, grab the test branch until I announce a change, but I will likely promote dev all the way up to "main" since version 1.5 is no longer very relevant and I've squashed a few macOS issues with CUDA code.
544 | 
545 | ## 12 Sep 2023 - Dependency Hell... Again
546 | 
547 | For some reason the distribution of PyTorch, llama-cpp-python, and something in the oobabooga requirements.txt, all conflict with each other, including installing over an existing NumPy install or even downgrading it from 1.25.3 to 1.24.0. The PyTorch distribution at PyTorch also had some conflicting dependencies with other things as well, so you can't just do a --force-reinstall/install with things, and get the daily build of PyTorch because something has change in the distribution. I did manage to fins a combination which works solves the problems ad posted it in the #macOS-setup of the oobabooga Discord server. To save a bit, here's what I wrote:
548 | 
549 | `RuntimeError: MPS does not support cumsum op with int64 input.` This has something to do with PyTorch and their Metal support on macOS using the Accelerate Framework for the M1/M2 GPU acceleration, in particular the version of NumPy and associated "numpy-base" package which is distributed with PyTorch is also problematic. Some time ago, this problem was solved, but seems to have returned. There are several issues with the install process you followed and most of it is due to ever-changing libraries, modules and dependencies. Not only all that, but it does not seem to use the Apple Silicon M1/M2 GPU acceleration, so simply installing things in the proper order doesn't particularly help either.
550 | 
551 | You could probably repair the modules and libraries in your Python installation, but the quickest and probably the easiest way will be to create a new VENV and install things in this order:
552 | 
553 | - Create new VENV and make it active.
554 | - `pip install --upgrade --no-deps --force-reinstall --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu`
555 | - `pip install -r requirements_nocuda.txt`
556 | - (suggested) `NPY_BLAS_ORDER='accelerate' NPY_LAPACK_ORDER='accelerate' CMAKE_ARGS='-DLLAMA_METAL=on' FORCE_CMAKE=1 pip install --force-reinstall --no-cache --no-binary :all: --compile llama-cpp-python`
557 | 
558 | All of these things may be done at the command line and in the directory you installed the webui.  Hopefully I have them in the correct order and haven't forgotten anything, and I did check this a couple of times.
559 | 
560 | The torch install will install the one from the nightly build of PyTorch. Though if you are re-installing/upgrading torch from a prior install, you probably want to do it like this as well using the --no-deps option, though no guarantee that you don't need to reinstall everything which comes along with PyTorch. (including it's NumPy as versions other than 1.25.2 will produce an error)
561 | 
562 | The install of requirements_nocuda.txt will ***downgrade*** your NumPy version to 1.24.0 if you have 1.25.2 installed so you may want to re-install or upgrade your NumPy version anyway. Upgrades of NumPy will complain that you have a incompatible versions of torch, torchvision and torchaudio, all of which I think can be safely ignored as those packages are not aware of the new development builds of PyTorch. So the warning may be safely ignored, even though it's in nice red text.
563 | 
564 | The last, (optional) instruction will get you llama-cpp-python so you can use GGUF models with M1/M2 GPU acceleration on Apple Silicon. It will also build a new NumPy which is one of the requirements for llama-cpp-python. The accelerated NumPy will speed up anything like PyTorch or other modules which use NumPy fro matrix manipulation. A word of caution is that NumPy team had this initially in NumPy, but dropped it in version 1.20 or 1.21, I forget which, due to inconsistent results.  However, while it fails some of its regression tests, this is typical of NumPy to fail a couple, but I haven't had a chance to look into the exact failures. NumPy pre-built fails some regression tests as well, and it looks like it may be the same regression tests.
565 | 
566 | My latest oobabooga-macOS was going to be a merge of the tagged release of oobabooga 1.5, but I have added some basic level of support for Llama2 and now that the GGUF file format is out, I am right now getting many of the new oobabooga features in their current main branch incorporated into mine for macOS and have stopped adding things to the 1.3.1a version in my test tree on GitHub, but GGUF and Llama2 support are in my dev branch and are working just fine, but not all the changes are implemented. If you'd like this my "in dev" version of oobabooga-macOS, you can clone the repository, even better, if you haven't modified any of the distribution files, you may be able to do a fetch, but here's how to clone.
567 | 
568 | ```bash
569 | # cd to directory wheer you want the clone to be placed.
570 | git clone --branch dev https://github.com/unixwzrd/text-generation-webui-macos.git webui-macOS
571 | cd webui-macOS
572 | python server.py --chat
573 | ```
574 | 
575 | ## 06 See 2023 This thread
576 | 
577 | [Exactly this. PR #1728](https://github.com/oobabooga/text-generation-webui/issues/1728#issuecomment-1708455881)
578 | 
579 | Follow the link to the PR, and I was wrong in my reply, it's been almost two months. Time certainly does fly by...
580 | 
581 | @101is5 you get a special mention here because this made my day.
582 | 
583 | ## 06 Sep 2023
584 | 
585 | Some people have reviewed the installation instructions and given me changes. They have been updated. Hopefully you were able to work through it, though comments and suggestions are always welcome. The scripted method for installing is coming soon.  I know I keep saying this, but I am in final testing of it and will let everyone know here when it's complete.
586 | 
587 | Thanks again to all who have helped out!
588 | 
589 | ## 05 Sep 2023 - Testing and Updates
590 | 
591 | GPT-4 was offline a good portion of the week last week, and I have my Internet connection upgraded. The upgrade to my connection was so I could download more than 1.2TB per month. The good folks at Comcast/Xfinity were kind enough to add an additional $25 to my monthly bill for this. I've never heard of a cap like this and it was amazing that about 20 years ago in Japan, I could get Fibre to MY HOUSE for about $70/month with 100MB speed up and down. Toss in an extra $10 and you could get 1GB up and down (Yes, that's Giga).  That was way more than mu WiFi router could handle, but the machines directly connected, it was AMAZING! There were about 5 companies all competing for business, wchih was good for the consumer and drove prices down. With Comcast/Xfinity, I pay more and get worse service, they are the only provider I can get where I live, so they have me over the proverbial barrel.
592 | 
593 | I've updated the instructions quite a bit and I've been given some feedback from testing and walkthroughs of the instructions. I haven't had much time to get things running and play with the new llama.cpp, but it is quite fast and looks like it has many new features, including being able to convert just about any model into the new GGUF format.
594 | 
595 | Still working on the scripted VENV builds. Hope to have them up this week, but that';s the hope every week. I also and working through the GitHub Workflows to determine what needs to be done to create pre-built binaries and installs for the llama.cpp,python wheels to support oobabooga and macOS. Ive never created GitHub workflows before or wheels for Python, it may take a bit to get this going, but doesn't look like too much to get done.
596 | 
597 | It's been a very busy couple of weeks for me as things change quickly every day.  Please feel free to comment in the discussions, and as always thanks goes out to the oobabooga team.
598 | 
599 | Living the dream of high speed Internet and unlimited data.
600 | 
601 | ## 29 Aug 2023 - GGML -> GGUF
602 | 
603 | - **NOTICE GGML File Format Change to GGUF**
604 | 
605 | The new llama.cpp is quite fast now and seems to take advantage of MPS now nicely. But in order to use the latest, any GGML files will need to be converted to GGUF files. It's quite simple to do. The script to do the conversion is in the llama.cpp repo and there is only one requirement is to import gguf into your Python installation.
606 | 
607 | [pip install gguf](https://github.com/ggerganov/llama.cpp/tree/master/gguf-py)
608 | 
609 | then run the script:
610 | 
611 | [convert-llama-ggmlv3-to-gguf.py](https://github.com/ggerganov/llama.cpp/blob/master/convert-llama-ggmlv3-to-gguf.py)
612 | 
613 | There are other conversion scripts in the [llama.cpp](https://github.com/ggerganov/llama.cpp) for converting other formats. There is a notice regarding the latest repository to support GGML and that GGUF will be given a priority for now. This is great news for everyone who wants to use models.
614 | 
615 | Last llama.cpp commit to support GGML: [master-ef3f333](https://github.com/ggerganov/llama.cpp/releases/tag/master-ef3f333)
616 | 
617 | This is the PR Discussing the new GGUF format: [PR: #2398](https://github.com/ggerganov/llama.cpp/pull/2398)
618 | 
619 | ## 29 Aug 2023 - Updated Instructions
620 | 
621 | Added and changed a few things in the instructions. Updated the dev and main repositories with new requirements. A few other items, not really significant. Continuing to test performance and for any issues or problems.
622 | 
623 | There does seem to be an inconsistency with one of my llama builds with torch, and I'm tracking it down now, but the one which is the base packages before rebuilding anything is due to the SciPy support that gets loaded with the oobabooga requirements, they took a different approach to their builds and combining their own Numpy embedded with the SciPy package somehow, or at least that's what it appears to be to me.  I will continue to investigate and have an update soon.
624 | 
625 | ## 28 Aug 2023 - Performance Improvements And LLaMa2 works
626 | 
627 | - UPDATED QUICK INSTALL. PLEASE NOTE REGARDING GGML FILES, 0.1.78 llama-cpp-python must be used with GGML files. Both that version and the latest will work with LLaMa2.
628 | 
629 | Have spent much time looking at Python packages for numerical analysis, data analytics and AI. There are many different combinations of libraries, depending on which order you install them and whether you compile in the Accelerate Framework. I believe it is working for the most part, but I haven't tested it completely. However, PyTorch now fully supports Apple Silicon, but other Python modules dance on some of the libraries installed by NumPy and PyTorch, specifically the BLAS libraries. I have a configuration tool I was putting together in Bash, but the logic ran me into a wall regarding graph traversal, so I am planning to have that re-done in Python.
630 | 
631 | I will be putting up the Bash script, likely in its own repository, since other VENV build tools will be added and I plan to add benchmarking and regression testing to the process. There is value in it now as it can help simplify your builds for your VENVs and create consistent builds every time and in a sequence you configure. It also contains a portion for setting up/updating your personal Conda. Through this process I have discovered a lot about the state of Linear Algebra packages available, who uses them and what configurations they support.
632 | 
633 | Presently, there are no BLAS packages which directly support Apple Silicon which I am able to find.  If someone does discover one, I will add it to the mix. The build configurations I've checked are OpenBLAS, BLIS, Apple's Accelerate Framework, and the various libraries included with Python packages. There is a lot here and I will have more on that, hopefully soon.
634 | 
635 | The library which llama.cpp uses, formerly GGML, is now GGUF, and will require any GGML models to be converted to GGUF format. This process is fairly quick, but involves installing a Python package from source, though it may be available soon on PyPi or other location.  I will write that up as soon as I have time. This new version seems to be very fast, relative to things from the past, especially on Apple Silicon, making use of the Unified Memory and M1/M2 GPU. I am updating my "dev" version of oobabooga, specifically for macOS, and it is now available for download and will run with LLaMa2. It will even load the 70B LLaMa2 model on a MacBook Pro with 96GB Unified Memory.
636 | 
637 | I know it's been a little while since I had anything to write, but there's a lot to report.  I am planning on more updates later today and over the next few days. Please feel free to take advantage of the "Discussions" here on my GitHub if you like or you can reach out to me as I am usually in the oobabooga Discord in the #mac-setup channel.
638 | 
639 | Thank you to all who have helped support me in working on this project, your kind assistance is very much appreciated, of all varieties.
640 | 
641 | ## 18 Aug 2023 - Current Status of Testing and Configuring
642 | 
643 | While most of this may seem like a moving target, I have discovered inconsistencies in how packages will overlay each other's dynamic linked libraries  and will also not completely uninstall themselves using either Pip or Conda. I have searched most everywhere I can find information about linear algebra and matrix manipulation. It was way more than I ever really wanted to, but it's given me more understanding of how things work inside language models and given me a deeper appreciation for understanding less about how they store and retrieve information the way they do. I completely understand the theory of how they are seeking a minima in the matrix space, but exactly how things get coded in their matrix and how they can retrieve things, still amazes me. I suppose those who say they are simply statistical models which are good at predicting the next token in sequence, but even considering that, it opens up a whole can of worms, technically and even philosophically. Maybe everything is deterministic and it's tru we have no free will, only the illusion of it. Anyway, that's an entirely different discussion.
644 | 
645 | I am texting my VENV build, benchmarking, and regression test automation.  I have run through the process manually for a few iterations, but it quickly become error-prone to continue manually. I have automated all aspects of the VENV build down to the BLAS libraries I'll be using.  I know I keep saying it will be just another few days, but I want to make sure everything is tested and I have a flexible enough build and test framework constructed so if new conditions arise, the framework is still usable or rather re-usable when new situations arise.
646 | 
647 | I also updated the information in the build process regarding some of my findings about package managers which only support my view that they can cause more problems than they solve if they are not carefully coordinated between various packages and package build teams. Sometimes common areas are subject to pollution.
648 | 
649 | ## 14 Aug 2023 New Direction Forward
650 | 
651 | So far it's taken a bit more than a week and I've been in contact with a few others to tests do of our ideas on optimizing the various Python modules/packages on Apple Silicon and here my progress so far:
652 | 
653 | ### My progress so far
654 | 
655 | - Put development and work on next release of oobabooga for macOS on hold until performance issues are investigated. This means merging oobagooba code for their 1.5 release into my codebase is on hold along with any changes I am putting into the code until there was an explanation or understanding of the performance issues.
656 | - Test a full build of the environment using modules which are required by the oobabooga/text-generation-webui.
657 | - Collecting some basic timing metrics along the way as the environment is built up.
658 | - Look for changes in those metrics in order to identify possible issues.
659 | - Compare GGML models running in obabooga and native llama.cpp to see if there is any difference between them.
660 | - Gather as much information from as many sources regarding performance and issues arising from using the M1/M2 GPU.
661 | 
662 | ### Some things I've learned along the way
663 | 
664 | - NumPy rejected using the Apple Accelerate Framework, stating it was giving inaccurate results during testing, so in their 1.20.1 release, they deprecated its use.
665 | - There are numerous BLAS and LAPACK packages, including BLIS, which I have tested, though primarily to see if they were using the GPU for processing and so far I cannot see any using it.
666 | - Apple's Accelerate Framework is "kinda" built into many of the numerical analysis packages, but it is difficult to see if they actually support it or not.
667 | - NumPy who officially state they do not approve use of Accelerate, include it in their build, if you build it in a particular manner. I discovered this by accident after doing a recompile without any BLAS/LAPACK libraries in a searchable location for it to link with.  It linked with the Accelerate framework.
668 | - While things seem to compile with the Accelerate Framework, it is unclear to me where the libraries are I am linking to exist. Running otool on my final linked package reveals that I am linking to a location where symbolic links exist for the libraries, but they are all broken.
669 | 
670 | My conclusions so far is there are performance issues related to inefficient processing is,  there are problems, they need more investigation, there is much contradictory information floating around, there is more testing to be done. These issues should not hold up further release of code into the "dev" tree for my branch of oobabooga. I intend to do this very soon, in the next day or two. After that, it will be available for download by anyone to test and report any issues related to running on macOS. I will go back to looking for performance issues again in the various modules.
671 | 
672 | My reasons for releasing is to allow people to use and test some of the new features like increased context length, and llama2 support mainly. This seems to make sense because the issues will still be with us in the near future, and holding up any software release doesn't change that. Further, even though NumPy doesn't "officially" support the Accelerate Framework, the latest releases of PyTorch and I assume SciPy as well both support Apple Silicon and both of these sit on top of NumPy, so, if they certify things work on Apple Silicon, then I can only assume they are happy with the NumPy ability to work with it too.
673 | 
674 | I'll continue testing and benchmarking things and will try to get some real numbers produced and presented soon. No matter what, keep watching this spot for my latest updates on this issue.
675 | 
676 | ## 11 Aug 2023 - This Kind of Explains the Issue With pip, conda, etc al
677 | 
678 | Well, I haven't tried the latest main branch of oobagooba as I'm still on a working 1.3.1 I have in my repository.  I'm sorting some performance and library compatibility issues out now, but hope to be back to getting a 1.5 release which is tested and running on macOS using Metal.  Metal also happens to be the piece I'm looking into deeply because there seem yo be issues about it using GPU or CPU or both, I have just about got a test framework setup for different combinations of things like NumPy, Pandas, PyTorch, and will test them in various configurations.
679 | 
680 | One item I discovered is depending on what was installed when and whether you have BLAS and LAPACK libraries, you can get distinctly different  results, so I'll also run the regression tests for NymPy especially.
681 | 
682 | There are two options for matrix manipulation, use GGML (in llama.cpp) or use LAPACK and BLAS, all available from different places. I have not found a BLAS/LAPACK which runs on the Apple Silicon GPU, which is why GGML exists, as far as I can read. The only thing which does run on the M1/M2 GPU is the BLAS and LAPACK Apple includes with their Accelerate Framework.  NumPy does not recommend using this as it gives inconsistent results past version 1.20.1 of NumPy, however, I'm trying the newer versions of the libraries because the speedup is about 5-7x.
683 | 
684 | There are so many funny interdependencies in the libraries and library managers between pip and conda, sometimes they leave bits of what they installed after you uninstall them.  This only makes matters worse, because if you are using NumPy, you could be using a library from just about anywhere.
685 | 
686 | I'm also writing the Apple Dev Team a letter letting them know I am not particularly thrilled with their response to problems with Accelerate being basically the same as the NumPy, PyTorch and SciPy people give is "Oh, it's not out problem, contact the vendor (Apple)"  When I look at Apple's site, their response is, "Oh, it's not our problem, contact the developer (Open Source Team)."  Both groups need to talk with each other otherwise nothing will get done. and  Apple should be much more supportive of the OpenSource Community or they will loose out to Nvidia and Hugging Face.
687 | 
688 | ## 10 Aug 2023 - Have Something Interesting
689 | 
690 | Wanted to give an update. Have found an issue with re-installing and installing new modules and packages. All seems to be linked back to BLAS and LAPACK.  it seems the installation order affects how the BLAS and LAPACK libraries are handles, even when not doing a recompile of the module or package. I'm able to reproduce my results, but haven't found any definite answer to the performance issues encountered using oobabooga. Many people with many different opinions on the cause, but I haven't seen a real solution yet.
691 | 
692 | I've been combing through anything I can find, but it's all very limited in content. Apple doesn't seem to be talking about how to use their high performance architecture and I even found in the release notes for NumPy that they had issues a couple of releases ago regarding inconsistent results from the Accelerate Framework, advising anyone who had a problem with this to contact Apple. Not cool on either party's part in pointing fingers. From what I can tell, no one from the NumPy team or Apple has actually tried to sit down and resolve the issues together, though this is speculation gathered from what I have read and been able to find on the Internet, which is sparse.
693 | 
694 | I have other conjectures as well, that the whole reason that llama-cpp-python exists if due to this dependency issue and it comes with its own required linear algebra support library. Probably due to what I discovered that packages which need this will bring it along and drop it in the Python library, but also don't uninstall it when they are installed, so there's a libblas sitting in the lib directory of your venv's "root" hierarchy.
695 | 
696 | After looking at the release notes for NumPy, I figured it would be a good idea to get the source for NumPy and try the regression tests from the various configurations I find being installed. I even just today found a new one which gets configured and not really mentioned anywhere I can recall, of "openblas64" which was pulled from my /usr/local/lib build I did of OpenBLAS. To muddy things further, depending on which libraries and where they come from, there are differences in the  results of regression testing. Even when using the stock standard NumPy install. All needs tracking down which I am working on doing as quickly as possible.
697 | 
698 | Not alone in this, several have approached me willing to assist in testing in with various configurations. So far, we replicated my results, but still not sure how to proceed. Bottom line is, I am hopeful regarding using the Accelerate Framework, but I must do more testing on the combinations, ordering and dependencies of the Python modules and packages. I'll keep updating here with my progress. More to come...
699 | 
700 | ## 07 Aug 2023 - Numpy Accelerated with Apple Silicon
701 | 
702 | I Spent a good portion of today and yesterday evening rebuilding my machine learning and data analytics Python packages required for oobabooga support, the information is also helpful for anyone who uses Apple Silicon Macs for these purposes.
703 | 
704 | Need to look into this some more.  Don't want to lead anyone down a blind alley, but I think I have found a significant performance solution for Python Data Analytics and AI packages. I'll need to collect more information, but I have repeatable tests which show dramatic increases and performance with NumPy and very reduced CPU utilization. More to come...
705 | 
706 | ## 07 Aug 2023 - Optimizing The Environment, Silver Lining in macOS  13.5
707 | 
708 | I'm have to rebuild my venv because I accidentally created inconsistencies in my Python packages, and this seems like a good a point as any to try to optimize the packages. In doing this I found something interesting when getting NumPy installed. Looking for a faster alternative to OpenBLAS or a way to have use the Apple Silicon GPU, I discovered BLIS, a faster linear algebra library.  To compare I put together a quick Python script t benchmark different builds of NumPy but noticed  when Pip does a recompile of NumPy, it actually looks for and finds the Apple [Accelerate Framework](https://developer.apple.com/accelerate/) which leverages their superscalar GPU and Neural engine technology for processing vectors and tensors. I assume this is due in part by the recent upgrade to macOS 13.5 and the NumPy team putting it into their package when you compile at install time using Pip.
709 | 
710 | More info to come as I try to optimize these packages which are used for AI, but also Data Science and Analytics. Keep an eye out here and you will see updated information posted in this repository regarding package optimization soon.
711 | 
712 | ## 06 Aug 2023 - Performance issues and Python Packages
713 | 
714 | **THIS IS IMPORTANT**
715 | Whatever terminal package you use to access the command line, make absolutely sure the "Open using Rosetta" checkbox is unchecked in the "Get Info" pop-up.  Someone reported an issue with the pip installation instructions, and I was able to replicate the issue I believe because I had set iTerm2 to open in Rosetta a while back to do some testing and forgot to change it back. I was in the process of rebuilding my venvs have to rebuild a couple of them because running in Rosetta can produce inconsistent results, especially for things like builds. Also, if your Python packages and libraries are installed in this manner, they will perform slower because the Intel code has to be translated to run.
716 | 
717 | Another thing, NOT DO the macOS 13.5 update, it is broken in many ways. WebKit crashes every now and then, Private Relay breaks things like accessing OpenAI and more, there are issues with the overall UI as well.
718 | 
719 | ## 03 Aug 2023 - Coming Soon oobabooga 1.5 integration and Coqui for macOS
720 | 
721 | Currently I am finishing up changes to the release 1.5 oobabooga and integrating them into my fork and may or may not have additional performance improvements I have identified for Apple Silicon M1/M2 GPU acceleration.  I hope by the end of this week.  I will release it in my test fork I created to allow people to test and provide feedback.
722 | 
723 | I also got distracted earlier in the week by looking for another TTS alternative to Elevenlabs which runs locally and am also working on incorporating full Apple Silicon into [Coqui TTS](https://github.com/coqui-ai/TTS) soon as well, first as a stand-alone system, but then as an extension alternative for Elevenlabs and Silero. Getting sidetracked with Coqui delayed ny progress on the 1.5 oobabooga effort. However, I should have a complete set of modifications for Coqui to support Apple Silicon GPU acceleration and I plan to create a pull request for Coqui, so they may integrate my changes. Honestly, their code seems very well written and it was very easy to read and comprehend. They did a great job with it.
724 | 
725 | As always, please leave comments, suggestions and issues you find so I can make sure they are addressed. Testers, developers and volunteers are also welcome to help out.  Please let me know if you would like to help out.
726 | 
727 | ## 30 Jul 2023 - Patched and Working
728 | 
729 | I forked the last working oobagooba/text-generation-webui I knew of that worked with macOS. I had to make some change=s to its code so it would process most of the model using Apple Silicon M1/M2 GPU. I am working on adding some of the new features of the latest oobabooga release, and have found further areas for optimization with Apple Silicon and macOS. I am working as fast as I can to get it upgraded as GGML encoded models are working quite well in my release. I have found some issues with object references in Python being corrupted and causing some processing to fall back to CPU. This is likely a problem for CUDA users due to the extensive use of global variables in the core oobabooga code. It's taking quite a bit of effort to decouple things, but after I do some of that, performance should improve even more.  Once I have that done, I want to incorporate RoPE, SupertHOT 8K context windows, and new Llama2 support. the last item shouldn’t be terribly difficult since it's built into the GGML libraries which hare part of llama.cpp.
730 | 
731 | If you are interested in trying out the macOS patched version, please grab it from here: [text-generation-webui-macos](https://github.com/unixwzrd/text-generation-webui-macos)
732 | 
733 | I hope to have an update out within the week. Again, anyone who want to test, provide feedback, comments, or ideas, let me know or use the "Discussions", at the top of the GitHub page and add to the discussion or start a new one. Let's help the personal AI on Apple Silicon and macOS grow together.
734 | 
735 | ## 28 Jul 2023 - More Testers (QA)
736 | 
737 | I've had a fe more people contact me with issues and that's a good thing because it shows me there is an interest in what I am trying to do here and that people are actually trying my procedures out and having decent success.
738 | 
739 | I want to start getting more features into the fork I created like Llama2 support. If I can do that, the next ting I will likely do is start looking at some of the performance enhancements I have thought of as well as trying to fix a couple of UI/UX annoyances and a scripted installation and...
740 | 
741 | If anyone would like to help out, please let me know.
742 | 
743 | ## 27 Jul 2023 - More llama.cpp Testing
744 | 
745 | Earlier problems with the new llama-cpp-python worked out. Seems setting **--n-gpu-layers** to very big numbers is not good anymore. It will result in overallocation of the context's memory pool and this error:
746 | 
747 | ggml_new_tensor_impl: not enough space in the context's memory pool (needed 19731968, available 16777216)
748 | Segmentation fault: 11
749 | 
750 | An easy way to see how many layers a model uses is to turn on verbose mode and look for this in the output of STDERR:
751 | 
752 | **llama_model_load_internal: n_layer    = 60**
753 | 
754 | It's right near the start of the output when loading the model. Apparently the huge numbers above the number of layers is not best to, "*Set this to 1000000000 to offload all layers to the GPU.*" breaks the context's memory pool. I haven't figured out the proper high setting for this, but you can get the number easily enough by loading your model and looking for the **n_layer** line, then unload the model and put that into n-gpu-layers in the Models tab. BE sure to set it so it's save for the nest time you load the same model.
755 | 
756 | The output of STDERR is also a good place to validate if your GPU is actually being accessed if you see lines with **ggml_metal_init** at the start of them. It doesn't necessarily mean it's being used, only that llamacpp sees it and is loading supported code for it. Unload the model and then load it again with the new settings.
757 | 
758 | Someone gets a HUGE thank you for being the first person to give feedback and help me make things better! They actually went through my instructions and gave me some feedback, spotted a few typos and found things to be useful.  You know who you are! 👍
759 | 
760 | Someone else also asked if this would work for Intel, I tried, but the python which comes with Conda is compiled for i386, which should work(?) but doesn't and should be x86_64.  Might work for Intel macOS, but would be difficult when you try getting Conda to install PyTorch, that won't work well. I'm sure I could hack it to make it work, but that would be a nasty hack. Not only that, I was trying to run things on a 32GB MacBook Pro and having memory issues, I doubt many Intel Macs out there have much more than 32GB and even though they have unified memory, my bet is they would still be slow. I gave up when I figured out that Conda wouldn't install on my 16GB Intel MacBook Pro. Never thought I'd need tat much RAM, but initially I was going to get 64GB and then swapped my 36GB Apple Silicon MBP for 96BG. 😮
761 | 
762 | If anyone is interested in helping out with this effort, please let me know.  I'm in the oobaboga Discord #mac-setup channel a good bit, or you may reach me through GitHub.
763 | 
764 | ## 25 Jul 2023 - macOS version patched and working
765 | 
766 | I managed to get the code back together from an unwanted pull of future commits, I had things mis-configured on my side. The patches are applied and it just needs some testing.  So far I have only really briefly tested with a llama 30B 4bit quantized model and I am getting very reasonable response times, though there it is running a range of 1-12 tokens per second. It seemed like more yesterday, but it's still reasonable.
767 | 
768 | I have not tested much more than a basic llama which was 4 bit quantized. I will try to test more today and tomorrow.
769 | 
770 | If anyone else is interested in testing and validating what works and what doesn't, please let me know.
771 | 
772 | ## 25 Jul 2023 - Wrong Commit Point
773 | 
774 | I merged with one commit too far ahead when I created the created the dev-ms branch with a merge back to the oobabooga main branch. I'll need a bit of time to sort the code out.  Until then, I don't know of a working version around. I'll have to sort through my local repository and see if I have something I can create a new repository with or revert to a previous commit.
775 | 
776 | I'll update the status on my repository and here when I get it sorted out.
777 | 
778 | ## 24 Jul 2023 - macOS Broken with oobabooga Llama2 support
779 | 
780 | The new oobabooga does not support macOS anymore.  I am removing the fork I was working on because there are code changed specifically for Windows and Linux which are not installed on macOS, so the default repository is now the one I generated a pull request for to fix things so Apple Silicon M1 and M2 machines would use GPU's.  It's going to get it sorted out, but I will do it as soon as I can.  Here's the command to clone the repository and if you have any problems with it, let me know.
781 | 
782 | ```bash
783 | git clone https://github.com/unixwzrd/text-generation-webui-macos.git
784 | ```
785 | 
786 | ## 24 July 2023 - LLaMa Python Package Bumped
787 | 
788 | New Python llama-cpp-python out. Need to be installed before loading running the new version of oobabooga with Llama2 support.
789 | 
790 | Same command top update as yesterday, it will grab llama-cpp-python.0.1.77.
791 | 
792 | I'm trying things out now here.
793 | 
794 | ## 23 Jul 2023 - LLaMA support in llama-cpp-python
795 | 
796 | Ok, a big week for LLaMa users, increased context size roiling out with RoPE and LLaMA 2.  I think I have a new recipe which works for getting the llama-cpp-python package working with MPS/Metal support on Apple Silicon.  I will go into it in more detail in another document, but wanted to get this out to as many as possible, as soon as possible.  It seems to work and I am getting reasonable response times, though some hallucinating. CAn't be sure where the hallucinations are coming from, my hyperparameter settings, or incompatibilities in various submodule versions which will take a bit of time to catch up. Here's how to update llama-cpp-python quickly. I will go into more detail later.
797 | 
798 | ### Installing from PyPi
799 | 
800 | ```bash
801 | # Take a checkpoint of your venv, incase you have to roll back.
802 | conda create --clone ${CONDA_DEFAULT_ENV} -n new-llama-cpp
803 | conda activate new-llama-cpp
804 | pip uninstall -y llama-cpp-python
805 | CMAKE_ARGS="--fresh -DLLAMA_METAL=ON -DLLAMA_OPENBLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
806 |     FORCE_CMAKE=1 \
807 |     pip install --no-cache --no-binary :all: --upgrade --compile llama-cpp-python
808 | ```
809 | 
810 | The --fresh in the CMAKE_FLAGS is not really necessary, but won't affect anything unless you decide to download the llama-cpp-python repository, build, and install from source.  That's bleeding edge, but if you want to do that you also need to use this git command line and update your local package source directory of just create a new one with teh git clone. The BLAS setting changed and only apply if you've built and installed OpenBlAS yourself. Instructions are in my two guides mentioned above.
811 | 
812 | ### Installing from source
813 | 
814 | ```bash
815 | conda create --clone ${CONDA_DEFAULT_ENV} --n new-llama-cpp
816 | conda activate new-llama-cpp
817 | git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
818 | pip uninsatll -y llama-cpp-python
819 | cd llama-cpp-python
820 | CMAKE_ARGS="--fresh -DLLAMA_METAL=ON -DLLAMA_OPENBLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
821 |     FORCE_CMAKE=1 \
822 |     pip install --no-cache --no-binary :all: --upgrade --compile -e .
823 | ```
824 | 
825 | **NOTE** when you run this you will need to make sure whatever application is using this is specifying number of GPU or GPU layers greater than zero, it should be at least one for teh GGML library to allocate space in the Apple Silicon M1 or M2 GPU space.
826 | 
827 | ## 23 Jul 2023 - Things are in a state of flux for Llamas
828 | 
829 | It seems that there have been many updates the past few days to account for handling the LLaMa 2 release and the software is so new, not all the bugs are out yet. In the past three days, I have updated my llama-cpp-python module about 3 times and now I'm on release 0.1.74. I'm not sure when things will stabilize, but right before the flurry of LLaMa updates, I saw much improved performance on language models using the modules and packages installed using my procedures here.  My token generation was up to a fairly consistent 6 tokens/sec with good response time for inference. I'm going to see how this new llama-cpp-python works and then turn my attention elsewhere until the dust settles.
830 | 
831 | I submitted a couple of changes to oobabooga/text-generation-webui, but not sure when those changes will be pushed out. I will probably fork a copy of the repository and path it here, making it available until my changes are incorporated into the main branch for general availability. I should, hopefully have that a little later today, as long as git cooperates with me. I will be the first to admit I am not great with Git, so learning VSCode and using Git have been kinda rough on me as I come from a very non-Windows environment and have used many other version control systems, but never used Git much. I will probably get the hang of it soon and finish making the transition from using vi in a terminal window to a GUI development environment like VSCode.  At least it has a Vim module to plugin, now if they can get "focus follows mouse" to work within a window for the different frames, I'll be very happy.
832 | 
833 | ## 20 Jul 2023 - Rebuilt things *Again* because many modules were updated
834 | 
835 | Many modules were bumped in version and some support was added for the new LLaMa 2 models.  I don't seem to have everything working, but did identify one application issue which will increase performance fro MPS, if not for CUDA.
836 | 
837 | The two TTS modules use the same Global model variable in them, so model gets clobbered if you use them. I've submitted a pull request for this. [Dev ms #3232](https://github.com/oobabooga/text-generation-webui/pull/3232) and filed a bug report [Use of global variable model in ElevenLabs and Silero extensions clobbers application global model](https://github.com/oobabooga/text-generation-webui/issues/3234). This was my first time submitting a pull request and submitting a bug report, took a long time to actually figure out how to do it, but maybe there is an easier way than what I did.  Anyway, with this fix, macOS users with M1/M2 processors should see a vast performance improvement if you are using either of these TTS extensions.
838 | 
839 | ## 19 Jul 2023 - New information on building llama-cpp-python
840 | 
841 | Instructions have been updated.  Also, there were some corrections as I was rushed getting this done.  If you find any errors are think or a better way to do things, let me know.
842 | 
843 | ## 19 Jul 2023 - NEW llama-cpp-python
844 | 
845 | Haven't tested it yet, but here's how to update yours.  Will change this with the results of my testing.
846 | 
847 | ```bash
848 | CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_OPENBLAS=on -DLLAMA_BLAS_VENDOR=OpenBLAS" \
849 |     FORCE_CMAKE=1 \
850 |     pip install --no-cache --no-binary :all: --force-reinstall --upgrade --compile llama-cpp-python
851 | ```
852 | 


--------------------------------------------------------------------------------