├── src
├── __init__.py
├── data_processing
│ ├── book_processing.py
│ ├── data_creation_ideas.md
│ ├── baseline_chunker.py
│ ├── convert_to_markdown.py
│ ├── bm25_func.py
│ ├── README.md
│ ├── semantic_chunker.py
│ ├── jsonl_utils.py
│ ├── prepare_training_data.py
│ └── convert_hf_dataset_format.py
├── finetuning
│ ├── lora_config.yaml
│ ├── download_qwen3.py
│ ├── finetune_qwen3.sh
│ └── convert_qwen3.py
├── inference
│ ├── generate_qwen_vlm_notebook.py
│ ├── generate-qwen-vlm.py
│ └── generate_qwen3.py
└── evaluations
│ └── run_evaluations.py
├── .python-version
├── main.py
├── mlx-quantization
├── eval_mlx-community_GLM-4.5-Air-5bit_0.4.9_mmlu_pro_computer_science
├── requirements.txt
├── LICENSE
├── CHANGELOG.md
├── CLAUDE.md
├── README.md
├── dwq_quantization.ipynb
└── awq_quantization.ipynb
├── CONVERT
├── eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7_0.4.9_mmlu_pro_computer_science
├── eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8_0.4.9_mmlu_pro_computer_science
├── eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8_0.4.9_mmlu_pro_computer_science
├── eval_many_4.sh
├── eval_many2.sh
├── eval_many3.sh
├── eval_many_2a.md
├── convert_qwen_coder3.sh
├── convert_many2.sh
├── eval_many.sh
├── convert_qwen_coder.sh
├── convert_qwen_coder2.sh
├── conversion_recipies.md
├── results.md
└── convert_many.sh
├── .gitignore
├── project_setup.sh
├── pyproject.toml
├── .cursor
└── rules
│ └── repo-overview.mdc
├── old_pyproj.txt
├── examples_scratchpad.sh
├── README.md
└── split_pdf_pages.py
/src/__init__.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/.python-version:
--------------------------------------------------------------------------------
1 | 3.13.9
2 |
--------------------------------------------------------------------------------
/src/data_processing/book_processing.py:
--------------------------------------------------------------------------------
1 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | def main():
2 | print("Hello from mlx-finetune-demo!")
3 |
4 |
5 | if __name__ == "__main__":
6 | main()
7 |
--------------------------------------------------------------------------------
/mlx-quantization/eval_mlx-community_GLM-4.5-Air-5bit_0.4.9_mmlu_pro_computer_science:
--------------------------------------------------------------------------------
1 | {
2 | "mmlu_pro_computer_science": {
3 | "alias": "computer_science",
4 | "exact_match,custom-extract": 0.7634146341463415,
5 | "exact_match_stderr,custom-extract": 0.021014183737081388
6 | }
7 | }
--------------------------------------------------------------------------------
/CONVERT/eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7_0.4.9_mmlu_pro_computer_science:
--------------------------------------------------------------------------------
1 | {
2 | "mmlu_pro_computer_science": {
3 | "alias": "computer_science",
4 | "exact_match,custom-extract": 0.7926829268292683,
5 | "exact_match_stderr,custom-extract": 0.020044980247224457
6 | }
7 | }
--------------------------------------------------------------------------------
/CONVERT/eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8_0.4.9_mmlu_pro_computer_science:
--------------------------------------------------------------------------------
1 | {
2 | "mmlu_pro_computer_science": {
3 | "alias": "computer_science",
4 | "exact_match,custom-extract": 0.7878048780487805,
5 | "exact_match_stderr,custom-extract": 0.02021693788475414
6 | }
7 | }
--------------------------------------------------------------------------------
/CONVERT/eval_Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8_0.4.9_mmlu_pro_computer_science:
--------------------------------------------------------------------------------
1 | {
2 | "mmlu_pro_computer_science": {
3 | "alias": "computer_science",
4 | "exact_match,custom-extract": 0.7926829268292683,
5 | "exact_match_stderr,custom-extract": 0.020044980247224453
6 | }
7 | }
--------------------------------------------------------------------------------
/src/finetuning/lora_config.yaml:
--------------------------------------------------------------------------------
1 | lora_parameters:
2 | rank: 256 # LoRA rank (dimension of the adapter matrices)
3 | dropout: 0.05 # Dropout applied to the LoRA matrices
4 | scale: 12.0 # Scaling factor for the LoRA update (higher means more influence) - original 10.0
5 | learning_rate: 8e-6 # Overrides LEARNING_RATE in the bash script
6 |
--------------------------------------------------------------------------------
/CONVERT/eval_many_4.sh:
--------------------------------------------------------------------------------
1 |
2 |
3 | echo "mlx-community/XBai-o4-8bit"
4 | mlx_lm.evaluate --model mlx-community/mlx-community/XBai-o4-8bit --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
5 |
6 |
7 | echo "mlx-community/XBai-o4-4bit-DWQ"
8 | mlx_lm.evaluate --model mlx-community/XBai-o4-4bit-DWQ --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | DATA
2 | *.json
3 | **/*.json
4 | **/runs
5 | **/MLX-Llama*
6 | mlx-pretrain/runs/*
7 | mlx-pretrain/MLX-Llama*
8 | **/*.jsonl
9 | *.jsonl
10 | mlx-pretrain/MLX-*
11 | **/__pycache__
12 | __pycache__
13 | ADAPTERS
14 | mlx_models
15 | **/sacredhunger.txt
16 | **/allthekingsmen.txt
17 | Qwen*DWQ*
18 | **/*.egg-info
19 | mlx-quantization/models
20 | uv.lock
21 | text_output*.md
22 | .DS_Store
23 | output*.md
24 |
--------------------------------------------------------------------------------
/CONVERT/eval_many2.sh:
--------------------------------------------------------------------------------
1 | echo "mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16"
2 | mlx_lm.evaluate --model mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
3 |
4 | echo "mlx-community/Qwen3-30B-A3B-Thinking-2507-bf16"
5 | mlx_lm.evaluate --model mlx-community/Qwen3-30B-A3B-Thinking-2507-bf16 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
--------------------------------------------------------------------------------
/CONVERT/eval_many3.sh:
--------------------------------------------------------------------------------
1 | echo "mlx-community/cogito-v2-preview-llama-70B-4Bit"
2 | mlx_lm.evaluate --model mlx-community/cogito-v2-preview-llama-70B-4Bit --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
3 |
4 | echo "mlx-community/GLM-4.5-Air-5bit"
5 | mlx_lm.evaluate --model mlx-community/GLM-4.5-Air-5bit --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
6 |
7 | https://github.com/cs2764/mlx-quantization
8 | # dynamic quantization
9 |
10 |
--------------------------------------------------------------------------------
/project_setup.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | set -e # Exit immediately if a command exits with a non-zero status
3 |
4 | # Upgrade pip and install poetry
5 | pip install --upgrade pip
6 | pip install poetry
7 |
8 | # Update the lock file if necessary
9 | poetry lock
10 |
11 | # Install dependencies and the project
12 | poetry install
13 |
14 | # Create and install the IPython kernel for the project
15 | python -m ipykernel install --sys-prefix --name=mlx3129 --display-name "MLX 3.12.9"
16 |
17 | echo "Jupyter kernel 'mlx3129' has been installed."
18 |
19 |
20 | echo "Project setup complete!"
--------------------------------------------------------------------------------
/CONVERT/eval_many_2a.md:
--------------------------------------------------------------------------------
1 | echo "Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7"
2 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
3 |
4 | echo "Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8"
5 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
6 |
7 | echo "Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9"
8 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [project]
2 | name = "mlx-finetune-demo"
3 | version = "0.1.0"
4 | description = "Add your description here"
5 | readme = "README.md"
6 | requires-python = ">=3.13"
7 | dependencies = [
8 | #"mlx @ git+https://github.com/ml-explore/mlx",
9 | "mlx-lm @ git+https://github.com/ml-explore/mlx-lm.git",
10 | "mlx",
11 | "mlx-vlm",
12 | #"mlx-lm",
13 | "datasets",
14 | "transformers",
15 | "huggingface_hub",
16 | "ipykernel",
17 | "jupyter",
18 | "ipywidgets",
19 | "torch",
20 | "torchvision",
21 | "lm_eval",
22 | "datasets",
23 | "accelerate",
24 | "sentencepiece",
25 | "protobuf",
26 | "evaluate",
27 | "hf_transfer",
28 | "gradio",
29 | "pymupdf",
30 | "pdf2image",
31 | ]
32 |
--------------------------------------------------------------------------------
/mlx-quantization/requirements.txt:
--------------------------------------------------------------------------------
1 | # Core MLX Framework
2 | mlx-lm>=0.12.0
3 |
4 | # Machine Learning Libraries
5 | torch>=2.0.0
6 | transformers>=4.40.0
7 | tokenizers>=0.15.0
8 | datasets>=2.16.0
9 | accelerate>=0.25.0
10 |
11 | # Hugging Face Integration
12 | huggingface_hub>=0.20.0
13 |
14 | # Text Processing
15 | sentencepiece>=0.1.99
16 | protobuf>=4.21.0
17 |
18 | # Quantization Support
19 | bitsandbytes>=0.41.0
20 |
21 | # Jupyter Environment
22 | jupyter>=1.0.0
23 | jupyterlab>=4.0.0
24 | ipywidgets>=8.0.0
25 |
26 | # Progress Bars and Utilities
27 | tqdm>=4.65.0
28 | numpy>=1.24.0
29 | scipy>=1.10.0
30 |
31 | # File Handling
32 | safetensors>=0.4.0
33 |
34 | # Optional Performance Libraries
35 | psutil>=5.9.0
36 | matplotlib>=3.7.0
37 | seaborn>=0.12.0
38 |
39 | # Development Tools (Optional)
40 | black>=23.0.0
41 | flake8>=6.0.0
42 | isort>=5.12.0
--------------------------------------------------------------------------------
/.cursor/rules/repo-overview.mdc:
--------------------------------------------------------------------------------
1 | ---
2 | description:
3 | globs:
4 | alwaysApply: true
5 | ---
6 | Most of this project's unique code lives in the `src` folder and the subfolders within it.
7 | There are multiple reference projects included within the project root. These folders are prefaced with `mlx-`, for example `mlx-lm`, `mlx-vlm` as well as `synthetic-data-kit` etc.. Never modify existing files in those folders. Literally never. Instead, when you would like to modify anything there, always make a copy in the `src/copies` folder and make your changes there.
8 |
9 | Also note that `mlx-examples` may have some overlap with some of the other projects, and should in most cases be disregarded, unless the information is not available in the other project. So, for example, for LLM-related inference and post-training examples, the `mlx-lm` folder should be the preferred source, but there may occasionally be information that is only available in the `mlx-examples` folder.
--------------------------------------------------------------------------------
/src/data_processing/data_creation_ideas.md:
--------------------------------------------------------------------------------
1 | ARXIV PAPERS
2 | 1. Get arxiv paper and convert to markdown
3 | 2. Get the abstract and the first section - ask LLM to summarize these.
4 | 3. Prompt is: Given the above summary, write the next section of the paper titled:
.
5 | 4. Repeat, but keep re-doing the previous steps, so each summary will have include more and more of the paper.
6 |
7 | Note: Need to have a pretty good rubric fo instructing the LLM as to how it should create its summaries.
8 |
9 | BOOK SECTION CONTINUATION
10 | 1. Get the first section.
11 | 3. Prompt is: This is an excerpt from a novel. Write the next {1 paragraphs, 2 pargraphs, etc.} of the book. Use the same style as the excerpt. Make sure that while stylistically similar, the new section moves the story forward and/or develops the characters and/or adds new information or in some way continues on meaningfully from the previous section.
12 | 4. Repeat.
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
--------------------------------------------------------------------------------
/CONVERT/convert_qwen_coder3.sh:
--------------------------------------------------------------------------------
1 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr3e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 3e-7 --group-size 32 --bits 6
2 | touch Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr3e-7/README.md
3 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr3e-7 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr3e-7
4 |
5 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr9e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 9e-8 --group-size 32 --bits 6
6 | touch Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr9e-8/README.md
7 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr9e-8 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr9e-8
8 |
9 | mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr3e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
10 | mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
11 |
--------------------------------------------------------------------------------
/mlx-quantization/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 MLX Quantization Toolkit
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
--------------------------------------------------------------------------------
/old_pyproj.txt:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["poetry-core"]
3 | build-backend = "poetry.core.masonry.api"
4 |
5 | [tool.poetry]
6 | name = "mlx_finetune_demo"
7 | version = "0.1.0"
8 | description = "MLX Finetuning Demo"
9 | authors = ["Jeff Coggshall "]
10 | readme = "README.md"
11 | packages = [
12 | { include = "src" }
13 | ]
14 | license = "MIT"
15 |
16 | [tool.poetry.dependencies]
17 | python = ">=3.12"
18 | pip = "*"
19 | accelerate = "*"
20 | mlx = "*"
21 | mlx-lm = "*"
22 | mlx_optimizers = "*"
23 | markitdown = "*"
24 | PyYAML = "*"
25 | tokenizers = "*"
26 | numpy = "*"
27 | pandas = "*"
28 | matplotlib = "*"
29 | datasets = "*"
30 | transformers = "*"
31 | huggingface_hub = "*"
32 | hf_transfer = "*"
33 | ipykernel = "*"
34 | sentencepiece = "*"
35 | torch = "*"
36 | torchao = "*"
37 | torchvision = "*"
38 | torchaudio = "*"
39 | fairscale = "*"
40 | fire = "*"
41 | jax = "*"
42 | flax = "*"
43 | optax = "*"
44 | einops = "*"
45 | diffusers = "*"
46 | tqdm = "*"
47 | rank-bm25 = {git = "https://github.com/dorianbrown/rank_bm25.git"}
48 | sentence-transformers = "*"
49 |
50 | [tool.poetry.scripts]
51 | # Add command line scripts here
52 |
53 |
--------------------------------------------------------------------------------
/CONVERT/convert_many2.sh:
--------------------------------------------------------------------------------
1 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 8e-7 --group-size 32 --bits 6
2 | touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7/README.md
3 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-7
4 |
5 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 1e-8 --group-size 32 --bits 6
6 | touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8/README.md
7 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr1e-8
8 |
9 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9 --max-seq-length 2048 --batch-size 4 --learning-rate 5e-9 --group-size 32 --bits 6
10 | touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9/README.md
11 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-9
12 |
13 | #========================================
14 |
--------------------------------------------------------------------------------
/src/data_processing/baseline_chunker.py:
--------------------------------------------------------------------------------
1 | # baseline_chunker.py
2 | import re
3 | from pathlib import Path
4 | from typing import List, Iterable
5 |
6 | def load_paragraphs(path: str | Path) -> List[str]:
7 | raw = Path(path).read_text(encoding="utf‑8")
8 | # collapse Windows/Mac line endings, then split on 2+ newlines
9 | return [p.strip() for p in re.split(r"\n{2,}", raw) if p.strip()]
10 |
11 | def chunk_paragraphs(paragraphs: Iterable[str],
12 | target_words: int = 1000,
13 | overlap_paragraphs: int = 0) -> List[str]:
14 | chunks, current, cur_count = [], [], 0
15 | for p in paragraphs:
16 | p_words = len(p.split())
17 | # if adding this paragraph would push us *over* the target,
18 | # flush what we’ve got (unless empty) and start anew
19 | if current and cur_count + p_words > target_words:
20 | chunks.append("\n\n".join(current))
21 | # start next chunk with optional overlap from the *end*
22 | current = current[-overlap_paragraphs:] if overlap_paragraphs else []
23 | cur_count = sum(len(x.split()) for x in current)
24 | current.append(p)
25 | cur_count += p_words
26 | if current:
27 | chunks.append("\n\n".join(current))
28 | return chunks
29 |
30 | if __name__ == "__main__":
31 | import argparse, json
32 | ap = argparse.ArgumentParser()
33 | ap.add_argument("book_path")
34 | ap.add_argument("--size", type=int, default=1000,
35 | help="≈ words per chunk (default 1000)")
36 | ap.add_argument("--overlap", type=int, default=0,
37 | help="paragraphs to repeat between chunks")
38 | args = ap.parse_args()
39 |
40 | paras = load_paragraphs(args.book_path)
41 | chunks = chunk_paragraphs(paras, args.size, args.overlap)
42 | print(json.dumps({"chunks": chunks, "count": len(chunks)}, indent=2))
43 |
--------------------------------------------------------------------------------
/CONVERT/eval_many.sh:
--------------------------------------------------------------------------------
1 | # mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
2 |
3 | # echo "Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7"
4 | # mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
5 |
6 | echo "Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7"
7 | mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
8 |
9 | echo "Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8"
10 | mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
11 |
12 | echo "Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8"
13 | mlx_lm.evaluate --model Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
14 |
15 | echo "Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7"
16 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
17 |
18 | echo "Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7"
19 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
20 |
21 | echo "Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8"
22 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
23 |
24 | echo "Qwen3-30B-A3B-Instruct-2507-bit-DWQ-lr5e-7"
25 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-bit-DWQ-lr5e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
26 |
27 | echo "Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8"
28 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
29 |
--------------------------------------------------------------------------------
/CONVERT/convert_qwen_coder.sh:
--------------------------------------------------------------------------------
1 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7 --max-seq-length 2048 --batch-size 4 --learning-rate 2e-7 --group-size 32 --bits 8 --data-path voxmenthe/merged-sft-coding-mix2
2 | touch Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7/README.md
3 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7
4 |
5 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7 --max-seq-length 2048 --batch-size 4 --learning-rate 4e-7 --group-size 32 --bits 8 --data-path voxmenthe/merged-sft-coding-mix2
6 | touch Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7/README.md
7 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7
8 |
9 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8 --max-seq-length 2048 --batch-size 4 --learning-rate 9e-8 --group-size 32 --bits 8 --data-path voxmenthe/merged-sft-coding-mix2
10 | touch Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8/README.md
11 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8
12 |
13 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 5e-8 --group-size 32 --bits 8 --data-path voxmenthe/merged-sft-coding-mix2
14 | touch Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8/README.md
15 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8
16 |
17 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6 --max-seq-length 2048 --batch-size 4 --learning-rate 1e-6 --group-size 32 --bits 8 --data-path voxmenthe/merged-sft-coding-mix2
18 | touch Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6/README.md
19 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6
--------------------------------------------------------------------------------
/CONVERT/convert_qwen_coder2.sh:
--------------------------------------------------------------------------------
1 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr2e7 --max-seq-length 2048 --batch-size 4 --learning-rate 2e-7 --group-size 32 --bits 5 --data-path voxmenthe/merged-sft-coding-mix2
2 | touch Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr2e7/README.md
3 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr2e7 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr2e7
4 |
5 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr4e7 --max-seq-length 2048 --batch-size 4 --learning-rate 4e-7 --group-size 32 --bits 5 --data-path voxmenthe/merged-sft-coding-mix2
6 | touch Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr4e7/README.md
7 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr4e7 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr4e7
8 |
9 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr9e8 --max-seq-length 2048 --batch-size 4 --learning-rate 9e-8 --group-size 32 --bits 5 --data-path voxmenthe/merged-sft-coding-mix2
10 | touch Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr9e8/README.md
11 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr9e8 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr9e8
12 |
13 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr5e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 5e-8 --group-size 32 --bits 5 --data-path voxmenthe/merged-sft-coding-mix2
14 | touch Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr5e-8/README.md
15 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr5e-8 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr5e-8
16 |
17 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr1e-6 --max-seq-length 2048 --batch-size 4 --learning-rate 1e-6 --group-size 32 --bits 5 --data-path voxmenthe/merged-sft-coding-mix2
18 | touch Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr1e-6/README.md
19 | mlx_lm.upload --path ./Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr1e-6 --upload-repo mlx-community/Qwen3-Coder-30B-A3B-Instruct-5bit-DWQ-lr1e-6
--------------------------------------------------------------------------------
/mlx-quantization/CHANGELOG.md:
--------------------------------------------------------------------------------
1 | # Changelog
2 |
3 | All notable changes to the MLX Quantization Toolkit will be documented in this file.
4 |
5 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6 | and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7 |
8 | ## [1.0.0] - 2025-01-30
9 |
10 | ### Added
11 | - Initial release of MLX Quantization Toolkit
12 | - Universal MLX converter for any Hugging Face model
13 | - DeepSeek-R1 AWQ to MLX conversion support
14 | - AWQ (Activation-aware Weight Quantization) implementation
15 | - DWQ (Distilled Weight Quantization) implementation
16 | - Dynamic mixed-precision quantization support
17 | - Comprehensive Jupyter notebook workflow
18 | - Automated model download and upload to Hugging Face
19 | - Performance benchmarking and validation tools
20 | - Apple Silicon optimization for M1/M2/M3/M4 devices
21 | - Robust error handling and fallback mechanisms
22 | - Complete documentation and usage examples
23 |
24 | ### Features
25 | - **5 Quantization Methods**: Universal, DeepSeek-R1, AWQ, DWQ, Dynamic
26 | - **Apple Silicon Optimized**: Native MLX framework integration
27 | - **Automated Workflows**: End-to-end conversion pipelines
28 | - **Performance Testing**: Built-in benchmarking tools
29 | - **Hugging Face Integration**: Seamless model upload/download
30 | - **Error Recovery**: Multiple fallback conversion methods
31 |
32 | ### Technical Specifications
33 | - **Python**: 3.8+ required
34 | - **Hardware**: Apple Silicon (M1/M2/M3/M4) required
35 | - **Storage**: 50GB+ recommended for large models
36 | - **Memory**: 16GB+ RAM recommended
37 | - **MLX Version**: 0.12.0+ supported
38 |
39 | ### Supported Models
40 | - Any Hugging Face transformer model
41 | - DeepSeek-R1 AWQ models (specialized support)
42 | - Large language models up to 70B+ parameters
43 | - Various architectures: Llama, Mistral, Qwen, etc.
44 |
45 | ### Performance Metrics
46 | - Model size reduction: 60-80%
47 | - Inference speed improvement: 2-4x on Apple Silicon
48 | - Quality retention: 95-99% of original performance
49 | - Memory usage reduction: 50-75%
50 |
51 | ### Documentation
52 | - Comprehensive README with setup instructions
53 | - Individual notebook documentation
54 | - Usage examples and best practices
55 | - Troubleshooting guide
56 | - Performance benchmarking results
--------------------------------------------------------------------------------
/src/data_processing/convert_to_markdown.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | """
3 | convert_to_markdown.py
4 |
5 | Usage
6 | -----
7 | python convert_to_markdown_jsonl.py \
8 | --input texts.txt # one snippet per line
9 | --output converted.jsonl
10 | """
11 |
12 | import argparse
13 | import json
14 | import re
15 | import tempfile
16 | from pathlib import Path
17 | from typing import List
18 |
19 | from markitdown import MarkItDown # official API :contentReference[oaicite:1]{index=1}
20 |
21 |
22 | def guess_suffix(text: str) -> str:
23 | """Very small heuristic to give MarkItDown the right file extension."""
24 | if re.search(r"<\s*html[^>]*>", text, re.I):
25 | return ".html"
26 | if text.lstrip().startswith("#"):
27 | return ".md"
28 | return ".txt"
29 |
30 |
31 | def convert_snippets(snippets: List[str]) -> List[dict]:
32 | """Return a list of {'raw', 'markdown'} dictionaries."""
33 | md = MarkItDown(enable_plugins=False) # one converter for all calls
34 | records = []
35 |
36 | for snippet in snippets:
37 | # Write the snippet to a NamedTemporaryFile so MarkItDown
38 | # can treat it like a real file
39 | suffix = guess_suffix(snippet)
40 | with tempfile.NamedTemporaryFile("w+b", suffix=suffix, delete=True) as tf:
41 | tf.write(snippet.encode("utf-8"))
42 | tf.flush() # ensure bytes are written
43 | result = md.convert(tf.name) # convert returns a DocumentResult
44 | records.append({"raw": snippet, "markdown": result.text_content})
45 | return records
46 |
47 |
48 | def main() -> None:
49 | parser = argparse.ArgumentParser(description="Bulk convert text → Markdown (JSONL)")
50 | parser.add_argument("--input", required=True,
51 | help="Text file with one snippet per line")
52 | parser.add_argument("--output", required=True,
53 | help="Destination .jsonl file")
54 | args = parser.parse_args()
55 |
56 | # Load snippets (blank lines are ignored)
57 | with Path(args.input).expanduser().open(encoding="utf-8") as f:
58 | snippets = [line.rstrip("\n") for line in f if line.strip()]
59 |
60 | conversions = convert_snippets(snippets)
61 |
62 | # Write JSONL
63 | with Path(args.output).expanduser().open("w", encoding="utf-8") as out:
64 | for rec in conversions:
65 | out.write(json.dumps(rec, ensure_ascii=False) + "\n")
66 |
67 | print(f"✅ Wrote {len(conversions):,} records to {args.output}")
68 |
69 |
70 | if __name__ == "__main__":
71 | main()
72 |
--------------------------------------------------------------------------------
/CONVERT/conversion_recipies.md:
--------------------------------------------------------------------------------
1 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-4bit-DWQ --max-seq-length 2048 --batch-size 4 --learning-rate 1e-7 --group-size 32 --bits 4
2 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-4bit-DWQ --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit-DWQ
3 | mlx_lm.generate --model mlx-community/Qwen3-30B-A3B-Instruct-2507-4bit-DWQ --max-tokens 4096 --temp 0.7 -p "Explain why the Soviet Union didn't collapse earlier than it did"
4 |
5 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-8bit-DWQ --max-seq-length 2048 --batch-size 4 --learning-rate 8e-8 --group-size 32 --bits 8
6 | Qwen3-30B-A3B-Instruct-2507-8bit-DWQ/README.md
7 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-8bit-DWQ --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-8bit-DWQ
8 |
9 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ --max-seq-length 2048 --batch-size 4 --learning-rate 1e-7 --group-size 32 --bits 6
10 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ/README.md
11 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ
12 | TODO:
13 | mlx_lm.generate --model mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ --max-tokens 4096 --temp 0.7 -p "Explain why the Soviet Union didn't collapse earlier than it did"
14 |
15 |
16 | "mlabonne/open-perfectblend"
17 |
18 | ValueError: Unsupported data format, check the supported formats here:
19 | https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/LORA.md#Data.
20 | --data-path
21 | "--data-path",
22 | type=str,
23 | default="allenai/tulu-3-sft-mixture",
24 | 'voxmenthe/merged-sft-coding-mix2'
25 | models: zai-org/GLM-4.5-Air
26 |
27 | mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-8bit-DWQ --max-seq-length 2048 --batch-size 4 --learning-rate 8e-8 --group-size 32 --bits 8
28 |
29 |
30 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ --max-seq-length 2048 --batch-size 4 --learning-rate 1e-7 --group-size 32 --bits 6
31 |
32 | <<<<<<< HEAD
33 | ======== evals
34 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
35 |
36 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
37 |
38 | mlx_lm.evaluate --model Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8 --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
39 | =======
40 | mlx_lm.dwq --model Qwen/Qwen3-Coder-30B-A3B-Instruct --mlx-path Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7 --max-seq-length 2048 --batch-size 4 --learning-rate 2e-7 --group-size 32 --bits 8
41 | >>>>>>> df719f3722138db30489c2f1a0920851392e99e9
42 |
--------------------------------------------------------------------------------
/examples_scratchpad.sh:
--------------------------------------------------------------------------------
1 |
2 |
3 | mlx_lm.lora \
4 | --model mlx_models/Qwen3-4B-mlx \
5 | --adapter-path ADAPTERS/qwen3_4b_lora_sacredhunger \
6 | --data DATA/SACREDHUNGER \
7 | --test
8 |
9 |
10 | cat temp_prompt.txt | python src/inference/generate_qwen3.py \
11 | --model-path mlx_models/Qwen3-4B-mlx \
12 | --adapter-path ADAPTERS/qwen3_4b_lora_sacredhunger \
13 | --prompt "-"
14 |
15 | cat temp_prompt.txt | python src/inference/generate_qwen3.py \
16 | --model-path mlx_models/Qwen3-4B-mlx \
17 | --adapter-path ADAPTERS/qwen3_4b_lora_sacredhunger \
18 | --prompt "-" \
19 | --repetition-penalty 1.1 \
20 | --temp 0.75 \
21 | --top-p 0.95
22 |
23 | # WITHOUT ADAPTER
24 | cat temp_prompt.txt | python src/inference/generate_qwen3.py \
25 | --model-path mlx_models/Qwen3-4B-mlx \
26 | --prompt "-" \
27 | --repetition-penalty 1.1 \
28 | --temp 0.75 \
29 | --top-p 0.95
30 |
31 | python prepare_training_data.py \
32 | --input_files semantic_chunks_480.json semantic_chunks_520.json semantic_chunks_680.json semantic_chunks_790.json \
33 | --output_dir ../../DATA/SACREDHUNGER/ \
34 | --train_ratio 0.9 \
35 | --seed 123
36 |
37 | python prepare_training_data.py \
38 | --input_files sacredhunger_350.json sacredhunger_480.json sacredhunger_520.json sacredhunger_570.json sacredhunger_680.json sacredhunger_730.json sacredhunger_790.json \
39 | --output_dir ../../DATA/SACREDHUNGER/ \
40 | --train_ratio 0.93 \
41 | --seed 211
42 |
43 | python prepare_training_data.py \
44 | --input_files allthekingsmen_480.json allthekingsmen_520.json allthekingsmen_680.json allthekingsmen_790.json \
45 | --output_dir ../../DATA/ALLTHEKINGSMEN/ \
46 | --train_ratio 0.9 \
47 | --seed 123
48 |
49 | cat temp_prompt.txt | python src/inference/generate_qwen3.py \
50 | --model-path mlx_models/Qwen3-14B-mlx \
51 | --adapter-path ADAPTERS/qwen3_14b_lora_sacredhunger_multi \
52 | --prompt "-" \
53 | --repetition-penalty 1.1 \
54 | --temp 0.75 \
55 | --top-p 0.95
56 |
57 | cat temp_prompt.txt | python src/inference/generate_qwen3.py \
58 | --model-path mlx_models/Qwen3-14B-mlx \
59 | --adapter-path ADAPTERS/qwen3_14b_dora_novels_sh_atkm \
60 | --prompt "-" \
61 | --repetition-penalty 1.1 \
62 | --temp 0.75 \
63 | --top-p 0.95
64 |
65 | books:
66 | Fatherland_HarrisRobert.txt
67 | Great Gatsby, The - Francis Scott Fitzgerald.txt
68 | Imperium_ANovelofAncientRo_HarrisRobert.txt
69 | LOTR.txt
70 | One Hundred Years of Solitude - Gabriel Garcia Marquez.txt
71 | OldManandtheSeaThe_ErnestHemingway.txt
72 | Pride_and_Prejudice.txt
73 | PaperTowns_JohnGreen.txt
74 | Pachinko_MinJinLee.txt
75 | RedSister-MarkLawrence.txt
76 | ToKillAMockingbird_HarperLee.txt
77 | TheMartian.txt
78 | TheMagicians1.txt
79 | TheMagicians2.txt
80 | TheMagicians3.txt
81 | TheLiontheWitchandtheWar_LewisCS_.txt
82 | TheGodfather.txt
83 | TheGraveyardBook.txt
84 | TheDaVinciCode_BrownDan.txt
85 | AWrinkleinTime(PuffinModer_LengleMadeleine.txt
86 | AdventuresofTomSawyerThe_MarkTwain.txt
87 | Bartimaeus1.txt
88 | Bartimaeus2.txt
89 | Bartimaeus3.txt
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 🚀 MLX Finetuning Demo Project
2 |
3 | ✨ A complete guide to setting up and running the MLX finetuning pipeline for your custom datasets
4 |
5 | ## 🛠️ Setup Instructions
6 |
7 | 1. **Create and activate virtual environment**
8 | ```bash
9 | python -m venv mlx_venv
10 | source mlx_venv/bin/activate # Linux/Mac
11 | # OR
12 | mlx_venv\Scripts\activate # Windows
13 | ```
14 |
15 | 2. **Install dependencies**
16 | ```bash
17 | sh project_setup.sh
18 | ```
19 |
20 | 3. **Prepare your dataset**
21 | - Place your source data file (long text format) in `/data/raw/`
22 | - Example dataset structure:
23 | ```
24 | data/
25 | raw/
26 | my_dataset.txt
27 | ```
28 |
29 | ## 🔄 Data Processing Pipeline
30 |
31 | 1. **Run semantic chunker**
32 | ```bash
33 | python src/data_processing/semantic_chunker.py \
34 | --book_path data/raw/your_book.txt \
35 | --target 480 \
36 | --output_path data/processed/chunks.json
37 | ```
38 | - `--target`: Target word count per chunk (default: 480)
39 | - Uses `lightonai/modernbert-embed-large` model by default
40 |
41 | 2. **Prepare training data**
42 | ```bash
43 | python src/data_processing/prepare_training_data.py \
44 | --input_files data/processed/chunks.json \
45 | --output_dir data/final \
46 | --train_ratio 0.85
47 | ```
48 | - Creates `train.jsonl` and `valid.jsonl` files
49 | - Each sample contains a prompt/continuation pair
50 | - Default 85/15 train/validation split
51 |
52 | ## ⚙️ Configuration
53 |
54 | Edit `lora_config.yaml` with your settings:
55 | ```yaml
56 | model_name: "bert-base-uncased"
57 | lora_rank: 8
58 | target_modules: ["query", "value"]
59 | learning_rate: 3e-4
60 | batch_size: 32
61 | num_epochs: 10
62 | ```
63 |
64 | ## 🏋️ Training
65 |
66 | Start finetuning:
67 | ```bash
68 | sh src/finetuning/finetune_qwen3.sh \
69 | --tune-type dora \
70 | --config src/finetuning/lora_config.yaml
71 | ```
72 |
73 | Key parameters (edit in script):
74 | - `MODEL_PATH`: Path to MLX model directory
75 | - `DATA_PATH`: Directory containing `train.jsonl` and `valid.jsonl`
76 | - `ADAPTER_PATH`: Where to save adapters
77 | - `ITERS`: Number of training iterations (default: 5600)
78 | - `BATCH_SIZE`: Batch size (default: 1)
79 |
80 | ## 📊 Evaluation
81 |
82 | Run evaluations:
83 | ```bash
84 | python run_evaluations.py \
85 | --model-path mlx_models/Qwen3-14B-mlx \
86 | --adapter-path ADAPTERS/qwen3_14b_dora_sacredhunger_multi \
87 | --valid-jsonl-path data/final/valid.jsonl \
88 | --output-dir eval_outputs \
89 | --num-examples 50
90 | ```
91 |
92 | Evaluation parameters:
93 | - `--temp`: Sampling temperature (default: 0.75)
94 | - `--top-p`: Top-p sampling (default: 0.95)
95 | - `--repetition-penalty`: Penalty for repeated tokens (default: 1.1)
96 |
97 | ## 📌 Tips
98 |
99 | - Monitor training with `tensorboard --logdir outputs/logs`
100 | - For large datasets, consider using `--num_workers` in data preparation
101 | - Adjust batch size based on your GPU memory
102 |
103 | 💡 For questions or issues, please open an issue in this repository!
--------------------------------------------------------------------------------
/src/finetuning/download_qwen3.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import subprocess
3 | import sys
4 | from pathlib import Path
5 |
6 | # Add mlx-lm to the Python path
7 | # Assumes the script is run from the project root or src/finetuning
8 | project_root = Path(__file__).resolve().parents[2]
9 | mlx_lm_path = project_root / "mlx-lm"
10 | sys.path.insert(0, str(mlx_lm_path))
11 |
12 | def download_and_convert(hf_repo_id: str, output_dir_base: Path):
13 | """
14 | Downloads a model from Hugging Face and converts it to MLX format.
15 |
16 | Args:
17 | hf_repo_id: The Hugging Face repository ID (e.g., 'Qwen/Qwen3-14B').
18 | output_dir_base: The base directory to save the converted MLX model within.
19 | A model-specific subdirectory will be created here.
20 | """
21 | # Ensure the base directory exists
22 | output_dir_base.mkdir(parents=True, exist_ok=True)
23 |
24 | # Construct the final model-specific path
25 | model_name = hf_repo_id.split("/")[-1] + "-mlx"
26 | final_model_path = output_dir_base / model_name
27 |
28 | print(f"Converting {hf_repo_id} to MLX format...")
29 | print(f"Final output directory: {final_model_path}")
30 |
31 | # Check if the final path *already* exists before attempting conversion
32 | if final_model_path.exists():
33 | print(f"Error: Target directory {final_model_path} already exists.")
34 | print("Please remove it or specify a different base directory if conversion is needed again.")
35 | sys.exit(1)
36 |
37 | try:
38 | # Use our custom conversion script
39 | command = [
40 | sys.executable, # Use the current Python interpreter
41 | str(project_root / "src" / "finetuning" / "convert_qwen3_custom.py"), # Path to custom script
42 | "--hf-path",
43 | hf_repo_id,
44 | "--mlx-path",
45 | str(final_model_path),
46 | # Potentially add "--dtype" if needed, default is float16
47 | # "--dtype", "bfloat16"
48 | ]
49 |
50 | # Note: Ensure transformers and huggingface_hub are installed:
51 | # pip install transformers huggingface_hub sentencepiece tiktoken
52 |
53 | print(f"Running command: {' '.join(command)}")
54 | subprocess.run(command, check=True, capture_output=True, text=True)
55 | print(f"Successfully converted {hf_repo_id} and saved to {final_model_path}")
56 |
57 | except subprocess.CalledProcessError as e:
58 | print(f"Error during conversion:")
59 | print(f"Command: {' '.join(e.cmd)}")
60 | print(f"Return code: {e.returncode}")
61 | print(f"Stdout: {e.stdout}")
62 | print(f"Stderr: {e.stderr}")
63 | sys.exit(1)
64 | except Exception as e:
65 | print(f"An unexpected error occurred: {e}")
66 | sys.exit(1)
67 |
68 |
69 | if __name__ == "__main__":
70 | parser = argparse.ArgumentParser(
71 | description="Download and convert a Hugging Face model to MLX format."
72 | )
73 | parser.add_argument(
74 | "--hf-repo-id",
75 | type=str,
76 | default="Qwen/Qwen3-14B",
77 | help="Hugging Face repository ID of the model to download and convert.",
78 | )
79 | parser.add_argument(
80 | "--output-dir",
81 | type=Path,
82 | default=Path("mlx_models"), # Base directory
83 | help="Base directory to save the converted MLX model (model-specific subfolder will be created).",
84 | )
85 | args = parser.parse_args()
86 |
87 | # Ensure output base dir is relative to project root if not absolute
88 | if not args.output_dir.is_absolute():
89 | args.output_dir = project_root / args.output_dir
90 |
91 | download_and_convert(args.hf_repo_id, args.output_dir)
--------------------------------------------------------------------------------
/src/data_processing/bm25_func.py:
--------------------------------------------------------------------------------
1 | import re
2 | try:
3 | from rank_bm25 import BM25Okapi
4 | _HAS_BM25 = True
5 | except ImportError:
6 | _HAS_BM25 = False
7 |
8 |
9 | # --- utilities --------------------------------------------------------------
10 | def _tokenise(text: str) -> list[str]:
11 | """Very light tokeniser → lowercase words A‑Z."""
12 | return re.findall(r"[A-Za-z']+", text.lower())
13 |
14 | # ---------------------------------------------------------------------------
15 |
16 |
17 | def bm25_gap_violation(boundary_chunks: tuple[str, str],
18 | entity: str,
19 | max_gap: int = 4,
20 | bm25_thresh: float = 0.0) -> bool:
21 | """
22 | Return ``True`` when *entity* (e.g. “Erasmus”) vanishes for more than
23 | ``max_gap`` paragraph units **across** the boundary formed by the two
24 | chunks supplied.
25 |
26 | Parameters
27 | ----------
28 | boundary_chunks : (prev_chunk, next_chunk)
29 | Tuple of the text immediately before and after the boundary.
30 | entity : str
31 | Name / term whose continuity we want to keep.
32 | max_gap : int, default 4
33 | Maximum allowed paragraph distance without seeing the entity.
34 | bm25_thresh : float, default 0.0
35 | Minimum BM25 score regarded as a “hit”. Leave at 0 to treat mere
36 | lexical presence as sufficient.
37 |
38 | Notes
39 | -----
40 | * If the third‑party package ``rank_bm25`` is present we build a very
41 | small per‑boundary BM25 index so that inflected or approximate
42 | mentions (“Mr Kemp”, “Kemp’s”) still register continuity.
43 | * If the package is missing we fall back to a fast
44 | case‑insensitive regex exact match.
45 | """
46 |
47 | prev_chunk, next_chunk = boundary_chunks
48 | prev_paras = re.split(r"\n{2,}", prev_chunk)
49 | next_paras = re.split(r"\n{2,}", next_chunk)
50 |
51 | # --------------------- helper to detect "entity present" ---------------
52 | entity_tokens = _tokenise(entity)
53 | if not entity_tokens:
54 | return False # nothing to look for
55 |
56 | if _HAS_BM25:
57 | # Build tiny BM25 index over paragraphs that straddle the boundary
58 | corpus_paras = prev_paras + next_paras
59 | corpus_tok = [_tokenise(p) for p in corpus_paras]
60 | bm25 = BM25Okapi(corpus_tok)
61 | scores = bm25.get_scores(entity_tokens)
62 | # Treat paragraph as containing the entity if BM25 > threshold
63 | contains = [s > bm25_thresh for s in scores]
64 | else:
65 | # Cheap lexical fallback
66 | pat = re.compile(rf"\b{re.escape(entity)}\b", flags=re.I)
67 | contains = [bool(pat.search(p)) for p in prev_paras + next_paras]
68 |
69 | # --------------------- measure the paragraph gap -----------------------
70 | # Index of *last* hit in the previous chunk
71 | last_prev_idx = None
72 | for i in reversed(range(len(prev_paras))):
73 | if contains[i]:
74 | last_prev_idx = len(prev_paras) - 1 - i # distance back from end
75 | break
76 |
77 | # Index of *first* hit in the next chunk
78 | offset = len(prev_paras) # shift into global index
79 | first_next_idx = None
80 | for j in range(len(next_paras)):
81 | if contains[offset + j]:
82 | first_next_idx = j
83 | break
84 |
85 | # Compute paragraphs without the entity spanning the join
86 | if last_prev_idx is None:
87 | gap_left = len(prev_paras) # no mention in prev ⇒ full length
88 | else:
89 | gap_left = last_prev_idx
90 |
91 | if first_next_idx is None:
92 | gap_right = len(next_paras) # no mention in next ⇒ full length
93 | else:
94 | gap_right = first_next_idx
95 |
96 | total_gap = gap_left + gap_right + 1 # +1 for the boundary itself
97 |
98 | return total_gap > max_gap
99 |
--------------------------------------------------------------------------------
/CONVERT/results.md:
--------------------------------------------------------------------------------
1 | All results are from the mmlu_pro_computer_science task.
2 | mlx_lm.evaluate --model --tasks mmlu_pro_computer_science --max-tokens 5000 --no-apply-chat-template
3 |
4 | ============================================================
5 |
6 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8
7 | {
8 | "alias": "computer_science",
9 | "exact_match,custom-extract": 0.7926829268292683,
10 | "exact_match_stderr,custom-extract": 0.020044980247224453
11 | }
12 |
13 |
14 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7Results:
15 | {
16 | "alias": "computer_science",
17 | "exact_match,custom-extract": 0.7926829268292683,
18 | "exact_match_stderr,custom-extract": 0.020044980247224457
19 | }
20 |
21 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8
22 | {
23 | "alias": "computer_science",
24 | "exact_match,custom-extract": 0.7878048780487805,
25 | "exact_match_stderr,custom-extract": 0.02021693788475414
26 | }
27 |
28 | Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr1e-6
29 | --data-path voxmenthe/merged-sft-coding-mix2
30 | {
31 | "alias": "computer_science",
32 | "exact_match,custom-extract": 0.6219512195121951,
33 | "exact_match_stderr,custom-extract": 0.023976756269796867
34 | }
35 |
36 | Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr2e7
37 | --data-path voxmenthe/merged-sft-coding-mix2
38 | Results:
39 | {
40 | "alias": "computer_science",
41 | "exact_match,custom-extract": 0.7292682926829268,
42 | "exact_match_stderr,custom-extract": 0.02197108846947813
43 | }
44 |
45 | Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr4e7
46 | --data-path voxmenthe/merged-sft-coding-mix2
47 | Results:
48 | {
49 | "alias": "computer_science",
50 | "exact_match,custom-extract": 0.697560975609756,
51 | "exact_match_stderr,custom-extract": 0.022711632302604486
52 | }
53 | Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr5e-8
54 | Results:
55 | {
56 | "alias": "computer_science",
57 | "exact_match,custom-extract": 0.7048780487804878,
58 | "exact_match_stderr,custom-extract": 0.022552572925167262
59 | }
60 | Qwen3-Coder-30B-A3B-Instruct-8bit-DWQ-lr9e8
61 | {
62 | "alias": "computer_science",
63 | "exact_match,custom-extract": 0.7292682926829268,
64 | "exact_match_stderr,custom-extract": 0.02197108846947813
65 | }
66 | mlx-community/GLM-4.5-Air-5bit
67 | {
68 | "alias": "computer_science",: 100%|
69 | "exact_match,custom-extract": 0.7634146341463415,
70 | "exact_match_stderr,custom-extract": 0.021014183737081388
71 | }
72 | mlx-community/Qwen3-Coder-30B-A3B-Instruct-bf16
73 | {
74 | "alias": "computer_science",
75 | "exact_match,custom-extract": 0.7268292682926829,
76 | "exact_match_stderr,custom-extract": 0.02203289844309934
77 | }
78 | Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr9e-8
79 | {
80 | "alias": "computer_science",
81 | "exact_match,custom-extract": 0.7170731707317073,
82 | "exact_match_stderr,custom-extract": 0.022271893903859002
83 | }
84 | Qwen3-Coder-30B-A3B-Instruct-6bit-DWQ-lr3e-7
85 | {
86 | "alias": "computer_science",
87 | "exact_match,custom-extract": 0.7414634146341463,
88 | "exact_match_stderr,custom-extract": 0.02164931770175753
89 | }
90 | mlx-community/Qwen3-30B-A3B-Thinking-2507-bf16
91 | {
92 | "alias": "computer_science",
93 | "exact_match,custom-extract": 0.7829268292682927,
94 | "exact_match_stderr,custom-extract": 0.020384591313839226
95 | }
96 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8
97 | {
98 | "alias": "computer_science",
99 | "exact_match,custom-extract": 0.7926829268292683,
100 | "exact_match_stderr,custom-extract": 0.020044980247224453
101 | }
102 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7Results:
103 | {
104 | "alias": "computer_science",
105 | "exact_match,custom-extract": 0.7926829268292683,
106 | "exact_match_stderr,custom-extract": 0.020044980247224457
107 | }
108 |
109 | Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8
110 | {
111 | "alias": "computer_science",
112 | "exact_match,custom-extract": 0.7878048780487805,
113 | "exact_match_stderr,custom-extract": 0.02021693788475414
114 | }
--------------------------------------------------------------------------------
/src/data_processing/README.md:
--------------------------------------------------------------------------------
1 | ## 1 · Baseline, purely programmatic splitter
2 |
3 | **Core idea**
4 |
5 | 1. Read the text.
6 | 2. Isolate **paragraph units** (book already uses double line‑breaks as separators).
7 | 3. Greedily concatenate whole paragraphs until the running word‑count would exceed `target_size`; then start a new chunk.
8 | 4. Optionally create a *fixed paragraph overlap* for a little context bleed.
9 |
10 | *Configurable knobs*
11 |
12 | | parameter | purpose | sensible range |
13 | | ----------- | ------------------------------------ | -------------- |
14 | | `--size` | target words per chunk | 300‑2 000 |
15 | | `--overlap` | paragraphs repeated at each boundary | 0‑2 |
16 |
17 | ---
18 |
19 | ## 2 · Semantic‑aware splitter (embedding + BM25 refinement)
20 |
21 | ### Rationale
22 |
23 | Even with paragraph‑respecting boundaries you can accidentally:
24 |
25 | * cut **mid‑scene** (a named character vanishes between chunks),
26 | * split **dialogue exchanges**, harming retrieval‑QA accuracy.
27 |
28 | We therefore **post‑process** the baseline boundaries:
29 |
30 | 1. **Initial pass** – call the greedy algorithm above with `size ≈ target × 0.9` (gives headroom for later shuffling).
31 |
32 | 2. **Compute embeddings** for each paragraph (or sentence) using a suitable model (e.g., `lightonai/modernbert-embed-large`).
33 | *Note: This model requires specific prefixes. The script currently uses `search_document:` for the text segments being compared.*
34 |
35 | 3. For every tentative boundary `B` between chunk *i* and *i+1*:
36 |
37 | * Take the last `tail_len` sentences of chunk *i* (`S_tail`) and the first `head_len` sentences of chunk *i+1* (`S_head`).
38 | * `sim = cosine(get_emb(S_tail), get_emb(S_head))`.
39 | * If `sim < thresh_low`, **shift** `B` *forward* until similarity rises or the size budget is hit.
40 | * If `sim > thresh_high`, optionally create a *sentence‑level overlap* so the teaser sentence appears in both chunks.
41 |
42 | 4. **BM25 character check** – build a BM25 index over paragraphs. For each main character name (Erasmus, Paris, Thurso, etc.) ensure that it doesn't disappear for > `gap` paragraphs. If a gap occurs across a boundary, shift the boundary backward by one paragraph.
43 |
44 |
45 | ### Configurable aspects
46 |
47 | * `--target` – desired words / chunk
48 | * `tail_len`, `head_len` – size of *join windows*
49 | * `thresh_low`, `thresh_high` – similarity action thresholds
50 | * `char_names` + `max_bm25_gap` – continuity heuristics
51 | * `max_size` – hard cap after refinement
52 |
53 | ---
54 |
55 | **Measuring chunk boundaries**
56 |
57 | Measure the distance between chunks by counting paragraphs from the last occurrence of an entity in one chunk to the first in the next. I'll look for the entity in both chunks, calculate the gap, and check if it exceeds a defined max number of paragraphs. If it does, I'll return "True." This will be accomplished using a case-insensitive regex for the entity's location. My approach seems clear, just ensuring I handle both chunks with careful indexing.
58 |
59 | ```python
60 | # --- utilities --------------------------------------------------------------
61 | def _tokenise(text: str) -> list[str]:
62 | """Very light tokeniser → lowercase words A‑Z."""
63 | return re.findall(r"[A-Za-z']+", text.lower())
64 |
65 | try:
66 | from rank_bm25 import BM25Okapi
67 | _HAS_BM25 = True
68 | except ImportError:
69 | _HAS_BM25 = False
70 | # ---------------------------------------------------------------------------
71 |
72 | ## `bm25_gap_violation`
73 |
74 | ### How it works
75 |
76 | 1. **Tokenisation** – a tiny regex picks out alphabetic tokens and lower‑cases them, enough for BM25.
77 | 2. **Dual mode**
78 |
79 | * If `rank_bm25` is available, we calculate paragraph‑level BM25 scores for the entity; any paragraph scoring above `bm25_thresh` (0 → "contains at least one query term") counts as a hit.
80 | * Without the library, we revert to a fast exact word‑boundary regex.
81 | 3. **Gap detection** – walk backward from the end of the *previous* chunk and forward from the start of the *next* chunk to locate the two nearest mentions. The sum of paragraphs between those two mentions (inclusive of the join) is the **gap**. If it exceeds `max_gap`, the function flags a violation so your boundary‑refinement logic can shift or duplicate paragraphs.
82 |
83 | You can now import the function directly in `semantic_chunker.py`, run the script, and the continuity‑checking step will operate deterministically—optionally strengthened by BM25 when the library is installed.
84 |
85 |
--------------------------------------------------------------------------------
/mlx-quantization/CLAUDE.md:
--------------------------------------------------------------------------------
1 | # CLAUDE.md
2 |
3 | This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4 |
5 | ## Project Overview
6 |
7 | This repository contains Jupyter notebooks for converting and quantizing large language models using Apple's MLX framework. The project focuses on optimizing models for Apple Silicon devices through various quantization techniques.
8 |
9 | ## Repository Structure
10 |
11 | The repository is organized around Jupyter notebooks that handle different aspects of MLX model conversion and quantization:
12 |
13 | - **deepseek_r1_mlx_conversion.ipynb**: Converts DeepSeek-R1 AWQ models to MLX format
14 | - **universal_mlx_converter.ipynb**: Universal converter for any Hugging Face model to MLX format
15 | - **dwq_quantization.ipynb**: Distilled Weight Quantization implementation
16 | - **awq_quantization.ipynb**: Activation-aware Weight Quantization implementation
17 | - **dynamic_quantization.ipynb**: Dynamic quantization with mixed precision
18 | - **models/**: Directory for storing downloaded and converted models
19 |
20 | ## Core Dependencies
21 |
22 | All notebooks require these essential packages:
23 | - `mlx-lm`: Apple's MLX language model framework
24 | - `transformers`: Hugging Face transformers library
25 | - `torch`: PyTorch (for compatibility)
26 | - `huggingface_hub`: For model download/upload
27 | - `datasets`, `accelerate`, `sentencepiece`, `protobuf`: Supporting libraries
28 |
29 | ## Common Workflow Pattern
30 |
31 | Each notebook follows a consistent structure:
32 | 1. Environment setup and dependency installation
33 | 2. MLX import testing and verification
34 | 3. Model configuration and directory setup
35 | 4. Original model download from Hugging Face
36 | 5. Model conversion/quantization using MLX tools
37 | 6. Converted model testing and validation
38 | 7. Optional performance comparison
39 | 8. Optional Hugging Face upload
40 | 9. Cleanup and summary
41 |
42 | ## MLX Conversion Commands
43 |
44 | The project uses MLX command-line tools for conversions:
45 |
46 | ### Basic Conversion
47 | ```bash
48 | python -m mlx_lm.convert --hf-path --mlx-path
49 | ```
50 |
51 | ### DWQ Quantization
52 | ```bash
53 | python -m mlx_lm.dwq --model --mlx-path --bits 4 --num-samples 1024
54 | ```
55 |
56 | ### AWQ Quantization
57 | ```bash
58 | python -m mlx_lm.awq --model --mlx-path --bits 4 --num-samples 32
59 | ```
60 |
61 | ### Dynamic Quantization
62 | ```bash
63 | python -m mlx_lm.dynamic_quant --model --mlx-path --target-bpw 4.0
64 | ```
65 |
66 | ## Environment Setup Requirements
67 |
68 | **Critical**: This project requires macOS with Apple Silicon (M1/M2/M3/M4). The notebooks include specific handling for:
69 | - numpy/gfortran library conflicts in JupyterLab Desktop
70 | - MLX framework import verification
71 | - Automatic package installation with error handling
72 | - Kernel restart recommendations for import issues
73 |
74 | ## Model Storage Architecture
75 |
76 | The project uses a standardized directory structure:
77 | - `models/`: Root directory for all model storage
78 | - `models//`: Original downloaded models
79 | - `models/__/`: Quantized outputs
80 | - `sensitivities/`: Layer sensitivity analysis files (for dynamic quantization)
81 |
82 | ## Error Handling Patterns
83 |
84 | All notebooks implement robust error handling:
85 | - Multiple conversion method attempts with fallbacks
86 | - Comprehensive import testing before execution
87 | - File existence checks and cleanup procedures
88 | - Detailed error reporting with troubleshooting guidance
89 |
90 | ## Hugging Face Integration
91 |
92 | The notebooks include full Hugging Face workflow:
93 | - Secure token-based authentication
94 | - Model download with resume capability
95 | - Automatic model card generation
96 | - Repository creation and file upload
97 | - Upload verification and file listing
98 |
99 | ## Performance Testing
100 |
101 | Each quantization method includes:
102 | - Model loading and inference testing
103 | - Multi-prompt validation
104 | - Performance timing comparisons
105 | - Size reduction calculations
106 | - Quality evaluation options using standard datasets
107 |
108 | ## Important Notes
109 |
110 | - AWQ models require dequantization before MLX conversion (`--dequantize` flag)
111 | - Directory paths must be absolute, not relative
112 | - Large models require significant disk space (50GB+ for full-size models)
113 | - Model conversion can be time-intensive depending on model size
114 | - Always test converted models before deployment or upload
--------------------------------------------------------------------------------
/CONVERT/convert_many.sh:
--------------------------------------------------------------------------------
1 | # mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-8bit-DWQ-lr8e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 8e-8 --group-size 32 --bits 5
2 | # touch GLM-4.5-Air-8bit-DWQ-lr8e-8/README.md
3 | # mlx_lm.upload --path ./GLM-4.5-Air-8bit-DWQ-lr8e-8 --upload-repo mlx-community/GLM-4.5-Air-8bit-DWQ-lr8e-8
4 |
5 | # mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-8bit-DWQ-lr7e-3 --max-seq-length 2048 --batch-size 4 --learning-rate 7e-3 --group-size 32 --bits 5
6 | # touch GLM-4.5-Air-8bit-DWQ-lr7e-3/README.md
7 | # mlx_lm.upload --path ./GLM-4.5-Air-8bit-DWQ-lr7e-3 --upload-repo mlx-community/GLM-4.5-Air-8bit-DWQ-lr7e-3
8 |
9 | # mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-4bit-DWQ-lr8e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 8e-8 --group-size 32 --bits 4
10 | # touch GLM-4.5-Air-4bit-DWQ-lr8e-8/README.md
11 | # mlx_lm.upload --path ./GLM-4.5-Air-4bit-DWQ-lr8e-8 --upload-repo mlx-community/GLM-4.5-Air-4bit-DWQ-lr8e-8
12 |
13 | # mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-4bit-DWQ-lr7e-3 --max-seq-length 2048 --batch-size 4 --learning-rate 7e-3 --group-size 32 --bits 4
14 | # touch GLM-4.5-Air-4bit-DWQ-lr7e-3/README.md
15 | # mlx_lm.upload --path ./GLM-4.5-Air-4bit-DWQ-lr7e-3 --upload-repo mlx-community/GLM-4.5-Air-4bit-DWQ-lr7e-3
16 |
17 | #========================================
18 |
19 | # mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 3e-7 --group-size 32 --bits 6
20 | # touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7/README.md
21 | # mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr3e-7
22 |
23 | # mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 5e-8 --group-size 32 --bits 6
24 | # touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8/README.md
25 | # mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr5e-8
26 |
27 | # mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 8e-8 --group-size 32 --bits 6
28 | # touch Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8/README.md
29 | # mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-6bit-DWQ-lr8e-8
30 |
31 | #========================================
32 |
33 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr5e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 5e-7 --group-size 32 --bits 5
34 | touch Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr5e-7/README.md
35 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr5e-7 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr5e-7
36 |
37 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 3e-8 --group-size 32 --bits 5
38 | touch Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8/README.md
39 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr3e-8
40 |
41 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 2e-7 --group-size 32 --bits 5
42 | touch Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7/README.md
43 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr2e-7
44 |
45 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8 --max-seq-length 2048 --batch-size 4 --learning-rate 9e-8 --group-size 32 --bits 5
46 | touch Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8/README.md
47 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr9e-8
48 |
49 | mlx_lm.dwq --model Qwen/Qwen3-30B-A3B-Instruct-2507 --mlx-path Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 1e-7 --group-size 32 --bits 5
50 | touch Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7/README.md
51 | mlx_lm.upload --path ./Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7 --upload-repo mlx-community/Qwen3-30B-A3B-Instruct-2507-5bit-DWQ-lr1e-7
52 |
53 |
54 | mlx_lm.dwq --model zai-org/GLM-4.5-Air --mlx-path mlx-community/GLM-4.5-Air-5bit-DWQ-lr2e-7 --max-seq-length 2048 --batch-size 4 --learning-rate 2e-7 --group-size 32 --bits 5
--------------------------------------------------------------------------------
/mlx-quantization/README.md:
--------------------------------------------------------------------------------
1 | # MLX Model Quantization Toolkit
2 |
3 | A comprehensive collection of Jupyter notebooks for converting and quantizing large language models using Apple's MLX framework, optimized for Apple Silicon devices.
4 |
5 | ## 🚀 Features
6 |
7 | - **Universal Model Conversion**: Convert any Hugging Face model to MLX format
8 | - **Multiple Quantization Methods**: Support for AWQ, DWQ, and Dynamic Quantization
9 | - **Apple Silicon Optimized**: Built specifically for M1/M2/M3/M4 devices
10 | - **Automated Workflows**: Complete pipeline from download to deployment
11 | - **Performance Testing**: Built-in benchmarking and validation tools
12 |
13 | ## 📋 Requirements
14 |
15 | - **Hardware**: macOS with Apple Silicon (M1/M2/M3/M4)
16 | - **Python**: 3.8 or higher
17 | - **Storage**: 50GB+ free space for large models
18 |
19 | ## 🛠 Installation
20 |
21 | 1. Clone the repository:
22 | ```bash
23 | git clone https://github.com/cs2764/mlx-quantization.git
24 | cd mlx-quantization
25 | ```
26 |
27 | 2. Install dependencies:
28 | ```bash
29 | pip install -r requirements.txt
30 | ```
31 |
32 | 3. Launch Jupyter:
33 | ```bash
34 | jupyter lab
35 | ```
36 |
37 | ## 📚 Notebooks Overview
38 |
39 | ### Core Notebooks
40 |
41 | | Notebook | Description | Use Case |
42 | |----------|-------------|----------|
43 | | `universal_mlx_converter.ipynb` | Universal converter for any HF model | General model conversion |
44 | | `awq_quantization.ipynb` | Activation-aware Weight Quantization | High-quality 4-bit quantization |
45 | | `dwq_quantization.ipynb` | Distilled Weight Quantization | Fast quantization with good quality |
46 | | `dynamic_quantization.ipynb` | Dynamic mixed-precision quantization | Optimal size/quality balance |
47 |
48 | ### Quantization Methods Comparison
49 |
50 | | Method | Speed | Quality | Size Reduction | Best For |
51 | |--------|-------|---------|----------------|----------|
52 | | **AWQ** | Medium | High | ~75% | Production deployment |
53 | | **DWQ** | Fast | Good | ~70% | Quick prototyping |
54 | | **Dynamic** | Slow | Highest | Variable | Research/experimentation |
55 |
56 | ## 🔄 Common Workflow
57 |
58 | Each notebook follows this standardized pattern:
59 |
60 | 1. **Environment Setup** - Dependency installation and MLX verification
61 | 2. **Model Configuration** - Set up directories and parameters
62 | 3. **Model Download** - Fetch original model from Hugging Face
63 | 4. **Conversion/Quantization** - Apply selected quantization method
64 | 5. **Validation** - Test converted model functionality
65 | 6. **Performance Analysis** - Compare speed and quality metrics
66 | 7. **Optional Upload** - Push to Hugging Face Hub
67 | 8. **Cleanup** - Remove temporary files
68 |
69 | ## 📁 Directory Structure
70 |
71 | ```
72 | mlx-quantization/
73 | ├── models/ # Model storage
74 | │ ├── / # Original models
75 | │ └── __/ # Quantized outputs
76 | ├── sensitivities/ # Layer analysis files
77 | ├── *.ipynb # Conversion notebooks
78 | ├── requirements.txt # Dependencies
79 | └── README.md # This file
80 | ```
81 |
82 | ## 🚀 Quick Start
83 |
84 | 1. **Choose your quantization method** based on your requirements
85 | 2. **Open the corresponding notebook** in Jupyter Lab
86 | 3. **Follow the step-by-step instructions** in each cell
87 | 4. **Monitor the conversion process** and review results
88 | 5. **Test the quantized model** before deployment
89 |
90 | ## 📊 Performance Benchmarks
91 |
92 | Typical results on Apple M2 Pro:
93 |
94 | - **Model Size Reduction**: 60-80% smaller than original
95 | - **Inference Speed**: 2-4x faster on Apple Silicon
96 | - **Quality Retention**: 95-99% of original performance
97 | - **Memory Usage**: 50-75% reduction
98 |
99 | ## 🔧 MLX Commands Reference
100 |
101 | ### Basic Conversion
102 | ```bash
103 | python -m mlx_lm.convert --hf-path --mlx-path
104 | ```
105 |
106 | ### AWQ Quantization
107 | ```bash
108 | python -m mlx_lm.awq --model --mlx-path