Exploring Model Kinship for Merging LLMs

├── .gitignore
├── LICENSE
├── README.md
├── metrics
    ├── __init__.py
    ├── calculate.py
    ├── calculate_split.py
    ├── run_command.py
    └── utility.py
├── pics
    ├── evolution.jpg
    └── logo.jpg
├── pyproject.toml
└── scripts
    └── Iterative_merge.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | /.idea/.gitignore
2 | /.idea/merge.iml
3 | /.idea/misc.xml
4 | /.idea/modules.xml
5 | /.idea/inspectionProfiles/profiles_settings.xml
6 | /.idea/inspectionProfiles/Project_Default.xml
7 | /.idea/vcs.xml
8 | /merge.egg-info/
9 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 ZJUNLP
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <div align="center">
  2 | <img src="pics/logo.jpg" width="400"/>
  3 | <h1 align="center"> Exploring Model Kinship for Merging LLMs </h1>
  4 | <b align="center">The degree of
  5 | similarity or relatedness between LLMs, analogous to biological evolution</b>
  6 | 
  7 | 
  8 |   <a href="https://arxiv.org/abs/2410.12613">📄arXiv</a> •
  9 |   <a href="https://potatobearp.github.io/publication/modelkinship/">📒 Blog</a>•
 10 |   <a href="https://huggingface.co/collections/zjunlp/model-kinship-670d59c69bf3598b0bed1cbd">🤗 HF</a> •
 11 |   <a href="https://notebooklm.google.com/notebook/720d2f64-fdd2-47e2-a086-7870b8db23e5/audio">🎧NotebookLM Audio</a>
 12 | 
 13 | 
 14 | 
 15 | [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
 16 | ![](https://img.shields.io/github/last-commit/zjunlp/ModelKinship?color=green)
 17 | <a href="https://colab.research.google.com/drive/141VAI89emgSIcwkswATEXSEENoAMywTO?usp=sharing">
 18 |         <img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg">
 19 | </a>
 20 | </div>
 21 | 
 22 | We introduce [Model Kinship](https://arxiv.org/pdf/2410.12613), the metric for degree of similarity or relatedness between LLMs for continual model merging, analogous to biological evolution. 
 23 | 
 24 | Currently, we support **Model Kinship** with 3 Similarity Metrics, others will be supported in the future. 
 25 | 
 26 | ---
 27 | 
 28 | ## Table of Contents
 29 | 
 30 | - [Overview](#overview)
 31 | - [Installation](#installation)
 32 | - [Usage](#usage)
 33 | - [Reproduction](#reproduction)
 34 | - [Supported Metrics](#supported-metrics)
 35 | - [Notebook](#notebook)
 36 | - [Acknowledgement](#acknowledgement)
 37 | - [Citation](#citation)
 38 | 
 39 | ## Overview
 40 | 
 41 | Model merging provides a novel paradigm to leverage information from multiple models without the need of additional training.  Recently, the development of a model merging toolkit has enabled non-experts to conduct merging experiments, spurring a trend of model merging on the Hugging Face Open LLM Leaderboard.
 42 | 
 43 | Currently, the model merging community has built powerful models through iterative merging steps. This process resembles artificial selection—a concept from biology in which humans deliberately select for or against specific traits in organisms.
 44 | 
 45 | ![](pics/evolution.jpg)
 46 | 
 47 | However, the reasons behind the success of this process remain unknown, resulting in numerous trial-and-error attempts for slight performance improvements.
 48 | Drawing inspiration from evolutionary biology, our project examines the weight changes that occur during post pre-training stages (e.g., fine-tuning, merging). We propose **Model Kinship**, a metric that evaluates the relatedness between two models by calculating the similarity of their weight changes, analogous to genetic variance in inheritance. In our paper we show that **Model Kinship** can be used for optimising the merging strategy.
 49 | 
 50 | This toolkit provides a simple way to calculate **Model Kinship** for model merging.
 51 | 
 52 | ---
 53 | 
 54 | ## Installation
 55 | 
 56 | ```bash
 57 | git clone https://github.com/zjunlp/ModelKinship.git
 58 | pip install -e ./ModelKinship
 59 | ```
 60 | 
 61 | ---
 62 | 
 63 | ## Usage
 64 | 
 65 | ```bash
 66 | # Input Format
 67 | merge_cal model-1 model-2 model-base metrics [options]
 68 | 
 69 | # Calculate Model Kinship based on Euclidean Distance (CPU)
 70 | merge_cal OpenPipe/mistral-ft-optimized-1218 \
 71 | mlabonne/NeuralHermes-2.5-Mistral-7B \
 72 | mistralai/Mistral-7B-v0.1 \
 73 | ed
 74 | 
 75 | # Multiple Calculation (CPU)
 76 | merge_cal OpenPipe/mistral-ft-optimized-1218 \
 77 | mlabonne/NeuralHermes-2.5-Mistral-7B \
 78 | mistralai/Mistral-7B-v0.1 \
 79 | cs,pcc,ed
 80 | 
 81 | ```
 82 | 
 83 | ### Optional Arguments
 84 | 
 85 | - `--low-precision`: Enable 8-bit quantization for parameter extraction. Reduces memory usage but may slightly affect accuracy.
 86 |   ```bash
 87 |   merge_cal model-1 model-2 model-base metrics --low-precision
 88 |   ```
 89 | 
 90 | - `--split-calculation`: Calculate metrics per layer/key instead of full vector to reduce RAM usage.
 91 |   ```bash
 92 |   merge_cal model-1 model-2 model-base metrics --split-calculation
 93 |   ```
 94 | 
 95 | Example with multiple options:
 96 | ```bash
 97 | merge_cal OpenPipe/mistral-ft-optimized-1218 \
 98 | mlabonne/NeuralHermes-2.5-Mistral-7B \
 99 | mistralai/Mistral-7B-v0.1 \
100 | pcc cs --low-precision --split-calculation
101 | ```
102 | 
103 | ---
104 | 
105 | ## Reproduction
106 | To reproduce our experiments, both an evaluation toolkit and a merging toolkit for large language models are required. We recommend using the following tools:
107 | 
108 | - [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness)
109 | - [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit)
110 | 
111 | Merged Models in Our Experiments are Open Access:
112 | - [Merged Models Repository](https://huggingface.co/PotatoB)
113 | 
114 | ---
115 | 
116 | ## Supported Metrics:
117 | - Cosine Similarity - cs
118 | - Pearson Correlation Coefficient - pcc
119 | - Euclidean Distance - ed
120 | 
121 | ---
122 | 
123 | ## Notebook:
124 | 
125 | To conduct iterative merging experiments, you can use following notebook for a quick start.
126 | 
127 | - [Notebook for Iterative Merging](https://colab.research.google.com/drive/141VAI89emgSIcwkswATEXSEENoAMywTO?usp=sharing)
128 | 
129 | This notebook includes 3 main functions:
130 | - Selection - calculate the model kinship to predict the potential benefit of providing merge.
131 | - Merging - merge the providing models.
132 | - Recycling - upload the merged model (evaluation is optional).
133 | ---
134 | 
135 | ## Acknowledgement
136 | 
137 | We would like to express our gratitude to the developers and contributors of the following external toolkits, which were instrumental in the success of our research on model merging and kinship analysis:
138 | 
139 | - [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness) for providing a comprehensive evaluation framework for large language models.
140 | - [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit) for offering an essential toolkit for model merging experiments.
141 | 
142 | These toolkits have significantly contributed to our ability to conduct and reproduce large-scale experiments, and their open-source availability has been invaluable to the broader research community.
143 | 
144 | ---
145 | 
146 | ## Citation:
147 | 
148 | ```bash
149 | @misc{hu2024exploringmodelkinshipmerging,
150 |       title={Exploring Model Kinship for Merging Large Language Models}, 
151 |       author={Yedi Hu and Yunzhi Yao and Ningyu Zhang and Shumin Deng and Huajun Chen},
152 |       year={2024},
153 |       eprint={2410.12613},
154 |       archivePrefix={arXiv},
155 |       primaryClass={cs.CL},
156 |       url={https://arxiv.org/abs/2410.12613}, 
157 | }
158 | ```


--------------------------------------------------------------------------------
/metrics/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/metrics/__init__.py


--------------------------------------------------------------------------------
/metrics/calculate.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | from typing import List
 3 | from metrics.utility import Metric
 4 | import torch
 5 | import numpy
 6 | 
 7 | logging.basicConfig(level=logging.INFO, force=True)
 8 | 
 9 | def cosine_similarity(a, b):
10 |     similarity = numpy.sqrt(numpy.dot(a, b) ** 2 / (numpy.dot(a, a) * numpy.dot(b, b)))
11 |     return similarity
12 | 
13 | 
14 | def calculate_model_kinship(
15 |         delta1: numpy.ndarray,
16 |         delta2: numpy.ndarray,
17 |         metrics: List[str]
18 | ) -> dict:
19 |     """
20 |     Calculate model kinship using specified metrics.
21 | 
22 |     Args:
23 |         delta1: Delta parameters for first model
24 |         delta2: Delta parameters for second model
25 |         metrics: List of metrics to calculate
26 | 
27 |     Returns:
28 |         dict: Dictionary of metric names and their calculated values
29 |     """
30 |     results = {}
31 |     for metric in metrics:
32 |         try:
33 |             if metric not in Metric.list():
34 |                 raise ValueError(f"Unsupported metric: {metric}")
35 |             results[metric] = calculate_metric(delta1, delta2, metric)
36 |         except Exception as e:
37 |             results[metric] = f"Error calculating {metric}: {str(e)}"
38 |     return results
39 | 
40 | 
41 | 
42 | def calculate_metric(d_vector_1: torch.Tensor, d_vector_2: torch.Tensor, metric: str) -> str:
43 |     """
44 |     Calculate the specified metric between two delta vectors.
45 | 
46 |     Args:
47 |         d_vector_1 (torch.Tensor): Delta parameters for model 1.
48 |         d_vector_2 (torch.Tensor): Delta parameters for model 2.
49 |         metric (str): The metric to calculate ('pcc', 'ed', 'cs').
50 | 
51 |     Returns:
52 |         str: A formatted string with the result of the chosen metric.
53 |     """
54 |     logging.info(f'Starting calculation of {metric.upper()} metric...')
55 | 
56 |     # Pearson Correlation Coefficient (PCC)
57 |     if metric == 'pcc':
58 |         # Stack the two vectors and calculate the Pearson correlation coefficient
59 |         stack = torch.stack((d_vector_1, d_vector_2), dim=0)
60 |         pcc = torch.corrcoef(stack)[0, 1].item()
61 |         return f"Model Kinship based on Pearson Correlation Coefficient: {pcc}"
62 | 
63 |     # Euclidean Distance (ED)
64 |     elif metric == 'ed':
65 |         # Compute the Euclidean distance between the vectors
66 |         distance = torch.dist(d_vector_1, d_vector_2).item()
67 |         return f"Model Kinship based on Euclidean Distance: {distance}"
68 | 
69 |     # Cosine Similarity (CS)
70 |     elif metric == 'cs':
71 |         # Compute cosine similarity
72 |         cs = cosine_similarity(d_vector_1, d_vector_2)
73 |         return f"Model Kinship based on Cosine Similarity: {cs}"
74 | 
75 |     # If metric is not recognized
76 |     else:
77 |         return "Invalid metric specified."
78 | 
79 | 


--------------------------------------------------------------------------------
/metrics/calculate_split.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | from typing import List,Dict
  3 | from metrics.utility import Metric, quantize_8bit, load_model_state_dict
  4 | import torch
  5 | import numpy
  6 | from tqdm import tqdm 
  7 | 
  8 | logging.basicConfig(level=logging.INFO, force=True)
  9 | 
 10 | def cosine_similarity(a, b):
 11 |     similarity = numpy.sqrt(numpy.dot(a, b) ** 2 / (numpy.dot(a, a) * numpy.dot(b, b)))
 12 |     return similarity
 13 | 
 14 | def calculate_model_kinship_split(
 15 |         model_1_name: str,
 16 |         model_2_name: str,
 17 |         model_base_name: str,
 18 |         low_precision: bool,
 19 |         metrics: List[str],
 20 |         device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
 21 | ) -> dict:
 22 | 
 23 |     # Extract state dictionaries from models
 24 |     state_dict_1 = load_model_state_dict(model_1_name, device)
 25 |     state_dict_2 = load_model_state_dict(model_2_name, device)
 26 |     state_dict_base = load_model_state_dict(model_base_name, device)
 27 |     results = {}
 28 | 
 29 |     # Validate metrics before processing
 30 |     valid_metrics = Metric.list()
 31 |     for metric in metrics:
 32 |         try:
 33 |             if metric not in valid_metrics:
 34 |                 raise ValueError(f"Unsupported metric: {metric}. Valid metrics are: {', '.join(valid_metrics)}")
 35 |             results[metric] = calculate_metrics_by_split(
 36 |                 state_dict_1, 
 37 |                 state_dict_2,
 38 |                 state_dict_base,
 39 |                 low_precision, 
 40 |                 metric
 41 |             )
 42 |         except Exception as e:
 43 |             logging.error(f"Error calculating {metric}: {str(e)}")
 44 |             results[metric] = f"Error calculating {metric}: {str(e)}"
 45 |     
 46 |     return results
 47 | 
 48 | 
 49 | def calculate_metrics_by_split(
 50 |     state_dict_1: dict,
 51 |     state_dict_2: dict,
 52 |     state_dict_base: dict,
 53 |     low_precision: bool,
 54 |     metric: str,
 55 | ) -> str:
 56 |     """
 57 |     Calculate metrics for each key and integrate results.
 58 |     
 59 |     Args:
 60 |         state_dict_1 (dict): State dictionary of first model
 61 |         state_dict_2 (dict): State dictionary of second model
 62 |         state_dict_base (dict): State dictionary of base model
 63 |         low_precision (bool): Whether to use 8-bit quantization
 64 |         metric (str): Metric to calculate ('pcc', 'ed', 'cs')
 65 |         
 66 |     Returns:
 67 |         str: Integrated metric result as formatted string
 68 |     """
 69 |     total_similarity = 0.0
 70 |     total_weight = 0.0
 71 |     split_results = {}
 72 | 
 73 |     # Determine the number of layers
 74 |     num_layers = state_dict_base['lm_head.weight'].shape[0]
 75 | 
 76 |     # Check architectures
 77 |     if state_dict_1['lm_head.weight'].shape[0] != state_dict_2['lm_head.weight'].shape[0]:
 78 |         shape_1 = state_dict_1['lm_head.weight'].shape
 79 |         shape_2 = state_dict_2['lm_head.weight'].shape
 80 |         logging.warning(f'Warning: Model architectures do not match. '
 81 |                       f'Using sub weight space instead.\n'
 82 |                       f'Vocab sizes in model 1: {shape_1[0]}, '
 83 |                       f'Vocab sizes in model 2: {shape_2[0]}')
 84 | 
 85 |     # Process each key
 86 |     for key, base_params in tqdm(state_dict_base.items(), desc=f"Processing {metric.upper()} by key"):
 87 |         try:
 88 |             if key not in state_dict_1 or key not in state_dict_2:
 89 |                 logging.warning(f'Key {key} not found in one of the models')
 90 |                 continue
 91 | 
 92 |             # Get parameters and calculate deltas
 93 |             params_1 = state_dict_1[key][:num_layers]
 94 |             params_2 = state_dict_2[key][:num_layers]
 95 |             
 96 |             delta_1 = (params_1 - base_params).view(-1)
 97 |             delta_2 = (params_2 - base_params).view(-1)
 98 | 
 99 |             if low_precision:
100 |                 delta_1 = quantize_8bit(delta_1)
101 |                 delta_2 = quantize_8bit(delta_2)
102 | 
103 |             # Calculate weight based on parameter count
104 |             weight = delta_1.numel()
105 | 
106 |             # Calculate metric for current key
107 |             if metric == 'pcc':
108 |                 stack = torch.stack((delta_1, delta_2), dim=0)
109 |                 split_similarity = torch.corrcoef(stack)[0, 1].item()
110 |             elif metric == 'ed':
111 |                 split_similarity = torch.dist(delta_1, delta_2).item()
112 |             elif metric == 'cs':
113 |                 split_similarity = cosine_similarity(delta_1, delta_2)
114 |             else:
115 |                 raise ValueError(f"Unsupported metric: {metric}")
116 |             
117 |             # Skip NaN values
118 |             if torch.isnan(torch.tensor(split_similarity)):
119 |                 logging.warning(f'Skipping key {key} due to NaN result')
120 |                 continue
121 | 
122 |             # Store valid result
123 |             split_results[key] = split_similarity
124 | 
125 |             # Update weighted average only for valid results
126 |             weight = delta_1.numel()
127 |             total_similarity += split_similarity * weight
128 |             total_weight += weight
129 | 
130 |             # Log progress for large layers
131 |             if weight > 1000000:
132 |                 logging.info(f'Layer {key}: {metric.upper()} = {split_similarity:.4f}, parameters = {weight}')
133 | 
134 |             # Free memory
135 |             del delta_1, delta_2
136 | 
137 |         except Exception as e:
138 |             logging.error(f'Error processing key {key}: {str(e)}')
139 |             continue
140 | 
141 |         
142 |     # Calculate final weighted average
143 |     if total_weight > 0:
144 |         final_result = total_similarity / total_weight
145 |         
146 |         # Log summary statistics
147 |         logging.info(f'\nSummary for {metric.upper()}:')
148 |         logging.info(f'Total parameters: {total_weight}')
149 |         
150 |         # Log detailed results for valid splits
151 |         logging.info(f'\nDetailed {metric.upper()} results by key:')
152 |         for key, value in split_results.items():
153 |             logging.info(f'{key}: {value:.4f}')
154 |         
155 |         metric_names = {
156 |             'pcc': 'Pearson Correlation Coefficient',
157 |             'ed': 'Euclidean Distance',
158 |             'cs': 'Cosine Similarity'
159 |         }
160 |         
161 |         return f"Model Kinship based on {metric_names[metric]} (weighted average): {final_result:.4f}"
162 |     else:
163 |         return f"Error: No valid parameters found for {metric.upper()} calculation"


--------------------------------------------------------------------------------
/metrics/run_command.py:
--------------------------------------------------------------------------------
 1 | import click
 2 | from metrics.calculate import calculate_model_kinship
 3 | from metrics.calculate_split import calculate_model_kinship_split
 4 | from metrics.utility import validate_models, extract_delta_parameters
 5 | 
 6 | @click.command("merge_cal")
 7 | @click.argument("model_1_name", type=str)
 8 | @click.argument("model_2_name", type=str)
 9 | @click.argument("model_base_name", type=str)
10 | @click.argument("metric", type=str)
11 | @click.option(
12 |     "--low-precision",
13 |     is_flag=True,
14 |     default=False,
15 |     help="Use low precision for parameter extraction"
16 | )
17 | @click.option(
18 |     "--split-calculation",
19 |     is_flag=True,
20 |     default=False,
21 |     help="Calculate similarity per split instead of full vector"
22 | )
23 | def main(
24 |     model_1_name: str,
25 |     model_2_name: str,
26 |     model_base_name: str,
27 |     metric: str,
28 |     low_precision: bool,
29 |     split_calculation: bool,
30 | ):
31 |     """
32 |         This function calculates the model kinship between model_1 and model_2
33 |         relative to a base model, model_base_name.
34 |         """
35 |     # Extract delta parameters between models for calculation
36 |     try:
37 |         # Validate input models
38 |         validate_models(model_1_name, model_2_name, model_base_name)
39 | 
40 |         # Parse metrics
41 |         metrics = metric.split()
42 |         if not metrics:
43 |             raise click.BadParameter("At least one metric must be specified")
44 | 
45 |         if split_calculation:
46 |         # Extract parameters
47 |             results = calculate_model_kinship_split(
48 |                 model_1_name,
49 |                 model_2_name,
50 |                 model_base_name,
51 |                 low_precision=low_precision,
52 |                 metrics=metrics
53 |             )
54 |         else:
55 |             d1, d2 = extract_delta_parameters(
56 |                 model_1_name,
57 |                 model_2_name,
58 |                 model_base_name,
59 |                 low_precision=low_precision
60 |             )
61 | 
62 |             results = calculate_model_kinship(d1, d2, metrics)
63 |         for metric_name, value in results.items():
64 |             click.echo(f"{metric_name}: {value}")
65 | 
66 |     except Exception as e:
67 |         click.echo(f"Error: {str(e)}", err=True)
68 |         raise click.Abort()
69 | 
70 | 
71 | if __name__ == '__main__':
72 |     main()
73 | 


--------------------------------------------------------------------------------
/metrics/utility.py:
--------------------------------------------------------------------------------
  1 | from transformers import AutoConfig, PretrainedConfig, AutoModelForCausalLM, AutoTokenizer
  2 | import torch
  3 | import logging
  4 | from tqdm import tqdm
  5 | import click
  6 | from typing import List
  7 | from enum import Enum
  8 | 
  9 | logging.basicConfig(level=logging.INFO, force=True)
 10 | 
 11 | class Metric(str, Enum):
 12 |     """Enumeration of supported metrics"""
 13 |     PCC = 'pcc'
 14 |     ED = 'ed'
 15 |     CS = 'cs'
 16 | 
 17 |     @classmethod
 18 |     def list(cls) -> List[str]:
 19 |         """Return list of supported metric values"""
 20 |         return [metric.value for metric in cls]
 21 |     
 22 | 
 23 | def get_config(model: str, trust_remote_code: bool = False) -> PretrainedConfig:
 24 |     """
 25 |     Fetch the configuration of a pretrained model from HuggingFace.
 26 | 
 27 |     Args:
 28 |         model (str): The name or path of the model to load configuration for.
 29 |         trust_remote_code (bool, optional): Whether to trust remote code during loading.
 30 |                                             Defaults to False.
 31 | 
 32 |     Returns:
 33 |         PretrainedConfig: The configuration object of the specified model.
 34 |     """
 35 |     # Fetch the configuration from HuggingFace's model hub.
 36 |     config = AutoConfig.from_pretrained(
 37 |         model,
 38 |         trust_remote_code=trust_remote_code,  # Whether to allow remote code execution.
 39 |     )
 40 |     return config
 41 | 
 42 | 
 43 | def validate_models(
 44 |         model_1: str,
 45 |         model_2: str,
 46 |         base_model: str
 47 | ) -> None:
 48 |     """
 49 |     Validate model names to ensure they are different and exist.
 50 | 
 51 |     Args:
 52 |         model_1: Name of the first model
 53 |         model_2: Name of the second model
 54 |         base_model: Name of the base model
 55 | 
 56 |     Raises:
 57 |         click.BadParameter: If validation fails
 58 |     """
 59 |     if model_1 == model_2 or model_1 == base_model or model_2 == base_model:
 60 |         raise click.BadParameter("All model names must be different")
 61 | 
 62 | 
 63 | def quantize_8bit(x: torch.Tensor) -> torch.Tensor:
 64 |     # Get absolute min and max values
 65 |     abs_max = torch.max(torch.abs(x))
 66 |     
 67 |     # Scale to [-127, 127] range for 8-bit signed integers
 68 |     # Using 127 instead of 128 to keep zero exactly representable
 69 |     scaled = 127 * (x / abs_max)
 70 |     
 71 |     # Round to nearest integer
 72 |     quantized = torch.round(scaled)
 73 |     
 74 |     # Clamp values to ensure they stay in valid range
 75 |     quantized = torch.clamp(quantized, -127, 127)
 76 |     
 77 |     return quantized
 78 | 
 79 | 
 80 | def load_model_state_dict(model_name: str, device: str) -> dict:
 81 |     """
 82 |     Load a model and return its state dictionary.
 83 |     
 84 |     Args:
 85 |         model_name (str): Name or path of the model to load
 86 |         device (str): Device to load the model on ('cuda' or 'cpu')
 87 |         
 88 |     Returns:
 89 |         dict: State dictionary of the loaded model
 90 |     """
 91 |     logging.info(f"Loading model: {model_name}")
 92 |     model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
 93 |     state_dict = model.state_dict()
 94 |     del model  # Free memory
 95 |     return state_dict
 96 | 
 97 | 
 98 | def extract_delta_parameters(
 99 |         model_1_name: str,
100 |         model_2_name: str,
101 |         model_base_name: str,
102 |         low_precision: bool,
103 |         device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
104 | ) -> tuple[torch.Tensor, torch.Tensor]:
105 | 
106 |     """
107 |     Extract the delta parameters (weight differences) between two models
108 |     relative to a base model.
109 | 
110 |     Args:
111 |         model_1_name (str): Name or path of the first model.
112 |         model_2_name (str): Name or path of the second model.
113 |         model_base_name (str): Name or path of the base model for comparison.
114 |         low_precision (bool): Whether to use low precision weights
115 | 
116 |     Returns:
117 |         (torch.Tensor, torch.Tensor): Delta parameters of model_1 and model_2 relative to base model.
118 |     """
119 | 
120 |     # Extract state dictionaries from models
121 |     state_dict_1 = load_model_state_dict(model_1_name, device)
122 |     state_dict_2 = load_model_state_dict(model_2_name, device)
123 |     state_dict_base = load_model_state_dict(model_base_name, device)
124 | 
125 |     # Determine the number of layers
126 |     num_layers = state_dict_base['lm_head.weight'].shape[0]
127 | 
128 |     # Check if model architectures match, log a warning if not
129 |     if state_dict_1['lm_head.weight'].shape[0] != state_dict_2['lm_head.weight'].shape[0]:
130 |         shape_1 = state_dict_1['lm_head.weight'].shape
131 |         shape_2 = state_dict_2['lm_head.weight'].shape
132 |         logging.warning(f'Warning: Model architectures do not match. '
133 |                         f'Using sub weight space instead.\n'
134 |                         f'Vocab sizes in model 1: {shape_1[0]}, '
135 |                         f'Vocab sizes in model 2: {shape_2[0]}')
136 | 
137 |     # Initialize lists to store delta parameters for both models
138 |     d_vector_1, d_vector_2 = [], []
139 | 
140 |     # Iterate over keys in the base model's state dictionary with tqdm
141 |     for key, base_params in tqdm(state_dict_base.items(), desc="Processing keys", unit="key"):
142 |         # Only proceed if key exists in both models
143 |         try:
144 |             if key not in state_dict_1 or key not in state_dict_2:
145 |                 logging.warning(f'Key {key} not found in one of the models')
146 |                 continue
147 |         except Exception as e:
148 |             logging.error(f'Error processing key {key}: {str(e)}')
149 | 
150 |         # Get the parameters for each model (truncate to num_layers for consistency)
151 |         params_1 = state_dict_1[key][:num_layers]
152 |         params_2 = state_dict_2[key][:num_layers]
153 | 
154 |         # Compute the deltas relative to the base model
155 |         delta_1 = (params_1 - base_params).view(-1)
156 |         delta_2 = (params_2 - base_params).view(-1)
157 | 
158 |         # Accumulate deltas
159 |         d_vector_1.append(delta_1)
160 |         d_vector_2.append(delta_2)
161 | 
162 |     # Clear memory
163 |     del state_dict_1, state_dict_2, state_dict_base
164 | 
165 |     logging.info('Concatenating delta vectors...')
166 | 
167 |     d_vector_1 = torch.cat(d_vector_1)
168 |     d_vector_2 = torch.cat(d_vector_2)
169 | 
170 |     if low_precision:
171 |         logging.info('Quantizing delta vectors to 8-bit precision...')
172 |         d_vector_1 = quantize_8bit(d_vector_1)
173 |         d_vector_2 = quantize_8bit(d_vector_2)
174 |         logging.info('Quantization complete')
175 | 
176 |     return d_vector_1, d_vector_2
177 | 


--------------------------------------------------------------------------------
/pics/evolution.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/pics/evolution.jpg


--------------------------------------------------------------------------------
/pics/logo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/pics/logo.jpg


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
 1 | [build-system]
 2 | requires = ["setuptools"]
 3 | build-backend = "setuptools.build_meta"
 4 | 
 5 | [project]
 6 | name = "merge"
 7 | description = "Tools for calculating assitant metrics"
 8 | readme = "README.md"
 9 | license = { text = "LGPL-3.0-or-later" }
10 | version = "0.0.0.1"
11 | authors = [{ name = "Yedi Hu", email = "yedihu.pub@gmail.com" }]
12 | dependencies = [
13 |     "transformers~=4.37.2",
14 |     "click~=8.1.7",
15 | ]
16 | 
17 | [project.urls]
18 | repository = "https://github.com/zjunlp/ModelKinship"
19 | 
20 | 
21 | [project.scripts]
22 | merge_cal = "metrics.run_command:main"
23 | 
24 | [tool.setuptools]
25 | packages = [
26 |     "metrics",
27 | ]
28 | 


--------------------------------------------------------------------------------
/scripts/Iterative_merge.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "markdown",
  5 |       "metadata": {
  6 |         "id": "o12O0YjJvvLW"
  7 |       },
  8 |       "source": [
  9 |         "# Iterative Merging with Model Kinship\n",
 10 |         "> Our Github [Merge Assistant Toolkit](https://github.com/zjunlp/ModelKinship).\n",
 11 |         "\n",
 12 |         "> Open in Colab [This Notebook](https://colab.research.google.com/drive/141VAI89emgSIcwkswATEXSEENoAMywTO?usp=sharing)\n",
 13 |         "\n",
 14 |         "This notebook demonstrates iterative merging using external tools to enhance large language model performance.\n",
 15 |         "It leverages the following:\n",
 16 |         "\n",
 17 |         "- [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness) for providing a comprehensive evaluation framework for large language models.\n",
 18 |         "- [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit) for offering an essential toolkit for model merging experiments.\n",
 19 |         "\n",
 20 |         "We express our gratitude to the contributors of these valuable tools."
 21 |       ]
 22 |     },
 23 |     {
 24 |       "cell_type": "code",
 25 |       "source": [
 26 |         "# @title ## Install requirements\n",
 27 |         "\n",
 28 |         "# @markdown ### Run this section to install the required dependencies and log in to Hugging Face Hub.\n",
 29 |         "\n",
 30 |         "!git clone https://github.com/EleutherAI/lm-evaluation-harness\n",
 31 |         "!pip install -e ./lm-evaluation-harness\n",
 32 |         "!git clone https://github.com/zjunlp/ModelKinship.git\n",
 33 |         "!pip install -e ./ModelKinship\n",
 34 |         "!git clone https://github.com/cg123/mergekit.git\n",
 35 |         "!pip install -e ./mergekit\n",
 36 |         "!pip install -qU huggingface_hub\n",
 37 |         "\n",
 38 |         "username = \"xxx\" # @param {\"type\":\"string\"}\n",
 39 |         "HF_Token = \"hf_xxxx\" # @param {\"type\":\"string\"}\n",
 40 |         "\n",
 41 |         "from huggingface_hub import login\n",
 42 |         "login(HF_Token)\n",
 43 |         "api = HfApi(token=HF_Token)"
 44 |       ],
 45 |       "metadata": {
 46 |         "cellView": "form",
 47 |         "id": "HK6EceEVeyS_"
 48 |       },
 49 |       "execution_count": null,
 50 |       "outputs": []
 51 |     },
 52 |     {
 53 |       "cell_type": "code",
 54 |       "source": [
 55 |         "# @title ## Select Models\n",
 56 |         "\n",
 57 |         "# @markdown ### Model 1\n",
 58 |         "# @markdown Select your first model in huggingface hub.\n",
 59 |         "\n",
 60 |         "Model_1 = \"openchat/openchat-3.5-1210\" # @param {\"type\":\"string\"}\n",
 61 |         "\n",
 62 |         "# @markdown ### Model 1\n",
 63 |         "# @markdown Select your second model in huggingface hub.\n",
 64 |         "\n",
 65 |         "Model_2 = \"meta-math/MetaMath-Mistral-7B\" # @param {\"type\":\"string\"}\n",
 66 |         "\n",
 67 |         "# @markdown ### Similarity metric for Model Kinship\n",
 68 |         "# @markdown Select similarity metric for model kinship. Details can be found in our paper.\n",
 69 |         "\n",
 70 |         "Similarity_metric = \"Pearson Correlation Coefficient\" # @param [\"Pearson Correlation Coefficient\",\"Cosine Similarity\",\"Euclidean Distance\"]\n",
 71 |         "# Save config as yaml file\n",
 72 |         "with open('config.yaml', 'w', encoding=\"utf-8\") as f:\n",
 73 |         "    f.write(yaml_config)\n",
 74 |         "\n",
 75 |         "cli = \"merge_cal\"\n",
 76 |         "cli += \" \" + Model_1\n",
 77 |         "cli += \" \" + Model_2\n",
 78 |         "\n",
 79 |         "\n",
 80 |         "# Similarity Metrics\n",
 81 |         "if Similarity_metric == \"Euclidean Distance\":\n",
 82 |         "    cli += \" ed\"\n",
 83 |         "elif Similarity_metric == \"Cosine Similarity\":\n",
 84 |         "    cli += \" cs\"\n",
 85 |         "elif Similarity_metric == \"Pearson Correlation Coefficient\"\n",
 86 |         "    cli += \" pcc\"\n",
 87 |         "\n",
 88 |         "print(cli)\n",
 89 |         "\n",
 90 |         "# Calculation\n",
 91 |         "!{cli}"
 92 |       ],
 93 |       "metadata": {
 94 |         "cellView": "form",
 95 |         "id": "yO9AVaNYeuiX"
 96 |       },
 97 |       "execution_count": null,
 98 |       "outputs": []
 99 |     },
100 |     {
101 |       "cell_type": "code",
102 |       "source": [
103 |         "# @title ## Merge Models\n",
104 |         "\n",
105 |         "# @markdown ### Name for the merged model\n",
106 |         "# @markdown Name your merged model.\n",
107 |         "Model_merge = \"Merge_1_1\" # @param {\"type\":\"string\"}\n",
108 |         "\n",
109 |         "# @markdown ### Model 1 (base model)\n",
110 |         "# @markdown Select your first model in huggingface hub.\n",
111 |         "\n",
112 |         "Model_1 = \"openchat/openchat-3.5-1210\" # @param {\"type\":\"string\"}\n",
113 |         "\n",
114 |         "# @markdown ### Model 2\n",
115 |         "# @markdown Select your second model in huggingface hub.\n",
116 |         "\n",
117 |         "Model_2 = \"meta-math/MetaMath-Mistral-7B\" # @param {\"type\":\"string\"}\n",
118 |         "\n",
119 |         "config = f\"\"\"\n",
120 |         "slices:\n",
121 |         "  - sources:\n",
122 |         "      - model: {Model_1}\n",
123 |         "        layer_range: [0, 32]\n",
124 |         "      - model: {Model_2}\n",
125 |         "        layer_range: [0, 32]\n",
126 |         "merge_method: slerp\n",
127 |         "base_model: {Model_1}\n",
128 |         "parameters:\n",
129 |         "  t:\n",
130 |         "    - filter: self_attn\n",
131 |         "      value: [0, 0.5, 0.3, 0.7, 1]\n",
132 |         "    - filter: mlp\n",
133 |         "      value: [1, 0.5, 0.7, 0.3, 0]\n",
134 |         "    - value: 0.5\n",
135 |         "dtype: bfloat16\n",
136 |         "\"\"\"\n",
137 |         "\n",
138 |         "with open('config.yaml', 'w', encoding=\"utf-8\") as f:\n",
139 |         "    f.write(config)\n",
140 |         "\n",
141 |         "!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle"
142 |       ],
143 |       "metadata": {
144 |         "cellView": "form",
145 |         "id": "b63EhHIFm7Im"
146 |       },
147 |       "execution_count": null,
148 |       "outputs": []
149 |     },
150 |     {
151 |       "cell_type": "code",
152 |       "source": [
153 |         "# @title ## Recycle Models\n",
154 |         "\n",
155 |         "# @markdown ### Run this section to upload merged model.\n",
156 |         "\n",
157 |         "from huggingface_hub import ModelCard, ModelCardData\n",
158 |         "from jinja2 import Template\n",
159 |         "\n",
160 |         "template_text = \"\"\"\n",
161 |         "---\n",
162 |         "license: apache-2.0\n",
163 |         "tags:\n",
164 |         "{%- for model in models %}\n",
165 |         "- {{ model }}\n",
166 |         "{%- endfor %}\n",
167 |         "---\n",
168 |         "\n",
169 |         "{%- for model in models %}\n",
170 |         "* [{{ model }}](https://huggingface.co/{{ model }})\n",
171 |         "{%- endfor %}\n",
172 |         "\"\"\"\n",
173 |         "\n",
174 |         "jinja_template = Template(template_text.strip())\n",
175 |         "\n",
176 |         "data = yaml.safe_load(yaml_config)\n",
177 |         "models = [data[\"slices\"][i][\"sources\"][0][\"model\"] for i in range(len(data[\"slices\"]))]\n",
178 |         "content = jinja_template.render(\n",
179 |         "    model_name=Model_name,\n",
180 |         "    models=models,\n",
181 |         "    username=username,\n",
182 |         ")\n",
183 |         "\n",
184 |         "# Save the model card\n",
185 |         "card = ModelCard(content)\n",
186 |         "card.save('merge/README.md')\n",
187 |         "\n",
188 |         "api.create_repo(\n",
189 |         "    repo_id=f\"{username}/{MODEL_NAME}\",\n",
190 |         "    repo_type=\"model\"\n",
191 |         ")\n",
192 |         "api.upload_folder(\n",
193 |         "    repo_id=f\"{username}/{MODEL_NAME}\",\n",
194 |         "    folder_path=\"merge\",\n",
195 |         ")\n"
196 |       ],
197 |       "metadata": {
198 |         "cellView": "form",
199 |         "id": "O_xv1W2ep3WT"
200 |       },
201 |       "execution_count": null,
202 |       "outputs": []
203 |     }
204 |   ],
205 |   "metadata": {
206 |     "accelerator": "TPU",
207 |     "colab": {
208 |       "gpuType": "V28",
209 |       "provenance": []
210 |     },
211 |     "kernelspec": {
212 |       "display_name": "Python 3",
213 |       "name": "python3"
214 |     },
215 |     "language_info": {
216 |       "name": "python"
217 |     }
218 |   },
219 |   "nbformat": 4,
220 |   "nbformat_minor": 0
221 | }


--------------------------------------------------------------------------------