├── .gitignore ├── LICENSE ├── README.md ├── metrics ├── __init__.py ├── calculate.py ├── calculate_split.py ├── run_command.py └── utility.py ├── pics ├── evolution.jpg └── logo.jpg ├── pyproject.toml └── scripts └── Iterative_merge.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | /.idea/.gitignore 2 | /.idea/merge.iml 3 | /.idea/misc.xml 4 | /.idea/modules.xml 5 | /.idea/inspectionProfiles/profiles_settings.xml 6 | /.idea/inspectionProfiles/Project_Default.xml 7 | /.idea/vcs.xml 8 | /merge.egg-info/ 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 ZJUNLP 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 |

Exploring Model Kinship for Merging LLMs

4 | The degree of 5 | similarity or relatedness between LLMs, analogous to biological evolution 6 | 7 | 8 | 📄arXiv • 9 | 📒 Blog• 10 | 🤗 HF • 11 | 🎧NotebookLM Audio 12 | 13 | 14 | 15 | [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) 16 | ![](https://img.shields.io/github/last-commit/zjunlp/ModelKinship?color=green) 17 | 18 | Open In Colab 19 | 20 |
21 | 22 | We introduce [Model Kinship](https://arxiv.org/pdf/2410.12613), the metric for degree of similarity or relatedness between LLMs for continual model merging, analogous to biological evolution. 23 | 24 | Currently, we support **Model Kinship** with 3 Similarity Metrics, others will be supported in the future. 25 | 26 | --- 27 | 28 | ## Table of Contents 29 | 30 | - [Overview](#overview) 31 | - [Installation](#installation) 32 | - [Usage](#usage) 33 | - [Reproduction](#reproduction) 34 | - [Supported Metrics](#supported-metrics) 35 | - [Notebook](#notebook) 36 | - [Acknowledgement](#acknowledgement) 37 | - [Citation](#citation) 38 | 39 | ## Overview 40 | 41 | Model merging provides a novel paradigm to leverage information from multiple models without the need of additional training. Recently, the development of a model merging toolkit has enabled non-experts to conduct merging experiments, spurring a trend of model merging on the Hugging Face Open LLM Leaderboard. 42 | 43 | Currently, the model merging community has built powerful models through iterative merging steps. This process resembles artificial selection—a concept from biology in which humans deliberately select for or against specific traits in organisms. 44 | 45 | ![](pics/evolution.jpg) 46 | 47 | However, the reasons behind the success of this process remain unknown, resulting in numerous trial-and-error attempts for slight performance improvements. 48 | Drawing inspiration from evolutionary biology, our project examines the weight changes that occur during post pre-training stages (e.g., fine-tuning, merging). We propose **Model Kinship**, a metric that evaluates the relatedness between two models by calculating the similarity of their weight changes, analogous to genetic variance in inheritance. In our paper we show that **Model Kinship** can be used for optimising the merging strategy. 49 | 50 | This toolkit provides a simple way to calculate **Model Kinship** for model merging. 51 | 52 | --- 53 | 54 | ## Installation 55 | 56 | ```bash 57 | git clone https://github.com/zjunlp/ModelKinship.git 58 | pip install -e ./ModelKinship 59 | ``` 60 | 61 | --- 62 | 63 | ## Usage 64 | 65 | ```bash 66 | # Input Format 67 | merge_cal model-1 model-2 model-base metrics [options] 68 | 69 | # Calculate Model Kinship based on Euclidean Distance (CPU) 70 | merge_cal OpenPipe/mistral-ft-optimized-1218 \ 71 | mlabonne/NeuralHermes-2.5-Mistral-7B \ 72 | mistralai/Mistral-7B-v0.1 \ 73 | ed 74 | 75 | # Multiple Calculation (CPU) 76 | merge_cal OpenPipe/mistral-ft-optimized-1218 \ 77 | mlabonne/NeuralHermes-2.5-Mistral-7B \ 78 | mistralai/Mistral-7B-v0.1 \ 79 | cs,pcc,ed 80 | 81 | ``` 82 | 83 | ### Optional Arguments 84 | 85 | - `--low-precision`: Enable 8-bit quantization for parameter extraction. Reduces memory usage but may slightly affect accuracy. 86 | ```bash 87 | merge_cal model-1 model-2 model-base metrics --low-precision 88 | ``` 89 | 90 | - `--split-calculation`: Calculate metrics per layer/key instead of full vector to reduce RAM usage. 91 | ```bash 92 | merge_cal model-1 model-2 model-base metrics --split-calculation 93 | ``` 94 | 95 | Example with multiple options: 96 | ```bash 97 | merge_cal OpenPipe/mistral-ft-optimized-1218 \ 98 | mlabonne/NeuralHermes-2.5-Mistral-7B \ 99 | mistralai/Mistral-7B-v0.1 \ 100 | pcc cs --low-precision --split-calculation 101 | ``` 102 | 103 | --- 104 | 105 | ## Reproduction 106 | To reproduce our experiments, both an evaluation toolkit and a merging toolkit for large language models are required. We recommend using the following tools: 107 | 108 | - [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness) 109 | - [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit) 110 | 111 | Merged Models in Our Experiments are Open Access: 112 | - [Merged Models Repository](https://huggingface.co/PotatoB) 113 | 114 | --- 115 | 116 | ## Supported Metrics: 117 | - Cosine Similarity - cs 118 | - Pearson Correlation Coefficient - pcc 119 | - Euclidean Distance - ed 120 | 121 | --- 122 | 123 | ## Notebook: 124 | 125 | To conduct iterative merging experiments, you can use following notebook for a quick start. 126 | 127 | - [Notebook for Iterative Merging](https://colab.research.google.com/drive/141VAI89emgSIcwkswATEXSEENoAMywTO?usp=sharing) 128 | 129 | This notebook includes 3 main functions: 130 | - Selection - calculate the model kinship to predict the potential benefit of providing merge. 131 | - Merging - merge the providing models. 132 | - Recycling - upload the merged model (evaluation is optional). 133 | --- 134 | 135 | ## Acknowledgement 136 | 137 | We would like to express our gratitude to the developers and contributors of the following external toolkits, which were instrumental in the success of our research on model merging and kinship analysis: 138 | 139 | - [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness) for providing a comprehensive evaluation framework for large language models. 140 | - [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit) for offering an essential toolkit for model merging experiments. 141 | 142 | These toolkits have significantly contributed to our ability to conduct and reproduce large-scale experiments, and their open-source availability has been invaluable to the broader research community. 143 | 144 | --- 145 | 146 | ## Citation: 147 | 148 | ```bash 149 | @misc{hu2024exploringmodelkinshipmerging, 150 | title={Exploring Model Kinship for Merging Large Language Models}, 151 | author={Yedi Hu and Yunzhi Yao and Ningyu Zhang and Shumin Deng and Huajun Chen}, 152 | year={2024}, 153 | eprint={2410.12613}, 154 | archivePrefix={arXiv}, 155 | primaryClass={cs.CL}, 156 | url={https://arxiv.org/abs/2410.12613}, 157 | } 158 | ``` -------------------------------------------------------------------------------- /metrics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/metrics/__init__.py -------------------------------------------------------------------------------- /metrics/calculate.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from typing import List 3 | from metrics.utility import Metric 4 | import torch 5 | import numpy 6 | 7 | logging.basicConfig(level=logging.INFO, force=True) 8 | 9 | def cosine_similarity(a, b): 10 | similarity = numpy.sqrt(numpy.dot(a, b) ** 2 / (numpy.dot(a, a) * numpy.dot(b, b))) 11 | return similarity 12 | 13 | 14 | def calculate_model_kinship( 15 | delta1: numpy.ndarray, 16 | delta2: numpy.ndarray, 17 | metrics: List[str] 18 | ) -> dict: 19 | """ 20 | Calculate model kinship using specified metrics. 21 | 22 | Args: 23 | delta1: Delta parameters for first model 24 | delta2: Delta parameters for second model 25 | metrics: List of metrics to calculate 26 | 27 | Returns: 28 | dict: Dictionary of metric names and their calculated values 29 | """ 30 | results = {} 31 | for metric in metrics: 32 | try: 33 | if metric not in Metric.list(): 34 | raise ValueError(f"Unsupported metric: {metric}") 35 | results[metric] = calculate_metric(delta1, delta2, metric) 36 | except Exception as e: 37 | results[metric] = f"Error calculating {metric}: {str(e)}" 38 | return results 39 | 40 | 41 | 42 | def calculate_metric(d_vector_1: torch.Tensor, d_vector_2: torch.Tensor, metric: str) -> str: 43 | """ 44 | Calculate the specified metric between two delta vectors. 45 | 46 | Args: 47 | d_vector_1 (torch.Tensor): Delta parameters for model 1. 48 | d_vector_2 (torch.Tensor): Delta parameters for model 2. 49 | metric (str): The metric to calculate ('pcc', 'ed', 'cs'). 50 | 51 | Returns: 52 | str: A formatted string with the result of the chosen metric. 53 | """ 54 | logging.info(f'Starting calculation of {metric.upper()} metric...') 55 | 56 | # Pearson Correlation Coefficient (PCC) 57 | if metric == 'pcc': 58 | # Stack the two vectors and calculate the Pearson correlation coefficient 59 | stack = torch.stack((d_vector_1, d_vector_2), dim=0) 60 | pcc = torch.corrcoef(stack)[0, 1].item() 61 | return f"Model Kinship based on Pearson Correlation Coefficient: {pcc}" 62 | 63 | # Euclidean Distance (ED) 64 | elif metric == 'ed': 65 | # Compute the Euclidean distance between the vectors 66 | distance = torch.dist(d_vector_1, d_vector_2).item() 67 | return f"Model Kinship based on Euclidean Distance: {distance}" 68 | 69 | # Cosine Similarity (CS) 70 | elif metric == 'cs': 71 | # Compute cosine similarity 72 | cs = cosine_similarity(d_vector_1, d_vector_2) 73 | return f"Model Kinship based on Cosine Similarity: {cs}" 74 | 75 | # If metric is not recognized 76 | else: 77 | return "Invalid metric specified." 78 | 79 | -------------------------------------------------------------------------------- /metrics/calculate_split.py: -------------------------------------------------------------------------------- 1 | import logging 2 | from typing import List,Dict 3 | from metrics.utility import Metric, quantize_8bit, load_model_state_dict 4 | import torch 5 | import numpy 6 | from tqdm import tqdm 7 | 8 | logging.basicConfig(level=logging.INFO, force=True) 9 | 10 | def cosine_similarity(a, b): 11 | similarity = numpy.sqrt(numpy.dot(a, b) ** 2 / (numpy.dot(a, a) * numpy.dot(b, b))) 12 | return similarity 13 | 14 | def calculate_model_kinship_split( 15 | model_1_name: str, 16 | model_2_name: str, 17 | model_base_name: str, 18 | low_precision: bool, 19 | metrics: List[str], 20 | device: str = 'cuda' if torch.cuda.is_available() else 'cpu' 21 | ) -> dict: 22 | 23 | # Extract state dictionaries from models 24 | state_dict_1 = load_model_state_dict(model_1_name, device) 25 | state_dict_2 = load_model_state_dict(model_2_name, device) 26 | state_dict_base = load_model_state_dict(model_base_name, device) 27 | results = {} 28 | 29 | # Validate metrics before processing 30 | valid_metrics = Metric.list() 31 | for metric in metrics: 32 | try: 33 | if metric not in valid_metrics: 34 | raise ValueError(f"Unsupported metric: {metric}. Valid metrics are: {', '.join(valid_metrics)}") 35 | results[metric] = calculate_metrics_by_split( 36 | state_dict_1, 37 | state_dict_2, 38 | state_dict_base, 39 | low_precision, 40 | metric 41 | ) 42 | except Exception as e: 43 | logging.error(f"Error calculating {metric}: {str(e)}") 44 | results[metric] = f"Error calculating {metric}: {str(e)}" 45 | 46 | return results 47 | 48 | 49 | def calculate_metrics_by_split( 50 | state_dict_1: dict, 51 | state_dict_2: dict, 52 | state_dict_base: dict, 53 | low_precision: bool, 54 | metric: str, 55 | ) -> str: 56 | """ 57 | Calculate metrics for each key and integrate results. 58 | 59 | Args: 60 | state_dict_1 (dict): State dictionary of first model 61 | state_dict_2 (dict): State dictionary of second model 62 | state_dict_base (dict): State dictionary of base model 63 | low_precision (bool): Whether to use 8-bit quantization 64 | metric (str): Metric to calculate ('pcc', 'ed', 'cs') 65 | 66 | Returns: 67 | str: Integrated metric result as formatted string 68 | """ 69 | total_similarity = 0.0 70 | total_weight = 0.0 71 | split_results = {} 72 | 73 | # Determine the number of layers 74 | num_layers = state_dict_base['lm_head.weight'].shape[0] 75 | 76 | # Check architectures 77 | if state_dict_1['lm_head.weight'].shape[0] != state_dict_2['lm_head.weight'].shape[0]: 78 | shape_1 = state_dict_1['lm_head.weight'].shape 79 | shape_2 = state_dict_2['lm_head.weight'].shape 80 | logging.warning(f'Warning: Model architectures do not match. ' 81 | f'Using sub weight space instead.\n' 82 | f'Vocab sizes in model 1: {shape_1[0]}, ' 83 | f'Vocab sizes in model 2: {shape_2[0]}') 84 | 85 | # Process each key 86 | for key, base_params in tqdm(state_dict_base.items(), desc=f"Processing {metric.upper()} by key"): 87 | try: 88 | if key not in state_dict_1 or key not in state_dict_2: 89 | logging.warning(f'Key {key} not found in one of the models') 90 | continue 91 | 92 | # Get parameters and calculate deltas 93 | params_1 = state_dict_1[key][:num_layers] 94 | params_2 = state_dict_2[key][:num_layers] 95 | 96 | delta_1 = (params_1 - base_params).view(-1) 97 | delta_2 = (params_2 - base_params).view(-1) 98 | 99 | if low_precision: 100 | delta_1 = quantize_8bit(delta_1) 101 | delta_2 = quantize_8bit(delta_2) 102 | 103 | # Calculate weight based on parameter count 104 | weight = delta_1.numel() 105 | 106 | # Calculate metric for current key 107 | if metric == 'pcc': 108 | stack = torch.stack((delta_1, delta_2), dim=0) 109 | split_similarity = torch.corrcoef(stack)[0, 1].item() 110 | elif metric == 'ed': 111 | split_similarity = torch.dist(delta_1, delta_2).item() 112 | elif metric == 'cs': 113 | split_similarity = cosine_similarity(delta_1, delta_2) 114 | else: 115 | raise ValueError(f"Unsupported metric: {metric}") 116 | 117 | # Skip NaN values 118 | if torch.isnan(torch.tensor(split_similarity)): 119 | logging.warning(f'Skipping key {key} due to NaN result') 120 | continue 121 | 122 | # Store valid result 123 | split_results[key] = split_similarity 124 | 125 | # Update weighted average only for valid results 126 | weight = delta_1.numel() 127 | total_similarity += split_similarity * weight 128 | total_weight += weight 129 | 130 | # Log progress for large layers 131 | if weight > 1000000: 132 | logging.info(f'Layer {key}: {metric.upper()} = {split_similarity:.4f}, parameters = {weight}') 133 | 134 | # Free memory 135 | del delta_1, delta_2 136 | 137 | except Exception as e: 138 | logging.error(f'Error processing key {key}: {str(e)}') 139 | continue 140 | 141 | 142 | # Calculate final weighted average 143 | if total_weight > 0: 144 | final_result = total_similarity / total_weight 145 | 146 | # Log summary statistics 147 | logging.info(f'\nSummary for {metric.upper()}:') 148 | logging.info(f'Total parameters: {total_weight}') 149 | 150 | # Log detailed results for valid splits 151 | logging.info(f'\nDetailed {metric.upper()} results by key:') 152 | for key, value in split_results.items(): 153 | logging.info(f'{key}: {value:.4f}') 154 | 155 | metric_names = { 156 | 'pcc': 'Pearson Correlation Coefficient', 157 | 'ed': 'Euclidean Distance', 158 | 'cs': 'Cosine Similarity' 159 | } 160 | 161 | return f"Model Kinship based on {metric_names[metric]} (weighted average): {final_result:.4f}" 162 | else: 163 | return f"Error: No valid parameters found for {metric.upper()} calculation" -------------------------------------------------------------------------------- /metrics/run_command.py: -------------------------------------------------------------------------------- 1 | import click 2 | from metrics.calculate import calculate_model_kinship 3 | from metrics.calculate_split import calculate_model_kinship_split 4 | from metrics.utility import validate_models, extract_delta_parameters 5 | 6 | @click.command("merge_cal") 7 | @click.argument("model_1_name", type=str) 8 | @click.argument("model_2_name", type=str) 9 | @click.argument("model_base_name", type=str) 10 | @click.argument("metric", type=str) 11 | @click.option( 12 | "--low-precision", 13 | is_flag=True, 14 | default=False, 15 | help="Use low precision for parameter extraction" 16 | ) 17 | @click.option( 18 | "--split-calculation", 19 | is_flag=True, 20 | default=False, 21 | help="Calculate similarity per split instead of full vector" 22 | ) 23 | def main( 24 | model_1_name: str, 25 | model_2_name: str, 26 | model_base_name: str, 27 | metric: str, 28 | low_precision: bool, 29 | split_calculation: bool, 30 | ): 31 | """ 32 | This function calculates the model kinship between model_1 and model_2 33 | relative to a base model, model_base_name. 34 | """ 35 | # Extract delta parameters between models for calculation 36 | try: 37 | # Validate input models 38 | validate_models(model_1_name, model_2_name, model_base_name) 39 | 40 | # Parse metrics 41 | metrics = metric.split() 42 | if not metrics: 43 | raise click.BadParameter("At least one metric must be specified") 44 | 45 | if split_calculation: 46 | # Extract parameters 47 | results = calculate_model_kinship_split( 48 | model_1_name, 49 | model_2_name, 50 | model_base_name, 51 | low_precision=low_precision, 52 | metrics=metrics 53 | ) 54 | else: 55 | d1, d2 = extract_delta_parameters( 56 | model_1_name, 57 | model_2_name, 58 | model_base_name, 59 | low_precision=low_precision 60 | ) 61 | 62 | results = calculate_model_kinship(d1, d2, metrics) 63 | for metric_name, value in results.items(): 64 | click.echo(f"{metric_name}: {value}") 65 | 66 | except Exception as e: 67 | click.echo(f"Error: {str(e)}", err=True) 68 | raise click.Abort() 69 | 70 | 71 | if __name__ == '__main__': 72 | main() 73 | -------------------------------------------------------------------------------- /metrics/utility.py: -------------------------------------------------------------------------------- 1 | from transformers import AutoConfig, PretrainedConfig, AutoModelForCausalLM, AutoTokenizer 2 | import torch 3 | import logging 4 | from tqdm import tqdm 5 | import click 6 | from typing import List 7 | from enum import Enum 8 | 9 | logging.basicConfig(level=logging.INFO, force=True) 10 | 11 | class Metric(str, Enum): 12 | """Enumeration of supported metrics""" 13 | PCC = 'pcc' 14 | ED = 'ed' 15 | CS = 'cs' 16 | 17 | @classmethod 18 | def list(cls) -> List[str]: 19 | """Return list of supported metric values""" 20 | return [metric.value for metric in cls] 21 | 22 | 23 | def get_config(model: str, trust_remote_code: bool = False) -> PretrainedConfig: 24 | """ 25 | Fetch the configuration of a pretrained model from HuggingFace. 26 | 27 | Args: 28 | model (str): The name or path of the model to load configuration for. 29 | trust_remote_code (bool, optional): Whether to trust remote code during loading. 30 | Defaults to False. 31 | 32 | Returns: 33 | PretrainedConfig: The configuration object of the specified model. 34 | """ 35 | # Fetch the configuration from HuggingFace's model hub. 36 | config = AutoConfig.from_pretrained( 37 | model, 38 | trust_remote_code=trust_remote_code, # Whether to allow remote code execution. 39 | ) 40 | return config 41 | 42 | 43 | def validate_models( 44 | model_1: str, 45 | model_2: str, 46 | base_model: str 47 | ) -> None: 48 | """ 49 | Validate model names to ensure they are different and exist. 50 | 51 | Args: 52 | model_1: Name of the first model 53 | model_2: Name of the second model 54 | base_model: Name of the base model 55 | 56 | Raises: 57 | click.BadParameter: If validation fails 58 | """ 59 | if model_1 == model_2 or model_1 == base_model or model_2 == base_model: 60 | raise click.BadParameter("All model names must be different") 61 | 62 | 63 | def quantize_8bit(x: torch.Tensor) -> torch.Tensor: 64 | # Get absolute min and max values 65 | abs_max = torch.max(torch.abs(x)) 66 | 67 | # Scale to [-127, 127] range for 8-bit signed integers 68 | # Using 127 instead of 128 to keep zero exactly representable 69 | scaled = 127 * (x / abs_max) 70 | 71 | # Round to nearest integer 72 | quantized = torch.round(scaled) 73 | 74 | # Clamp values to ensure they stay in valid range 75 | quantized = torch.clamp(quantized, -127, 127) 76 | 77 | return quantized 78 | 79 | 80 | def load_model_state_dict(model_name: str, device: str) -> dict: 81 | """ 82 | Load a model and return its state dictionary. 83 | 84 | Args: 85 | model_name (str): Name or path of the model to load 86 | device (str): Device to load the model on ('cuda' or 'cpu') 87 | 88 | Returns: 89 | dict: State dictionary of the loaded model 90 | """ 91 | logging.info(f"Loading model: {model_name}") 92 | model = AutoModelForCausalLM.from_pretrained(model_name).to(device) 93 | state_dict = model.state_dict() 94 | del model # Free memory 95 | return state_dict 96 | 97 | 98 | def extract_delta_parameters( 99 | model_1_name: str, 100 | model_2_name: str, 101 | model_base_name: str, 102 | low_precision: bool, 103 | device: str = 'cuda' if torch.cuda.is_available() else 'cpu' 104 | ) -> tuple[torch.Tensor, torch.Tensor]: 105 | 106 | """ 107 | Extract the delta parameters (weight differences) between two models 108 | relative to a base model. 109 | 110 | Args: 111 | model_1_name (str): Name or path of the first model. 112 | model_2_name (str): Name or path of the second model. 113 | model_base_name (str): Name or path of the base model for comparison. 114 | low_precision (bool): Whether to use low precision weights 115 | 116 | Returns: 117 | (torch.Tensor, torch.Tensor): Delta parameters of model_1 and model_2 relative to base model. 118 | """ 119 | 120 | # Extract state dictionaries from models 121 | state_dict_1 = load_model_state_dict(model_1_name, device) 122 | state_dict_2 = load_model_state_dict(model_2_name, device) 123 | state_dict_base = load_model_state_dict(model_base_name, device) 124 | 125 | # Determine the number of layers 126 | num_layers = state_dict_base['lm_head.weight'].shape[0] 127 | 128 | # Check if model architectures match, log a warning if not 129 | if state_dict_1['lm_head.weight'].shape[0] != state_dict_2['lm_head.weight'].shape[0]: 130 | shape_1 = state_dict_1['lm_head.weight'].shape 131 | shape_2 = state_dict_2['lm_head.weight'].shape 132 | logging.warning(f'Warning: Model architectures do not match. ' 133 | f'Using sub weight space instead.\n' 134 | f'Vocab sizes in model 1: {shape_1[0]}, ' 135 | f'Vocab sizes in model 2: {shape_2[0]}') 136 | 137 | # Initialize lists to store delta parameters for both models 138 | d_vector_1, d_vector_2 = [], [] 139 | 140 | # Iterate over keys in the base model's state dictionary with tqdm 141 | for key, base_params in tqdm(state_dict_base.items(), desc="Processing keys", unit="key"): 142 | # Only proceed if key exists in both models 143 | try: 144 | if key not in state_dict_1 or key not in state_dict_2: 145 | logging.warning(f'Key {key} not found in one of the models') 146 | continue 147 | except Exception as e: 148 | logging.error(f'Error processing key {key}: {str(e)}') 149 | 150 | # Get the parameters for each model (truncate to num_layers for consistency) 151 | params_1 = state_dict_1[key][:num_layers] 152 | params_2 = state_dict_2[key][:num_layers] 153 | 154 | # Compute the deltas relative to the base model 155 | delta_1 = (params_1 - base_params).view(-1) 156 | delta_2 = (params_2 - base_params).view(-1) 157 | 158 | # Accumulate deltas 159 | d_vector_1.append(delta_1) 160 | d_vector_2.append(delta_2) 161 | 162 | # Clear memory 163 | del state_dict_1, state_dict_2, state_dict_base 164 | 165 | logging.info('Concatenating delta vectors...') 166 | 167 | d_vector_1 = torch.cat(d_vector_1) 168 | d_vector_2 = torch.cat(d_vector_2) 169 | 170 | if low_precision: 171 | logging.info('Quantizing delta vectors to 8-bit precision...') 172 | d_vector_1 = quantize_8bit(d_vector_1) 173 | d_vector_2 = quantize_8bit(d_vector_2) 174 | logging.info('Quantization complete') 175 | 176 | return d_vector_1, d_vector_2 177 | -------------------------------------------------------------------------------- /pics/evolution.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/pics/evolution.jpg -------------------------------------------------------------------------------- /pics/logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zjunlp/ModelKinship/2369ef49244a91ac4db2dc110e0562408cd9faac/pics/logo.jpg -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | requires = ["setuptools"] 3 | build-backend = "setuptools.build_meta" 4 | 5 | [project] 6 | name = "merge" 7 | description = "Tools for calculating assitant metrics" 8 | readme = "README.md" 9 | license = { text = "LGPL-3.0-or-later" } 10 | version = "0.0.0.1" 11 | authors = [{ name = "Yedi Hu", email = "yedihu.pub@gmail.com" }] 12 | dependencies = [ 13 | "transformers~=4.37.2", 14 | "click~=8.1.7", 15 | ] 16 | 17 | [project.urls] 18 | repository = "https://github.com/zjunlp/ModelKinship" 19 | 20 | 21 | [project.scripts] 22 | merge_cal = "metrics.run_command:main" 23 | 24 | [tool.setuptools] 25 | packages = [ 26 | "metrics", 27 | ] 28 | -------------------------------------------------------------------------------- /scripts/Iterative_merge.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "o12O0YjJvvLW" 7 | }, 8 | "source": [ 9 | "# Iterative Merging with Model Kinship\n", 10 | "> Our Github [Merge Assistant Toolkit](https://github.com/zjunlp/ModelKinship).\n", 11 | "\n", 12 | "> Open in Colab [This Notebook](https://colab.research.google.com/drive/141VAI89emgSIcwkswATEXSEENoAMywTO?usp=sharing)\n", 13 | "\n", 14 | "This notebook demonstrates iterative merging using external tools to enhance large language model performance.\n", 15 | "It leverages the following:\n", 16 | "\n", 17 | "- [lm-evaluation-harness by EleutherAI](https://github.com/EleutherAI/lm-evaluation-harness) for providing a comprehensive evaluation framework for large language models.\n", 18 | "- [mergekit by arcee-ai](https://github.com/arcee-ai/mergekit) for offering an essential toolkit for model merging experiments.\n", 19 | "\n", 20 | "We express our gratitude to the contributors of these valuable tools." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "source": [ 26 | "# @title ## Install requirements\n", 27 | "\n", 28 | "# @markdown ### Run this section to install the required dependencies and log in to Hugging Face Hub.\n", 29 | "\n", 30 | "!git clone https://github.com/EleutherAI/lm-evaluation-harness\n", 31 | "!pip install -e ./lm-evaluation-harness\n", 32 | "!git clone https://github.com/zjunlp/ModelKinship.git\n", 33 | "!pip install -e ./ModelKinship\n", 34 | "!git clone https://github.com/cg123/mergekit.git\n", 35 | "!pip install -e ./mergekit\n", 36 | "!pip install -qU huggingface_hub\n", 37 | "\n", 38 | "username = \"xxx\" # @param {\"type\":\"string\"}\n", 39 | "HF_Token = \"hf_xxxx\" # @param {\"type\":\"string\"}\n", 40 | "\n", 41 | "from huggingface_hub import login\n", 42 | "login(HF_Token)\n", 43 | "api = HfApi(token=HF_Token)" 44 | ], 45 | "metadata": { 46 | "cellView": "form", 47 | "id": "HK6EceEVeyS_" 48 | }, 49 | "execution_count": null, 50 | "outputs": [] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "source": [ 55 | "# @title ## Select Models\n", 56 | "\n", 57 | "# @markdown ### Model 1\n", 58 | "# @markdown Select your first model in huggingface hub.\n", 59 | "\n", 60 | "Model_1 = \"openchat/openchat-3.5-1210\" # @param {\"type\":\"string\"}\n", 61 | "\n", 62 | "# @markdown ### Model 1\n", 63 | "# @markdown Select your second model in huggingface hub.\n", 64 | "\n", 65 | "Model_2 = \"meta-math/MetaMath-Mistral-7B\" # @param {\"type\":\"string\"}\n", 66 | "\n", 67 | "# @markdown ### Similarity metric for Model Kinship\n", 68 | "# @markdown Select similarity metric for model kinship. Details can be found in our paper.\n", 69 | "\n", 70 | "Similarity_metric = \"Pearson Correlation Coefficient\" # @param [\"Pearson Correlation Coefficient\",\"Cosine Similarity\",\"Euclidean Distance\"]\n", 71 | "# Save config as yaml file\n", 72 | "with open('config.yaml', 'w', encoding=\"utf-8\") as f:\n", 73 | " f.write(yaml_config)\n", 74 | "\n", 75 | "cli = \"merge_cal\"\n", 76 | "cli += \" \" + Model_1\n", 77 | "cli += \" \" + Model_2\n", 78 | "\n", 79 | "\n", 80 | "# Similarity Metrics\n", 81 | "if Similarity_metric == \"Euclidean Distance\":\n", 82 | " cli += \" ed\"\n", 83 | "elif Similarity_metric == \"Cosine Similarity\":\n", 84 | " cli += \" cs\"\n", 85 | "elif Similarity_metric == \"Pearson Correlation Coefficient\"\n", 86 | " cli += \" pcc\"\n", 87 | "\n", 88 | "print(cli)\n", 89 | "\n", 90 | "# Calculation\n", 91 | "!{cli}" 92 | ], 93 | "metadata": { 94 | "cellView": "form", 95 | "id": "yO9AVaNYeuiX" 96 | }, 97 | "execution_count": null, 98 | "outputs": [] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "source": [ 103 | "# @title ## Merge Models\n", 104 | "\n", 105 | "# @markdown ### Name for the merged model\n", 106 | "# @markdown Name your merged model.\n", 107 | "Model_merge = \"Merge_1_1\" # @param {\"type\":\"string\"}\n", 108 | "\n", 109 | "# @markdown ### Model 1 (base model)\n", 110 | "# @markdown Select your first model in huggingface hub.\n", 111 | "\n", 112 | "Model_1 = \"openchat/openchat-3.5-1210\" # @param {\"type\":\"string\"}\n", 113 | "\n", 114 | "# @markdown ### Model 2\n", 115 | "# @markdown Select your second model in huggingface hub.\n", 116 | "\n", 117 | "Model_2 = \"meta-math/MetaMath-Mistral-7B\" # @param {\"type\":\"string\"}\n", 118 | "\n", 119 | "config = f\"\"\"\n", 120 | "slices:\n", 121 | " - sources:\n", 122 | " - model: {Model_1}\n", 123 | " layer_range: [0, 32]\n", 124 | " - model: {Model_2}\n", 125 | " layer_range: [0, 32]\n", 126 | "merge_method: slerp\n", 127 | "base_model: {Model_1}\n", 128 | "parameters:\n", 129 | " t:\n", 130 | " - filter: self_attn\n", 131 | " value: [0, 0.5, 0.3, 0.7, 1]\n", 132 | " - filter: mlp\n", 133 | " value: [1, 0.5, 0.7, 0.3, 0]\n", 134 | " - value: 0.5\n", 135 | "dtype: bfloat16\n", 136 | "\"\"\"\n", 137 | "\n", 138 | "with open('config.yaml', 'w', encoding=\"utf-8\") as f:\n", 139 | " f.write(config)\n", 140 | "\n", 141 | "!mergekit-yaml config.yaml merge --copy-tokenizer --allow-crimes --out-shard-size 1B --lazy-unpickle" 142 | ], 143 | "metadata": { 144 | "cellView": "form", 145 | "id": "b63EhHIFm7Im" 146 | }, 147 | "execution_count": null, 148 | "outputs": [] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "source": [ 153 | "# @title ## Recycle Models\n", 154 | "\n", 155 | "# @markdown ### Run this section to upload merged model.\n", 156 | "\n", 157 | "from huggingface_hub import ModelCard, ModelCardData\n", 158 | "from jinja2 import Template\n", 159 | "\n", 160 | "template_text = \"\"\"\n", 161 | "---\n", 162 | "license: apache-2.0\n", 163 | "tags:\n", 164 | "{%- for model in models %}\n", 165 | "- {{ model }}\n", 166 | "{%- endfor %}\n", 167 | "---\n", 168 | "\n", 169 | "{%- for model in models %}\n", 170 | "* [{{ model }}](https://huggingface.co/{{ model }})\n", 171 | "{%- endfor %}\n", 172 | "\"\"\"\n", 173 | "\n", 174 | "jinja_template = Template(template_text.strip())\n", 175 | "\n", 176 | "data = yaml.safe_load(yaml_config)\n", 177 | "models = [data[\"slices\"][i][\"sources\"][0][\"model\"] for i in range(len(data[\"slices\"]))]\n", 178 | "content = jinja_template.render(\n", 179 | " model_name=Model_name,\n", 180 | " models=models,\n", 181 | " username=username,\n", 182 | ")\n", 183 | "\n", 184 | "# Save the model card\n", 185 | "card = ModelCard(content)\n", 186 | "card.save('merge/README.md')\n", 187 | "\n", 188 | "api.create_repo(\n", 189 | " repo_id=f\"{username}/{MODEL_NAME}\",\n", 190 | " repo_type=\"model\"\n", 191 | ")\n", 192 | "api.upload_folder(\n", 193 | " repo_id=f\"{username}/{MODEL_NAME}\",\n", 194 | " folder_path=\"merge\",\n", 195 | ")\n" 196 | ], 197 | "metadata": { 198 | "cellView": "form", 199 | "id": "O_xv1W2ep3WT" 200 | }, 201 | "execution_count": null, 202 | "outputs": [] 203 | } 204 | ], 205 | "metadata": { 206 | "accelerator": "TPU", 207 | "colab": { 208 | "gpuType": "V28", 209 | "provenance": [] 210 | }, 211 | "kernelspec": { 212 | "display_name": "Python 3", 213 | "name": "python3" 214 | }, 215 | "language_info": { 216 | "name": "python" 217 | } 218 | }, 219 | "nbformat": 4, 220 | "nbformat_minor": 0 221 | } --------------------------------------------------------------------------------