├── result.png ├── ballot21.csv ├── ballot22.csv ├── ballot23final.csv ├── ballot24final.csv ├── README.md ├── .gitignore └── analyze_linear_transfer.py /result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harelc/elections-vote-transfer/HEAD/result.png -------------------------------------------------------------------------------- /ballot21.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harelc/elections-vote-transfer/HEAD/ballot21.csv -------------------------------------------------------------------------------- /ballot22.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harelc/elections-vote-transfer/HEAD/ballot22.csv -------------------------------------------------------------------------------- /ballot23final.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harelc/elections-vote-transfer/HEAD/ballot23final.csv -------------------------------------------------------------------------------- /ballot24final.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/harelc/elections-vote-transfer/HEAD/ballot24final.csv -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # elections-vote-transfer 2 | ## Analysis of vote transfer between two elections 3 | ### Harel Cain, September 2019 4 | 5 | The code analyzes the ballot results for ~10,000 polling stations with 6 | the same identity (locale and polling station number) between the elections for the 21st and the 22nd knesset. 7 | 8 | It assumes there is a linear transfer matrix $M$ such that $V_21 * M ~ V_22$. 9 | 10 | It solves for the matrix M (argmin_M |V_21*M-V_22|) in one of three ways: 11 | 12 | 1. Closed-form least linear squares. This way M is not constrained and may contain negative numbers as well as numbers > 1, also its columns don't sum up to 1. 13 | 2. Non-negative least squares. This handles the non-negative constraint but still doesn't guarantee stochasity. 14 | 3. Convex optimization with the SCS solver, with constraints on 0Based on analysis of {n_ballots} precincts whose serial number appeared in both. 95 |
Created by Harel Cain on {time.strftime('%d.%m.%Y %H%:%M')}. All rights reserved. 96 |
Source code: https://github.com/harelc/elections-vote-transfer/ 97 |
98 |
""", title_font_size=16, font_size=14) 99 | fig.write_html("index.html") 100 | fig.show() 101 | 102 | 103 | if __name__ == '__main__': 104 | method = "convex solver" # "nnls", "closed form" 105 | df_previous = pd.read_csv('ballot24final.csv', encoding='iso8859_8') 106 | df_current = pd.read_csv('ballot25.csv') 107 | 108 | # 23rd knesset 109 | # parties_previous_full = 'יש_עתיד ליכוד המשותפת ש״ס ישראל_ביתנו יהדות_התורה ימינה העבודה'.split() 110 | # parties_previous = 'פה מחל ודעם שס ל ג טב אמת'.split() 111 | 112 | # 24th knesset 113 | parties_previous_full = 'יש_עתיד הליכוד המשותפת ש״ס ישראל_ביתנו יהדות_התורה הציונות_הדתית העבודה ימינה מרצ רע״ם כחול_לבן תקווה_חדשה '.split() 114 | parties_previous = 'פה מחל ודעם שס ל ג ט אמת ב מרצ עם כן ת'.split() 115 | 116 | # 25th knesset 117 | parties_current_full = 'העבודה אביר_קארה הבית_היהודי יהדות_התורה בל״ד חד״ש_תע״ל הציונות_הדתית המחנה_הממלכתי ישראל_ביתנו הליכוד מרצ רע״ם יש_עתיד ש״ס'.split() 118 | parties_current = 'אמת אצ ב ג ד ום ט כן ל מחל מרצ עם פה שס'.split() 119 | 120 | df_previous, df_previous_full = adapt_df(df_previous, parties_previous, parties_previous_full, include_no_vote=True) 121 | df_current, df_current_full = adapt_df(df_current, parties_current, parties_current_full, include_no_vote=True) 122 | 123 | merged_df = pd.merge(df_previous, df_current, how='inner', left_index=True, right_index=True) 124 | 125 | print('Analyzing {} precincts common to both elections. Largest ballot has {} votes.'.format( 126 | len(merged_df), 127 | merged_df.sum(axis=1).max() 128 | )) 129 | values_previous = df_previous.loc[merged_df.index].values 130 | values_current = df_current.loc[merged_df.index].values 131 | 132 | if method == "closed form": 133 | #### method 1: closed-form solution with no non-negative constraint 134 | transfer_matrix = values_current.T @ values_previous @ np.linalg.pinv(values_previous.T @ values_previous) 135 | 136 | elif method == "nnls": 137 | ### method 2: non-negative least square solution 138 | transfer_matrix = np.zeros((values_current.shape[1], values_previous.shape[1])) 139 | for i in range(values_current.shape[1]): 140 | sol, r2 = nnls(values_previous, values_current[:, i]) 141 | transfer_matrix[i, :] = sol 142 | pred = values_previous @ sol 143 | res = pred - values_current[:, i] 144 | # print MSE, MAE, sum of error 145 | # print(r2, np.mean(np.abs(res)), res.sum()) 146 | 147 | elif method == "convex solver": 148 | ## method 3: use convex solver with constraints 149 | transfer_matrix = solve_transfer_coefficients(values_previous, values_current, verbose=True).T 150 | 151 | y_bar = values_current.mean(axis=0) 152 | ss_tot = ((values_current - y_bar) ** 2).sum() 153 | ss_res = ((values_current - values_previous @ transfer_matrix.T) ** 2).sum() 154 | print('R^2 is {:3.3f}'.format(1. - ss_res / ss_tot)) 155 | print(transfer_matrix.sum(axis=0)) 156 | print(transfer_matrix.sum(axis=1)) 157 | 158 | vote_movements = transfer_matrix * df_previous_full.sum(axis=0).values 159 | print('Removing vote movements smaller than 5000') 160 | vote_movements[vote_movements < 5000] = 0. 161 | 162 | sankey(vote_movements, df_previous.columns.values, df_current.columns.values, n_ballots=len(merged_df)) 163 | --------------------------------------------------------------------------------