├── .gitignore ├── README.md ├── combine_files.rb ├── create_open_secrets_db.sql ├── fec_individual_contribution_files.py ├── fec_individual_contribution_files.pyc ├── license.md ├── make_files.sh ├── raw_fec_files.py └── small_money_donors.sql /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | local_files/ 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Get The OpenSecrets Data Into A Postgres Database 2 | 3 | This repo is to help people easily (ok, semi-easily) get all the 4 | opensecrets campaign contribution data off the internet and into a 5 | postgres database. It took me a decent amount of time to automate lots 6 | of the things you would otherwise need to do manually and learn about 7 | the hard way. Hopefully this saves future researchers lots of time. 8 | 9 | ## The Easy Way 10 | 11 | For those that care only about getting the data, I uploaded a `pg_dump` 12 | of the database to S3. 13 | 14 | The specific command I used was: 15 | 16 | `pg_dump -Fc campaign_finance > campaign_finance.dump` 17 | 18 | You can download the database from this link: 19 | 20 | [database dump](https://s3.amazonaws.com/campaign_finance/campaign_finance.dump) 21 | 22 | Assuming you have postgres installed, run the following from your 23 | terminal command line from the folder containing the file you just 24 | downloaded: 25 | 26 | `pg_restore -C -d campaign_finance campaign_finance.dump` 27 | 28 | This will create a campaign finance database, and load it up with all 29 | the data you need. The total size of the database without indexes is 30 | about 5 gigs. 31 | 32 | I'm going to be updating the `create_open_secrets_db_sql` file with 33 | additional indexes as I find them necessary for my use case. If you want 34 | to add these indexes to your version of the data, just check at the 35 | bottom of that file, and copy the index commands after you psql into the 36 | database. 37 | 38 | ## The Hard Way 39 | 40 | This should work on any OSX or Linux systems, but won't work on 41 | Windows. 42 | 43 | 44 | ### Get The Files 45 | 46 | To start with, you need to go to [Open Secrets](http://www.opensecrets.org), create an account, and download 47 | all the yearly data files. There are 5 database tables, with files for 48 | every election season from 1990 through 2014. The five tables are 49 | committees, candidates, individual contributions, pacs and pac to pacs. 50 | You will need to download lots of files. 51 | 52 | Take all those files and put them into the same folder, without renaming 53 | them. Then copy the files from this repo into that folder. 54 | 55 | ### Combining The Yearly Files 56 | 57 | Run the bash script as follows to clean and combine all the files for 58 | easy upload into postgres. 59 | 60 | `sh make_files.sh` 61 | 62 | ### Getting The Data Into Postgres 63 | 64 | If you don't have postgres, you need to go download it and get it set 65 | up. 66 | 67 | Log into postgres and create a database named "campaign_finance". 68 | 69 | Then `psql campaign_finance` to enter your new database. 70 | 71 | Open up the `create_open_secrets_db.sql` file from this repo. 72 | 73 | Paste the commands one by one into your terminal. You could probably 74 | just run the whole file, and you would be ok, but I would recommend 75 | running the commands one by one, to make sure that there aren't any 76 | issues. 77 | 78 | The most likely issue you might find is a formatting error somewhere in 79 | the combined files while you are copying. Postgres will let you know the 80 | line number of the error, so you can just open up the file in a text 81 | editor, go directly to that line, and generally fix the formatting 82 | error. Save the file, and then try the copy command again. 83 | 84 | You will see that the old individual_contributions_combined file is 85 | split into multiple parts. This is because the file is too large to fit 86 | into memory on my computer, and had problems being copied into the 87 | database. 88 | 89 | I'm going to be adding indexes to this file as I use the data, and these 90 | are optional. They make an already large database larger, so you can 91 | decide which, if any, of these indexes you want to include for your 92 | research. I'll sort them by how useful they are to me. 93 | 94 | ### Questions / Problems 95 | 96 | Open up a github issue and I'll do my best to help. 97 | 98 | ### Contributing 99 | 100 | 1. Fork it 101 | 2. Create your feature branch (`git checkout -b my-new-feature`) 102 | 3. Commit your changes (`git commit -am 'Add some feature'`) 103 | 4. Push to the branch (`git push origin my-new-feature`) 104 | 5. Create new Pull Request 105 | 106 | ### Credits 107 | 108 | ![open_secrets](http://assets.opensecrets.org/MyOS/img/opensecrets_databy250x88.gif) 109 | -------------------------------------------------------------------------------- /combine_files.rb: -------------------------------------------------------------------------------- 1 | def combine_files(new_filename, filename_array) 2 | File.open(new_filename + ".txt", "w") do |combined_file| 3 | filename_array.each do |f| 4 | text = File.open(f, "r").read 5 | text.each_line do |line| 6 | combined_file << line 7 | end 8 | end 9 | end 10 | format_file(new_filename) 11 | end 12 | 13 | def format_file(new_filename) 14 | format_command = "iconv -f utf-8 -t utf-8 -c #{new_filename}.txt > #{new_filename}_formatted.txt" 15 | system(format_command) 16 | system("rm #{new_filename}.txt") 17 | system("mv -f #{new_filename}_formatted.txt #{new_filename}.txt") 18 | end 19 | 20 | def fix_file_formatting_errors 21 | system('sed -i ".original" -e "s/PREDKI, \|STANLEY/PREDKI, STANLEY/g" -e "s/THE \|HILLS/THE HILLS/g" -e "s/\|\|kirby, Rev Dr \|\|jerry/\|kirby, Rev Dr jerry/g" -e "s/MCCORMICK, GARY \|D/MCCORMICK, GARY D/g" -e "s/\|\|palmetto/|\palmetto/g" combined_old_individual.txt') 22 | system("mv -f combined_old_individual_new.txt combined_old_individual.txt") 23 | end 24 | 25 | # Combine all the groups except for individual contributions 26 | # because they have two file types 27 | def combine_non_individual 28 | file_groups = %w{pacs pac_other cmtes cands} 29 | 30 | file_groups.each do |g| 31 | puts "combining #{g}" 32 | files = Dir.entries('.').select{ |f| f[/^#{g}/] } 33 | filename = "combined_#{g}" 34 | combine_files(filename, files) 35 | end 36 | end 37 | 38 | def combine_individual 39 | new_individual_files = ["indivs12.txt", "indivs14.txt"] 40 | old_individual_files = Dir.entries('.').select{|f| f[/^indivs/]}.select{|f| !new_individual_files.include?(f) } 41 | old_combined_filename = "combined_old_individual" 42 | combine_files(old_combined_filename, old_individual_files) 43 | new_combined_filename = "combined_new_individual" 44 | combine_files(new_combined_filename, new_individual_files) 45 | end 46 | 47 | combine_non_individual 48 | combine_individual 49 | fix_file_formatting_errors 50 | -------------------------------------------------------------------------------- /create_open_secrets_db.sql: -------------------------------------------------------------------------------- 1 | DROP TABLE candidates; 2 | 3 | CREATE TABLE candidates( 4 | id bigserial primary key, 5 | cycle varchar(255), 6 | fec_cand_id varchar(255), 7 | cid varchar(255), 8 | first_last_party varchar(255), 9 | party varchar(255), 10 | dist_id_run_for varchar(255), 11 | dist_id_currently_held varchar(255), 12 | current_candidate varchar(255), 13 | cycle_candidate varchar(255), 14 | crpico varchar(255), 15 | recip_code varchar(255), 16 | nopacs varchar(255), 17 | raised_from_pacs integer, 18 | raised_from_individuals integer, 19 | raised_total integer, 20 | raised_unitemized integer 21 | ); 22 | 23 | COPY candidates(cycle, fec_cand_id, cid, first_last_party, party, dist_id_run_for, dist_id_currently_held, current_candidate, cycle_candidate, crpico, recip_code, nopacs) 24 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cands.txt' 25 | WITH CSV QUOTE '|' DELIMITER ','; 26 | 27 | DROP TABLE committees; 28 | 29 | CREATE TABLE committees( 30 | id bigserial primary key, 31 | cycle varchar(255), 32 | committee_id varchar(255), 33 | pac_short varchar(255), 34 | affiliate varchar(255), 35 | ultorg varchar(255), 36 | recip_id varchar(255), 37 | recip_code varchar(255), 38 | fec_cand_id varchar(255), 39 | party varchar(255), 40 | prim_code varchar(255), 41 | source varchar(255), 42 | sensitive varchar(255), 43 | foreign_owned varchar(255), 44 | active integer 45 | ); 46 | 47 | COPY committees(cycle, committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned, active) 48 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cmtes.txt' 49 | WITH CSV QUOTE '|' DELIMITER ','; 50 | 51 | DROP TABLE individual_contributions; 52 | 53 | CREATE TABLE individual_contributions( 54 | id bigserial primary key, 55 | cycle varchar(255), 56 | fec_trans_id varchar(255), 57 | contributor_id varchar(255), 58 | contributor_name varchar(255), 59 | recipient_id varchar(255), 60 | org_name varchar(255), 61 | ult_org varchar(255), 62 | real_code varchar(255), 63 | date timestamp, 64 | amount integer, 65 | street varchar(255), 66 | city varchar(255), 67 | state varchar(255), 68 | zip varchar(255), 69 | recip_code varchar(255), 70 | type varchar(255), 71 | committee_id varchar(255), 72 | other_id varchar(255), 73 | gender varchar(255), 74 | old_format_employer_occupation varchar(255), 75 | microfilm varchar(255), 76 | occupation varchar(255), 77 | employer varchar(255), 78 | source varchar(255) 79 | ); 80 | 81 | -- The format for the old pre-2012 files has an extra field for the combined employer/occupation field. 82 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, microfilm, occupation, employer, source) 83 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_new_individual.txt' 84 | WITH CSV QUOTE '|' DELIMITER ','; 85 | 86 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source) 87 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_aa' 88 | WITH CSV QUOTE '|' DELIMITER ','; 89 | 90 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source) 91 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ab' 92 | WITH CSV QUOTE '|' DELIMITER ','; 93 | 94 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source) 95 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ac' 96 | WITH CSV QUOTE '|' DELIMITER ','; 97 | 98 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source) 99 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ad' 100 | WITH CSV QUOTE '|' DELIMITER ','; 101 | 102 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source) 103 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ae' 104 | WITH CSV QUOTE '|' DELIMITER ','; 105 | 106 | DROP TABLE pacs; 107 | 108 | CREATE TABLE pacs( 109 | id bigserial primary key, 110 | cycle varchar(255), 111 | fec_rec_no varchar(255), 112 | pac_id varchar(255), 113 | cid varchar(255), 114 | amount integer, 115 | date timestamp, 116 | real_code varchar(255), 117 | type varchar(255), 118 | di varchar(255), 119 | fec_cand_id varchar(255) 120 | ); 121 | 122 | COPY pacs(cycle, fec_rec_no, pac_id, cid, amount, date, real_code, type, di, fec_cand_id) 123 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pacs.txt' 124 | WITH CSV QUOTE '|' DELIMITER ','; 125 | 126 | DROP TABLE pac_to_pacs; 127 | 128 | CREATE TABLE pac_to_pacs( 129 | id bigserial primary key, 130 | cycle varchar(255), 131 | fec_rec_no varchar(255), 132 | filer_id varchar(255), 133 | donor_committee varchar(255), 134 | contrib_lend_trans varchar(255), 135 | city varchar(255), 136 | state varchar(255), 137 | zip varchar(255), 138 | fec_occ_emp varchar(255), 139 | prim_code varchar(255), 140 | date timestamp, 141 | amount decimal, 142 | recipient_id varchar(255), 143 | party varchar(255), 144 | other_id varchar(255), 145 | recip_code varchar(255), 146 | recip_prim_code varchar(255), 147 | amend varchar(255), 148 | report varchar(255), 149 | pg varchar(255), 150 | microfilm varchar(255), 151 | type varchar(255), 152 | real_code varchar(255), 153 | source varchar(255) 154 | ); 155 | 156 | COPY pac_to_pacs(cycle, fec_rec_no, filer_id, donor_committee, contrib_lend_trans, city, state, zip, fec_occ_emp, prim_code, date, amount, recipient_id, party, other_id, recip_code, recip_prim_code, amend, report, pg, microfilm, type, real_code, source ) 157 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pac_other.txt' 158 | WITH CSV QUOTE '|' DELIMITER ','; 159 | 160 | DROP TABLE industry_codes; 161 | 162 | CREATE TABLE industry_codes( 163 | id bigserial primary key, 164 | category_code varchar(255), 165 | category_name varchar(255), 166 | industry_code varchar(255), 167 | industry_name varchar(255), 168 | sector varchar(255), 169 | sector_long varchar(255) 170 | ); 171 | 172 | -- Line 443 extra column at end of line needs to be deleted 173 | COPY industry_codes(category_code, category_name, industry_code, industry_name, sector, sector_long) 174 | from '/Users/sik/Downloads/CRP_Categories.txt' 175 | WITH CSV DELIMITER E'\t'; 176 | 177 | 178 | DROP TABLE politicians; 179 | 180 | CREATE TABLE politicians( 181 | id bigserial primary key, 182 | cid varchar(255), 183 | name varchar(255) 184 | ); 185 | 186 | INSERT INTO politicians(cid, name)( 187 | SELECT distinct on(cid) cid, left(first_last_party, -4) 188 | FROM candidates 189 | ); 190 | 191 | DROP TABLE pac_records; 192 | 193 | CREATE TABLE pac_records( 194 | id bigserial primary key, 195 | committee_id varchar(255), 196 | pac_short varchar(255), 197 | affiliate varchar(255), 198 | ultorg varchar(255), 199 | recip_id varchar(255), 200 | recip_code varchar(255), 201 | fec_cand_id varchar(255), 202 | party varchar(255), 203 | prim_code varchar(255), 204 | source varchar(255), 205 | sensitive varchar(255), 206 | foreign_owned varchar(255) 207 | ); 208 | 209 | INSERT INTO pac_records(committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned)( 210 | SELECT distinct on(committee_id) 211 | committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned 212 | FROM committees 213 | ); 214 | 215 | CREATE INDEX ON candidates (cid); 216 | CREATE INDEX ON individual_contributions (recipient_id); 217 | 218 | CREATE INDEX ON individual_contributions (real_code); 219 | CREATE INDEX ON pacs (cid); 220 | CREATE INDEX ON pacs (real_code); 221 | CREATE INDEX ON industry_codes (category_code); 222 | 223 | CREATE INDEX ON politicians (cid); 224 | CREATE INDEX ON politicians (name); 225 | 226 | CREATE INDEX ON individual_contributions (cycle); 227 | CREATE INDEX ON pacs (cycle); 228 | CREATE INDEX ON candidates (cycle); 229 | 230 | CREATE INDEX ON pac_records (committee_id); 231 | 232 | -- Denormalize candidates to show total raised for each candidate by election cycle 233 | 234 | with candidate_individual_contributions as ( 235 | SELECT c.id as candidate_id 236 | , SUM(i.amount) as total_individual 237 | FROM candidates c 238 | JOIN individual_contributions i ON i.recipient_id = c.cid AND i.cycle = c.cycle 239 | GROUP BY c.id 240 | ) 241 | 242 | UPDATE candidates SET raised_from_individuals = candidate_individual_contributions.total_individual 243 | FROM candidate_individual_contributions 244 | WHERE candidate_individual_contributions.candidate_id = candidates.id; 245 | 246 | 247 | with pac_contributions as ( 248 | SELECT c.id as candidate_id 249 | , SUM(p.amount) as total_pac 250 | FROM candidates c 251 | JOIN pacs p ON p.cid = c.cid AND p.cycle = c.cycle 252 | WHERE p.type != '24A' AND p.type != '24N' 253 | GROUP BY c.id 254 | ) 255 | 256 | UPDATE candidates SET raised_from_pacs = pac_contributions.total_pac 257 | FROM pac_contributions 258 | WHERE pac_contributions.candidate_id = candidates.id; 259 | 260 | with cycle_unitemized as ( 261 | SELECT f.cid as cid 262 | , f.cycle as cycle 263 | , SUM(f.individual_unitemized_contributions) as total_unitemized 264 | FROM fec_api_committees f 265 | WHERE f.designation = 'P' 266 | AND f.individual_unitemized_contributions IS NOT NULL 267 | GROUP BY f.cid, f.cycle 268 | ) 269 | 270 | UPDATE candidates SET raised_unitemized = cu.total_unitemized 271 | FROM cycle_unitemized cu 272 | WHERE candidates.cycle = cu.cycle 273 | AND candidates.cid = cu.cid; 274 | 275 | UPDATE candidates SET raised_total = COALESCE(raised_from_individuals, 0) + COALESCE(raised_from_pacs, 0) + COALESCE(raised_unitemized, 0); 276 | 277 | 278 | -- Add backers table, for the kickstarter backers (and other backers) 279 | 280 | DROP TABLE backers; 281 | 282 | CREATE TABLE backers( 283 | id bigserial primary key, 284 | cid varchar(255), 285 | name varchar(255), 286 | backer_level varchar(255), 287 | kickstarter boolean 288 | ); 289 | 290 | DROP TABLE user_submissions; 291 | 292 | CREATE TABLE user_submissions( 293 | id bigserial primary key, 294 | title text, 295 | description text, 296 | user_name varchar(255), 297 | url varchar(255), 298 | cid varchar(255), 299 | approved_at timestamp, 300 | approved_by varchar(255), 301 | promote boolean 302 | ); 303 | 304 | DROP TABLE fec_api_committees; 305 | 306 | CREATE TABLE fec_api_committees( 307 | id bigserial primary key, 308 | cid varchar(255), 309 | created_at timestamp, 310 | updated_at timestamp, 311 | cycle varchar(255), 312 | individual_unitemized_contributions integer, 313 | individual_itemized_contributions integer, 314 | individual_contributions integer, 315 | designation varchar(255), 316 | organization_type varchar(255), 317 | name text, 318 | committee_id varchar(255), 319 | committee_type varchar(255) 320 | ); 321 | 322 | CREATE UNIQUE INDEX unique_committee_and_cycle ON fec_api_committees (committee_id, cycle); 323 | CREATE INDEX ON fec_api_committees (cid); 324 | -------------------------------------------------------------------------------- /fec_individual_contribution_files.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import fileinput 4 | import sys 5 | import subprocess 6 | from IPython.core.debugger import Tracer 7 | 8 | 9 | class OrganizeFiles(): 10 | def __init__(self, folder, unzip_name, new_prefix, header_file=None, unzip_year=None): 11 | self.folder = folder 12 | self.unzip_name = unzip_name 13 | self.new_prefix = new_prefix 14 | self.unzip_year = unzip_year 15 | self.header_file = header_file 16 | 17 | def unzip_committee_files(self): 18 | file_list = glob.glob('local_files/{}/*.zip'.format(self.folder)) 19 | #file_list = glob.glob('local_files/{}/indiv90.zip'.format(self.folder)) 20 | 21 | for f in file_list: 22 | year = f.split('.')[0][-2:] 23 | new_filename = self.new_prefix + year + '.txt' 24 | subprocess.call(['unzip', f]) 25 | print 'hi' 26 | if self.unzip_year is None: 27 | current_name = self.unzip_name + '.txt' 28 | else: 29 | print self.unzip_name + year + '.txt' 30 | current_name = self.unzip_name + year + '.txt' 31 | print current_name 32 | os.rename(current_name, 'local_files/{}/'.format(self.folder) + new_filename) 33 | 34 | def header_string(self): 35 | header_path = 'local_files/{}/{}'.format(self.folder, self.header_file) 36 | with open(header_path, 'r') as f: 37 | first_line = f.readline() 38 | 39 | return first_line.replace(',','|') 40 | 41 | def fixed_format_line(self, line): 42 | # remove microsoft encoding 43 | decoded = line.decode('cp1252').encode('utf8') 44 | 45 | # remove apostraphes because csvkit seems to have problems with them: 46 | return decoded.replace('\'','') 47 | 48 | 49 | def add_cycle_to_files(self): 50 | file_list = glob.glob('local_files/{}/*.txt'.format(self.folder)) 51 | #file_list = glob.glob('local_files/{}/individual_contributions90.txt'.format(self.folder)) 52 | for f in file_list: 53 | year = f.split('.')[0][-2:] 54 | if int(year) < 30: 55 | year = '20' + year 56 | else: 57 | year = '19' + year 58 | 59 | for line in fileinput.input([f], inplace=True): 60 | fixed_line = self.fixed_format_line(line) 61 | if fileinput.isfirstline(): 62 | sys.stdout.write('CYCLE|{l}'.format(l=self.header_string())) 63 | sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line)) 64 | else: 65 | sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line)) 66 | 67 | def combine_converted_files(self): 68 | full_file_list = glob.glob('local_files/{}/*.txt'.format(self.folder)) 69 | file_list = [f for f in full_file_list if "converted" in f] 70 | print file_list 71 | combined_filename = 'local_files/{}/combined_{}.csv'.format(self.folder, self.new_prefix) 72 | with open(combined_filename, 'w') as outfile: 73 | for fname in file_list: 74 | with open(fname) as infile: 75 | for line in infile: 76 | outfile.write(line) 77 | 78 | def import_files_to_postgres(self): 79 | full_file_list = glob.glob('local_files/{}/*.csv'.format(self.folder)) 80 | file_list = [f for f in full_file_list if "converted" in f] 81 | 82 | for f in file_list: 83 | print f 84 | command = 'psql -c "COPY fec_individual_contributions FROM \'/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/{}\' with DELIMITER \',\' CSV HEADER" campaign_finance'.format(f) 85 | subprocess.call(command, shell=True) 86 | 87 | def check_format_and_convert(self): 88 | file_list = glob.glob('local_files/{}/*.txt'.format(self.folder)) 89 | for f in file_list: 90 | print f 91 | year = f.split('.')[0][-2:] 92 | command = 'in2csv -d "|" -f csv -v local_files/{}/{}{}.txt > local_files/{}/converted{}.csv'\ 93 | .format(self.folder, self.new_prefix, year, self.folder, year) 94 | subprocess.call(command, shell=True) 95 | 96 | 97 | def run(self): 98 | print 'hik' 99 | #self.unzip_committee_files() 100 | #self.add_cycle_to_files() 101 | #self.check_format_and_convert() 102 | self.import_files_to_postgres() 103 | 104 | 105 | if __name__ == "__main__": 106 | #a = OrganizeFiles(folder='fec_candidate_summary_files', unzip_name='weball', new_prefix='candidate_summary', unzip_year=True) 107 | a = OrganizeFiles(folder='fec_individual_contributions', unzip_name='itcont', new_prefix='individual_contributions', header_file='indiv_header_file.csv') 108 | a.run() 109 | 110 | # get the header file, and then run head -n 200 >> header_file.csv 111 | # Create the table for the new table, but don't insert 112 | # csvsql --db {db here} --no-constraints header_file.csv 113 | # sed -i '' 's/_unnamed//g' combined_individual_contributions.csv 114 | # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/combined_individual_contributions.csv' DELIMITER ',' CSV; 115 | # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/.csv' DELIMITER ',' CSV HEADER; 116 | -------------------------------------------------------------------------------- /fec_individual_contribution_files.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Solomon/opensecrets_to_postgres/57cc69c9c6a0d4718befda9f1181d35164e43d81/fec_individual_contribution_files.pyc -------------------------------------------------------------------------------- /license.md: -------------------------------------------------------------------------------- 1 | Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 2 | -------------------------------------------------------------------------------- /make_files.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "Combining Files" 4 | 5 | ruby combine_files.rb 6 | 7 | echo "Fixing Spacing Issues" 8 | 9 | sed -i '.original' -e ':a' -e 'N' -e '$!ba' -e 's/Oversight Bd\n/Oversight Bd/g' -e 's/Sciences\n/Sciences/g' combined_old_individual.txt 10 | 11 | echo "splitting combined individual contributions file" 12 | 13 | split -l 4000000 combined_old_individual.txt combined_old_individual_ 14 | 15 | -------------------------------------------------------------------------------- /raw_fec_files.py: -------------------------------------------------------------------------------- 1 | import os 2 | import glob 3 | import fileinput 4 | import sys 5 | import subprocess 6 | 7 | def unzip_committee_files(): 8 | file_list = glob.glob('local_files/fec_committees/*.zip') 9 | for f in file_list: 10 | new_fileprefix = f.split('/')[2][:4] 11 | year = new_fileprefix[2:4] 12 | new_filename = new_fileprefix + '.txt' 13 | subprocess.call(['unzip', f]) 14 | os.rename('cm.txt', 'local_files/fec_committees/' + new_filename) 15 | 16 | def add_cycle_to_files(): 17 | file_list = glob.glob('local_files/fec_committees/*.txt') 18 | for f in file_list: 19 | year = f.split('/')[2][2:4] 20 | if int(year) < 30: 21 | year = '20' + year 22 | else: 23 | year = '19' + year 24 | 25 | for line in fileinput.input([f], inplace=True): 26 | sys.stdout.write('{c}|{l}'.format(c=year, l=line)) 27 | 28 | def combine_committee_files(): 29 | file_list = glob.glob('local_files/fec_committees/*.txt') 30 | with open('local_files/fec_committees/combined_fec_committees.csv', 'w') as outfile: 31 | for fname in file_list: 32 | with open(fname) as infile: 33 | outfile.write(infile.read()) 34 | 35 | # Get table metadata here and add to the combined file: http://www.fec.gov/finance/disclosure/metadata/cm_header_file.csv 36 | # get rid of microsoft quotes: iconv -f cp1252 -t UTF-8 combined_fec_committees.csv > combined_new.csv 37 | # mv combined_new.csv combined_fec_committees.csv 38 | #sed -i '.original' 's/\x0//g' combined_fec_committees.csv 39 | # sed didn't completely work, go into vim and do the following: 40 | #Then search for NUL char with: 41 | #/000 42 | #(where is "control-v" key) 43 | #and I see the NUL chars highlighted. 44 | 45 | # run in2csv -d "|" -v combined_fec_committees.csv > fec_committees.csv 46 | # csvsql --db postgresql:///campaign_finance --table fec_committees --insert filename.csv 47 | -------------------------------------------------------------------------------- /small_money_donors.sql: -------------------------------------------------------------------------------- 1 | SELECT s."CYCLE" 2 | , s."CAND_NAME" 3 | , c."CMTE_NM" 4 | , s."TTL_INDIV_CONTRIB" 5 | 6 | FROM fec_candidate_summary s 7 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE" 8 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH' 9 | ORDER BY s."CYCLE" DESC 10 | 11 | 12 | SELECT s."CYCLE" as cycle 13 | , s."CAND_NAME" as cand_name 14 | , c."CMTE_ID" as cmte_id 15 | , c."CMTE_NM" as cmte_nm 16 | , s."TTL_INDIV_CONTRIB" as total_contributions 17 | , SUM(i."TRANSACTION_AMT") as sum_contributions 18 | , COUNT(i."TRANSACTION_AMT") as total_transactions 19 | , COUNT(CASE WHEN i."MEMO_CD" IS NULL THEN i."SUB_ID" ELSE NULL END) as total_without_code 20 | , SUM(CASE WHEN i."MEMO_CD" IS NULL THEN i."TRANSACTION_AMT" ELSE NULL END) as total_without_code 21 | FROM fec_candidate_summary s 22 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE" 23 | JOIN fec_individual_contributions i ON i."CMTE_ID" = c."CMTE_ID" AND i."CYCLE" = c."CYCLE" 24 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH' 25 | GROUP BY cycle, cand_name, cmte_id, cmte_nm, total_contributions 26 | ORDER BY s."CYCLE" DESC 27 | --------------------------------------------------------------------------------