├── .gitignore
├── README.md
├── combine_files.rb
├── create_open_secrets_db.sql
├── fec_individual_contribution_files.py
├── fec_individual_contribution_files.pyc
├── license.md
├── make_files.sh
├── raw_fec_files.py
└── small_money_donors.sql


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | local_files/
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Get The OpenSecrets Data Into A Postgres Database
  2 | 
  3 | This repo is to help people easily (ok, semi-easily) get all the
  4 | opensecrets campaign contribution data off the internet and into a
  5 | postgres database. It took me a decent amount of time to automate lots
  6 | of the things you would otherwise need to do manually and learn about
  7 | the hard way. Hopefully this saves future researchers lots of time.
  8 | 
  9 | ## The Easy Way
 10 | 
 11 | For those that care only about getting the data, I uploaded a `pg_dump`
 12 | of the database to S3.
 13 | 
 14 | The specific command I used was:
 15 | 
 16 | `pg_dump -Fc campaign_finance > campaign_finance.dump`
 17 | 
 18 | You can download the database from this link:
 19 | 
 20 | [database dump](https://s3.amazonaws.com/campaign_finance/campaign_finance.dump)
 21 | 
 22 | Assuming you have postgres installed, run the following from your
 23 | terminal command line from the folder containing the file you just
 24 | downloaded:
 25 | 
 26 | `pg_restore -C -d campaign_finance campaign_finance.dump`
 27 | 
 28 | This will create a campaign finance database, and load it up with all
 29 | the data you need. The total size of the database without indexes is
 30 | about 5 gigs.
 31 | 
 32 | I'm going to be updating the `create_open_secrets_db_sql` file with
 33 | additional indexes as I find them necessary for my use case. If you want
 34 | to add these indexes to your version of the data, just check at the
 35 | bottom of that file, and copy the index commands after you psql into the
 36 | database.
 37 | 
 38 | ## The Hard Way
 39 | 
 40 | This should work on any OSX or Linux systems, but won't work on
 41 | Windows.
 42 | 
 43 | 
 44 | ### Get The Files
 45 | 
 46 | To start with, you need to go to [Open Secrets](http://www.opensecrets.org), create an account, and download
 47 | all the yearly data files. There are 5 database tables, with files for
 48 | every election season from 1990 through 2014. The five tables are
 49 | committees, candidates, individual contributions, pacs and pac to pacs.
 50 | You will need to download lots of files.
 51 | 
 52 | Take all those files and put them into the same folder, without renaming
 53 | them. Then copy the files from this repo into that folder.
 54 | 
 55 | ### Combining The Yearly Files
 56 | 
 57 | Run the bash script as follows to clean and combine all the files for
 58 | easy upload into postgres.
 59 | 
 60 | `sh make_files.sh`
 61 | 
 62 | ### Getting The Data Into Postgres
 63 | 
 64 | If you don't have postgres, you need to go download it and get it set
 65 | up.
 66 | 
 67 | Log into postgres and create a database named "campaign_finance".
 68 | 
 69 | Then `psql campaign_finance` to enter your new database.
 70 | 
 71 | Open up the `create_open_secrets_db.sql` file from this repo.
 72 | 
 73 | Paste the commands one by one into your terminal. You could probably
 74 | just run the whole file, and you would be ok, but I would recommend
 75 | running the commands one by one, to make sure that there aren't any
 76 | issues.
 77 | 
 78 | The most likely issue you might find is a formatting error somewhere in
 79 | the combined files while you are copying. Postgres will let you know the
 80 | line number of the error, so you can just open up the file in a text
 81 | editor, go directly to that line, and generally fix the formatting
 82 | error. Save the file, and then try the copy command again.
 83 | 
 84 | You will see that the old individual_contributions_combined file is
 85 | split into multiple parts. This is because the file is too large to fit
 86 | into memory on my computer, and had problems being copied into the
 87 | database.
 88 | 
 89 | I'm going to be adding indexes to this file as I use the data, and these
 90 | are optional. They make an already large database larger, so you can
 91 | decide which, if any, of these indexes you want to include for your
 92 | research. I'll sort them by how useful they are to me.
 93 | 
 94 | ### Questions / Problems
 95 | 
 96 | Open up a github issue and I'll do my best to help.
 97 | 
 98 | ### Contributing
 99 | 
100 | 1. Fork it
101 | 2. Create your feature branch (`git checkout -b my-new-feature`)
102 | 3. Commit your changes (`git commit -am 'Add some feature'`)
103 | 4. Push to the branch (`git push origin my-new-feature`)
104 | 5. Create new Pull Request
105 | 
106 | ### Credits
107 | 
108 | ![open_secrets](http://assets.opensecrets.org/MyOS/img/opensecrets_databy250x88.gif)
109 | 


--------------------------------------------------------------------------------
/combine_files.rb:
--------------------------------------------------------------------------------
 1 | def combine_files(new_filename, filename_array)
 2 |   File.open(new_filename + ".txt", "w") do |combined_file|
 3 |     filename_array.each do |f|
 4 |       text = File.open(f, "r").read
 5 |       text.each_line do |line|
 6 |         combined_file << line
 7 |       end
 8 |     end
 9 |   end
10 |   format_file(new_filename)
11 | end
12 | 
13 | def format_file(new_filename)
14 |   format_command = "iconv -f utf-8 -t utf-8 -c #{new_filename}.txt > #{new_filename}_formatted.txt"
15 |   system(format_command)
16 |   system("rm #{new_filename}.txt")
17 |   system("mv -f #{new_filename}_formatted.txt #{new_filename}.txt")
18 | end
19 | 
20 | def fix_file_formatting_errors
21 |   system('sed -i ".original" -e "s/PREDKI, \|STANLEY/PREDKI, STANLEY/g" -e "s/THE \|HILLS/THE HILLS/g" -e "s/\|\|kirby, Rev Dr \|\|jerry/\|kirby, Rev Dr jerry/g" -e "s/MCCORMICK, GARY \|D/MCCORMICK, GARY D/g" -e "s/\|\|palmetto/|\palmetto/g" combined_old_individual.txt')
22 |   system("mv -f combined_old_individual_new.txt combined_old_individual.txt")
23 | end
24 | 
25 | # Combine all the groups except for individual contributions
26 | # because they have two file types
27 | def combine_non_individual
28 |   file_groups = %w{pacs pac_other cmtes cands}
29 | 
30 |   file_groups.each do |g|
31 |     puts "combining #{g}"
32 |     files = Dir.entries('.').select{ |f| f[/^#{g}/] }
33 |     filename = "combined_#{g}"
34 |     combine_files(filename, files)
35 |   end
36 | end
37 | 
38 | def combine_individual
39 |   new_individual_files = ["indivs12.txt", "indivs14.txt"]
40 |   old_individual_files = Dir.entries('.').select{|f| f[/^indivs/]}.select{|f| !new_individual_files.include?(f) }
41 |   old_combined_filename = "combined_old_individual"
42 |   combine_files(old_combined_filename, old_individual_files)
43 |   new_combined_filename = "combined_new_individual"
44 |   combine_files(new_combined_filename, new_individual_files)
45 | end
46 | 
47 | combine_non_individual
48 | combine_individual
49 | fix_file_formatting_errors
50 | 


--------------------------------------------------------------------------------
/create_open_secrets_db.sql:
--------------------------------------------------------------------------------
  1 | DROP TABLE candidates;
  2 | 
  3 | CREATE TABLE candidates(
  4 |   id bigserial primary key,
  5 |   cycle varchar(255),
  6 |   fec_cand_id varchar(255),
  7 |   cid varchar(255),
  8 |   first_last_party varchar(255),
  9 |   party varchar(255),
 10 |   dist_id_run_for varchar(255),
 11 |   dist_id_currently_held varchar(255),
 12 |   current_candidate varchar(255),
 13 |   cycle_candidate varchar(255),
 14 |   crpico varchar(255),
 15 |   recip_code varchar(255),
 16 |   nopacs varchar(255),
 17 |   raised_from_pacs integer,
 18 |   raised_from_individuals integer,
 19 |   raised_total integer,
 20 |   raised_unitemized integer
 21 | );
 22 | 
 23 | COPY candidates(cycle, fec_cand_id, cid, first_last_party, party, dist_id_run_for, dist_id_currently_held, current_candidate, cycle_candidate, crpico, recip_code, nopacs) 
 24 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cands.txt' 
 25 | WITH CSV QUOTE '|' DELIMITER ',';
 26 | 
 27 | DROP TABLE committees;
 28 | 
 29 | CREATE TABLE committees(
 30 |   id bigserial primary key,
 31 |   cycle varchar(255),
 32 |   committee_id varchar(255),
 33 |   pac_short varchar(255),
 34 |   affiliate varchar(255),
 35 |   ultorg varchar(255),
 36 |   recip_id varchar(255),
 37 |   recip_code varchar(255),
 38 |   fec_cand_id varchar(255),
 39 |   party varchar(255),
 40 |   prim_code varchar(255),
 41 |   source varchar(255),
 42 |   sensitive varchar(255),
 43 |   foreign_owned varchar(255),
 44 |   active integer
 45 | );
 46 | 
 47 | COPY committees(cycle, committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned, active)
 48 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cmtes.txt' 
 49 | WITH CSV QUOTE '|' DELIMITER ',';
 50 | 
 51 | DROP TABLE individual_contributions;
 52 | 
 53 | CREATE TABLE individual_contributions(
 54 |   id bigserial primary key,
 55 |   cycle varchar(255),
 56 |   fec_trans_id varchar(255),
 57 |   contributor_id varchar(255),
 58 |   contributor_name varchar(255),
 59 |   recipient_id varchar(255),
 60 |   org_name varchar(255),
 61 |   ult_org varchar(255),
 62 |   real_code varchar(255),
 63 |   date timestamp,
 64 |   amount integer,
 65 |   street varchar(255),
 66 |   city varchar(255),
 67 |   state varchar(255),
 68 |   zip varchar(255),
 69 |   recip_code varchar(255),
 70 |   type varchar(255),
 71 |   committee_id varchar(255),
 72 |   other_id varchar(255),
 73 |   gender varchar(255),
 74 |   old_format_employer_occupation varchar(255),
 75 |   microfilm varchar(255),
 76 |   occupation varchar(255),
 77 |   employer varchar(255),
 78 |   source varchar(255)
 79 | );
 80 | 
 81 | -- The format for the old pre-2012 files has an extra field for the combined employer/occupation field.
 82 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, microfilm, occupation, employer, source)
 83 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_new_individual.txt'
 84 | WITH CSV QUOTE '|' DELIMITER ',';
 85 | 
 86 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
 87 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_aa'
 88 | WITH CSV QUOTE '|' DELIMITER ',';
 89 | 
 90 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
 91 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ab'
 92 | WITH CSV QUOTE '|' DELIMITER ',';
 93 | 
 94 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
 95 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ac'
 96 | WITH CSV QUOTE '|' DELIMITER ',';
 97 | 
 98 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
 99 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ad'
100 | WITH CSV QUOTE '|' DELIMITER ',';
101 | 
102 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
103 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ae'
104 | WITH CSV QUOTE '|' DELIMITER ',';
105 | 
106 | DROP TABLE pacs;
107 | 
108 | CREATE TABLE pacs(
109 |   id bigserial primary key,
110 |   cycle varchar(255),
111 |   fec_rec_no varchar(255),
112 |   pac_id varchar(255),
113 |   cid varchar(255),
114 |   amount integer,
115 |   date timestamp,
116 |   real_code varchar(255),
117 |   type varchar(255),
118 |   di varchar(255),
119 |   fec_cand_id varchar(255)
120 | );
121 | 
122 | COPY pacs(cycle, fec_rec_no, pac_id, cid, amount, date, real_code, type, di, fec_cand_id)
123 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pacs.txt' 
124 | WITH CSV QUOTE '|' DELIMITER ',';
125 | 
126 | DROP TABLE pac_to_pacs;
127 | 
128 | CREATE TABLE pac_to_pacs(
129 |   id bigserial primary key,
130 |   cycle varchar(255),
131 |   fec_rec_no varchar(255),
132 |   filer_id varchar(255),
133 |   donor_committee varchar(255),
134 |   contrib_lend_trans varchar(255),
135 |   city varchar(255),
136 |   state varchar(255),
137 |   zip varchar(255),
138 |   fec_occ_emp varchar(255),
139 |   prim_code varchar(255),
140 |   date timestamp,
141 |   amount decimal,
142 |   recipient_id varchar(255),
143 |   party varchar(255),
144 |   other_id varchar(255),
145 |   recip_code varchar(255),
146 |   recip_prim_code varchar(255),
147 |   amend varchar(255),
148 |   report varchar(255),
149 |   pg varchar(255),
150 |   microfilm varchar(255),
151 |   type varchar(255),
152 |   real_code varchar(255),
153 |   source varchar(255)
154 | );
155 | 
156 | COPY pac_to_pacs(cycle, fec_rec_no, filer_id, donor_committee, contrib_lend_trans, city, state, zip, fec_occ_emp, prim_code, date, amount, recipient_id, party, other_id, recip_code, recip_prim_code, amend, report, pg, microfilm, type, real_code, source )
157 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pac_other.txt'
158 | WITH CSV QUOTE '|' DELIMITER ',';
159 | 
160 | DROP TABLE industry_codes;
161 | 
162 | CREATE TABLE industry_codes(
163 |   id bigserial primary key,
164 |   category_code varchar(255),
165 |   category_name varchar(255),
166 |   industry_code varchar(255),
167 |   industry_name varchar(255),
168 |   sector varchar(255),
169 |   sector_long varchar(255)
170 | );
171 | 
172 | -- Line 443 extra column at end of line needs to be deleted
173 | COPY industry_codes(category_code, category_name, industry_code, industry_name, sector, sector_long)
174 | from '/Users/sik/Downloads/CRP_Categories.txt'
175 | WITH CSV DELIMITER E'\t';
176 | 
177 | 
178 | DROP TABLE politicians;
179 | 
180 | CREATE TABLE politicians(
181 |   id bigserial primary key,
182 |   cid varchar(255),
183 |   name varchar(255)
184 | );
185 | 
186 | INSERT INTO politicians(cid, name)(
187 |   SELECT distinct on(cid) cid, left(first_last_party, -4)
188 |   FROM candidates
189 | );
190 | 
191 | DROP TABLE pac_records;
192 | 
193 | CREATE TABLE pac_records(
194 |   id bigserial primary key,
195 |   committee_id varchar(255),
196 |   pac_short varchar(255),
197 |   affiliate varchar(255),
198 |   ultorg varchar(255),
199 |   recip_id varchar(255),
200 |   recip_code varchar(255),
201 |   fec_cand_id varchar(255),
202 |   party varchar(255),
203 |   prim_code varchar(255),
204 |   source varchar(255),
205 |   sensitive varchar(255),
206 |   foreign_owned varchar(255)
207 | );
208 | 
209 | INSERT INTO pac_records(committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned)(
210 |   SELECT distinct on(committee_id)
211 |     committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned
212 |   FROM committees
213 | );
214 | 
215 | CREATE INDEX ON candidates (cid);
216 | CREATE INDEX ON individual_contributions (recipient_id);
217 | 
218 | CREATE INDEX ON individual_contributions (real_code);
219 | CREATE INDEX ON pacs (cid);
220 | CREATE INDEX ON pacs (real_code);
221 | CREATE INDEX ON industry_codes (category_code);
222 | 
223 | CREATE INDEX ON politicians (cid);
224 | CREATE INDEX ON politicians (name);
225 | 
226 | CREATE INDEX ON individual_contributions (cycle);
227 | CREATE INDEX ON pacs (cycle);
228 | CREATE INDEX ON candidates (cycle);
229 | 
230 | CREATE INDEX ON pac_records (committee_id);
231 | 
232 | -- Denormalize candidates to show total raised for each candidate by election cycle
233 | 
234 | with candidate_individual_contributions as (
235 |   SELECT c.id as candidate_id
236 |   , SUM(i.amount) as total_individual
237 |   FROM candidates c
238 |   JOIN individual_contributions i ON i.recipient_id = c.cid AND i.cycle = c.cycle
239 |   GROUP BY c.id
240 | )
241 | 
242 | UPDATE candidates SET raised_from_individuals = candidate_individual_contributions.total_individual
243 | FROM candidate_individual_contributions
244 | WHERE candidate_individual_contributions.candidate_id = candidates.id;
245 | 
246 | 
247 | with pac_contributions as (
248 |   SELECT c.id as candidate_id
249 |   , SUM(p.amount) as total_pac
250 |   FROM candidates c
251 |   JOIN pacs p ON p.cid = c.cid AND p.cycle = c.cycle
252 |   WHERE p.type != '24A' AND p.type != '24N'
253 |   GROUP BY c.id
254 | )
255 | 
256 | UPDATE candidates SET raised_from_pacs = pac_contributions.total_pac
257 | FROM pac_contributions
258 | WHERE pac_contributions.candidate_id = candidates.id;
259 | 
260 | with cycle_unitemized as (
261 |   SELECT f.cid as cid
262 |   , f.cycle as cycle
263 |   , SUM(f.individual_unitemized_contributions) as total_unitemized
264 |   FROM fec_api_committees f
265 |   WHERE f.designation = 'P'
266 |   AND f.individual_unitemized_contributions IS NOT NULL
267 |   GROUP BY f.cid, f.cycle
268 | )
269 | 
270 | UPDATE candidates SET raised_unitemized = cu.total_unitemized
271 | FROM cycle_unitemized cu
272 | WHERE candidates.cycle = cu.cycle
273 | AND candidates.cid = cu.cid;
274 | 
275 | UPDATE candidates SET raised_total = COALESCE(raised_from_individuals, 0) + COALESCE(raised_from_pacs, 0) + COALESCE(raised_unitemized, 0);
276 | 
277 | 
278 | -- Add backers table, for the kickstarter backers (and other backers)
279 | 
280 | DROP TABLE backers;
281 | 
282 | CREATE TABLE backers(
283 |   id bigserial primary key,
284 |   cid varchar(255),
285 |   name varchar(255),
286 |   backer_level varchar(255),
287 |   kickstarter boolean
288 | );
289 | 
290 | DROP TABLE user_submissions;
291 | 
292 | CREATE TABLE user_submissions(
293 |   id bigserial primary key,
294 |   title text,
295 |   description text,
296 |   user_name varchar(255),
297 |   url varchar(255),
298 |   cid varchar(255),
299 |   approved_at timestamp,
300 |   approved_by varchar(255),
301 |   promote boolean
302 | );
303 | 
304 | DROP TABLE fec_api_committees;
305 | 
306 | CREATE TABLE fec_api_committees(
307 |   id bigserial primary key,
308 |   cid varchar(255),
309 |   created_at timestamp,
310 |   updated_at timestamp,
311 |   cycle varchar(255),
312 |   individual_unitemized_contributions integer,
313 |   individual_itemized_contributions integer,
314 |   individual_contributions integer,
315 |   designation varchar(255),
316 |   organization_type varchar(255),
317 |   name text,
318 |   committee_id varchar(255),
319 |   committee_type varchar(255)
320 | );
321 | 
322 | CREATE UNIQUE INDEX unique_committee_and_cycle ON fec_api_committees (committee_id, cycle);
323 | CREATE INDEX ON fec_api_committees (cid);
324 | 


--------------------------------------------------------------------------------
/fec_individual_contribution_files.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import glob
  3 | import fileinput
  4 | import sys
  5 | import subprocess
  6 | from IPython.core.debugger import Tracer
  7 | 
  8 | 
  9 | class OrganizeFiles():
 10 |     def __init__(self, folder, unzip_name, new_prefix, header_file=None, unzip_year=None):
 11 |         self.folder = folder
 12 |         self.unzip_name = unzip_name
 13 |         self.new_prefix = new_prefix
 14 |         self.unzip_year = unzip_year
 15 |         self.header_file = header_file
 16 | 
 17 |     def unzip_committee_files(self):
 18 |         file_list = glob.glob('local_files/{}/*.zip'.format(self.folder))
 19 |         #file_list = glob.glob('local_files/{}/indiv90.zip'.format(self.folder))
 20 | 
 21 |         for f in file_list:
 22 |             year = f.split('.')[0][-2:]
 23 |             new_filename = self.new_prefix + year + '.txt'
 24 |             subprocess.call(['unzip', f])
 25 |             print 'hi'
 26 |             if self.unzip_year is None:
 27 |                 current_name = self.unzip_name + '.txt'
 28 |             else:
 29 |                 print self.unzip_name + year + '.txt'
 30 |                 current_name = self.unzip_name + year + '.txt'
 31 |             print current_name
 32 |             os.rename(current_name, 'local_files/{}/'.format(self.folder) + new_filename)
 33 | 
 34 |     def header_string(self):
 35 |         header_path = 'local_files/{}/{}'.format(self.folder, self.header_file)
 36 |         with open(header_path, 'r') as f:
 37 |             first_line = f.readline()
 38 | 
 39 |         return first_line.replace(',','|')
 40 | 
 41 |     def fixed_format_line(self, line):
 42 |         # remove microsoft encoding
 43 |         decoded = line.decode('cp1252').encode('utf8')
 44 | 
 45 |         # remove apostraphes because csvkit seems to have problems with them:
 46 |         return decoded.replace('\'','')
 47 | 
 48 | 
 49 |     def add_cycle_to_files(self):
 50 |         file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
 51 |         #file_list = glob.glob('local_files/{}/individual_contributions90.txt'.format(self.folder))
 52 |         for f in file_list:
 53 |             year = f.split('.')[0][-2:]
 54 |             if int(year) < 30:
 55 |                 year = '20' + year
 56 |             else:
 57 |                 year = '19' + year
 58 | 
 59 |             for line in fileinput.input([f], inplace=True):
 60 |                 fixed_line = self.fixed_format_line(line)
 61 |                 if fileinput.isfirstline():
 62 |                     sys.stdout.write('CYCLE|{l}'.format(l=self.header_string()))
 63 |                     sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line))
 64 |                 else:
 65 |                     sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line))
 66 | 
 67 |     def combine_converted_files(self):
 68 |         full_file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
 69 |         file_list = [f for f in full_file_list if "converted" in f]
 70 |         print file_list
 71 |         combined_filename = 'local_files/{}/combined_{}.csv'.format(self.folder, self.new_prefix)
 72 |         with open(combined_filename, 'w') as outfile:
 73 |             for fname in file_list:
 74 |                 with open(fname) as infile:
 75 |                     for line in infile:
 76 |                         outfile.write(line)
 77 | 
 78 |     def import_files_to_postgres(self):
 79 |         full_file_list = glob.glob('local_files/{}/*.csv'.format(self.folder))
 80 |         file_list = [f for f in full_file_list if "converted" in f]
 81 | 
 82 |         for f in file_list:
 83 |             print f
 84 |             command = 'psql -c "COPY fec_individual_contributions FROM \'/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/{}\' with DELIMITER \',\' CSV HEADER" campaign_finance'.format(f)
 85 |             subprocess.call(command, shell=True)
 86 | 
 87 |     def check_format_and_convert(self):
 88 |         file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
 89 |         for f in file_list:
 90 |             print f
 91 |             year = f.split('.')[0][-2:]
 92 |             command = 'in2csv -d "|" -f csv -v local_files/{}/{}{}.txt > local_files/{}/converted{}.csv'\
 93 |                     .format(self.folder, self.new_prefix, year, self.folder, year)
 94 |             subprocess.call(command, shell=True)
 95 | 
 96 | 
 97 |     def run(self):
 98 |         print 'hik'
 99 |         #self.unzip_committee_files()
100 |         #self.add_cycle_to_files()
101 |         #self.check_format_and_convert()
102 |         self.import_files_to_postgres()
103 | 
104 | 
105 | if __name__ == "__main__":
106 |     #a = OrganizeFiles(folder='fec_candidate_summary_files', unzip_name='weball', new_prefix='candidate_summary', unzip_year=True)
107 |     a = OrganizeFiles(folder='fec_individual_contributions', unzip_name='itcont', new_prefix='individual_contributions', header_file='indiv_header_file.csv')
108 |     a.run()
109 | 
110 |     # get the header file, and then run head -n 200 >> header_file.csv
111 |     # Create the table for the new table, but don't insert
112 |     # csvsql --db {db here} --no-constraints header_file.csv
113 |     # sed -i '' 's/_unnamed//g' combined_individual_contributions.csv
114 |     # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/combined_individual_contributions.csv' DELIMITER ',' CSV;
115 |     # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/.csv' DELIMITER ',' CSV HEADER;
116 | 


--------------------------------------------------------------------------------
/fec_individual_contribution_files.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Solomon/opensecrets_to_postgres/57cc69c9c6a0d4718befda9f1181d35164e43d81/fec_individual_contribution_files.pyc


--------------------------------------------------------------------------------
/license.md:
--------------------------------------------------------------------------------
1 | <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License</a>
2 | 


--------------------------------------------------------------------------------
/make_files.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | echo "Combining Files"
 4 | 
 5 | ruby combine_files.rb
 6 | 
 7 | echo "Fixing Spacing Issues"
 8 | 
 9 | sed -i '.original' -e ':a' -e 'N' -e '$!ba' -e 's/Oversight Bd\n/Oversight Bd/g' -e 's/Sciences\n/Sciences/g' combined_old_individual.txt
10 | 
11 | echo "splitting combined individual contributions file"
12 | 
13 | split -l 4000000 combined_old_individual.txt combined_old_individual_
14 | 
15 | 


--------------------------------------------------------------------------------
/raw_fec_files.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import glob
 3 | import fileinput
 4 | import sys
 5 | import subprocess
 6 | 
 7 | def unzip_committee_files():
 8 |     file_list = glob.glob('local_files/fec_committees/*.zip')
 9 |     for f in file_list:
10 |         new_fileprefix = f.split('/')[2][:4]
11 |         year = new_fileprefix[2:4]
12 |         new_filename = new_fileprefix + '.txt'
13 |         subprocess.call(['unzip', f])
14 |         os.rename('cm.txt', 'local_files/fec_committees/' + new_filename)
15 | 
16 | def add_cycle_to_files():
17 |     file_list = glob.glob('local_files/fec_committees/*.txt')
18 |     for f in file_list:
19 |         year = f.split('/')[2][2:4]
20 |         if int(year) < 30:
21 |             year = '20' + year
22 |         else:
23 |             year = '19' + year
24 | 
25 |         for line in fileinput.input([f], inplace=True):
26 |             sys.stdout.write('{c}|{l}'.format(c=year, l=line))
27 | 
28 | def combine_committee_files():
29 |     file_list = glob.glob('local_files/fec_committees/*.txt')
30 |     with open('local_files/fec_committees/combined_fec_committees.csv', 'w') as outfile:
31 |         for fname in file_list:
32 |             with open(fname) as infile:
33 |                 outfile.write(infile.read())
34 | 
35 | # Get table metadata here and add to the combined file: http://www.fec.gov/finance/disclosure/metadata/cm_header_file.csv
36 | # get rid of microsoft quotes: iconv -f cp1252 -t UTF-8 combined_fec_committees.csv > combined_new.csv
37 | # mv combined_new.csv combined_fec_committees.csv
38 | #sed -i '.original' 's/\x0//g' combined_fec_committees.csv
39 | # sed didn't completely work, go into vim and do the following:
40 | #Then search for NUL char with: 
41 | #/<c-v>000<Enter> 
42 | #(where <c-v> is "control-v" key) 
43 | #and I see the NUL chars highlighted. 
44 | 
45 | # run in2csv -d "|" -v combined_fec_committees.csv > fec_committees.csv
46 | # csvsql --db postgresql:///campaign_finance --table fec_committees --insert filename.csv
47 | 


--------------------------------------------------------------------------------
/small_money_donors.sql:
--------------------------------------------------------------------------------
 1 | SELECT s."CYCLE"
 2 | , s."CAND_NAME"
 3 | , c."CMTE_NM"
 4 | , s."TTL_INDIV_CONTRIB"
 5 | 
 6 | FROM fec_candidate_summary s
 7 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE"
 8 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH'
 9 | ORDER BY s."CYCLE" DESC
10 | 
11 | 
12 | SELECT s."CYCLE" as cycle
13 | , s."CAND_NAME" as cand_name
14 | , c."CMTE_ID" as cmte_id
15 | , c."CMTE_NM" as cmte_nm
16 | , s."TTL_INDIV_CONTRIB" as total_contributions
17 | , SUM(i."TRANSACTION_AMT") as sum_contributions
18 | , COUNT(i."TRANSACTION_AMT") as total_transactions
19 | , COUNT(CASE WHEN i."MEMO_CD" IS NULL THEN i."SUB_ID" ELSE NULL END) as total_without_code
20 | , SUM(CASE WHEN i."MEMO_CD" IS NULL THEN i."TRANSACTION_AMT" ELSE NULL END) as total_without_code
21 | FROM fec_candidate_summary s
22 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE"
23 | JOIN fec_individual_contributions i ON i."CMTE_ID" = c."CMTE_ID" AND i."CYCLE" = c."CYCLE"
24 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH'
25 | GROUP BY cycle, cand_name, cmte_id, cmte_nm, total_contributions
26 | ORDER BY s."CYCLE" DESC
27 | 


--------------------------------------------------------------------------------