├── .gitignore
├── README.md
├── combine_files.rb
├── create_open_secrets_db.sql
├── fec_individual_contribution_files.py
├── fec_individual_contribution_files.pyc
├── license.md
├── make_files.sh
├── raw_fec_files.py
└── small_money_donors.sql
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | local_files/
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## Get The OpenSecrets Data Into A Postgres Database
2 |
3 | This repo is to help people easily (ok, semi-easily) get all the
4 | opensecrets campaign contribution data off the internet and into a
5 | postgres database. It took me a decent amount of time to automate lots
6 | of the things you would otherwise need to do manually and learn about
7 | the hard way. Hopefully this saves future researchers lots of time.
8 |
9 | ## The Easy Way
10 |
11 | For those that care only about getting the data, I uploaded a `pg_dump`
12 | of the database to S3.
13 |
14 | The specific command I used was:
15 |
16 | `pg_dump -Fc campaign_finance > campaign_finance.dump`
17 |
18 | You can download the database from this link:
19 |
20 | [database dump](https://s3.amazonaws.com/campaign_finance/campaign_finance.dump)
21 |
22 | Assuming you have postgres installed, run the following from your
23 | terminal command line from the folder containing the file you just
24 | downloaded:
25 |
26 | `pg_restore -C -d campaign_finance campaign_finance.dump`
27 |
28 | This will create a campaign finance database, and load it up with all
29 | the data you need. The total size of the database without indexes is
30 | about 5 gigs.
31 |
32 | I'm going to be updating the `create_open_secrets_db_sql` file with
33 | additional indexes as I find them necessary for my use case. If you want
34 | to add these indexes to your version of the data, just check at the
35 | bottom of that file, and copy the index commands after you psql into the
36 | database.
37 |
38 | ## The Hard Way
39 |
40 | This should work on any OSX or Linux systems, but won't work on
41 | Windows.
42 |
43 |
44 | ### Get The Files
45 |
46 | To start with, you need to go to [Open Secrets](http://www.opensecrets.org), create an account, and download
47 | all the yearly data files. There are 5 database tables, with files for
48 | every election season from 1990 through 2014. The five tables are
49 | committees, candidates, individual contributions, pacs and pac to pacs.
50 | You will need to download lots of files.
51 |
52 | Take all those files and put them into the same folder, without renaming
53 | them. Then copy the files from this repo into that folder.
54 |
55 | ### Combining The Yearly Files
56 |
57 | Run the bash script as follows to clean and combine all the files for
58 | easy upload into postgres.
59 |
60 | `sh make_files.sh`
61 |
62 | ### Getting The Data Into Postgres
63 |
64 | If you don't have postgres, you need to go download it and get it set
65 | up.
66 |
67 | Log into postgres and create a database named "campaign_finance".
68 |
69 | Then `psql campaign_finance` to enter your new database.
70 |
71 | Open up the `create_open_secrets_db.sql` file from this repo.
72 |
73 | Paste the commands one by one into your terminal. You could probably
74 | just run the whole file, and you would be ok, but I would recommend
75 | running the commands one by one, to make sure that there aren't any
76 | issues.
77 |
78 | The most likely issue you might find is a formatting error somewhere in
79 | the combined files while you are copying. Postgres will let you know the
80 | line number of the error, so you can just open up the file in a text
81 | editor, go directly to that line, and generally fix the formatting
82 | error. Save the file, and then try the copy command again.
83 |
84 | You will see that the old individual_contributions_combined file is
85 | split into multiple parts. This is because the file is too large to fit
86 | into memory on my computer, and had problems being copied into the
87 | database.
88 |
89 | I'm going to be adding indexes to this file as I use the data, and these
90 | are optional. They make an already large database larger, so you can
91 | decide which, if any, of these indexes you want to include for your
92 | research. I'll sort them by how useful they are to me.
93 |
94 | ### Questions / Problems
95 |
96 | Open up a github issue and I'll do my best to help.
97 |
98 | ### Contributing
99 |
100 | 1. Fork it
101 | 2. Create your feature branch (`git checkout -b my-new-feature`)
102 | 3. Commit your changes (`git commit -am 'Add some feature'`)
103 | 4. Push to the branch (`git push origin my-new-feature`)
104 | 5. Create new Pull Request
105 |
106 | ### Credits
107 |
108 | 
109 |
--------------------------------------------------------------------------------
/combine_files.rb:
--------------------------------------------------------------------------------
1 | def combine_files(new_filename, filename_array)
2 | File.open(new_filename + ".txt", "w") do |combined_file|
3 | filename_array.each do |f|
4 | text = File.open(f, "r").read
5 | text.each_line do |line|
6 | combined_file << line
7 | end
8 | end
9 | end
10 | format_file(new_filename)
11 | end
12 |
13 | def format_file(new_filename)
14 | format_command = "iconv -f utf-8 -t utf-8 -c #{new_filename}.txt > #{new_filename}_formatted.txt"
15 | system(format_command)
16 | system("rm #{new_filename}.txt")
17 | system("mv -f #{new_filename}_formatted.txt #{new_filename}.txt")
18 | end
19 |
20 | def fix_file_formatting_errors
21 | system('sed -i ".original" -e "s/PREDKI, \|STANLEY/PREDKI, STANLEY/g" -e "s/THE \|HILLS/THE HILLS/g" -e "s/\|\|kirby, Rev Dr \|\|jerry/\|kirby, Rev Dr jerry/g" -e "s/MCCORMICK, GARY \|D/MCCORMICK, GARY D/g" -e "s/\|\|palmetto/|\palmetto/g" combined_old_individual.txt')
22 | system("mv -f combined_old_individual_new.txt combined_old_individual.txt")
23 | end
24 |
25 | # Combine all the groups except for individual contributions
26 | # because they have two file types
27 | def combine_non_individual
28 | file_groups = %w{pacs pac_other cmtes cands}
29 |
30 | file_groups.each do |g|
31 | puts "combining #{g}"
32 | files = Dir.entries('.').select{ |f| f[/^#{g}/] }
33 | filename = "combined_#{g}"
34 | combine_files(filename, files)
35 | end
36 | end
37 |
38 | def combine_individual
39 | new_individual_files = ["indivs12.txt", "indivs14.txt"]
40 | old_individual_files = Dir.entries('.').select{|f| f[/^indivs/]}.select{|f| !new_individual_files.include?(f) }
41 | old_combined_filename = "combined_old_individual"
42 | combine_files(old_combined_filename, old_individual_files)
43 | new_combined_filename = "combined_new_individual"
44 | combine_files(new_combined_filename, new_individual_files)
45 | end
46 |
47 | combine_non_individual
48 | combine_individual
49 | fix_file_formatting_errors
50 |
--------------------------------------------------------------------------------
/create_open_secrets_db.sql:
--------------------------------------------------------------------------------
1 | DROP TABLE candidates;
2 |
3 | CREATE TABLE candidates(
4 | id bigserial primary key,
5 | cycle varchar(255),
6 | fec_cand_id varchar(255),
7 | cid varchar(255),
8 | first_last_party varchar(255),
9 | party varchar(255),
10 | dist_id_run_for varchar(255),
11 | dist_id_currently_held varchar(255),
12 | current_candidate varchar(255),
13 | cycle_candidate varchar(255),
14 | crpico varchar(255),
15 | recip_code varchar(255),
16 | nopacs varchar(255),
17 | raised_from_pacs integer,
18 | raised_from_individuals integer,
19 | raised_total integer,
20 | raised_unitemized integer
21 | );
22 |
23 | COPY candidates(cycle, fec_cand_id, cid, first_last_party, party, dist_id_run_for, dist_id_currently_held, current_candidate, cycle_candidate, crpico, recip_code, nopacs)
24 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cands.txt'
25 | WITH CSV QUOTE '|' DELIMITER ',';
26 |
27 | DROP TABLE committees;
28 |
29 | CREATE TABLE committees(
30 | id bigserial primary key,
31 | cycle varchar(255),
32 | committee_id varchar(255),
33 | pac_short varchar(255),
34 | affiliate varchar(255),
35 | ultorg varchar(255),
36 | recip_id varchar(255),
37 | recip_code varchar(255),
38 | fec_cand_id varchar(255),
39 | party varchar(255),
40 | prim_code varchar(255),
41 | source varchar(255),
42 | sensitive varchar(255),
43 | foreign_owned varchar(255),
44 | active integer
45 | );
46 |
47 | COPY committees(cycle, committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned, active)
48 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_cmtes.txt'
49 | WITH CSV QUOTE '|' DELIMITER ',';
50 |
51 | DROP TABLE individual_contributions;
52 |
53 | CREATE TABLE individual_contributions(
54 | id bigserial primary key,
55 | cycle varchar(255),
56 | fec_trans_id varchar(255),
57 | contributor_id varchar(255),
58 | contributor_name varchar(255),
59 | recipient_id varchar(255),
60 | org_name varchar(255),
61 | ult_org varchar(255),
62 | real_code varchar(255),
63 | date timestamp,
64 | amount integer,
65 | street varchar(255),
66 | city varchar(255),
67 | state varchar(255),
68 | zip varchar(255),
69 | recip_code varchar(255),
70 | type varchar(255),
71 | committee_id varchar(255),
72 | other_id varchar(255),
73 | gender varchar(255),
74 | old_format_employer_occupation varchar(255),
75 | microfilm varchar(255),
76 | occupation varchar(255),
77 | employer varchar(255),
78 | source varchar(255)
79 | );
80 |
81 | -- The format for the old pre-2012 files has an extra field for the combined employer/occupation field.
82 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, microfilm, occupation, employer, source)
83 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_new_individual.txt'
84 | WITH CSV QUOTE '|' DELIMITER ',';
85 |
86 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
87 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_aa'
88 | WITH CSV QUOTE '|' DELIMITER ',';
89 |
90 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
91 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ab'
92 | WITH CSV QUOTE '|' DELIMITER ',';
93 |
94 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
95 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ac'
96 | WITH CSV QUOTE '|' DELIMITER ',';
97 |
98 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
99 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ad'
100 | WITH CSV QUOTE '|' DELIMITER ',';
101 |
102 | COPY individual_contributions(cycle, fec_trans_id, contributor_id, contributor_name, recipient_id, org_name, ult_org, real_code, date, amount, street, city, state, zip, recip_code, type, committee_id, other_id, gender, old_format_employer_occupation, microfilm, occupation, employer, source)
103 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_old_individual_ae'
104 | WITH CSV QUOTE '|' DELIMITER ',';
105 |
106 | DROP TABLE pacs;
107 |
108 | CREATE TABLE pacs(
109 | id bigserial primary key,
110 | cycle varchar(255),
111 | fec_rec_no varchar(255),
112 | pac_id varchar(255),
113 | cid varchar(255),
114 | amount integer,
115 | date timestamp,
116 | real_code varchar(255),
117 | type varchar(255),
118 | di varchar(255),
119 | fec_cand_id varchar(255)
120 | );
121 |
122 | COPY pacs(cycle, fec_rec_no, pac_id, cid, amount, date, real_code, type, di, fec_cand_id)
123 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pacs.txt'
124 | WITH CSV QUOTE '|' DELIMITER ',';
125 |
126 | DROP TABLE pac_to_pacs;
127 |
128 | CREATE TABLE pac_to_pacs(
129 | id bigserial primary key,
130 | cycle varchar(255),
131 | fec_rec_no varchar(255),
132 | filer_id varchar(255),
133 | donor_committee varchar(255),
134 | contrib_lend_trans varchar(255),
135 | city varchar(255),
136 | state varchar(255),
137 | zip varchar(255),
138 | fec_occ_emp varchar(255),
139 | prim_code varchar(255),
140 | date timestamp,
141 | amount decimal,
142 | recipient_id varchar(255),
143 | party varchar(255),
144 | other_id varchar(255),
145 | recip_code varchar(255),
146 | recip_prim_code varchar(255),
147 | amend varchar(255),
148 | report varchar(255),
149 | pg varchar(255),
150 | microfilm varchar(255),
151 | type varchar(255),
152 | real_code varchar(255),
153 | source varchar(255)
154 | );
155 |
156 | COPY pac_to_pacs(cycle, fec_rec_no, filer_id, donor_committee, contrib_lend_trans, city, state, zip, fec_occ_emp, prim_code, date, amount, recipient_id, party, other_id, recip_code, recip_prim_code, amend, report, pg, microfilm, type, real_code, source )
157 | from '/Users/sik/Solomon/programming/campaign_finance_projects/campaign_finance_code/combined_pac_other.txt'
158 | WITH CSV QUOTE '|' DELIMITER ',';
159 |
160 | DROP TABLE industry_codes;
161 |
162 | CREATE TABLE industry_codes(
163 | id bigserial primary key,
164 | category_code varchar(255),
165 | category_name varchar(255),
166 | industry_code varchar(255),
167 | industry_name varchar(255),
168 | sector varchar(255),
169 | sector_long varchar(255)
170 | );
171 |
172 | -- Line 443 extra column at end of line needs to be deleted
173 | COPY industry_codes(category_code, category_name, industry_code, industry_name, sector, sector_long)
174 | from '/Users/sik/Downloads/CRP_Categories.txt'
175 | WITH CSV DELIMITER E'\t';
176 |
177 |
178 | DROP TABLE politicians;
179 |
180 | CREATE TABLE politicians(
181 | id bigserial primary key,
182 | cid varchar(255),
183 | name varchar(255)
184 | );
185 |
186 | INSERT INTO politicians(cid, name)(
187 | SELECT distinct on(cid) cid, left(first_last_party, -4)
188 | FROM candidates
189 | );
190 |
191 | DROP TABLE pac_records;
192 |
193 | CREATE TABLE pac_records(
194 | id bigserial primary key,
195 | committee_id varchar(255),
196 | pac_short varchar(255),
197 | affiliate varchar(255),
198 | ultorg varchar(255),
199 | recip_id varchar(255),
200 | recip_code varchar(255),
201 | fec_cand_id varchar(255),
202 | party varchar(255),
203 | prim_code varchar(255),
204 | source varchar(255),
205 | sensitive varchar(255),
206 | foreign_owned varchar(255)
207 | );
208 |
209 | INSERT INTO pac_records(committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned)(
210 | SELECT distinct on(committee_id)
211 | committee_id, pac_short, affiliate, ultorg, recip_id, recip_code, fec_cand_id, party, prim_code, source, sensitive, foreign_owned
212 | FROM committees
213 | );
214 |
215 | CREATE INDEX ON candidates (cid);
216 | CREATE INDEX ON individual_contributions (recipient_id);
217 |
218 | CREATE INDEX ON individual_contributions (real_code);
219 | CREATE INDEX ON pacs (cid);
220 | CREATE INDEX ON pacs (real_code);
221 | CREATE INDEX ON industry_codes (category_code);
222 |
223 | CREATE INDEX ON politicians (cid);
224 | CREATE INDEX ON politicians (name);
225 |
226 | CREATE INDEX ON individual_contributions (cycle);
227 | CREATE INDEX ON pacs (cycle);
228 | CREATE INDEX ON candidates (cycle);
229 |
230 | CREATE INDEX ON pac_records (committee_id);
231 |
232 | -- Denormalize candidates to show total raised for each candidate by election cycle
233 |
234 | with candidate_individual_contributions as (
235 | SELECT c.id as candidate_id
236 | , SUM(i.amount) as total_individual
237 | FROM candidates c
238 | JOIN individual_contributions i ON i.recipient_id = c.cid AND i.cycle = c.cycle
239 | GROUP BY c.id
240 | )
241 |
242 | UPDATE candidates SET raised_from_individuals = candidate_individual_contributions.total_individual
243 | FROM candidate_individual_contributions
244 | WHERE candidate_individual_contributions.candidate_id = candidates.id;
245 |
246 |
247 | with pac_contributions as (
248 | SELECT c.id as candidate_id
249 | , SUM(p.amount) as total_pac
250 | FROM candidates c
251 | JOIN pacs p ON p.cid = c.cid AND p.cycle = c.cycle
252 | WHERE p.type != '24A' AND p.type != '24N'
253 | GROUP BY c.id
254 | )
255 |
256 | UPDATE candidates SET raised_from_pacs = pac_contributions.total_pac
257 | FROM pac_contributions
258 | WHERE pac_contributions.candidate_id = candidates.id;
259 |
260 | with cycle_unitemized as (
261 | SELECT f.cid as cid
262 | , f.cycle as cycle
263 | , SUM(f.individual_unitemized_contributions) as total_unitemized
264 | FROM fec_api_committees f
265 | WHERE f.designation = 'P'
266 | AND f.individual_unitemized_contributions IS NOT NULL
267 | GROUP BY f.cid, f.cycle
268 | )
269 |
270 | UPDATE candidates SET raised_unitemized = cu.total_unitemized
271 | FROM cycle_unitemized cu
272 | WHERE candidates.cycle = cu.cycle
273 | AND candidates.cid = cu.cid;
274 |
275 | UPDATE candidates SET raised_total = COALESCE(raised_from_individuals, 0) + COALESCE(raised_from_pacs, 0) + COALESCE(raised_unitemized, 0);
276 |
277 |
278 | -- Add backers table, for the kickstarter backers (and other backers)
279 |
280 | DROP TABLE backers;
281 |
282 | CREATE TABLE backers(
283 | id bigserial primary key,
284 | cid varchar(255),
285 | name varchar(255),
286 | backer_level varchar(255),
287 | kickstarter boolean
288 | );
289 |
290 | DROP TABLE user_submissions;
291 |
292 | CREATE TABLE user_submissions(
293 | id bigserial primary key,
294 | title text,
295 | description text,
296 | user_name varchar(255),
297 | url varchar(255),
298 | cid varchar(255),
299 | approved_at timestamp,
300 | approved_by varchar(255),
301 | promote boolean
302 | );
303 |
304 | DROP TABLE fec_api_committees;
305 |
306 | CREATE TABLE fec_api_committees(
307 | id bigserial primary key,
308 | cid varchar(255),
309 | created_at timestamp,
310 | updated_at timestamp,
311 | cycle varchar(255),
312 | individual_unitemized_contributions integer,
313 | individual_itemized_contributions integer,
314 | individual_contributions integer,
315 | designation varchar(255),
316 | organization_type varchar(255),
317 | name text,
318 | committee_id varchar(255),
319 | committee_type varchar(255)
320 | );
321 |
322 | CREATE UNIQUE INDEX unique_committee_and_cycle ON fec_api_committees (committee_id, cycle);
323 | CREATE INDEX ON fec_api_committees (cid);
324 |
--------------------------------------------------------------------------------
/fec_individual_contribution_files.py:
--------------------------------------------------------------------------------
1 | import os
2 | import glob
3 | import fileinput
4 | import sys
5 | import subprocess
6 | from IPython.core.debugger import Tracer
7 |
8 |
9 | class OrganizeFiles():
10 | def __init__(self, folder, unzip_name, new_prefix, header_file=None, unzip_year=None):
11 | self.folder = folder
12 | self.unzip_name = unzip_name
13 | self.new_prefix = new_prefix
14 | self.unzip_year = unzip_year
15 | self.header_file = header_file
16 |
17 | def unzip_committee_files(self):
18 | file_list = glob.glob('local_files/{}/*.zip'.format(self.folder))
19 | #file_list = glob.glob('local_files/{}/indiv90.zip'.format(self.folder))
20 |
21 | for f in file_list:
22 | year = f.split('.')[0][-2:]
23 | new_filename = self.new_prefix + year + '.txt'
24 | subprocess.call(['unzip', f])
25 | print 'hi'
26 | if self.unzip_year is None:
27 | current_name = self.unzip_name + '.txt'
28 | else:
29 | print self.unzip_name + year + '.txt'
30 | current_name = self.unzip_name + year + '.txt'
31 | print current_name
32 | os.rename(current_name, 'local_files/{}/'.format(self.folder) + new_filename)
33 |
34 | def header_string(self):
35 | header_path = 'local_files/{}/{}'.format(self.folder, self.header_file)
36 | with open(header_path, 'r') as f:
37 | first_line = f.readline()
38 |
39 | return first_line.replace(',','|')
40 |
41 | def fixed_format_line(self, line):
42 | # remove microsoft encoding
43 | decoded = line.decode('cp1252').encode('utf8')
44 |
45 | # remove apostraphes because csvkit seems to have problems with them:
46 | return decoded.replace('\'','')
47 |
48 |
49 | def add_cycle_to_files(self):
50 | file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
51 | #file_list = glob.glob('local_files/{}/individual_contributions90.txt'.format(self.folder))
52 | for f in file_list:
53 | year = f.split('.')[0][-2:]
54 | if int(year) < 30:
55 | year = '20' + year
56 | else:
57 | year = '19' + year
58 |
59 | for line in fileinput.input([f], inplace=True):
60 | fixed_line = self.fixed_format_line(line)
61 | if fileinput.isfirstline():
62 | sys.stdout.write('CYCLE|{l}'.format(l=self.header_string()))
63 | sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line))
64 | else:
65 | sys.stdout.write('{c}|{l}'.format(c=year, l=fixed_line))
66 |
67 | def combine_converted_files(self):
68 | full_file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
69 | file_list = [f for f in full_file_list if "converted" in f]
70 | print file_list
71 | combined_filename = 'local_files/{}/combined_{}.csv'.format(self.folder, self.new_prefix)
72 | with open(combined_filename, 'w') as outfile:
73 | for fname in file_list:
74 | with open(fname) as infile:
75 | for line in infile:
76 | outfile.write(line)
77 |
78 | def import_files_to_postgres(self):
79 | full_file_list = glob.glob('local_files/{}/*.csv'.format(self.folder))
80 | file_list = [f for f in full_file_list if "converted" in f]
81 |
82 | for f in file_list:
83 | print f
84 | command = 'psql -c "COPY fec_individual_contributions FROM \'/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/{}\' with DELIMITER \',\' CSV HEADER" campaign_finance'.format(f)
85 | subprocess.call(command, shell=True)
86 |
87 | def check_format_and_convert(self):
88 | file_list = glob.glob('local_files/{}/*.txt'.format(self.folder))
89 | for f in file_list:
90 | print f
91 | year = f.split('.')[0][-2:]
92 | command = 'in2csv -d "|" -f csv -v local_files/{}/{}{}.txt > local_files/{}/converted{}.csv'\
93 | .format(self.folder, self.new_prefix, year, self.folder, year)
94 | subprocess.call(command, shell=True)
95 |
96 |
97 | def run(self):
98 | print 'hik'
99 | #self.unzip_committee_files()
100 | #self.add_cycle_to_files()
101 | #self.check_format_and_convert()
102 | self.import_files_to_postgres()
103 |
104 |
105 | if __name__ == "__main__":
106 | #a = OrganizeFiles(folder='fec_candidate_summary_files', unzip_name='weball', new_prefix='candidate_summary', unzip_year=True)
107 | a = OrganizeFiles(folder='fec_individual_contributions', unzip_name='itcont', new_prefix='individual_contributions', header_file='indiv_header_file.csv')
108 | a.run()
109 |
110 | # get the header file, and then run head -n 200 >> header_file.csv
111 | # Create the table for the new table, but don't insert
112 | # csvsql --db {db here} --no-constraints header_file.csv
113 | # sed -i '' 's/_unnamed//g' combined_individual_contributions.csv
114 | # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/combined_individual_contributions.csv' DELIMITER ',' CSV;
115 | # COPY fec_individual_contributions FROM '/Users/sik/Solomon/programming/campaign_finance_projects/opensecrets_to_postgres/local_files/fec_individual_contributions/.csv' DELIMITER ',' CSV HEADER;
116 |
--------------------------------------------------------------------------------
/fec_individual_contribution_files.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Solomon/opensecrets_to_postgres/57cc69c9c6a0d4718befda9f1181d35164e43d81/fec_individual_contribution_files.pyc
--------------------------------------------------------------------------------
/license.md:
--------------------------------------------------------------------------------
1 | 
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
2 |
--------------------------------------------------------------------------------
/make_files.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | echo "Combining Files"
4 |
5 | ruby combine_files.rb
6 |
7 | echo "Fixing Spacing Issues"
8 |
9 | sed -i '.original' -e ':a' -e 'N' -e '$!ba' -e 's/Oversight Bd\n/Oversight Bd/g' -e 's/Sciences\n/Sciences/g' combined_old_individual.txt
10 |
11 | echo "splitting combined individual contributions file"
12 |
13 | split -l 4000000 combined_old_individual.txt combined_old_individual_
14 |
15 |
--------------------------------------------------------------------------------
/raw_fec_files.py:
--------------------------------------------------------------------------------
1 | import os
2 | import glob
3 | import fileinput
4 | import sys
5 | import subprocess
6 |
7 | def unzip_committee_files():
8 | file_list = glob.glob('local_files/fec_committees/*.zip')
9 | for f in file_list:
10 | new_fileprefix = f.split('/')[2][:4]
11 | year = new_fileprefix[2:4]
12 | new_filename = new_fileprefix + '.txt'
13 | subprocess.call(['unzip', f])
14 | os.rename('cm.txt', 'local_files/fec_committees/' + new_filename)
15 |
16 | def add_cycle_to_files():
17 | file_list = glob.glob('local_files/fec_committees/*.txt')
18 | for f in file_list:
19 | year = f.split('/')[2][2:4]
20 | if int(year) < 30:
21 | year = '20' + year
22 | else:
23 | year = '19' + year
24 |
25 | for line in fileinput.input([f], inplace=True):
26 | sys.stdout.write('{c}|{l}'.format(c=year, l=line))
27 |
28 | def combine_committee_files():
29 | file_list = glob.glob('local_files/fec_committees/*.txt')
30 | with open('local_files/fec_committees/combined_fec_committees.csv', 'w') as outfile:
31 | for fname in file_list:
32 | with open(fname) as infile:
33 | outfile.write(infile.read())
34 |
35 | # Get table metadata here and add to the combined file: http://www.fec.gov/finance/disclosure/metadata/cm_header_file.csv
36 | # get rid of microsoft quotes: iconv -f cp1252 -t UTF-8 combined_fec_committees.csv > combined_new.csv
37 | # mv combined_new.csv combined_fec_committees.csv
38 | #sed -i '.original' 's/\x0//g' combined_fec_committees.csv
39 | # sed didn't completely work, go into vim and do the following:
40 | #Then search for NUL char with:
41 | #/000
42 | #(where is "control-v" key)
43 | #and I see the NUL chars highlighted.
44 |
45 | # run in2csv -d "|" -v combined_fec_committees.csv > fec_committees.csv
46 | # csvsql --db postgresql:///campaign_finance --table fec_committees --insert filename.csv
47 |
--------------------------------------------------------------------------------
/small_money_donors.sql:
--------------------------------------------------------------------------------
1 | SELECT s."CYCLE"
2 | , s."CAND_NAME"
3 | , c."CMTE_NM"
4 | , s."TTL_INDIV_CONTRIB"
5 |
6 | FROM fec_candidate_summary s
7 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE"
8 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH'
9 | ORDER BY s."CYCLE" DESC
10 |
11 |
12 | SELECT s."CYCLE" as cycle
13 | , s."CAND_NAME" as cand_name
14 | , c."CMTE_ID" as cmte_id
15 | , c."CMTE_NM" as cmte_nm
16 | , s."TTL_INDIV_CONTRIB" as total_contributions
17 | , SUM(i."TRANSACTION_AMT") as sum_contributions
18 | , COUNT(i."TRANSACTION_AMT") as total_transactions
19 | , COUNT(CASE WHEN i."MEMO_CD" IS NULL THEN i."SUB_ID" ELSE NULL END) as total_without_code
20 | , SUM(CASE WHEN i."MEMO_CD" IS NULL THEN i."TRANSACTION_AMT" ELSE NULL END) as total_without_code
21 | FROM fec_candidate_summary s
22 | JOIN fec_committees c ON c."CAND_ID" = s."CAND_ID" AND c."CYCLE" = s."CYCLE"
23 | JOIN fec_individual_contributions i ON i."CMTE_ID" = c."CMTE_ID" AND i."CYCLE" = c."CYCLE"
24 | WHERE "CAND_NAME" = 'CROWLEY, JOSEPH'
25 | GROUP BY cycle, cand_name, cmte_id, cmte_nm, total_contributions
26 | ORDER BY s."CYCLE" DESC
27 |
--------------------------------------------------------------------------------