├── .gitignore ├── 01-01-reproducible-data-analysis.md ├── 01-02-education-civil-rights-data.md ├── 01-04-cross-functional-teams.md ├── 01-05-investigating-hate.md ├── 01-06-self-managing-teams.md ├── 02-01-data-you-already-have.md ├── 02-02-evolving-live-coverage.md ├── 02-03-break-filter-bubble.md ├── 02-04-life-after-factfinder.md ├── 02-05-irs-nonprofit-data.md ├── 03-01-lat-map-maker.md ├── 03-02-disaster-data-money.md ├── 03-03-environmental-hazards.md ├── 03-04-guns.md ├── 03-05-entitled-to-a-spreadsheet.md ├── 03-06-data-culture.md ├── 04-01-sensor-journalism.md ├── 04-02-python-tests.md ├── 04-03-archiving-data-journalism.md ├── README.md └── lightning-talks.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /01-01-reproducible-data-analysis.md: -------------------------------------------------------------------------------- 1 | # How and why to make your data analysis reproducible 2 | 3 | **Tipsheet: [bitly.com/reproducible-data](http://www.bitly.com/reproducible-data)** 4 | 5 | * [Ryann Grochowski Jones](https://twitter.com/ryanngro) 6 | * [Hannah Recht](https://twitter.com/hannah_recht) 7 | * [Jeremy Singer-Vine](https://twitter.com/JSVine) 8 | * [Hannah Cushman](https://twitter.com/hancush) 9 | 10 | ##### Description 11 | 12 | You understand how you processed your data. Does your editor? Your reader? You, in six months? Without a replicable approach to extracting, transforming and loading data, we are often frustrated in our efforts to share or update our work. Join us for a panel discussion of reproducible data workflows. We’ll talk about why we use standardized processes for collecting, cleaning and analyzing data, and share practices that work for us. We’ll also discuss strategies for smart human intervention (i.e. reporting, logging and documentation) in automated workflows. 13 | 14 | ## Notes 15 | 16 | ### What is Data Reproducibility? 17 | 18 | **Hannah Recht (HR):** "Whatever you're doing, someone else could reproduce it" whether that's getting the data, analysis, visualization or the story. If you're coding, you run your code and get the same thing every time. Good if you decide to change course halfway through or revisit your data later on. 19 | 20 | **Hannah Cushman (HC):** Doesn't necessarily mean fully automated. Even if one part is reproducible, it still makes it easier. 21 | 22 | **Jeremy Singer-Vine (JSV):** Inputs and outputs. Inputs should yield the same outputs. "What could I delete from my computer, hit one button and have it regenerate completely?" Reproducibility vs. transparency. 23 | 24 | ### What is an ETL Process? 25 | 26 | **HC:** Extract-transform-load. Extract = get the raw data. Transform = Doing stuff. Load = Publish or load data into news app. 27 | 28 | Principles: 29 | 30 | * Source data is lava. We don't touch it. 31 | * Process is deterministic: You'll always get the same thing. 32 | * Standard tool kit, standard process. We're all speaking the same language. 33 | * Version control 34 | * Kind code. Kind code is easier to read and understand for you and others. 35 | 36 | **JSV:** Series of assumptions and inputs, series of outputs. Could be lots of things — lots of data types. Be explicit about what the findings are. Uses Make (DataMade does too). Goal: One command that deletes all your outputs and runs all your code to reproduce the outputs. 37 | 38 | **HR:** Start thinking about reproducibility. First line of script is "Download file". Documenting decisions and processes. 39 | 40 | ### How does ETL process guide data exploration 41 | 42 | **HC:** Have an auditable trail of things you did to your data. "Freedom to forget". Helpful if your code gives you "sharable artifacts" to help editors, reporters, etc. even if it's not for publication. 43 | 44 | **JSV:** Radiating circle: Do it for myself, future self, collaborators, editors, for the public. Also adds accountability. All the other parts are for practicle reasons, making it public is more about the principle. 45 | 46 | ### What can't be automated and how do you deal with that? 47 | 48 | **HC:** Reporting can't be automated. "There's no computer script in the world that can impart meaning on this subjective subject that someone else made." 49 | 50 | **HR:** Building your own datasets. Gathering the data sometimes can't be; once you've gathered it, it can. 51 | 52 | **JSV:** Anything manual should be part of the input. If there's a part in the middle you can't automate, think of it as two smaller reproducible projects with a step in the middle. 53 | 54 | ### Example of when reproducibility was valuable: 55 | 56 | **HC:** Gives you street cred. Helps you explain things more intelligently to other folks in the newsroom. 57 | 58 | **HR:** Saves a lot of work; Factfinder and punching a button hundreds of times or automating it. 59 | 60 | **JSV:** Hurricane Harvey-related industrial emissions. New data coming in all the time, wanted to make sure they had the most updated data. Human judgment: Records to ignore — science reporter identified things that had nothing to do with the hurricane. 61 | 62 | ### Favorite tasks to replicate: 63 | 64 | **HC:** SQL queries. 65 | 66 | **HR:** Standardizing county names and joining them to FIPS codes. 67 | 68 | **JSV:** Everything. Command to re-run all jupyter notebooks and export the output. 69 | 70 | ### Hardest task to replicate: 71 | 72 | **HC:** Chicago lead analysis. Lots of data in PDFs in different formats. 73 | 74 | **HR:** Horrible data formats. 75 | 76 | **JSV:** Time. Some things take a very long time to run. Figure skating analysis, one cell could take 4-8 hours. 77 | 78 | ### Good entry point: 79 | 80 | **HC:** Take something you've already written and make it reproducible. 81 | 82 | **HR:** Take code that's "sort of reproducible" and make it reproducible. More of a mindset rather than a technical skillset. 83 | 84 | **JSV:** Write a methodology. "The art of explaining". 85 | 86 | ## Questions: 87 | 88 | ### Reproducing your data when it's meant to be live 89 | 90 | **HC:** Build in a checking step. Notifies if something is wrong and doesn't push new data to production. 91 | 92 | ### Testing? 93 | 94 | **HR:** Definitely want to test throughout the process. Create logs. 95 | 96 | **Ryann Grochowski Jones (RGJ):** System that gives you alerts when stuff doesn't look the way it should. Airflow ETL system. 97 | 98 | ### How to use on a team? 99 | 100 | **HC:** Lots of documentation. Tutorials. 101 | 102 | **JSV:** Have some flexibility so it's fun. 103 | 104 | **RGJ:** Trainings. Show reporters why it matters. 105 | 106 | ### Dealing with uncertainty of when something might be a story 107 | 108 | **HR:** Do the basic stuff from the beginning, can add later. Assume that at some point it might go somewhere. 109 | 110 | 111 | ##### Speakers 112 | 113 | Ryann Grochowski Jones is the deputy editor for data at ProPublica. Previously, she was a data reporter at ProPublica and at Investigative Newsource in San Diego, California. She received her master’s degree from the University of Missouri School of Journalism, where she was a data librarian for IRE/NICAR. Ryann began her career as a municipal beat reporter for her hometown newspaper in Wilkes-Barre, Pennsylvania. [@ryanngro](https://twitter.com/ryanngro) 114 | 115 | Hannah Recht is a data journalist at Bloomberg News. She likes scraping obscure insurance filings and wrote an R package that accesses Census data. She previously worked at the Urban Institute as a researcher and data visualization developer. [@hannah_recht](https://twitter.com/hannah_recht) 116 | 117 | Jeremy Singer-Vine is the data editor at BuzzFeed News. He also publishes Data Is Plural, a weekly newsletter of useful/curious datasets. [www.jsvine.com](https://www.jsvine.com/) 118 | 119 | Hannah Cushman is a journalist turned hacker for public good. She arrived at DataMade, a civic technology company in Chicago, by way of The Associated Press. She believes in open information, empathy, and Dark Matter coffee. [@hancush](https://twitter.com/hancush) 120 | 121 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3532/)* -------------------------------------------------------------------------------- /01-02-education-civil-rights-data.md: -------------------------------------------------------------------------------- 1 | # Education civil rights data: The good, the bad, the dirty (Diversity Track) 2 | 3 | [Slides](https://docs.google.com/presentation/d/15PJ9R-L3Qljh6zV6be0r3gJCU6hXIQeAUi9qvmI7O30/edit#slide=id.p) 4 | 5 | * Jennifer LaFleur 6 | * Alex Harwin 7 | * Kameel Stanley 8 | 9 | ##### Description 10 | 11 | Panelists with a range of expertise will discuss the federal civil rights dataset, how to make the most of it and avoid the pitfalls. We also will talk about how to use state data to tell stories about disparities in schools. This session will provide lots of story ideas, so come with questions! 12 | 13 | ## Notes 14 | 15 | ### Alex Harwin 16 | 17 | Data available for every school. What can you get from it? Special ed, arrests at schools, corporal punishment, teacher absences. 18 | 19 | Data is messy but powerful. Impact: RI Govornor, White House Brief, other local and state level results. 20 | 21 | How do you work with it? Look up comparisons at school and district level. Larger analyses are more complicated and require feedback loops and teamwork. 22 | 23 | **Jennifer LaFleur (JLF):** Vetting, spot-checking this data is very important. 24 | 25 | ### Kameel 26 | 27 | Working on [We Live Here](http://www.welivehere.show/). Tim Lloyd started the process by getting the data from DESE. 28 | 29 | "When white kids act out they get kicked out of class, but the black kids get kicked out of school." Nut graf. 30 | 31 | Johnny — 7 years old, out of school for 38 days in 1 year. 32 | 33 | Data we got: Narrowed in on K-3 suspensions. Big difference between in-school suspensions (mostly given to white kids), out-of-school (mostly given to black students). 34 | 35 | DIY data: surveys of resources, counselors, bias trainings. 36 | 37 | Lessons: It had an impact. SLPS banned OSS for K-2, 20 other districts pledged to reduce or ban them. Different media, different approach: We created a database for people to look up their district. 38 | 39 | 40 | ### Tips 41 | 42 | Where to get the data: Office for Civil Rights, NCES, state departments of education 43 | 44 | Watch for data entry errors, look for impossible data, check against other datasets, verify with schools/districts if possible. 45 | 46 | New data points are coming out. 47 | 48 | ## Questions 49 | 50 | ### How can this data effect change? 51 | 52 | **Kameel Stanley (KS):** Data may not be the best way to effect change, but it can make some people pay attention. 53 | 54 | ##### Speakers 55 | 56 | Jennifer LaFleur is data editor at The Investigative Reporting Workshop and teaches at American University. She previously was a senior editor at Reveal/CIR, data editor at ProPublica, The Dallas Morning News, the San Jose Mercury News and the St. Louis Post-Dispatch. She was NICAR's founding training director and has won awards for her coverage of disability, legal and open government issues. 57 | 58 | Alex Harwin is a quantitative research analyst for the Education Week Research Center. She works on a wide variety of projects, from marquee annual reports such as Quality Counts to data-driven reporting in collaboration with the Education Week newsroom. She received her education at Stanford, and UT with degrees in Sociology and policy analysis. Areas of Focus: Policy analysis, government data analysis, and research communication. 59 | 60 | Kameel Stanley produces and co-hosts We Live Here, an award-winning podcast about race and class from St. Louis Public Radio and PRX. Previously, Kameel worked at the Tampa Bay Times, where she investigated racial disparities in policing and government. In her spare time these days, she runs a storytelling organization in St. Louis and a brunch club for women of color. She’s a Michigan native, a dog owner, a yogi and spaghetti enthusiast. [@cornandpotatoes](https://twitter.com/cornandpotatoes) 61 | 62 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3585/)* -------------------------------------------------------------------------------- /01-04-cross-functional-teams.md: -------------------------------------------------------------------------------- 1 | # Building happy cross-functional teams 2 | 3 | * Becca Aaronson 4 | * Joe Germuska 5 | * Emily Ingram 6 | 7 | ##### Summary 8 | 9 | As more newsrooms adopt a product mindset, the culture clash between traditional editorial workflows and Agile development processes often comes to a head. Learn how tech-forward newsrooms are building cross-functional teams and the lessons they've learned along the way. 10 | 11 | ## Notes 12 | 13 | **Joe Germuska (JG):** Attitude used to be "Think of them as the photo desk." But as technology has progressed, you can't wait until the end. Everybody has to feel like an equal player. 14 | 15 | **Becca Aaronson (BA):** Everybody has different roles but you have to work together toward a common goal. How do we work efficiently but also collaboratively without stepping on each other's toes or running afoul of ethics issues? 16 | 17 | **JG:** Recognize it's uncharted territory and no one knows exactly where we're going. 18 | 19 | ### Common conflicts, communication issues and cultural issues that arise. 20 | 21 | **JG:** Software development mindset, strong specifications, does the code pass tests? Book: Difficult Conversations. 22 | 23 | **BA:** Empathy is important. I may be making their job more difficult. 24 | 25 | **JG:** Hard to build that on a service desk type environment. 26 | 27 | ### How to build trust? 28 | 29 | **JG:** "Trench Warfare" helps. Consistent team routine. T-shirt: "Scrum Lunch Coffee". Not everyone has the bandwidth for that though. Make sure that people get together regularly. "Good occasions to get together that aren't about the work." The satisfaction of finishing something. 30 | 31 | **BA:** Doing something where you communicate with someone outside your circle is work. 32 | 33 | **JG:** Post-mortems after a project, not a blame game. 34 | 35 | **BA:** Good/Bad things. Recently added a "thankful" section. 36 | 37 | ## Questions 38 | 39 | ### Working with business side, marketing, etc.? 40 | 41 | **BA:** Less of a clear divide. We all want to engage with the audience. 42 | 43 | ### Dealing with stepping on each others' toes? 44 | 45 | **JG:** Looking at motivations, why are they doing that. Agree on principles. "These are our goals and our guiding principles." 46 | 47 | **BA:** Approach with empathy and figure out why is something frustrating to them. Figure out how task has value for them, or at least realize you asking a favor of them. 48 | 49 | Audience member: Conflict between shiny new thing and established code base. Have to compromise. 50 | 51 | **BA:** Help accommodate different teams. It all connects back to larger goals. 52 | 53 | ### What questions should we ask of new managers? 54 | 55 | **BA:** How do you manage conflict? 56 | 57 | **JG:** Uncover their affinity for things we're talking about. Sometimes direct isn't best. Examples are good. 58 | 59 | **BA:** How do you define success? 60 | 61 | ### Using markdown? 62 | 63 | **BA:** Figure out how much technical savvy you're going to require of people. Figure out what the actual problem you're trying to solve is. 64 | 65 | 66 | ### Misc. 67 | 68 | **BA:** Learn how to speak each others' languages. 69 | 70 | ### What to keep an eye out for when building a team? 71 | 72 | **BA:** Create a consistent relationship, not just one-time. 73 | 74 | **JG:** Or try it first before making a big committment, like a "ride-along". 75 | 76 | **BA:** A medical-school match program, picking top candidates and mentors. 77 | 78 | Audience member: Have reporters who miss their deadlines spend time on the copy desk. 79 | 80 | ### Synchronizing vocab/methodogies of developers and editorial 81 | 82 | **BA:** Make your own language or find one that's in between. 83 | 84 | **JG:** Play games/run a simulation together. 85 | 86 | ### Editorial complaint that things are very rigid and formal 87 | 88 | **BA:** Word template, they can edit the questions if they want but they have to see the original one. Follow up in person. 89 | 90 | ##### Speakers 91 | 92 | Becca Aaronson is the first-ever product manager at The Texas Tribune. She manages the Tribune’s website redesign, coordinates cross-departmental projects and conducts user research to improve reader experience. She previously worked on the Tribune’s Data Visuals team as a developer and project manager, contributing to several award-winning investigative projects. [@becca_aa](https://twitter.com/becca_aa) 93 | 94 | Joe is the Chief Nerd at Northwestern University Knight Lab, a community of designers, developers, students, and educators working on experiments designed to push journalism into new spaces. He's also the project lead for CensusReporter.org, and an alum of the Chicago Tribune News Applications team. Once a week he gets up before dawn to host "Conference of the Birds," an eclectic music radio show on WNUR-FM. Ask him for Chicago restaurant recommendations! [@JoeGermuska](https://twitter.com/JoeGermuska) 95 | 96 | Emily Ingram is lead product manager at Chartbeat, where she builds tools to help publishers understand the nuances of reader behavior. Previously, she was a senior product manager at HuffPost and the Washington Post, where she started her career as a producer in the newsroom. 97 | 98 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3539/Ï)* -------------------------------------------------------------------------------- /01-05-investigating-hate.md: -------------------------------------------------------------------------------- 1 | # Investigating hate when the data isn’t there (Diversity Track) 2 | 3 | * Duaa Eldeib 4 | * Melissa Lewis 5 | * Ken Schwencke 6 | * Nadine Sebai 7 | 8 | ##### Description 9 | 10 | How can we report on victims of bias-motived crimes when half of them don’t report the crimes to police, and when they do, police frequently fail to mark them down as such? Come listen to how we’ve done it (or tried to). 11 | 12 | ## Notes 13 | 14 | No reliable stats exist. FBI says around 6,000 per year. Bureau of Justice Statistics estimates up to 250,000. 15 | 16 | [Documenting Hate database by ProPublica](http://documentinghate.com) 17 | 18 | ### Nadine 19 | 20 | AJ+ Documenting Hate partnership put out a form and callout video. Used the database to find individual tips to create multimedia stories. Needed stories powerful enough to be told on social media. 21 | 22 | ### Duaa 23 | 24 | Compare local data to state data to federal data. Hate crime convictions — there's no centralized database. 25 | 26 | Drill down by location: college campuses, public transit. 27 | 28 | Tweeted she was covering hate crimes and got stories. FOIAed police department to confirm. 29 | 30 | Plea deals can drop hate crime charges. 31 | 32 | Don't have to wait to do huge all-encompassing story. Chip away at it. 33 | 34 | ### Melissa 35 | 36 | Build a reputation to be someone to approach with the stories. 37 | 38 | The area isn't very racially diverse, which hinders reporting and recognition. 39 | 40 | ### Ken 41 | 42 | Why America fails at collecting hate crime statistics. 43 | 44 | FBI's master files are fixed-width and the dictionary is a typed, scanned document from the 90s. 45 | 46 | Easy impact. Ask for records from local agency and see if they match up with FBI reports. 47 | 48 | "Does this mean you've had no hate crimes, or that you don't track hate crimes reported to you?" 49 | 50 | Ask how they handle hate crimes? Do they do hate crimes training? How do they handle racist graffiti — does it get marked as a hate crime? 51 | 52 | ## Questions 53 | 54 | ### Getting people to talk to you — "We saw this in the database, please talk to me" 55 | 56 | **NS:** Sometimes trolls submit to the form. Have to fact check. Real people worry about being doxxed. Engagement team monitored comments. 57 | 58 | **DE:** Fear. "Complaining witness not in court." "When they're contacted by a reporter, it's real." 59 | 60 | **ML:** People feel like it's not worth reporting because it's happened to them a lot, they don't want to rehash it. 61 | 62 | **DE:** Can't make them feel pressured. The power is in their hands. 63 | 64 | ### Tips to help people open up. 65 | 66 | **DE:** Let them know you trust them. You're invested in this, not a story you'll check off your list and move on. Multiple interviews. Ask open ended questions. "Tell me in your words." 67 | 68 | ### How can we tell stories of people not like us? 69 | 70 | **NS:** Understand your audience. Can collaborate with other newsrooms that have other audiences. 71 | 72 | **DE:** Look for some kind of connection. 73 | 74 | **ML:** How do you partner with with other orgs? 75 | 76 | Rachel (propublica): Play to your strengths. If two newsrooms are interested in the same thing, matchmaking. 77 | 78 | ### How do you verify? 79 | 80 | **NS:** Sometimes you have to go with your gut if there's no other way. Witnesses, police reports, etc. 81 | 82 | **KS:** Police were skeptical. Talked to other members of LGBT community and they were skeptical. Tried to get court records. 83 | 84 | ### Dealing with distrust of media 85 | 86 | **DE:** "But I'm here now, trying to tell this story." 87 | 88 | **NS:** When they're reporting it, it's not usually the first time it's happened. This gives people a platform to realize this isn't normal. 89 | 90 | ### Outside of the US 91 | 92 | **KS:** Not part of Documenting Hate project. Familiarity with laws helps. 93 | 94 | ### Other institutions you can benchmark reports against 95 | 96 | **KS:** ADL, SPLC, other similar groups. Unifying that data is hard. Universities. Dept. of Education. 97 | 98 | ### LGBTQ incarceration and sexual assault while incarcerated; interview techniques to let people know you believe them but that you need more info 99 | 100 | **DE:** Repeat the story in different ways. "For my publication…" 101 | 102 | **NS:** Blame my editor for everything. 103 | 104 | **KS:** Make a timeline, ask chronologically. 105 | 106 | **ML:** ProPublica "Unbelievable Story of Rape" — reporter sent email, talked through lawyer first, no surprises. "You're always free to go." Only a couple people, minimal gear. 107 | 108 | Audience: If you're FOIAing lots of police agencies and they say they don't have data, find corroboration in the news and send it to them. 109 | 110 | ##### Speakers 111 | 112 | Duaa Eldeib is an investigative reporter for ProPublica Illinois. Her work has examined the death of children in state care, the treatment of juveniles in adult court and police use of polygraphs in cases where suspects were wrongly convicted. She previously worked at the Chicago Tribune, where she and two colleagues were finalists for the Pulitzer Prize for Investigative Reporting in 2015. [@deldeib](https://twitter.com/deldeib) 113 | 114 | Melissa Lewis is the data editor and a developer at The Oregonian. She's a former software engineer and research scientist, and currently volunteers as an organizer for PyLadies Portland. She's an occasional technical reviewer for O'Reilly Media and contributor to The Recompiler. [@iff_or](https://twitter.com/iff_or) 115 | 116 | Ken Schwencke is a journalist and developer on ProPublica's news apps team, covering hate crimes and election administration. Previously, he worked on The New York Times’ interactive news team and the Los Angeles Times data desk. [@schwanksta](https://twitter.com/schwanksta) 117 | 118 | Nadine Sebai is a radio reporter in the S.F. Bay Area, working part-time at KQED News. In 2016, Nadine investigated a hepatitis C outbreak in Fremont, NE. The story won the SPJ Mark of Excellence Award and the Reva and David Logan Prize for Excellence in Investigative Reporting. In 2016, she was an Ida B. Wells fellow with the Investigative Fund. Prior to working in journalism, Nadine worked as an accountant and investigative analyst. [@NadineSebai](https://twitter.com/NadineSebai) 119 | 120 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3560/)* -------------------------------------------------------------------------------- /01-06-self-managing-teams.md: -------------------------------------------------------------------------------- 1 | # Conversation: More bibles, fewer priests: Tools for running self-managing teams 2 | 3 | * Brian Boyer 4 | 5 | ##### Description 6 | 7 | A lot of managers see themselves at the center of the team’s universe — the linchpin, making the decisions every day. And that’s a great way to burn out, stifle the growth of your teammates and totally avoid thinking about the important stuff, like the future. So, forget that! Let’s talk about the tools and techniques you can use to make yourself non-essential. (This session is also good for non-managers who have bosses that are making poor choices.) 8 | 9 | ### Notes 10 | 11 | I don't want to be the single source of truth. I want to be able to go on vacation. 12 | 13 | We're not interchangeable but we need to be able to take a break. 14 | 15 | Having the opportunity to think about higher-level stuff. You can spend time planning. 16 | 17 | An exercise in self-control. Easiest to be the person who tells everybody else what to do. 18 | 19 | Roles, goals and rules. 20 | 21 | We have rules so we don't have to argue. 22 | 23 | #### Frameworks 24 | 25 | Daily scrum. Room: 5-30 minutes. Try keeping it to 5 minutes or less. Team of 50 can do it in less than 10. 26 | 27 | Set parameters for what you talk about: Yesterday, today, blockers and that's it. Be prepared for it. 28 | 29 | Slack channels can work. Sometimes followup is an issue. 30 | 31 | How much detail? Depends on context. Maybe have two if you've got a regular meeting and a significant project. 32 | 33 | If you have rules, you can call it out when someone is violating them. If you don't you just look like a jerk. 34 | 35 | Iteration review — weekly review. Like a scrum but with outside parties. 36 | 37 | Secrecy and investigations. Codenames. Multiple meetings. 38 | 39 | #### Roles 40 | 41 | "I don't know what my job is." 42 | 43 | Without a set of clearly defined roles, people step on each others' toes, ask why they weren't consulted. "You're allowed to have an opinion? I thought I was the boss of this." Job descriptions. 44 | 45 | Job descriptions might describe your job but they don't describe how people interact with each other. 46 | 47 | Responsibility matrix (RASCI). Across the top are people's names. Jobs down the side. What that person's responsibility is for that job at intersection. 48 | 49 | Responsible for != boss. 50 | 51 | A for Accountable. S for Support. C for Consulted. I for Informed. 52 | 53 | Relationships among your team but also between other teams. 54 | 55 | If you're not in agreement about who has a say about what, you've got problems. 56 | 57 | Create a sense of co-responsibility. Teams should feel responsible to each other. No one's too important to do QA. 58 | 59 | Don't want to be teams of divas. 60 | 61 | #### Issues 62 | 63 | People feel like process is extra. Stylebook metaphor might be helpful. How do you create rules where life is easier when you have them. e.g. Checklists. Implemented process and gotten buy-in because they can explain why the process is there. Get people on the same page about the problem first. Transparency and visualizations. 64 | 65 | Strategies for getting other managers on board for making room for these new processes. Luck with being very visible about it. Keep it short and keep it planned. Don't waste people's time. Be open about whys and whats. Treating process with a design eye. Keep a cadence. "Scrum Lunch Coffee". 66 | 67 | #### Retrospectives 68 | 69 | After a project or for a defined period of time. Catalogue everything. 70 | 71 | 1. What went well. 72 | 2. What didn't work. What was hard. 73 | 3. What should we do differently next time? 74 | 75 | Write down on post-its for defined period (5 minutes). 76 | 77 | Come back to it later, pick a few of #3, talk about how you'd deal with it. That's what makes it actionable. 78 | 79 | How do you do it when you're distributed? Chat with a designated post-it writer. Mural app. 80 | 81 | "Yes, and" rule. 82 | 83 | #### Goals 84 | 85 | Pair praise and ask. Play to ego. 86 | 87 | "What are we trying to do as a group?" "What are we trying to do as a company?" 88 | 89 | OKRs: Objectives and Key Results. Objective: Let's build the best photo website for Chicago. Key Results: What is evidence that we did that? (views, revenue, etc. needs to be measurable) 90 | 91 | Look at the key results every week. What is our confidence we're going to hit it? 92 | 93 | Send it to other teams so they know what they're doing. 94 | 95 | Book: "Radical Focus" 96 | 97 | #### How do you find a good manager? How do you tell a good manager. 98 | 99 | Give them a project and ask them how they would've solved that problem. 100 | 101 | How do you know if you're good at managing? One on one conversations with your reports. Regularly and predictably. Find someone else on the same level to talk to. 102 | 103 | "Decided I was jazzed about catalyzing work instead of doing it." Service leadership. Do the work and help people do their jobs well. 104 | 105 | "Delegate everything that's fun." 106 | 107 | "Give the credit but take the blame." 108 | 109 | Bringing it to newsrooms? 110 | 111 | Breaking news checklist. Have to trust each other is doing their job. And that they know what their job is. 112 | 113 | Covert operation. 114 | 115 | Brown bags/show and tell. 116 | 117 | We hold our dysfunction close. The news is not different. It just doesn't value good management. 118 | 119 | ##### Speakers 120 | 121 | Brian is the vice president of product and people at Spirited Media. Previously, he was the visuals editor at NPR, founded the news applications team at the Chicago Tribune, and was a happy intern at ProPublica. He was one of the first programmers to receive a Knight-funded scholarship to study journalism at Northwestern University. [@brianboyer](https://twitter.com/brianboyer) 122 | 123 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3686/)* -------------------------------------------------------------------------------- /02-01-data-you-already-have.md: -------------------------------------------------------------------------------- 1 | # How to find reporting leads and publishable facts in text data you already have 2 | 3 | [Slides](http://www.bit.ly/nlp-car18) | [Examples](http://bit.ly/nlp-car18-examples) 4 | 5 | * Jeff Ernsthausen 6 | * Jeremy Merrill 7 | * Youyou Zhou 8 | 9 | ##### Description 10 | 11 | Let's discuss some published projects that have extracted useful, newsy information from big piles of text data — so you can use similar techniques. We'll walk you through real-world examples of every step of the process: gathering text data, dividing it into chunks the computer can understand, analyzing it with fancy or simple techniques and the challenges you'll face in analyzing, bulletproofing and presenting what you find. This session isn't quite a hands-on, but the panelists will discuss the tools, practical techniques and tricks they used to transform giant piles of text into publishable insights and reporting leads. These techniques are often called "natural language processing," but we're going to keep it practical: no obscure mathematical formulas, guaranteed! 12 | 13 | ## Notes 14 | 15 | What kinds of insights can you get? What sorts of text do you have (or can get)? 16 | 17 | ### Example insights 18 | 19 | * Patterns across documents — which documents are most similar to documents you already know are interesting. 20 | * Outliers — What's distinctive about some documents compared to others 21 | * Extract meaning — topic/sentiment 22 | 23 | Example: Extract motivation and mentions of killer's race from 141 hours of TV coverage 24 | 25 | Example: Database of more than 100,000 disciplinary documents of doctor misconduct, model to tag them and statistically estimate liklihood. 26 | 27 | Example: Press releases of government representatives — similar to other legislators, distinctive topics, policy priorities. 28 | 29 | ### Pipeline (How) 30 | 31 | 1. Getting the data 32 | 2. Dividing it up 33 | 3. Analyzing 34 | 4. Bulletproofing 35 | 5. Presentation 36 | 37 | ### 1. Getting data is harder than it sounds 38 | 39 | Some sources: 40 | 41 | * Readily available (Speeches, academics, libraries) 42 | * APIs 43 | * Scraping 44 | * Speech to text 45 | * FOIA 46 | 47 | ### 2. Dividing it up 48 | 49 | * Cleaning 50 | * OCR 51 | * Filtering stuff out like documents in other languages 52 | * Lowercasing, punctuation, stripping HTML, bylines, etc. 53 | * Tokenization (words -> columns) 54 | * "this is not comprehensible to the computer" -> ["this", "is", "not", "comprehensible", "to", "the", "computer"] 55 | * Stemming/lemmatization & part-of-speech tagging 56 | * Remove stopwords and meaningless words 57 | 58 | DocumentCloud 59 | 60 | ### 3. Analysis 61 | 62 | Counting words 63 | TFIDF (Term Frequency Inverse Document Frequency) 64 | Keyword 65 | Clustering 66 | Sentiment 67 | Vectorization 68 | 69 | Cleaning data and removing stop words = senator's most common words were his last name and "previous_article" 70 | 71 | Vectorization = give it lots of documents, it'll figure out which words appear in similar contexts 72 | 73 | Can ask analogies like "what is the Republican version of what the Democrats call an estate tax". 74 | 75 | ### 4. Bulletproofing 76 | 77 | Bulletproofing doctor's harassment reports: Periodically check whether predictions were useful. Randomly select documents with low scores and read those. Be aware of whether you've unintentionally biased your algorithm. False negatives are ok, false positives are unacceptable. 78 | 79 | Beware of external factors: One stylebook requires "spokesperson", one requires "spokesman/spokeswoman". 80 | 81 | ### 5. Presenting and visulization 82 | 83 | Bar charts, bubbles, heat maps. Small multiples. 84 | 85 | No word clouds. (exception: Street name visualization) 86 | 87 | ## Questions 88 | 89 | Story planning process: Incremental process. 90 | 91 | People start coming to you with every pile of text. Use the easiest tool that will get the job done (DocumentCloud). 92 | 93 | Initial loading step: figure out what you actually need from the data. When dealing with large amounts of data, just moving it around can take a long time. Paring it down can be helpful. 94 | 95 | What do you want to do with this next: JM - Integrate it into search engine (e.g. search for estate tax, also get stuff related to death tax). 96 | 97 | ##### Speakers 98 | 99 | Jeff Ernsthausen is a data reporter at the Atlanta Journal-Constitution. He previously interned at The Nation and Harper's Magazine. 100 | 101 | Jeremy is a news apps developer at ProPublica. He likes scraping data that's hard to get, maps and public records. He lives in Atlanta, Georgia. He works on a variety of open-source newsroom tools like Tabula, Stevedore and FOIA Lawya. 102 | 103 | Youyou Zhou is a visual journalist at Quartz. She digs into data, writing stories, designing visuals and building interactives out of it. Youyou has a keen interest in the global transfer of knowledge and text data. She is a Mizzou alum and has previously built interactives and election apps for The Associated Press. [@zhoyoyo](https://twitter.com/zhoyoyo) 104 | 105 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3545/)* -------------------------------------------------------------------------------- /02-02-evolving-live-coverage.md: -------------------------------------------------------------------------------- 1 | # Evolving forms and the future of live coverage 2 | 3 | [Slides](https://docs.google.com/presentation/d/1gQH5wycI20sj909AdSNjk1qUTpfWtT3n57mVpMBNWuw/) 4 | 5 | * Hamilton Boardman 6 | * Tiff Fehr 7 | * Tyler Fisher 8 | 9 | ##### Description 10 | 11 | Breaking news formats have evolved and so have reader devices and preferences. Both The New York Times and The Guardian — through its Mobile Innovation Lab — have broadened their breaking news formats and experimented with new forms. In this session, we’ll review our current story forms and discuss the pros and cons, then share lessons from early experiments from the future of live coverage. 12 | 13 | ## Notes 14 | 15 | Agenda: Tour the classics, newer forms. Discuss the future. Intra-panel questions. Audience questions. 16 | 17 | Goals: Shared vocabulary. Learn spectrum of pros/cons per forms. Dispel myths. Ground new ideas with next steps. 18 | 19 | ### Audience evolution, myths and realities 20 | 21 | Realities: 22 | 23 | * Shifting to mobile (except business junkies) 24 | * Rich native app UX expectations 25 | * Skimming, tl;dr 26 | 27 | Myths: 28 | 29 | * Folks are news junkies (newsroom folks are, but others aren't) "News is one of many" of the alerts on their screen. Also sub-topics in news like sports, entertainment, etc. 30 | * Drop-by traffic converts into long-term readers 31 | * Do we support late-comers? (We often lose the summary) 32 | 33 | **Hamilton Boardman (HB):** Survey found large number of latecomers would go to Wikipedia to get caught up 34 | 35 | ### The classics 36 | 37 | Articles and liveblogs 38 | 39 | Articles are time-tested, but time-consuming. Familiar to readers and comprehensive. But tl;dr, have they already read this (or this update?) "mentally diffing". For reporters, well-known, cohesive but takes the time to write it and there are expectations about effort. 40 | 41 | Liveblogs are recent trend but attention-demanding. Learned expectations, up-to-date and feels urgent for readers, but hard to orient, oversaturation and things get buried quickly. For reporters, it's fast-paced and you can have varied depth, but you need to "feed the beast" and cohesiveness is hard. 42 | 43 | ### Newer forms 44 | 45 | * Briefings 46 | * Chats 47 | * Chatbots 48 | * Notifications 49 | * Augmented 50 | * 'Smarticles' 51 | * Mobile-focused experiences 52 | 53 | ##### Live chats 54 | 55 | Reverse-chron vs. forward-chron (chat). 56 | 57 | Pros: expert analysis and debate, cons: transcript of cable-news talking heads 58 | 59 | For readers: very up-to-date, second-screen experience, asked to join (sort of). Cons, tl;dr 60 | 61 | For reporters: brevity, second-screen banter, slack comfort (perhaps). But they need to feed the beast. Editing questions. Threading comments and having a "host" is tough. 62 | 63 | Politico's slack -> Live chat thing is open source. Can transform slack data into anything. 64 | 65 | ### Live briefing 66 | 67 | Removes sense of urgency from live-blog (feed-the-beast notion). "What's important right now"? Kind of like an article, but updated a lot. Ordered by importance rather than recency, but unlike an article it doesn't need to be a narrative. 68 | 69 | Good: Daily digest and big moments. Bad: can turn into a live blog. 70 | 71 | Doesn't demand consistent attention and helps readers catch up, but can be an ambiguously ordered collection of stuff. 72 | 73 | Writing is brief, but things still get buried and editors can struggle with it. 74 | 75 | ### More new forms 76 | 77 | Data driven events: Sports, voting, awards shows 78 | 79 | ### State of the Union — Politico 80 | 81 | Transcipt and annotations 82 | 83 | Every annotation is in context. Transcription becomes a resource. But long events make for long transcripts, hard to keep track of new info. 84 | 85 | For tech side, it breaks down into components well and the data already exists. But getting structure from the google doc is hard and transcripts are messy 86 | 87 | ### Notifications 88 | 89 | Re-think of what notifications should do. Instead of driving people to coverage, it is coverage. 90 | 91 | More engagement sometimes. 92 | 93 | But repeated alerts can be irritating. Can be cold and distance. Currently are low-fi, but improving. 94 | 95 | ### Augmented podcasting 96 | 97 | Expand on player limitations, notifications per timestamp, audience on platform. 98 | 99 | ### Smarticles and chatbots 100 | 101 | Expand on format limitations, notifications based on place in story, audience on platform. Sending incremental changes. 102 | 103 | ## Questions 104 | 105 | **Tyler Fisher (TyF):** Live briefings writing style? 106 | 107 | **HB:** There's no tool. Might be helpful to have one. Often a bottleneck on the editor. "If you've got anthing, send it to us!" 108 | 109 | Me: Long-running but variations in urgency (like ferguson) 110 | 111 | **HB:** Daily briefings, good for daily but not catchup. People will still go to wikipedia 112 | 113 | Audience: Can and should be doing? 114 | 115 | **Tiff Fehr (TiF):** Think about audience, don't presume everyone wants to know. 116 | 117 | **HB:** Some apps let you dial up or down number of notifications. Let people set those preference easily. 118 | 119 | **TiF:** Be good stewards with people's expectations and attention. 120 | 121 | **TyF:** What about users who don't want to be a part of breaking news? 122 | 123 | Audience: Timing on notifications, browser notifications 124 | 125 | **HB:** Try to be aware of time of day as far as after 10 p.m. or before 6 a.m. 126 | 127 | **TiF:** Target different readerships; coasts, international. Browser notifications might be useful page-specific vs. site-specific. Very easy to opt-out and tied with privacy/security so it's hard to ask people to change it. 128 | 129 | 130 | ##### Speakers 131 | 132 | Hamilton Boardman ([@nytham](https://twitter.com/nytham)) is a senior editor at The New York Times, currently serving as deputy Washington editor for digital. He has worked for nearly a decade on The Times's news desk as an editor on the digital home page and print front page and as a coordinator of live and breaking news coverage. 133 | 134 | Alastair is a developer at the Guardian Mobile Innovation Lab, where he experiments in new forms of news coverage on both the web and in native apps. [@\_alastair](https://twitter.com/_alastair) 135 | 136 | Tiff Fehr ([@tiffehr](https://twitter.com/tiffehr)) is an assistant editor on the Interactive Desk of The New York Times. She leads development on The Times' live coverage toolset and explores new ideas in breaking news storyforms with newsroom collaborators. 137 | 138 | Tyler is a news applications developer at POLITICO on its new interactives team. He previously worked as a news applications developer on the NPR Visuals Team and as an undergraduate fellow at the Northwestern University Knight Lab. 139 | 140 | *Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3546/)* -------------------------------------------------------------------------------- /02-03-break-filter-bubble.md: -------------------------------------------------------------------------------- 1 | # Data journalism that breaks the filter bubble 2 | 3 | [Slides](https://docs.google.com/presentation/d/1Tjsg506bcb1xORpZkj_ydYnFCmfQkZsgNUOE4Qr-ElM/edit#slide=id.p) 4 | 5 | * Eva Constantaras 6 | * Adriana Gallardo 7 | * Anjeanette Damon 8 | 9 | ##### Description 10 | 11 | Too often, data journalism falls into the trap of preaching to the converted — informing elite, liberal, white audiences on issues they already understand pretty well. This panel will look at three case studies of how journalists harnessed data, innovative storytelling and audience engagement to expose injustices faced by marginalized communities and bring those communities into the policy debate. 12 | 13 | ## Notes 14 | 15 | How can we produce journalism about marginalized communities when they're not in the room? 16 | 17 | ### Lost Mothers series 18 | 19 | [Tipsheet](https://docs.google.com/document/d/15ZRYLjsMLFTEnrOM2mdGxPouLDqu7Q2mQyz_tnTyWOQ/edit) 20 | 21 | Adriana Gallardo 22 | 23 | Maternal harm and deaths in the US. About 800 women die each year in the U.S., about about 50,000 nearly die. 60 percent are preventable. Black mothers die at 3-4X the rate of white mothers. Last mandated to be reported in 2010. 24 | 25 | Needed basic data: Name, cause of death or near death, date, states, prior births, type of delivery. Looked for conversations, but it's treated as private tragedy but not public health crisis. Crowdsourcing. 26 | 27 | Spent two months developing the right form. Specific about intent (not interested in difficult pregnancies or births, for example). Questions changed based on conditional answers. 28 | 29 | Three audiences: I almost died, I know someone who died, I know someone who almost died. 30 | 31 | Majority were self-reported. 32 | 33 | Had to be conscious of tone. 34 | 35 | Did all this in tandem with traditional reporting. 36 | 37 | Also searched out facebook groups — anything that touched maternal health at all. 38 | 39 | Shared the form widely. 40 | 41 | But then followed up. Constant communication, can't be just the form. 42 | 43 | ### Death behind bars 44 | 45 | [Tipsheet](https://drive.google.com/file/d/0B-dTRqkrNLbcbDBUeEV1V2tpQi1sbVhDNTdjalBjWkkwU1Nv/view) 46 | 47 | Anjeanette Damon 48 | 49 | "The dataset that we used for this is not hugely robust." Telling the story beyond the data: this is a story about people. Data formed the basis for the story but wasn't the story. 50 | 51 | Mostly vulnerable communities, so they don't have a voice or access and their stories were either overlooked or ridiculed. 52 | 53 | Jail officials seemed sympathetic but didn't feel like they should be held accountable. 54 | 55 | Family members became targets too. 56 | 57 | Story started with anonymous tip: three suicides in the past month. 58 | 59 | Requested 10 years of data for basic info, then requested more data points later. 60 | 61 | Found big uptick after new sheriff, rate about 5x national average. 62 | 63 | But had to figure out why. Found out only one death had been investigated by outside agency. 64 | 65 | Use of video for explainers, interview and annotated raw video. 66 | 67 | Case studies/profiles. 68 | 69 | Combating cynicism: Told human stories, described in detail, this could be you or someone you know, the data helped bolster the story. 70 | 71 | Impact: Lots of calls, adovcated, law enforcement (for training purposes), sheriff changed policies and death rate fell. 72 | 73 | 74 | ### Data Journalism where everything is terrible 75 | 76 | [Tipsheet](https://docs.google.com/document/d/1qxY9GY9uzBUjYAfue1rGrqPPo6w8XwtPC4Buqi1jPU8/edit) 77 | 78 | Eva Contantraras 79 | 80 | Corruption is not news. Inequality is not a story. Theft is not a story. 81 | 82 | If we can count it, if we can measure it, we can fix it. 83 | 84 | Process: 85 | 86 | * Background 87 | * Hypotheses 88 | * Questions 89 | * Analysis 90 | * Story Structure 91 | * Interviews 92 | * Visualization 93 | * Impact metrics 94 | 95 | Start with headline story. 96 | 97 | Background. Has someone else written this? Can we get to the bottom of this? 98 | 99 | Hypotheses. Why is this happening? Tips: can be proven or disproven with data, specific about what's measured, is it measuring problem, causes, impact and solutions, is it important to the public? Common mistakes: too simple, too broad, too narrow, can't be proved, already is proven and common knolwedge. 100 | 101 | Questions: Problem/Impact/Cause/Solution categories. 102 | 103 | Impact: "News you can use". How can someone act on this info? 104 | 105 | 106 | ##### Speakers 107 | 108 | Eva Constantaras is a data journalist specialized in building data journalism teams in developing countries. These teams that have reported from across Latin America, Asia and East Africa on topics ranging from displacement and kidnapping by organized crime networks to extractive industries and public health. As a Google Scholar and a Fulbright Fellow, she developed a course for investigative and data journalism in high-risk environments. [@evaconstantaras](https://twitter.com/evaconstantaras) 109 | 110 | Adriana Gallardo is an engagement reporter at ProPublica. This means she works on investigative series to fuel the reporting process with communities. Last year, she led engagement and reported for the Lost Mothers series which illuminated a national disgrace: the U.S. has the worst rate of maternal deaths in the developed world, and up to 60 percent of those deaths are preventable. Prior to ProPublica, she oversaw a series of 15 projects at NPR member stations and traveled the country with StoryCorps. In her hometown Chicago, she spent over a decade working as a journalist and radio producer. 111 | 112 | Anjeanette Damon is the Reno Gazette Journal's government watchdog reporter. Damon has been covering communities in Nevada for two decades for both the Gazette Journal and the Las Vegas Sun. During her career, Damon has covered the police beat, the city hall beat and state and national politics. Damon has a journalism degree from the University of Nevada, Reno and a master in public administration from the Harvard Kennedy School of Government. 113 | 114 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3548/)_ -------------------------------------------------------------------------------- /02-04-life-after-factfinder.md: -------------------------------------------------------------------------------- 1 | # Life after FactFinder 2 | 3 | [Tipsheet](https://www.dropbox.com/s/sihtg7l2b6mxbb3/RonCampbellFactFindertipsheet.pdf?dl=0) 4 | 5 | * Ronald Campbell 6 | * Paul Overberg 7 | * Ally Burleson-Gibson 8 | 9 | ##### Description 10 | 11 | Just when you had finally learned American FactFinder’s many foibles, the Census Bureau is shutting the site down. We’ll introduce you to its successor, data.census.gov, take you on a test drive and dish some tricks and secrets we’ve discovered. The new site will become the main source for census information in June. 12 | 13 | ## Notes 14 | 15 | **Ally Burleson-Gibson (ABG):** How are we going to move to a single-search approach? 16 | 17 | Want to replace FactFinder — it doesn't have all the data in it. 18 | 19 | Data, software, access to API. 20 | 21 | Ron Campbell 22 | 23 | "Always read the footnotes." 24 | 25 | How far back will we go (1970s)? 26 | 27 | **ABG:** TBD. 28 | 29 | 30 | ##### Speakers 31 | 32 | Ronald Campbell is data editor for the NBC Owned Television Stations. He previously created the computer-assisted reporting program at the Orange County Register. He has won the IRE Award, the Loeb Award and placed in the Philip Meyer Award. He lives in Orange County, CA, with his wife, kids and cat. When not getting frustrated with databases he gets frustrated rock-climbing. [@campbellronaldw](https://twitter.com/campbellronaldw) 33 | 34 | Paul Overberg is a data reporter at the Wall Street Journal and a member of its investigative team. He focuses on economic and demographic stories but helps reporters working on many subjects. He previously worked at USA TODAY, where he worked on projects that won the Philip Meyer Award for Precision Journalism and the National Headliner Award. [@poverberg](https://twitter.com/poverberg) 35 | 36 | Ally Burleson-Gibson has worked with the Census Bureau since 2012, first as a data dissemination specialist (DDS) and now as part of the communications team for the Center for Enterprise Dissemination Services and Consumer Innovation (CEDSCI). Ally provides presentations and training on the Bureau’s project to streamline users’ access to Census Bureau data on Census.gov, and gathers user feedback for an intuitive, customer-focused data dissemination experience. 37 | 38 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3574/)_ -------------------------------------------------------------------------------- /02-05-irs-nonprofit-data.md: -------------------------------------------------------------------------------- 1 | # Tips for harnessing new IRS nonprofit data 2 | 3 | [Slides](http://bit.ly/2C9DuQV) | [Tipsheet](http://bit.ly/NICAR18-IRS) 4 | 5 | * Andrea Fuller 6 | * Todd Wallack 7 | 8 | ##### Description 9 | 10 | Did you know the IRS recently started uploading most nonprofits' annual financial filings to the web in electronic format? We'll give you an overview of how to download the filings and extract key fields into a database. We'll also cover some of the tricks and pitfalls in working with the new IRS data. 11 | 12 | This session will be most useful if: You are either familiar with basic programming or can track down a programmer later to help. 13 | 14 | ## Notes 15 | 16 | **Todd Wallack (TW):** Lots of info in 990s. 17 | 18 | Now they're available electronically. More than 2 million filings. 19 | 20 | Not everything is there: 21 | 22 | * Orgs that file by paper 23 | * Tiny groups that file 990-N 24 | * Groups that don't file at all (churches, etc.) 25 | * Forms that haven't been processed (takes a few months) 26 | * Forms processed before 2011 27 | 28 | Format: XML 29 | 30 | There are index files for each year, JSON and CSV. They're slightly different. 31 | 32 | Approaches to downloading the data: 33 | 34 | **TW:** Download all index files, download XML files from indexes, loop through XML files to grab key info, save data in new CSV files. (on GitHub) 35 | 36 | **Andrea Fuller (AF):** Download JSON, parse into SQL table, get URLS and download. 37 | 38 | ##### Speakers 39 | 40 | Andrea Fuller is an investigative reporter for The Wall Street Journal in New York City where she specializes in computer-assisted reporting. She is a North Carolina native and previously worked for Gannett Digital, The Center for Public Integrity, and The Chronicle of Higher Education. [@anfuller](https://twitter.com/anfuller) 41 | 42 | Todd Wallack is a data journalist and investigative reporter for the Boston Globe’s Spotlight team. He has won national awards for his work on public records and been a finalist for the Pulitzer Prize. Prior to joining the Globe in 2007, he worked for the San Francisco Chronicle, Boston Herald, and Dayton Daily News. [@twallack](https://twitter.com/twallack) 43 | 44 | 45 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3593/)_ -------------------------------------------------------------------------------- /03-01-lat-map-maker.md: -------------------------------------------------------------------------------- 1 | # Introducing the L.A. Times Map Maker: Make maps faster 2 | 3 | [GitHub](http://github.com/datadesk/web-map-maker) 4 | 5 | * Jon Schleuss 6 | 7 | ##### Description 8 | 9 | The Los Angeles Times' Map Maker was created and released to help journalists make locator and other maps faster. Whether it's a quick web locator map or a more detailed map of, say, all the rides at Disneyland, the Map Maker allows you to download both images and vector files to create better maps faster. Jon Schleuss from the Times will demo how to install, use and customize Map Maker for your newsroom. 10 | 11 | ## Notes 12 | 13 | Basic purpose: Make locator map quickly. 14 | 15 | Setup config.js and config.yaml. Uses nextzen (mapzen successor). 16 | 17 | Can get coordinates from Google Maps if you're not geolocating. 18 | 19 | Text is large so it scales on mobile. 20 | 21 | Can drag bottom corner for custom sizes. 22 | 23 | Can upload basic GeoJSON. 24 | 25 | Layer palette to turn on and off different layers. 26 | 27 | Add custom labels. 28 | 29 | Download as png or svg. 30 | 31 | RGB to CMYK converter. 32 | 33 | Use Jquery in the console to change imported JSON. 34 | 35 | 36 | 37 | ##### Speakers 38 | 39 | Jon Schleuss is a data reporter, mapmaker and graphics artist at the Los Angeles Times. He enjoys the challenge of telling stories through maps: Los Angeles' homeless population, Girl Scout cookies, California's precinct election results, housing costs and more. When he's not a a deadline he's improving the Times' mapmaking tools. [@gaufre](https://twitter.com/gaufre) 40 | 41 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3604/)_ -------------------------------------------------------------------------------- /03-02-disaster-data-money.md: -------------------------------------------------------------------------------- 1 | # Data of disasters: Following the money 2 | 3 | * Lee Zurik 4 | * Matt Dempsey 5 | * Omaya Sosa 6 | 7 | ##### Description 8 | 9 | Natural disasters such as hurricanes, floods and wildfires caused record damage to U.S. cities in 2017. These panelists — veterans of some of the worst storms in history, Hurricanes Harvey, Maria and Katrina — will cover resources to help you dig into the problems left in the disaster's wake, including disaster relief efforts, using databases and mapping to show the extent of damage in certain areas and neighborhoods, and how to follow the money. 10 | 11 | ## Notes 12 | 13 | ### Matt Dempsey 14 | 15 | [Slides](https://docs.google.com/presentation/d/1ShWU96ghA7Ht_sDSpgYZnLkjlBOahTt4M5z8UTpAMZs/edit#slide=id.p) 16 | 17 | Natural disasters are tests for your newsroom. How well it knows your community. What are the vulnerable points? 18 | 19 | You maybe can't predict when thery're going to happen, but you know they're going to happen. 20 | 21 | #### Flooding 22 | 23 | * Floodplain shapefiles 24 | * dam conditions, ACOE ratings 25 | * NFIP claims (by blockgroup/county/community) 26 | * buyout data 27 | * rain data from NOAA 28 | * Number of shelters/shelter plan 29 | 30 | #### Fires 31 | 32 | * Shapefiles of previous fires 33 | * WUI shapefiles from University of Wisconsin (wildland-urban interface) 34 | * Raster maps to see changes in vegetation 35 | * Check building codes to see allowed development, compare with Firewise guidelines 36 | 37 | #### Earthquakes 38 | 39 | * Lots of maps (fault, hazard, landslide) 40 | * Realtime feeds 41 | * Historical data 42 | 43 | #### Hurricanes 44 | 45 | * Historical path data 46 | * Historical strength and stats 47 | * Evacuation plans and procedures 48 | 49 | #### Tornadoes 50 | 51 | * Historical trends and locations 52 | * Siren locations, repairs, tests, usage plan 53 | * Building codes 54 | 55 | #### Chemical release/explosion 56 | 57 | * Local planning committee 58 | * Tier II chemical inventories 59 | * ECHO (epa data search) 60 | * OSHA 61 | * Rtk.net.RMP "Right to know network" Risk management plan data. 62 | 63 | #### Blizzards 64 | 65 | * Plows 66 | * Plowing plans 67 | 68 | #### General prep 69 | 70 | * OEM 71 | * Assessor data 72 | * Building codes 73 | * Disaster plans 74 | * Academic studies 75 | 76 | --- 77 | 78 | ### Investigating a disaster when nothing works 79 | 80 | Omaya Sosa 81 | 82 | No power, water, internet, cellphone, food, ports and airports closed, almost no media outlets (1 radio station), most roads blocked, govt. collapsed, no official data. 83 | 84 | Local journalists were also victims. 85 | 86 | Temporary newsroom. 87 | 88 | Back to basics + creative use of technology = great, high-impact, necessary stories. 89 | 90 | "We had to move on the ground everywhere" to interview people. No way to make a phone call. 91 | 92 | How did it start? Common sense. 16 deaths being reported by government didn't make sense. Interviewed two doctors who had 9 casualties in 1 day. 93 | 94 | On the ground sources. Some official sources, much later, not very useful. 95 | 96 | Other sources: Missing persons reports, radio, community leaders, social media. 97 | 98 | Took picture of picture on cellphone because no way to send it. 99 | 100 | Started with basic spreadsheet. Added to it as much as possible. 101 | 102 | Were able to prove the government wrong with data. 103 | 104 | Online form for people to report deaths they knew about. 105 | 106 | Get to know your community. Get to know the details of your systems. Don't lose perspective, the data isn't the story. Marry data with reality (deaths listed in hospitals when people died at home). Humanize the data. 107 | 108 | --- 109 | 110 | ### Data of disasters 111 | 112 | Lee Zurik 113 | 114 | Long-term, after disasters. The bigger the disaster, the longer the money's going to be spent. 115 | 116 | * Check registers before and after 117 | * Salary and overtime 118 | * Business corporations 119 | * Campaign finance (also before/after) 120 | 121 | What I'm looking for: 122 | 123 | * Who's making the most money 124 | * Do campaign contributions = contracts? 125 | 126 | Data only tells some of the story. Will lead to other documents, invoices, build your own dataset. 127 | 128 | Case study — Plaquemines Parish Schools 129 | 130 | Took years to rebuild after Katrina. Used check register to build Pivot Table, found most money went to one contractor, requested invoices. Construction management company. 131 | 132 | Invoice said they spent 200 hours in a month to maintain project files. 133 | 134 | Federal procurement data system has reports on disasters. 135 | 136 | --- 137 | 138 | ## Questions 139 | 140 | Community Block Grant money goes from federal to state, how to keep track of it after it goes to the state? 141 | 142 | FEMA appeals drags out? 143 | 144 | Check register? 145 | 146 | Fast moving disaster, good working relationships? 147 | 148 | Data on how disasters affect undocumented or other off-the-grid people? 149 | 150 | ###### Speakers 151 | 152 | Lee Zurik is an IRE Board Member and Evening Anchor and Chief Investigative Reporter at WVUE-TV in New Orleans. He also serves as Director of Investigations for Raycom Media. He's been honored with local and national awards including the Peabody, duPont-Columbia, and IRE. Before Hurricane Katrina, Lee was a sports anchor. He taught himself to be an investigative reporter by reading IRE resources (books and tipsheets) and attending nine IRE conferences. [@leezurik](https://twitter.com/leezurik) 153 | 154 | Matt Dempsey is the data reporter for the Houston Chronicle. Matt previously worked for the Arizona Republic and Atlanta Journal-Constitution. He has worked on projects involving wildfires, state pensions, and the chemical industry. His passion for public records frequently leads to disclosure of data from all levels of government. His series Chemical Breakdown won the 2016 IRE Innovation award and the National Press Foundation's "Feddie" award. [@mizzousundevil](https://twitter.com/mizzousundevil) 155 | 156 | 157 | Omaya Sosa is an award winning journalist, entrepreneur, and adventurer with 20 years of experience. She is co-Founder of Puerto Rico’s Center for Investigative Journalism. Her recent work on the underreported death toll of hurricane Maria has been republished and quoted by more than a dozen media outlets. Omaya is also co-Founder of NotiCel.com digital news outlet sold in 2016. Before her digital media life she worked at El Nuevo Día newspaper and radio news station Red 96. [@omayasosa](https://twitter.com/omayasosa) 158 | 159 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3596/)_ -------------------------------------------------------------------------------- /03-03-environmental-hazards.md: -------------------------------------------------------------------------------- 1 | # Uncovering environmental hazards faced by urban children 2 | 3 | * Chris Zubak-Skees 4 | * Molly Peterson 5 | * Dylan Purcell 6 | 7 | ##### Summary 8 | 9 | Decaying, pollution-choked schools, old homes with lead paint, toxic soil left behind by shuttered factories and even urban heat islands — all environmental dangers faced by children. This panel will show how to uncover these lurking dangers in your own communities by analyzing often-overlooked data sources and, when data is lacking, doing your own testing. 10 | 11 | ## Notes 12 | 13 | ### Schools near roads 14 | 15 | Chris Zubak-Skees 16 | 17 | "Can we find all the places where schools are next to busy roads?" 18 | 19 | Ultrafine particles spike near highways. 20 | 21 | Wanted to nationalize the story and bring it out of academia. 22 | 23 | FHA Office of Highway Policy Administration. NCES school dataset. 24 | 25 | NCES school data takes addresses, geocodes them, manually correct some of the results, and that's all. Wrong about 16 percent of the time. 26 | 27 | Theshold of 500 means being off by a few hundred feet makes your analysis bad. 28 | 29 | Things they tried: 30 | 31 | Loading all the parcel shapefiles in the country and matching them to school locations. How do you figure out which parcel is the school? 32 | 33 | Retrieving Google's database of school points of interest and matching them to NCES data. Both of those are messy. 34 | 35 | Neural networks to identify things that look like schools. Terrapattern. A lot of schools don't look like schools anymore, e.g. charter schools. 36 | 37 | Ran addresses through Google geocoder instead. 38 | 39 | ### Urban heat & children's health risk 40 | 41 | Molly Peterson 42 | 43 | Wanted to know how the people most vulnerable to heat were affected by it. 44 | 45 | "I see change", crowdsourced climate data. 46 | 47 | Project design/sensor design. Arduino. 48 | 49 | Cost around $60 each. 50 | 51 | Indoor sensors got temp and humidity every 5 minutes, compared with outside data from nearby airport. 52 | 53 | Also published and mapped ER visits for heat-related visits. 54 | 55 | Existing data don't capture and existing policies don't address what people are actually going through. 56 | 57 | Community engagement project: "Where are the hottest and coolest places?" Collected text message data. 58 | 59 | Failures: There's no single prescribed outcome. 60 | 61 | Scientists offered lots of feedback. 62 | 63 | Renters can use this data in rental disputes. 64 | 65 | ### Toxic City 66 | 67 | [Tipsheet](https://www.dropbox.com/s/q5um23esr707coz/Dylan%20Purcell%20tipsheet.pdf?dl=0) 68 | 69 | Dylan Purcell 70 | 71 | Lead paint 72 | 73 | Lead paint in rental homes. Only about 500 of 2,700 kids were helped by city health departments. 74 | 75 | Most they could get was ZIP code breakdown. Philadelphia started "lead court" where they'd bring landlords in. Slap on the wrist, but a roadmap to addresses. 76 | 77 | Tainted soil 78 | 79 | No database of toxic soil exists, so they dug up dirt and made one. 500 samples from 114 locations. 3/4 of properties had hazardous lead. 80 | 81 | Found locations of former smelters. 82 | 83 | Old news clips helped. 84 | 85 | Boom in housing stirred up dirt. 86 | 87 | Construction workers weren't taking basic steps like watering down dirt. 88 | 89 | Used satellite imagery to look at it over time. 90 | 91 | After story, the state tested. 92 | 93 | Children at risk in school 94 | 95 | Mostly lead paint and asbestos. Some mold and pests. Drinking water concerns. Work orders often get delayed. 96 | 97 | ### Questions 98 | 99 | Lead risk score? Do other cities have Lead Court? 100 | 101 | How do you determine threshold for emissions? 102 | 103 | ##### Speakers 104 | 105 | Chris Zubak-Skees leads a small team of computational journalists as data editor at the Center for Public Integrity. He was previously the Center's developer, doing analysis and interactive journalism with code. He has been part of teams that have won Meyer, Loeb, Goldsmith and Malofiej awards. 106 | 107 | Molly Peterson ([@Mollydacious](https://twitter.com/Mollydacious)) reports on climate and environment for public media and print, including High Country News, NPR, CodeSwitch, KQED, & PRI’s The World. In 2009, while at Southern California Public Radio, she was an IRE finalist in radio for a project investigating faulty pumps in New Orleans. Her latest project documented extreme heat in LA’s San Fernando Valley. She has worked for ISeeChange, a citizen climate observation platform funded in part by NASA. 108 | 109 | Dylan Purcell is a data reporter on the Inquirer’s investigative team. He has uncovered low conviction rates for violent crimes, widespread cheating on state tests, and the high rate of newborn deaths after heart surgery at a for-profit hospital. He was a member of the reporting team that won a Pulitzer and IRE award for examining violence in Philadelphia’s public schools. Recently, he’s focused on the dangers of lead exposure faced by urban children. [@dylancpurcell](https://twitter.com/dylancpurcell) 110 | 111 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3693/)_ -------------------------------------------------------------------------------- /03-04-guns.md: -------------------------------------------------------------------------------- 1 | # Shoot us straight: Correctly using data and docs on guns 2 | 3 | [Tipsheet](http://bit.ly/NICAR18-guns) | [Slides](http://bit.ly/NICAR18-gunSlides) 4 | 5 | * Nick Penzenstadler 6 | * Matt Drange 7 | * Kim Smith 8 | 9 | ##### Description 10 | 11 | Shotspotter? Federal Firearms Licenses? Trace data? We’ll help you sort out what data and documents you should routinely gather for reporting on guns, where you’re wasting your time and when it’s time to build from the ground up. We’ll cover common pitfalls and how to accurately cover the gun industry, firearm violence and regulatory agencies. You’ll leave with: sample data on guns, tips on finding sources and tips on reliable data streams. 12 | 13 | ## Notes 14 | 15 | ### Chicago Crime Lab 16 | 17 | Kim Smith 18 | 19 | Helps out all of Chicago 20 | 21 | Gun trace report 2017 22 | 23 | Fiiling in missing data: 24 | 25 | ATF trace data: 1st retail sale. 26 | 27 | Data on transfers: Ethnographics interviews; jail survey. 28 | 29 | Recovered gun: Police admin data 30 | 31 | Did prison/jail surveys of gun offenders. 32 | 33 | How do prohibited possessors acquire guns? 34 | 35 | Not gun shows, internet or theft, according to their data. Caveat: Only applies to person convicted; intermediaries may have. 36 | 37 | Gun violence in Chicago 2016 report 38 | 39 | "synthetic control" to figure out if gun violence interventions are working. 40 | 41 | #### Shotspotter 42 | 43 | Matt Drange 44 | 45 | Know what the agency can see on its end: Wav files and location 46 | 47 | Type,ID,Date,Time,Address,Round,CAD,Beat 48 | 49 | PD might say shotspotter owns the data, but ownership may not affect what's public 50 | 51 | #### FFLs: inspections, Tiahrt and revocation 52 | 53 | Nick Penzenstadler 54 | 55 | FFL lists available from ATF. 56 | 57 | From FOIA: Inspection history, violations, narrative, corrective action. 10,000 inspections in 2011, 71 revocations. 58 | 59 | Some FFLs get shut down, transfer their license, then reopen cleanly. 60 | 61 | Tiahrt amendment means trace data is hard to get. 62 | 63 | Background checks, NCIC and NICS 64 | 65 | #### Online gun sales 66 | 67 | Matt Drange 68 | 69 | Building your own dataset. 70 | 71 | Manually: Armslist or Facebook. 72 | 73 | What's the scope of your universe? Zero in. Be transparent with the limits of your data. 74 | 75 | "Sometimes getting a bunch of really bad examples is enough." 76 | 77 | Know the jargon before you start collecting data. (Searching "AR-15" on ebay won't get you much, but searching the barrel diameter might) 78 | 79 | #### Questions 80 | 81 | Data sources on gun violence 82 | 83 | **Kim Smith (KS):** CDC WISQARS is comprehensive. Non-fatal shootings hard to track over time. 84 | 85 | Gun violence archive 86 | 87 | Harrassed for joining gun groups on Facebook? 88 | 89 | **Matt Drange (MD):** Yes. 90 | 91 | Demographic info on gun ownership? 92 | 93 | None, really. 94 | 95 | What makes a gun illegal? 96 | 97 | Someone having one who isn't supposed to. In Illinois, FOID Card is required. 98 | 99 | The Trace made gun tracking data they collected available. 100 | 101 | Gun buyback programs? 102 | 103 | Not much data. 104 | 105 | Spikes in gun sales after shootings? 106 | 107 | May be available to county level. Probably only numbers though. 108 | 109 | What public records are available? 110 | 111 | Inspections of FFL. DOJ does it in California. 112 | 113 | 114 | ##### Speakers 115 | 116 | Nick Penzenstadler is a reporter on USA TODAY's investigative team based at the paper's Denver bureau. [@npenzenstadler](https://twitter.com/npenzenstadler) 117 | 118 | Matt Drange is a staff writer at Forbes magazine, where he reports on Donald Trump's business dealings. Before joining Forbes in 2016, Matt worked at The Center for Investigative Reporting, covering technology and guns. He joined IRE as a student in 2010. Matt's proposal to allow student members of IRE to vote for the Board of Directors was adopted by membership in 2015. [@mattdrange](https://twitter.com/mattdrange) or #FOIAFriday hashtag. 119 | 120 | Kim Smith is a Sr. Research Manager at the University of Chicago Crime Lab, where she manages the multi-city gun markets project, work done in partnership with affiliates in six major U.S. cities – Chicago, LA, Boston, NY, Baltimore, and New Orleans. Kim also provides implementation support to a Chicago Police Department initiative that brings together police officers and analysts from the Crime Lab to integrate crime intelligence, data analysis, and technology. 121 | 122 | 123 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3583/)_ -------------------------------------------------------------------------------- /03-05-entitled-to-a-spreadsheet.md: -------------------------------------------------------------------------------- 1 | # I'm entitled to a spreadsheet, dang it! 2 | 3 | * Steven Rich 4 | * Sarah Ryley 5 | * Annie Waldman 6 | 7 | #### Description 8 | 9 | Public records requests for data come with their own unique set of challenges — from agencies that insist their databases can only export documents, to those that will only send a fraction of the fields that are disclosable under the law. We’ll give you tips on writing rock-solid public records request for data, and how to respond to the common excuses used to deny these requests. Topics will include: Common sources that prove the agency has a database and can export data; statute, case law and language to make it more likely you’ll receive data (versus a PDF, or even worse, scans of redacted printouts); how to negotiate the confidentiality issues that come with medical and educational data; and managing the mass FOIA request project. We’ll provide a sample data request, a few good war stories, and links to helpful resources. 10 | 11 | ## Notes 12 | 13 | ### Sarah Ryley 14 | 15 | [Slides](https://drive.google.com/file/d/1rernesalk-vWpI5TW16CUE2G6zQkRJG6/view) 16 | 17 | * Make sure it's not already online. 18 | * Treat everything you write as a legal document. 19 | * All communications should be clear, concise, professional and well-formatted — you're writing for both people with no data literacy as well as the data folks who will generate the request. 20 | * Followup phone calls with emails summarizing the conversation so you have a record. 21 | * Request should preempt common objections. 22 | * Negotiation can strengthen your appeal. Asking for data dictionary and record layout helps make an informed effort to narrow your request. 23 | * Check validity of exemptions or falsehoods. 24 | * Insist on determination in writing. 25 | 26 | Acknowledge existence of online data in a request. 27 | 28 | Common objections: 29 | 30 | We don't keep it as data 31 | Can only export 5 fields 32 | Overly burdensome 33 | Etc. 34 | 35 | Cite evidence that data exist and can be exported. RFQ/P, contracts can detail "current data environment". 36 | 37 | Documentation: 38 | 39 | Record layout, ERD, Data Dictionary, User Manuals, Report Writing Guides 40 | 41 | You can look at released records to see what exists. 42 | 43 | Tips for overly burdensome. 44 | 45 | Data format does not limit disclosability. 46 | 47 | Make sure you get everything, frame followups as missing records, not questions. 48 | 49 | ### Healthcare and ED data 50 | 51 | Annie Waldman 52 | 53 | [Slides](http://bit.ly/2p35dd4) | [Tipsheet](http://bit.ly/2FHxlwM) 54 | 55 | Main reasons: exempt under other laws, privacy, HIPAA, FERPA, other delays 56 | 57 | #### Before: 58 | 59 | * Research 60 | * be friendly 61 | * find internal data wizard 62 | * ask first 63 | * request itemized cost estimate 64 | * negotiations are crucial 65 | * know privacy restrictions 66 | * research AG decisions 67 | 68 | #### Health data: 69 | 70 | Claimed that b6 exemption and HIPAA were essentially the same. Under b6 court has to do a balancing test. HIPAA doesn't. 71 | 72 | Applies to Health care organizations, any org that bills/transmits health care data, hybrid entities (schools, prisons, etc.). 73 | 74 | HIPAA lasts even after someone dies. 75 | 76 | Personally identifiable info. 77 | 78 | But you can still get some data. And people are entitled to their own records. 79 | 80 | Does allow for De-identified data. Can ask for personally identifiable data to be de-identified or dummy ids. 81 | 82 | Can get limited or restricted-use data set with, e.g. ZIP codes. Will have to sign data use agreement. 83 | 84 | If limited data sets are not enough, institutional review board. 85 | 86 | #### Education data: 87 | 88 | FERPA 89 | 90 | Can still get lots of data; "directory data". 91 | 92 | FERPA also allows for de-identified data and limited-use data. 93 | 94 | ### Mass FOIA 95 | 96 | Steven Rich 97 | 98 | Same request to a bunch of same kinds of agencies, or same request to different types of agencies. 99 | 100 | Fired/Rehired 101 | 102 | Law enforcement data: 103 | 104 | * Not a ton of standardization 105 | * Kept in different formats 106 | * Databases no one ever uses 107 | * Certain exemptions apply 108 | 109 | Approach: 110 | 111 | * Check the internet 112 | * Ask the agency if they'll turn it over 113 | * Familiarize with state's laws 114 | * File request 115 | * Pester like crazy 116 | 117 | Understanding what you can get: 118 | 119 | * Every state has different exemptions _and_ legal precedent 120 | * Generally speaking there's discretion 121 | * Ask for things that are exempt 122 | 123 | Request all the things: 124 | 125 | * Request the same thing from everyone 126 | * Mass email (mail merge) or tailored request with specific legal language 127 | * Automate if you can 128 | * Know your deadlines 129 | 130 | Fight for the things: 131 | 132 | * Request is a first step 133 | * Create alerts for deadlines 134 | * Pick up the phone 135 | * Use a spreadsheet (or tracker) 136 | * Be nice, then get mean 137 | * Push for correct format 138 | 139 | If you don't get things: 140 | 141 | * Some battles are worth fighting 142 | * Others aren't 143 | * Always be appealing 144 | * Prioritize based on your time 145 | * Name and shame 146 | * Apply pressure 147 | 148 | Now what: 149 | 150 | * Standardize 151 | * Pick up the phone 152 | * Get the collection methodology 153 | * Understand that things can change 154 | * Don't force it 155 | 156 | Stuff you should look out for: 157 | 158 | * 65,536 records 159 | * Changing definitions in the data 160 | * Changing fields in the data 161 | * Definitional issues in general 162 | * PDF data 163 | * Missing fields (is a missing field a caveat for one you are using?) 164 | 165 | Aftermath: 166 | 167 | * Pick your targets 168 | * Don't always focus on outliers 169 | * Put everything in context 170 | * Check legal issues 171 | * Talk to experts 172 | * Make what data you can public 173 | 174 | ### Questions 175 | 176 | **University hasn't responded (sexual assault complaints), what should I do?** 177 | 178 | Talk to expert in the state, see if there are deadlines. They may have reported it somewhere else. Can file "constructive denial" appeal. Contact state attorney general. 179 | 180 | **Data that's statutorily limited who it can share it with. Can you get it by partnering?** 181 | 182 | Probably can't partner due to data use agreement, maybe can get it de-identified. 183 | 184 | **Disagreeing with legal precedents, what do you do?** 185 | 186 | Google state public records law. Attorney general can help. 187 | 188 | **When should you appeal?** 189 | 190 | Appeal immediately. Can get denied because of deadlines. 191 | 192 | **When do you standardize?** 193 | 194 | Immediately. Don't wait for them to come back. 195 | 196 | **Agency that charges a lot of money and also gives it to competitors?** 197 | 198 | Shame them. Technically nothing illegal with giving it to competitors. 199 | 200 | #### Speakers 201 | 202 | Steven Rich is the database editor for investigations at The Washington Post. He's worked on investigations probing the National Security Agency, tax lien sales, asset forfeiture, policing and college athletics. He has been a reporter on two teams awarded Pulitzer Prizes, in 2014 for Public Service and in 2016 for National Reporting. Steven is a graduate of Mizzou and Virginia Tech. He was elected to IRE’s Board of Directors in 2015. [@dataeditor](https://twitter.com/dataeditor) 203 | 204 | Sarah Ryley is an investigative reporter at The Trace, a non-profit news outlet that covers gun issues. Previously, she was the data editor at the New York Daily News, where her work triggered numerous criminal justice reforms. Her series on the NYPD's abuse of eviction laws, done in partnership with ProPublica, was awarded the Pulitzer Prize for Public Service. She has also taught data journalism and investigative journalism at CUNY and The New School. [@MissRyley](https://twitter.com/MissRyley) 205 | 206 | Annie Waldman ([@AnnieWaldman](https://twitter.com/AnnieWaldman)) is a reporter at ProPublica, working on both data and education projects. She is based in Brooklyn. 207 | 208 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3572/)_ -------------------------------------------------------------------------------- /03-06-data-culture.md: -------------------------------------------------------------------------------- 1 | # Creating a data culture that lasts 2 | 3 | [Slides](http://bit.ly/dataculture_nicar18) 4 | 5 | * Annie Daniel 6 | * Tom Meagher 7 | * Mark Nichols 8 | 9 | ##### Description 10 | 11 | How can news organizations create a sustainable culture for data and interactive journalism? In some cases, when one dynamic, charismatic CAR leader leaves for a new job, the data team left behind suffers and struggles to rebuild. Get practical advice and tips on ways to build a data culture in your newsroom that's sustainable and independent of any one staffer. How can data teams make a lasting impact on newsroom leadership and culture? 12 | 13 | ## Notes 14 | 15 | How do we make sure we're not seen as expendable? 16 | 17 | About 42% of reporters use data regularly, about half of organizations have someone doing data. But if there's someone who does do data, they're probably the only one. 18 | 19 | Appetite for data training varies widely. Most people don't want to leave their newsrooms to learn. 20 | 21 | Who uses data? 22 | 23 | * Beat or projects reporters who learned skills on their own or through IRE 24 | * Designated data reporter/editor (singular) 25 | * Team of several journalists dedicated to data reporting/interactives/news apps 26 | 27 | ### If you're the only one, what can you do? 28 | 29 | **Mark Nichols (MN):** Be the data evangelist. Invite yourself to meetings. Create usable in-house databases. Brown-bag lunches. 30 | 31 | ### How has support for data changed? 32 | 33 | **MN:** There's more data available now. Journalists are interested in data earlier. 34 | 35 | ### What does the future look like? Is it necessary that we still have data reporters/editors, or will everyone be one? 36 | 37 | **Annie Daniel (AD):** It's possible that will happen, reporters are getting more comfortable doing basic things. Allows people with data skills to do more interesting things. 38 | 39 | **Tom Meagher (TM):** If there isn't someone with that title, it's easy to overlook it altogether. It isn't part of the day to day job for everyone yet. 40 | 41 | Audience: What are our aspirations for what data journalism is? If it's just getting basic numbers from spreadsheets, sure. But maybe we can go past that. 42 | 43 | **AW:** Reporters report on the data, we publish it. 44 | 45 | ### When it's viewed as a prestige team, how do we work better day to day? When someone leaves, who's responsible. 46 | 47 | **MN:** Make the data seem important to reports and editors on every story. Editors are important in setting the culture. 48 | 49 | **TM:** Make you colleagues' jobs easier. 50 | 51 | **AD:** Hiring is important. Mentorship/intern programs. Let newer people bother you. 52 | 53 | ### Tools that fade away 54 | 55 | **AD:** Two kinds of people, people who adapt and people who fix it. User testing. They may not want to bother you with "It's broken." 56 | 57 | **MN:** Sometimes you have to be your own cheerleader. 58 | 59 | **TM:** Envy can be a very powerful marketing tool. 60 | 61 | ### Fight for your byline 62 | 63 | **AD:** Sometimes editors don't think about it; they're more in contact with the reporter. It probably isn't malicious. Consider writing a policy. 64 | 65 | ### What about double bylines on radio and broadcast? And distributed team? Pushing dealines. 66 | 67 | **MN:** Editor helps. Show value of data to your story. 68 | 69 | **TM:** Data isn't magic and editors need to understand what it can do and what its limitations are. 70 | 71 | ### How can you evolve 72 | 73 | **AD:** At the end of the day we're reporters. Learning in public. Most of our job is calling people and asking really dumb questions. 74 | 75 | ### IRE/NICAR collective byline policy? 76 | 77 | Good idea. 78 | 79 | ### How important is the title? 80 | 81 | Title doesn't make you known. 82 | 83 | 84 | ##### Speakers 85 | 86 | Annie Daniel makes charts, maps and web apps at The Texas Tribune. Prior to joining the Tribune, she interned with the Washington Post’s graphics team creating charts and graphics for print and web. She graduated from UNC-Chapel Hill where she studied journalism and political science. Annie lives in Austin, Texas where she knits, bakes and reads science fiction. [@anieldaniel](https://twitter.com/anieldaniel) 87 | 88 | Tom Meagher is the deputy managing editor at The Marshall Project, where he leads a team of designers, developers, visual journalists and reporters covering the criminal justice system. He's part of the team behind "Klaxon," an open-source tool for monitoring the web for changes. A veteran reporter and editor, he previously led an interactive team for the Digital First Media newspaper chain and was the data editor at the Newark Star-Ledger. [@ultracasual](https://twitter.com/ultracasual) 89 | 90 | Mark Nichols is data journalist on the national desk of USA TODAY. He has worked as a data specialist for the digital reporting team at WCPO-TV in Cincinnati, OH, and was the computer-assisted reporting coordinator for The Indianapolis Star for nearly 20 years. [@nicholsmarkc](https://twitter.com/nicholsmarkc) 91 | 92 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3578/)_ -------------------------------------------------------------------------------- /04-01-sensor-journalism.md: -------------------------------------------------------------------------------- 1 | # Sensor journalism: How do we do it and what are the limits? 2 | 3 | [Slides](https://docs.google.com/presentation/d/1O8wpjEtVmnkb0RWtI0r626TPtJdqw11BLmaFYtydfaQ/edit#slide=id.p) 4 | 5 | * Michael Corey 6 | * Marianne Bouchart 7 | * Denise Lu 8 | * Kelly Calagna 9 | 10 | ##### Description 11 | 12 | Journalists are using a wide range of sensors to find stories, from spectrometers in space to cameras on drones to Raspberry Pis on the street. We'll talk about our processes, what we'd like to see more of, the challenges, and discuss the variety of ethical frameworks that may overlap and even contradict each other. 13 | 14 | ## Notes 15 | 16 | ### Marianne Bouchart 17 | 18 | Examples 19 | 20 | Breathe map from India uses Air Quality Index 21 | 22 | Losing Ground by ProPublica 23 | 24 | Speeding cops by Sun Sentinel 25 | 26 | Houston Chrinicle air toxins 27 | 28 | Cicada Tracker by radiolab 29 | 30 | Where do you get data? 31 | 32 | * Use existing data 33 | * Collect data by deploying sensors 34 | * Collect data from sensors deployed by public 35 | 36 | A data journalist's microguide to environmental data 37 | 38 | World Resources Institute 39 | 40 | NASA Earth Data 41 | 42 | AQCIN 43 | 44 | What kind of data is useful for what kind of stories? 45 | 46 | [Taxonomy of sensors](https://punkish.org/Taxonomy-of-Sensors?tag=science) 47 | 48 | What does the data look like? 49 | 50 | Most of the time, like any other type. May require research to understand the format. 51 | 52 | Project to build easy-to-use sensor toolkit for journalists. 53 | 54 | #### Environmental sensor journalism 55 | 56 | Kelly Calagna 57 | 58 | Desire to democratize data. 59 | 60 | EPA monitors are few and far between. They use algorithms to fill in the missing data. But it varies a lot based on factories, construction, geography. 61 | 62 | Making it accessible. Cost, energy, time, knowledge. 63 | 64 | * Arduino 65 | * LoRa Radio 66 | * LiPo Batteries and solar 67 | 68 | Deploying 69 | 70 | Homes, businesses or public land? Still figuring it out. 71 | 72 | Depends on project. 73 | 74 | Collecting particulate matter (PM10 & PM2.5) with optical sensors, but testing other types of sensors. 75 | 76 | Things to consider: 77 | 78 | How does this affect people's lives? 79 | 80 | How do we ensure accuracy? 81 | 82 | What happens if we discover harmful concentrations? 83 | 84 | #### Catching crooks with science 85 | 86 | Michael Corey 87 | 88 | Top water users in Bel-Air. "Which one is it?" Had ZIP Code but that's it. 89 | 90 | Using other bands of satellite imagery to detect water usage. 91 | 92 | Two-step process. 93 | 94 | NDVI - measures how healthy plants are by measuring photosynthesis. Higher ratio of infrared to red light means healthy plants. 95 | 96 | Doesn't directly measure water though. 97 | 98 | "Tasseled cap" transformation. Can get measure of soil moisture. 99 | 100 | Scatter plot of area and green/wet index. 101 | 102 | Other remote sensors: 103 | 104 | * Spectrometry 105 | * LIDAR 106 | * INSAR - magnetic 107 | * GRACE - gravity 108 | 109 | #### Imagery in maps for news 110 | 111 | Denise Lu 112 | 113 | Resolution 114 | 115 | Different sources have different resolutions. "15m" means 1px = 15m x 15m. Free imagery usually has higher revisit rates for lower res imagery. 116 | 117 | Pansharpening 118 | 119 | Panchromatic sharpening: merging high-res single-band with low-res multi-band 120 | 121 | Leveraging data for news 122 | 123 | Las Vegas shooting 124 | 125 | Google Earth 3D 126 | 127 | EgyptAir Crash 128 | 129 | SRTM - shuttle/satellite radar topography mission elevation data (land only) 130 | 131 | Natural Disasters 132 | 133 | Hurricane Maria 134 | 135 | Slippy Maps with vector data from National Hurricane Center. Automatic updates instead of out-of-date static maps. 136 | 137 | Binary raster files for sea surface temperature data. 138 | 139 | Satellite imagery to show power loss before and after. 140 | 141 | DigitalGlobe (for profit, but provides data during breaking news) for up-to-date satellite imagery. 142 | 143 | Katrina anniversary looking at 1 block. Pictometry. 144 | 145 | Timelapse, GOES-16. 146 | 147 | Satellite data products, Copernicus. 148 | 149 | Conflict zones, Islamic State scorched earth tactics 150 | 151 | Battle for Mosul, satellite band recipes 152 | 153 | Out-of-reach areas, South China Sea, North Korea 154 | 155 | Science and environment 156 | 157 | Land use/land cover 158 | 159 | Historical imagery, Lake Erie algae bloom 160 | 161 | Hyperlocal satellite imagery, Larsen C ice shelf crack 162 | 163 | 164 | 165 | ##### Speakers 166 | 167 | Michael Corey is Reveal's acting data editor. He leads a team of data journalists who seek to distill large datasets into compelling and easily understandable stories using the tools of journalism, statistics and programming. His specialties include mapping, the U.S.-Mexico border, scientific data and working with remote sensing. He previously worked for the Des Moines Register and graduated from Drake University. [@mikejcorey](https://twitter.com/mikejcorey) 168 | 169 | Marianne Bouchart is a data journalist based in France. She is the manager of the Data Journalism Awards. She is also the founder of HEI-DA, a nonprofit organisation promoting news innovation, the future of data journalism and open data. She runs data journalism programmes in various regions around the world as well as HEI-DA's Sensor Journalism Toolkit project. She also created the Data Journalism Blog in 2011. 170 | 171 | Denise Lu makes maps, charts, data visualizations and other knickknacks. She is currently a graphics editor at The New York Times and was previously a graphics editor at The Washington Post. [@DeniseDSLu](https://twitter.com/DeniseDSLu) 172 | 173 | Kelly Calagna is a postgraduate fellow at Northwestern University's Knight Lab. She is part of a team that is developing an environmental sensor project, called SensorGrid. As an environmental journalist, Kelly has covered topics from sea level rise in Puerto Rico to climate change research on the Tibetan Plateau. Kelly earned her MSJ from the Medill School of Journalism and has a communications degree from UCLA. [@kellycalagna](https://twitter.com/kellycalagna) 174 | 175 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3549/)_ -------------------------------------------------------------------------------- /04-02-python-tests.md: -------------------------------------------------------------------------------- 1 | # Python: Writing tests for your code 2 | 3 | [Github Repo](https://github.com/DallasMorningNews/python-testing-101) 4 | 5 | * Andrew Chavez 6 | 7 | ##### Description 8 | 9 | Every programmer makes mistakes. Writing good tests can help you avoid making them in production. In this session, you will learn how to use Python's built-in tools to automate testing so you can sleep better at night. 10 | 11 | This session is good for: People who use Python regularly and want to improve their workflow. 12 | 13 | ## Notes 14 | 15 | Code that exercises your code. 16 | 17 | Code that ensures consistent behavior of code you've already written. 18 | 19 | Gift for future developers who have to maintain your code. 20 | 21 | Blueprint for what your code is supposed to do. 22 | 23 | Cover things in tests that could keep you up at night. Things that interact with readers or staff. 24 | 25 | Regression testing: If something breaks, write a test to make sure it doesn't break again. 26 | 27 | Good tests test one narrow behavior. When the test fails, you know exactly what it wrong. 28 | 29 | In output: 30 | 31 | . = passed test 32 | E = error 33 | F = failed 34 | 35 | Coverage.py 36 | 37 | makefile to integrate command line into testing (e.g. check for existence of other files) 38 | 39 | Testing things with randomness or other behavior: 40 | 41 | E.g. site being up. Is your code wrong or is the site down? 42 | 43 | Mocking tricks test into returning a different thing than it otherwise would. 44 | 45 | Freezegun lets you freeze time. Other mocking libraries let you mock S3, etc. 46 | 47 | Setup, Teardown and Class functions for test cases can be useful. 48 | 49 | Continuous integration with Circle 50 | 51 | Push to git, tests pass, deployed. 52 | 53 | ##### Speakers 54 | 55 | Andrew Chavez is a senior computational journalist on the data and interactives team at The Dallas Morning News and a lecturer at the journalism school at The University of Texas at Austin. 56 | 57 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3631/)_ -------------------------------------------------------------------------------- /04-03-archiving-data-journalism.md: -------------------------------------------------------------------------------- 1 | # Archiving data journalism 2 | 3 | * Katherine Boss 4 | * Meredith Broussard 5 | * Nora Paul 6 | * Ben Welsh 7 | 8 | ##### Description 9 | 10 | Remember that story you read online in 2005, the one with the cool Flash graphics? How about that amazing interactive data visualization that you saw way back when, the one that made you want to level up your news nerd game? Good luck finding those stories today. Data journalism is disappearing from the web. 11 | 12 | Data journalism is more fragile than most people realize. Every time a news organization reorganizes its staff or updates its CMS or stops paying the bill for the data team’s servers, complex data journalism projects are lost. Conventional archiving methods, like the Internet Archive’s crawlers or the automated archiving feeds of companies like Lexis-Nexis, are no longer sufficient to capture projects that involve big data, databases, streaming data or interactive graphics. 13 | 14 | In this session, we’ll discuss why data journalism is the new digital ephemera, and we’ll explore the state of the art for archiving. We’ll talk about strategies data journalists can use to preserve their own work and how news organizations can better preserve their valuable digital assets. Finally, we’ll report on how journalists, librarians and scholars are thinking about future-proofing the news. 15 | 16 | ### Notes 17 | 18 | **Nora Paul (NP)** 19 | 20 | There's no one whose job it is to advocate for saving old news; if you do that's not all they're doing. 21 | 22 | News orgs have never been good at preserving the news. 23 | 24 | **Ben Welsh (BW)** 25 | 26 | Having CMS be archive ready is best way, but we aren't (usually) CMS people. 27 | 28 | 5-10 years old stuff is probably dead. 29 | 30 | The Five Commandments 31 | 32 | I. Thou shalt not make a mess and expect someone else to clean it up for you. 33 | II. Thou shalt publish as static files immediately or eventually. 34 | III. Thou shalt not depend on rando links. 35 | IV. Thou shalt version your CSS and base templates. 36 | V. Thou shalt see the big archives as a platform. 37 | 38 | **Katherine Boss (KB)** 39 | 40 | Internet archive is only searchable by date. 41 | 42 | The problem: Our stuff is dynamic, libraries haven't figured out what to do with it. 43 | 44 | Maybe we can monetize our archives. 45 | 46 | Flash. :( 47 | 48 | Emulation might be a solution. 49 | 50 | For static objects, PDFs. But migration is not successful for dynamic projects. 51 | 52 | Reprozip, the reproducibility packer. 53 | 54 | **Meredith Broussard (MB)** 55 | 56 | Four recommendations for what you can do. 57 | 58 | This has to happen at the institutional level. Individuals should save their own work, but that's not enough. 59 | 60 | 1. Take a video. Walkthrough of your project. 61 | 2. Bake out. Static versions of dynamic pages. 62 | 3. Plan for the future. Sunset plan at time of launch. 63 | 4. Work with libraries, institutions and commercial archives. 64 | 65 | #### Questions 66 | 67 | These are human issues, not computational problems. What are those? 68 | 69 | **NP:** Difference between libraries and archives. Preservation v. access. 70 | 71 | **BW:** People underestimate the risk to their work until they lose something they care about. And people who do know and care don't know how they can make a difference. There needs to be "The Checklist" to make sure stuff is archive-ready. 72 | 73 | **MB:** We need digital archivists and libraians back in newsrooms 74 | 75 | What are some resources that can help? 76 | 77 | * Reprozip 78 | * Dodging the Memory Hole 79 | * Internet Archive API tools 80 | 81 | How do you think about linkrot? 82 | 83 | Local files vs. CDNs for updates? 84 | 85 | Technical and design challenges of archiving? 86 | 87 | ##### Speakers 88 | 89 | Katherine Boss is the Librarian for Journalism, Media, Culture and Communication at New York University. Her research focuses on archiving and preserving born digital news content, and she is the co-leader of the Archiving and Preserving News Applications working group of the Journalism Digital News Archive. She holds a bachelor’s in Journalism, a master’s in Library and Information Science, and a master’s in Media Studies. [@katy_boss](https://twitter.com/katy_boss) 90 | 91 | Meredith Broussard teaches data journalism at NYU's Arthur L. Carter Journalism Institute. Her current research focuses on artificial intelligence in investigative reporting, with a particular interest in using data analysis for social good. Her new book is "Artificial Unintelligence: How Computers Misunderstand the World." [@merbroussard](https://twitter.com/merbroussard) or [meredithbroussard.com](http://meredithbroussard.com) 92 | 93 | Nora Paul is co-author of Future-Proofing the News: Preserving the First Draft of History. She is the former director of the Minnesota Journalism Center at the University of Minnesota where she also taught classes on information strategies. Formerly at the Poynter Institute as a faculty member and at the Miami Herald where she ran the news research library. Now blissfully retired, but happy to share her perspective on the archiving panel. 94 | 95 | Ben Welsh is the editor of the Data Desk, a team of reporters and computer programmers in the Los Angeles Times newsroom. He is also an organizer of the California Civic Data Coalition, an network of journalists working to open public data, and the founder of PastPages, an open-source archive dedicated to better preserving digital news. 96 | 97 | _Description and speakers from [official schedule](https://www.ire.org/events-and-training/event/3189/3576/)_ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NICAR 2018 2 | 3 | Chicago, Ill. March 8-11, 2018 4 | 5 | Here are the sessions I went to at NICAR 2018. Each session title links to more detailed notes, including links to tipsheets and the session description on [ire.org](https://ire.org/). (Pro tip: If you're an IRE member, the audio links will be posted on the sessions' pages at IRE several weeks after the conference). 6 | 7 | Also see the end of this list for more resources. 8 | 9 | ### Thursday 10 | 11 | #### [How and why to make your data analysis reproducible](01-01-reproducible-data-analysis.md) 12 | 13 | _Ryann Grochowski Jones, Hannah Recht, Jeremy Singer-Vine, Hannah Cushman_ 14 | 15 | I learned how making your data analysis pipeline a reproducible thing can save you. Reproducibility doesn't necessarily mean automated. And how do deal with those pesky steps you have to do by hand. 16 | 17 | #### [Education civil rights data: The good, the bad, the dirty (Diversity Track)](01-02-education-civil-rights-data.md) 18 | 19 | _Jennifer LaFleur, Alex Harwin, Kameel Stanley_ 20 | 21 | My colleague Kameel Stanley along with Alex Harwin and Jennifer LaFleur talked about what kinds of education civil rights data exist, how to get them and some pitfalls. Kameel talked about a project I helped with where we discovered very [high disparities in suspensions of students in Kindergarten through 3rd grade](http://www.welivehere.show/posts/2016/4/15/suspended-futures). 22 | 23 | #### Conversation: Learn from my fail 24 | 25 | _Saurabh Datar, Brent Jones, Maggie Lee_ 26 | 27 | No link here, because I helped lead this one along with Maggie Lee and Saurabh Datar. This well-attended conversation was about how we all fail and what we learn from it. Having the first few tales of fails come from the likes of the New York Times and the Washington Post made it clear that this really does happen to everyone and the important thing is what we can learn. 28 | 29 | #### [Building happy cross-functional teams](01-04-cross-functional-teams.md) 30 | 31 | _Becca Aaronson, Joe Germuska, Emily Ingram_ 32 | 33 | A follow-on from the previous therapy session, this one focused on the people around us. We heard about how newsrooms are integrating data desks into their workflow and moving away from the service-desk mentality, as well as how to split up work, what to look for in a manager and how to deal with less-technical people. 34 | 35 | #### [Investigating hate when the data isn’t there (Diversity Track)](01-05-investigating-hate.md) 36 | 37 | _Duaa Eldeib, Melissa Lewis, Ken Schwencke, Nadine Sebai_ 38 | 39 | I learned about the ProPublica Documenting Hate database, and a few projects that it helped power. Also why hate crime data in the U.S. is terrible, and tips for interviewing people who have experienced bias crimes. 40 | 41 | #### [Conversation: More bibles, fewer priests: Tools for running self-managing teams](01-06-self-managing-teams.md) 42 | 43 | _Brian Boyer_ 44 | 45 | Brian Boyer, as usual, facilitated a fantastic discussion about team management and how teams can better manage themselves. We learned about "roles, goals and rules" for better management. 46 | 47 | ### Friday 48 | 49 | #### [How to find reporting leads and publishable facts in text data you already have](02-01-data-you-already-have.md) 50 | 51 | _Jeff Ernsthausen, Jeremy Merrill, Youyou Zhou_ 52 | 53 | I learned how to take piles of text data and find patterns across documents, outliers and extract meaning from them. As well as how to find some data in the first place, bulletproof your analysis and present text data. 54 | 55 | #### [Evolving forms and the future of live coverage](02-02-evolving-live-coverage.md) 56 | 57 | _Hamilton Boardman, Alastair Coote, Tiff Fehr, Tyler Fisher_ 58 | 59 | The panel explored current ways of doing live coverage, and then looked at a few experimental forms and their positives and negatives. 60 | 61 | #### [Data journalism that breaks the filter bubble](02-03-break-filter-bubble.md) 62 | 63 | _Eva Constantaras, Adriana Gallardo, Anjeanette Damon_ 64 | 65 | Through three case studies, we learned how to tell stories of marginalized or not-often-talked-about groups. We heard about the Lost Mothers series, about maternal harm and deaths in the U.S.; Death Behind Bars, about prisoners dying in jail; and how to do data journalism in developing countries. 66 | 67 | #### [Life after FactFinder](02-04-life-after-factfinder.md) 68 | 69 | _Ronald Campbell, Paul Overberg, Ally Burleson-Gibson_ 70 | 71 | An employee of the U.S. Census, Ally Burleson-Gibson, went into the belly of the beast and faced down a roomful of reporters to demo the soon-to-come updates to American FactFinder, the tool to access Census data online. 72 | 73 | #### [Tips for harnessing new IRS nonprofit data](02-05-irs-nonprofit-data.md) 74 | 75 | _Andrea Fuller, Todd Wallack_ 76 | 77 | Unfortunately I had to leave this one early (I hadn't seen my wife and kid since leaving for Chicago and they had time to FaceTime), but what I did get to hear of it was great. I've looked at this data before but it seemed inscrutable. Andrea and Todd explained the format and gave some tips for using it. 78 | 79 | #### [Lightning Talks](lightning-talks.md) 80 | 81 | Lightning talks are a series of five-minute talks given to the entire conference (i.e. there's nothing else scheduled during this time). This year we heard about 82 | 83 | * Sensor journalism when the EPA is cutting back — Kelly Calagna 84 | * Why non-US data sources look terrible in text processors — Jonathan Soma 85 | * Why good copy editors make good data journalists — Justin Myers 86 | * Why news nerds need more career paths in newsrooms — Matt Dempsey 87 | * Challenges facing immigrants working in the news industry — Kai Teoh 88 | * How a bunch of News Nerds helped the L.A. Times form a union — Jon Schleuss and Anthony Pesce 89 | * What happens when the unexpected happens in the middle of your story — Allie Kanik and Kate Howard 90 | * How the Washington Post built its eclipse map with rubber bands — Denise Lu 91 | * Why alcohol and journalism don't _have_ to go together — Rachel Alexander 92 | * What building a dining room table can teach you about data journalism — Steven Rich 93 | 94 | ### Saturday 95 | 96 | #### [Introducing the L.A. Times Map Maker: Make maps faster](03-01-lat-map-maker.md) 97 | 98 | _Jon Schleuss_ 99 | 100 | Jon Schleuss demoed the open-source map maker from the LA Times. I've been looking for a good self-service mapmaking tool that's a bit more flexible than the one we currently use, and this one may be it. 101 | 102 | #### [Data of disasters: Following the money](03-02-disaster-data-money.md) 103 | 104 | _Lee Zurik, Matt Dempsey, Omaya Sosa_ 105 | 106 | Matt Dempsey gave an overview of types of disasters and some of the data that might be useful for each. Lee Zurik gave tips on following the money after a disaster, including one parish in Louisiana that was spending a ridiculous amount of money to rebuild schools. And Omaya Sosa talked about what it was like to do important, impactful journalism in Puerto Rico after Hurricane Maria. 107 | 108 | #### [Uncovering environmental hazards faced by urban children](03-03-environmental-hazards.md) 109 | 110 | _Chris Zubak-Skees, Molly Peterson, Dylan Purcell_ 111 | 112 | I learned about three projects to uncover environmental hazards faced by kids, particularly in cities: Schools near busy roads (air pollution), urban heat, and a project that looked at lead paint and tainted soil in Philadelphia. 113 | 114 | #### [Shoot us straight: Correctly using data and docs on guns](03-04-guns.md) 115 | 116 | _Nick Penzenstadler, Matt Drange, Kim Smith_ 117 | 118 | I learned about data on guns, including the Chicago Crime Lab's work, how to get Shotspotter data, what data is and isn't available about gun dealers, and how to build your own dataset about online gun sales. 119 | 120 | #### [I'm entitled to a spreadsheet, dang it!](03-05-entitled-to-a-spreadsheet.md) 121 | 122 | _Steven Rich, Sarah Ryley, Annie Waldman_ 123 | 124 | This was a fun session about getting data from public agencies in formats we can use, including how to format records requests, what data is and isn't available because of HIPAA and FERPA, and how to manage large FOIA projects. 125 | 126 | #### [Creating a data culture that lasts](03-06-data-culture.md) 127 | 128 | _Annie Daniel, Tom Meagher, Mark Nichols_ 129 | 130 | In yet another therapy session, we talked about how to better integrate ourselves in the newsroom, get support for what we do and fight…for the right…to byyyyyyylines (sorry). 131 | 132 | ### Sunday 133 | 134 | #### [Sensor journalism: How do we do it and what are the limits?](04-01-sensor-journalism.md) 135 | 136 | _Michael Corey, Marianne Bouchart, Denise Lu, Kelly Calagna_ 137 | 138 | We learned about different types of sensor data that are available or in the works, a project called SensorGrid that's trying to democratize the collection of data, using sensor journalism to catch crooks, and using map imagery. 139 | 140 | #### [Python: Writing tests for your code](04-02-python-tests.md) 141 | 142 | _Andrew Chavez_ 143 | 144 | I learned about unit testing in Python. I've read about it before but not really been sure how to go about it. Now I know how to start. 145 | 146 | #### [Archiving data journalism](04-03-archiving-data-journalism.md) 147 | 148 | _Katherine Boss, Meredith Broussard, Nora Paul, Ben Welsh_ 149 | 150 | Finally, a panel that began with Ben Welsh saying "This is going to be a dark panel," and included Meredith Broussard later saying, "Death comes to all of us." This was about why we're terrible at archiving data journalism and ways we might be better at it. 151 | 152 | ## More resources 153 | 154 | [NICAR Schedule (see all the sessions)](https://www.ire.org/events-and-training/event/3189/) 155 | 156 | [Conference Blog](https://ire.org/blog/car-conference-blog/) 157 | 158 | [IRE's repository of tipsheets and slides](https://www.ire.org/conferences/nicar18/tipsheets-and-links/) 159 | 160 | [Chrys Wu's repository of tipseets, slides, tools, etc.](http://blog.chryswu.com/2018/01/23/nicar18-slides-links-tutorials/) -------------------------------------------------------------------------------- /lightning-talks.md: -------------------------------------------------------------------------------- 1 | # Lightning Talks 2 | 3 | ## Sensor Journalism in the Age of Trump’s EPA 4 | 5 | [Slides](https://drive.google.com/file/d/1Z0AxXGX46xUk9208TB14H7OIjVfUrODR/view) 6 | 7 | Proposed by: Kelly Calagna 8 | 9 | With significant EPA budget cuts and program rollbacks set in place by the Trump administration, the role of environmental journalists has become vital in watching over the health of the public and the planet. In this talk, I will share the concept of a sensor project being developed by Northwestern University’s Knight Lab that will enable journalists and citizen scientists to collect high-definition air quality data that can be used to uncover at-risk populations. 10 | 11 | ### Notes: 12 | 13 | Air pollution causes 1 in 9 deaths, mostly lung cancer. 14 | 15 | Killing ourselves and our climate. 16 | 17 | Administration is doing little to change it. Lowered standards and budget of EPA. 18 | 19 | What can journalists and citizen scientists do to fill the gaps? 20 | 21 | Sensor journalism. Can monitor pollutants, ozone, sound, etc. 22 | 23 | SensorGrid. Democratize environmental data. Each node open source, open architecture, low-cost (under $200). Currently does particulate matter. 24 | 25 | Not many sensors even in Chicago. Currently, algorithms calculate estimates. 26 | 27 | --- 28 | 29 | ## HELLO WORLD: How the English langauge failed international data journalism 30 | 31 | [Slides](http://jonathansoma.com/projects/charsets/) 32 | 33 | Proposed by: Jonathan Soma 34 | 35 | Why do non-US and accented data sources tend to look like ⍰⍰⍰⍰? Let's take a trip into the magical land of çharacter encodìng! Learn why Excel can't cope with the Champs-Élysées, how emoji saved the planet, and how to say adiós/さようなら/до свидания to these kinds of problems forever. 👋 36 | 37 | ### Notes: 38 | 39 | Why your international data looks like garbage 40 | 41 | Character encoding. 42 | 43 | US-ASCII just has letters, numbers. 44 | 45 | Latin-1 doubled it. 46 | 47 | ISO-88591-SOMETHING 48 | 49 | You open a file and it's a bunch of numbers. Your computer has to interpret. There's no way to know just going in. 50 | 51 | Unicode 1.x - 16k characters, unicode 2.x - 1.1m 52 | 53 | You can select different encodings in excel and most programming packages. 54 | 55 | --- 56 | 57 | ## Good Copy Editors Make Good Data Journalists 58 | 59 | [Slides](https://docs.google.com/presentation/d/1_KlcTijrcHwQXVNwkPG072j7WboAvhtuB18H8_6UXSE/view#slide=id.p) 60 | 61 | Proposed by: Justin Myers 62 | 63 | Data journalists love to talk about precision. So do copy editors! I'll show you how thinking like a copy editor can help you find bugs, lock down your methodology and ensure your stories say exactly what you mean. 64 | 65 | ### Notes: 66 | 67 | Enjoyed copyeding most. 68 | 69 | It's not trendy but it's the truth. 70 | 71 | Both about precision. Say what we mean and mean what we say. 72 | 73 | "What do you mean by that?" in english and in code. 74 | 75 | Last line of defense against doublespeak. 76 | 77 | Punctuation. 78 | 79 | Everything reflects some sort of editorial decision. Copyeditors excel at thinking about these decisions and their ramifications. 80 | 81 | --- 82 | 83 | ## We Need Multiple Career Paths for News Nerds 84 | 85 | [Slides](https://medium.com/@mizzousundevil_7561/hi-my-name-is-matt-dempsey-and-im-the-data-editor-at-the-houston-chronicle-7891b509e0d7) 86 | 87 | Proposed by: Matt Dempsey 88 | 89 | Data journalists provide an essential role in modern newsrooms. We request, clean up and crunch data for stories big and small. We build interactive experiences and data visualizations for important projects. We're the hotness. But what does advancing in journalism look like if you have data skills? Few of us get to become editors. Why? Why does our news judgement mean less than those who come up as writers first? Why is the only career path for most over on the digital or tech side of things? I'll make an argument that just like with bylines, data journalists deserve a seat at the management table too. 90 | 91 | ### Notes: 92 | 93 | Few of us are in leadership positions even though journalists are in huge demand. Either access or because we're not good candidates, but we're awesome so it's probably the access thing. 94 | 95 | Had to argue for bylines. Fight for title of editor. Still doesn't oversee team of reporters. 96 | 97 | "I was more respected when I was an intern." 98 | 99 | 30% of us feel our organizations don't value our work. 39% need editors who are qualified. 100 | 101 | Who better to understand some of our challenges? 102 | 103 | What can we do? 104 | 105 | Help current leaders understand this is a problem. 106 | 107 | Passionate people who want to take journalism to a higher level. 108 | 109 | We need more of us in positions of power. And support each other. 110 | 111 | Journalism needs us there. 112 | 113 | --- 114 | 115 | ## Immigrants working in the news industry 116 | 117 | [Slides](https://docs.google.com/presentation/d/1NYcufcRDX6YPuK7e65gsFyo3uqfjio3Oh52YfVPJbvA/edit) 118 | 119 | Proposed by: Kai Teoh 120 | 121 | I want to talk about immigrants working in the industry, what the visa challenges they might face (that I've learned or experienced or am experiencing), and maybe focus a little on how others can be better supporters or allies to them.How "just get married" is really not a good suggestion, and can be really hurtful.How job departments and titles and classifications can matter a great deal more.How having a support network not just amongst peers, but also amongst supervisors/managers can mean the difference between being in the country or, well, not. 122 | 123 | ### Notes: 124 | 125 | Why should you care? Immigrants are people too. 126 | 127 | Common experiences: Job offers rescinded because they need a visa. Job in the product team but they have a degree in journalism. People have to leave the country because they missed a deadline. 128 | 129 | How can we do better? 130 | 131 | "Why don't you just get married?" That's kind of not the point. 132 | 133 | "I'd harbor you!" Yeah, no. 134 | 135 | If there's a union it can protect you. 136 | 137 | If you're hiring, announce the policy. Don't want to go through the process of interviewing and find out later it doesn't work. 138 | 139 | If you're a candidate, know the process. 140 | 141 | Amplify our voice. When we talk about it it makes people uncomfortable. 142 | 143 | --- 144 | 145 | ## The news nerd guide to forming a union 146 | 147 | [Slides](https://docs.google.com/presentation/d/1azXdAK_RJVzSY2Lmxa3jycKXOoo5EOhkmucwE4KosNM/edit) 148 | 149 | Proposed by: Jon Schleuss and Anthony Pesce 150 | 151 | Data nerds Jon Schleuss and Anthony Pesce started the union at the L.A. Times. Learn how they relied on good data, used the company’s technology against it and organized the newsroom at a historically anti-union paper. 152 | 153 | ### Notes: 154 | 155 | 85% said "hell yes" 156 | 157 | Reported on ourselves. 158 | 159 | Data analysis on commutes, sent to publisher, published his response. 160 | 161 | Technical issues: Getting a list of all the employees. 162 | 163 | Protect your communication. The company was monitoring their comms. 164 | 165 | Built a website. Just go with squarespace. 166 | 167 | Logo. 168 | 169 | Had to email whole newsroom. Company blocked access to listserv, had to figure out way around. 170 | 171 | Predicted election. 172 | 173 | And they won. 174 | 175 | --- 176 | 177 | ## Hope for the best, prepare for the worst 178 | 179 | [Slides](https://docs.google.com/presentation/d/1xDJwf9Btpa4B8xCt--cBIa3xVItJSBpbRQYfa-qd76w/edit#slide=id.p) 180 | 181 | Proposed by: Alexandra Kanik and Kate Howard 182 | 183 | Part of publishing an investigation is anticipating how your audience will react, and being ready for it. But what about when the unexpected happens? In December 2017, we published a 5-part series about a state representative. In the middle of the roll out, he killed himself. We share what we learned about being prepared technologically, editorially, and emotionally for the unexpected. 184 | 185 | ### Notes: 186 | 187 | The Pope's Long Con 188 | 189 | It was going pretty well. We were getting a lot of attention and a lot of traffic. Then he killed himself. 190 | 191 | We planned for a Kentucky audience. How to figure out how micro aws server could not crash. In the end I just had to go to bed and pray that the website would stay up. 192 | 193 | Pjax — single page application, limited requests to server. Assets split between 2 servers — django on ec2 and media on s3. 194 | 195 | Expected that rep might step down or not step down. Did not expect this. 196 | 197 | But some of the conversations help them be prepared. 198 | 199 | They'd done an extensive fact check. The story was right. Legal review. 200 | 201 | Had additional reporters on hand to cover fallout (funeral, etc.). 202 | 203 | Statement offered condolences but stood by story. Ignored trolls. 204 | 205 | Brought in counselor and security. Editor gave them credit card (beers and lunches). 206 | 207 | Make sure you're thinking about everyone on team, not just main writers. Keep doing journalism. 208 | 209 | --- 210 | 211 | ## I can’t believe it’s not georeferenced! How we made a scrolly eclipse map with rubber bands, screenshots and math. (to be presented with Armand Emamdjomeh) 212 | 213 | [Slides](https://docs.google.com/presentation/d/1nSzb-FjwmCFoI0lnPbiEH4F76xSUYw75TfyRJiTw4UA/edit?usp=sharing) 214 | 215 | Proposed by: Denise Lu 216 | 217 | No georeference? No problem. We had a what-if idea and put it together with nontraditional methods. What if you could scroll through the path of the Great American eclipse and look at it from a birds-eye view? ...What if it was one seamless image, from West Coast to East? ...What if you superimpose the umbra of the eclipse as you scrolled? ...What if the page could follow the eclipse in real time? The 30,000-pixel journey involves screenshots, rubber bands, some Photoshop and D3 magic and a lot of hacks, and was literally held together with Scotch tape at points. 218 | 219 | ### Notes: 220 | 221 | Step 0: "Wouldn't it be cool if?" 222 | 223 | 1. Shaded relief 224 | 2. Raster land use 225 | 3. Tone in photoshop, export geotiffs 226 | 4. Import to google earth. (still georeferenced at this point) 227 | 5. Vector data 228 | 6. Screenshot time = rubber bands (Why? Easiest way) 229 | 7. Photoshop merge, then fix by hand 230 | 8. point of no return: Flatten. 231 | 232 | Post-Flattening (P.F.) 233 | 234 | Ai2HTML - 21 artboards! 235 | 236 | Custom-fit vector data 237 | 238 | 3000+ labels added by hand 239 | 240 | The Great American Eclipse Road Trip (GEART) 241 | 242 | Needs a minimap. 243 | 244 | Minimap, time and shadow shape all done using position on page. 245 | 246 | Progressive image loading. 247 | 248 | --- 249 | 250 | ## Real Talk: Alcohol, Journalism and What I Did About It 251 | 252 | [Slides](https://docs.google.com/presentation/d/19aJGbKu6mjJBURtMUqe2fsKSuenfq8RKeEj_mmQ5icg/edit#slide=id.g34c155cfec_0_1) 253 | 254 | Proposed by: Rachel Alexander 255 | 256 | Journalism isn’t the healthiest industry for someone with eight alcoholics in her family. Using my experience as an addiction reporter and beer enthusiast, I'll talk about how I realized I was using alcohol in unhealthy ways, how I cut back and what you can do if you'd like to drink less while doing a stressful job. 257 | 258 | I designed and built a dining room table; here's what it taught me about data. 259 | 260 | ### Notes: 261 | 262 | We all have stressful jobs. What if we saw this as a problem instead of a fun quirky feature? 263 | 264 | Not moralizing. Just be mindful. What if heavy drinking wasn't a default. 265 | 266 | Why change? You might be drinking more than you think. 267 | 268 | Used to think either you're an alcoholic or you're not. Turns out people are more complicated. 269 | 270 | Federal guidelines: 7 drinks per week for women, 14 for men. 271 | 272 | Geek out about it. Spreadsheet! Kind of like a reporting project. 273 | 274 | Didn't make it easier. I miss not having to think about it. But a bunch of good stuff was happening too! 275 | 276 | --- 277 | 278 | ## I designed and built a dining room table; here’s what it taught me about data 279 | 280 | [Slides](http://slides.com/stevenrich/woodendata#/) 281 | 282 | Proposed by: Steven Rich 283 | 284 | Working with wood is more like working with data than anything else I've ever done. It also taught me a lot I never would have considered otherwise. Allow me to be the Nick Offerman of data and impart my experience to you and I'll give you both a new perspective on data and a design for a functional extendable dining room table. 285 | 286 | ### Notes: 287 | 288 | Expectation is almost never reality. Something will go wrong. 289 | 290 | Don't start by building a table. 291 | 292 | Design for humans first. Think about usability from the beginning. 293 | 294 | There are a lot of ways to build a table. 295 | 296 | Don't reinvent the wheel. 297 | 298 | The type of join(t) matters. Butt joint = matching on names. 299 | 300 | Measure twice, cut once. 301 | 302 | Cut bigger and sand it down. 303 | 304 | Perfect fits should scare you. 305 | 306 | Fix problems as you see them. 307 | 308 | Use the right tools. 309 | 310 | You're going to get cut. Learn from it. 311 | 312 | Don't stain good wood. 313 | 314 | You need to think about what's missing. 315 | 316 | No one is a master, the best never stop learning. 317 | 318 | --------------------------------------------------------------------------------