├── .gitignore ├── README.md ├── data ├── bush2008.txt ├── clinton2000.txt ├── english_stopwords.txt ├── obama2016.txt ├── tree.svg ├── trump.txt └── twitter_apple.csv ├── data_manipulation.ipynb ├── data_manipulation_filled.ipynb ├── descriptions ├── intro_description.md └── series_description.md ├── images ├── anaconda-channels.gif ├── anaconda-envs.gif ├── anaconda-jupyter.gif ├── anaconda-notebook.gif └── anaconda-packages.gif ├── intro_to_jekyll_and_github_pages.md ├── intro_to_ml.ipynb ├── intro_to_ml_filled.ipynb ├── intro_to_nlp.ipynb ├── intro_to_nlp_filled.ipynb ├── intro_to_python.ipynb ├── intro_to_python_filled.ipynb ├── setup.ipynb └── setup_jekyll_githubpages.md /.gitignore: -------------------------------------------------------------------------------- 1 | .Python 2 | tmp 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python for Humanities and Social Sciences 2 | 3 | This repository contains Jupyter notebooks for the series of workshops given by the [Center for Interdisciplinary Digital Research (CIDR)](http://library.stanford.edu/department/cidr), a unit of [Stanford University Libraries](http://library.stanford.edu/), and its associated partners, on Python related topics specially crafted towards the Humanities and Social Sciences. 4 | 5 | ## Introduction to Python [[Notebook](intro_to_python.ipynb) | [Solutions](intro_to_python_filled.ipynb)] 6 | 7 | This workshop covers basic Python syntax and project set up through the teaching of basic web scraping with [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/). 8 | 9 | ## Data Manipulation and Visualization with Python [[Notebook](data_manipulation.ipynb) | [Solutions](data_manipulation_filled.ipynb)] 10 | 11 | This workshop guides students through fundamentals of data manipulation and visualization with [Pandas](http://pandas.pydata.org/), [matplotlib](https://matplotlib.org/), and [Seaborn](http://seaborn.pydata.org/). 12 | 13 | ## Natural Language Processing with Python [[Notebook](intro_to_nlp.ipynb) | [Solutions](intro_to_nlp_filled.ipynb)] 14 | 15 | This workshop teaches students natural language processing in Python, with topics such as tokenization, part of speech tagging, and sentiment analysis, using [TextBlob](https://textblob.readthedocs.io/en/dev/). 16 | 17 | ## Introduction to Machine Learning [[Notebook](intro_to_ml.ipynb) | [Solutions](intro_to_ml_filled.ipynb)] 18 | 19 | This workshop introduces the basic workflow of machine learning in Python using [scikit-learn](http://scikit-learn.org/stable/). It covers topics from feature engineering or feature learning to model evaluation and selection. 20 | -------------------------------------------------------------------------------- /data/bush2008.txt: -------------------------------------------------------------------------------- 1 | AS FOUND ON https://millercenter.org/the-presidency/presidential-speeches/january-28-2008-state-union-address 2 | FOR USE IN ACADEMIC WORKSHOP 3 | Madam Speaker, Vice President Cheney, members of Congress, distinguished guests, and fellow citizens: 4 | 5 | Seven years have passed since I first stood before you at this rostrum. In that time, our country has been tested in ways none of us could have imagined. We faced hard decisions about peace and war, rising competition in the world economy, and the health and welfare of our citizens. These issues call for vigorous debate, and I think it's fair to say, we've answered the call. Yet history will record that amid our differences, we acted with purpose, and together we showed the world the power and resilience of American self-government. 6 | 7 | All of us were sent to Washington to carry out the people's business. That is the purpose of this body. It is the meaning of our oath. It remains our charge to keep. 8 | 9 | The actions of the 110th Congress will affect the security and prosperity of our nation long after this session has ended. In this election year, let us show our fellow Americans that we recognize our responsibilities and are determined to meet them. Let us show them that Republicans and Democrats can compete for votes and cooperate for results at the same time. 10 | 11 | From expanding opportunity to protecting our country, we've made good progress. Yet we have unfinished business before us, and the American people expect us to get it done. 12 | 13 | In the work ahead, we must be guided by the philosophy that made our nation great. As Americans, we believe in the power of individuals to determine their destiny and shape the course of history. We believe that the most reliable guide for our country is the collective wisdom of ordinary citizens. And so in all we do, we must trust in the ability of free peoples to make wise decisions and empower them to improve their lives for their futures. 14 | 15 | To build a prosperous future, we must trust people with their own money and empower them to grow our economy. As we meet tonight, our economy is undergoing a period of uncertainty. America has added jobs for a record 52 straight months, but jobs are now growing at a slower pace. Wages are up, but so are prices for food and gas. Exports are rising, but the housing market has declined. At kitchen tables across our country, there is a concern about our economic future. 16 | 17 | In the long run, Americans can be confident about our economic growth. But in the short run, we can all see that that growth is slowing. So last week, my administration reached agreement with Speaker Pelosi and Republican Leader Boehner on a robust growth package that includes tax relief for individuals and families and incentives for business investment. The temptation will be to load up the bill. That would delay it or derail it, and neither option is acceptable. This is a good agreement that will keep our economy growing and our people working, and this Congress must pass it as soon as possible. 18 | 19 | We have other work to do on taxes. Unless Congress acts, most of the tax relief we've delivered over the past seven years will be taken away. Some in Washington argue that letting tax relief expire is not a tax increase. Try explaining that to 116 million American taxpayers who would see their taxes rise by an average of $1,800. Others have said they would personally be happy to pay higher taxes. I welcome their enthusiasm. I'm pleased to report that the IRS accepts both checks and money orders. 20 | 21 | Most Americans think their taxes are high enough. With all the other pressures on their finances, American families should not have to worry about their federal government taking a bigger bite out of their paychecks. There's only one way to eliminate this uncertainty: Make the tax relief permanent. And members of Congress should know, if any bill raises taxes reaches my desk, I will veto it. 22 | 23 | Just as we trust Americans with their own money, we need to earn their trust by spending their tax dollars wisely. Next week, I'll send you a budget that terminates or substantially reduces 151 wasteful or bloated programs, totaling more than $18 billion. The budget that I will submit will keep America on track for a surplus in 2012. American families have to balance their budgets; so should their government. 24 | 25 | The people's trust in their government is undermined by congressional earmarks, special interest projects that are often snuck in at the last minute, without discussion or debate. Last year, I asked you to voluntarily cut the number and cost of earmarks in half. I also asked you to stop slipping earmarks into committee reports that never even come to a vote. Unfortunately, neither goal was met. So this time, if you send me an appropriations bill that does not cut the number and cost of earmarks in half, I'll send it back to you with my veto. 26 | 27 | And tomorrow I will issue an executive order that directs federal agencies to ignore any future earmark that is not voted on by Congress. If these items are truly worth funding, Congress should debate them in the open and hold a public vote. 28 | 29 | Our shared responsibilities extend beyond matters of taxes and spending. On housing, we must trust Americans with the responsibility of homeownership and empower them to weather turbulent times in the housing market. My administration brought together the HOPE NOW Alliance, which is helping many struggling homeowners avoid foreclosure. And Congress can help even more. Tonight I ask you to pass legislation to reform Fannie Mae and Freddie Mac, modernize the Federal Housing Administration, and allow state housing agencies to issue tax-free bonds to help homeowners refinance their mortgages. These are difficult times for many American families, and by taking these steps, we can help more of them keep their homes. 30 | 31 | To build a future of quality health care, we must trust patients and doctors to make medical decisions and empower them with better information and better options. We share a common goal: making health care more affordable and accessible for all Americans. The best way to achieve that goal is by expanding consumer choice, not government control. So I have proposed ending the bias in the Tax Code against those who do not get their health insurance through their employer. This one reform would put private coverage within reach for millions, and I call on the Congress to pass it this year. 32 | 33 | The Congress must also expand health savings accounts, create association health plans for small businesses, promote health information technology, and confront the epidemic of junk medical lawsuits. With all these steps, we will ensure that decisions about your medical care are made in the privacy of your doctor's office, not in the halls of Congress. 34 | 35 | On education, we must trust students to learn, if given the chance, and empower parents to demand results from our schools. In neighborhoods across our country, there are boys and girls with dreams, and a decent education is their only hope of achieving them. 36 | 37 | Six years ago, we came together to pass the No Child Left Behind Act, and today, no one can deny its results. Last year, fourth and eighth graders achieved the highest math scores on record. Reading scores are on the rise. African-American and Hispanic students posted alltime highs. Now we must work together to increase accountability, add flexibilities for states and districts, reduce the number of high school dropouts, provide extra help for struggling schools. 38 | 39 | Members of Congress, the No Child Left Behind Act is a bipartisan achievement. It is succeeding. And we owe it to America's children, their parents, and their teachers to strengthen this good law. 40 | 41 | We must also do more to help children when their schools do not measure up. Thanks to the DC Opportunity Scholarships you approved, more than 2,600 of the poorest children in our nation's capital have found new hope at a faith-based or other non-public school. Sadly, these schools are disappearing at an alarming rate in many of America's inner cities. So I will convene a White House summit aimed at strengthening these lifelines of learning. And to open the doors of these schools to more children, I ask you to support a new $300 million program called Pell Grants for Kids. We have seen how Pell Grants help low-income college students realize their full potential. Together we've expanded the size and reach of these grants. Now let us apply the same spirit to help liberate poor children trapped in failing public schools. 42 | 43 | On trade, we must trust American workers to compete with anyone in the world and empower them by opening up new markets overseas. Today, our economic growth increasingly depends on our ability to sell American goods and crops and services all over the world. So we're working to break down barriers to trade and investment wherever we can. We're working for a successful Doha round of trade talks, and we must complete a good agreement this year. At the same time, we're pursuing opportunities to open up new markets by passing free trade agreements. 44 | 45 | I thank the Congress for approving a good agreement with Peru. And now I ask you to approve agreements with Colombia and Panama and South Korea. Many products from these nations now enter America duty free, yet many of our products face steep tariffs in their markets. These agreements will level the playing field. They will give us better access to nearly 100 million customers. They will support good jobs for the finest workers in the world, those whose products say "Made in the USA." 46 | 47 | These agreements also promote America's strategic interests. The first agreement that will come before you is with Colombia, a friend of America that is confronting violence and terror and fighting drug traffickers. If we fail to pass this agreement, we will embolden the purveyors of false populism in our hemisphere. So we must come together, pass this agreement, and show our neighbors in the region that democracy leads to a better life. 48 | 49 | Trade brings better jobs and better choices and better prices. Yet for some Americans, trade can mean losing a job, and the federal government has a responsibility to help. I ask Congress to reauthorize and reform trade adjustment assistance so we can help these displaced workers learn new skills and find new jobs. 50 | 51 | To build a future of energy security, we must trust in the creative genius of American researchers and entrepreneurs and empower them to pioneer a new generation of clean energy technology. Our security, our prosperity, and our environment all require reducing our dependence on oil. 52 | 53 | Last year, I asked you to pass legislation to reduce oil consumption over the next decade, and you responded. Together we should take the next steps. Let us fund new technologies that can generate coal power while capturing carbon emissions. Let us increase the use of renewable power and emissions-free nuclear power. Let us continue investing in advanced battery technology and renewable fuels to power the cars and trucks of the future. Let us create a new international clean technology fund, which will help developing nations like India and China make a greater use of clean energy sources. And let us complete an international agreement that has the potential to slow, stop, and eventually reverse the growth of greenhouse gases. 54 | 55 | This agreement will be effective only if it includes commitments by every major economy and gives none a free ride. The United States is committed to strengthening our energy security and confronting global climate change. And the best way to meet these goals is for America to continue leading the way toward the development of cleaner and more energy efficient technology. 56 | 57 | To keep America competitive into the future, we must trust in the skill of our scientists and engineers and empower them to pursue the breakthroughs of tomorrow. Last year, Congress passed legislation supporting the American Competitiveness Initiative, but never followed through with the funding. This funding is essential to keeping our scientific edge. So I ask Congress to double federal support for critical basic research in the physical sciences and ensure America remains the most dynamic nation on Earth. 58 | 59 | On matters of life and science, we must trust in the innovative spirit of medical researchers and empower them to discover new treatments while respecting moral boundaries. In November, we witnessed a landmark achievement when scientists discovered a way to reprogram adult skin cells to act like embryonic stem cells. This breakthrough has the potential to move us beyond the divisive debates of the past by extending the frontiers of medicine without the destruction of human life. 60 | 61 | So we're expanding funding for this type of ethical medical research. And as we explore promising avenues of research, we must also ensure that all life is treated with the dignity it deserves. And so I call on Congress to pass legislation that bans unethical practices, such as the buying, selling, patenting, or cloning of human life. 62 | 63 | On matters of justice, we must trust in the wisdom of our Founders and empower judges who understand that the Constitution means what it says. I've submitted judicial nominees who will rule by the letter of the law, not the whim of the gavel. Many of these nominees are being unfairly delayed. They are worthy of confirmation, and the Senate should give each of them a prompt up-or-down vote. 64 | 65 | In communities across our land, we must trust in the good heart of the American people and empower them to serve their neighbors in need. Over the past seven years, more of our fellow citizens have discovered that the pursuit of happiness leads to the path of service. Americans have volunteered in record numbers. Charitable donations are higher than ever. Faith-based groups are bringing hope to pockets of despair, with newfound support from the federal government. And to help guarantee equal treatment of faith-based organizations when they compete for federal funds, I ask you to permanently extend charitable choice. 66 | 67 | Tonight the armies of compassion continue the march to a new day in the gulf coast. America honors the strength and resilience of the people of this region. We reaffirm our pledge to help them build stronger and better than before. And tonight I'm pleased to announce that in April, we will host this year's North American Summit of Canada, Mexico, and the United States in the great city of New Orleans. 68 | 69 | There are two other pressing challenges that I've raised repeatedly before this body and that this body has failed to address: entitlement spending and immigration. Every member in this chamber knows that spending on entitlement programs like Social Security, Medicare, and Medicaid is growing faster than we can afford. We all know the painful choices ahead if America stays on this path: massive tax increases, sudden and drastic cuts in benefits, or crippling deficits. I've laid out proposals to reform these programs. Now I ask members of Congress to offer your proposals and come up with a bipartisan solution to save these vital programs for our children and our grandchildren. 70 | 71 | The other pressing challenge is immigration. America needs to secure our borders, and with your help, my administration is taking steps to do so. We're increasing worksite enforcement, deploying fences and advanced technologies to stop illegal crossings. We've effectively ended the policy of catch-and-release at the border, and by the end of this year, we will have doubled the number of Border Patrol agents. Yet we also need to acknowledge that we will never fully secure our border until we create a lawful way for foreign workers to come here and support our economy. This will take pressure off the border and allow law enforcement to concentrate on those who mean us harm. 72 | 73 | We must also find a sensible and humane way to deal with people here illegally. Illegal immigration is complicated, but it can be resolved. And it must be resolved in a way that upholds both our laws and our highest ideals. 74 | 75 | This is the business of our nation here at home. Yet building a prosperous future for our citizen also depends on confronting enemies abroad and advancing liberty in troubled regions of the world. 76 | 77 | Our foreign policy is based on a clear premise: We trust that people, when given the chance, will choose a future of freedom and peace. In the last seven years, we have witnessed stirring moments in the history of liberty. We've seen citizens in Georgia and Ukraine stand up for their right to free and fair elections. We've seen people in Lebanon take to the streets to demand their independence. We've seen Afghans emerge from the tyranny of the Taliban and choose a new President and a new Parliament. We've seen jubilant Iraqis holding up ink-stained fingers and celebrating their freedom. These images of liberty have inspired us. 78 | 79 | In the past seven years, we've also seen the images that have sobered us. We've watched throngs of mourners in Lebanon and Pakistan carrying the caskets of beloved leaders taken by the assassin's hand. We've seen wedding guests in blood-soaked finery staggering from a hotel in Jordan, Afghans and Iraqis blown up in mosques and markets, and trains in London and Madrid ripped apart by bombs. On a clear September day, we saw thousands of our fellow citizens taken from us in an instant. These horrific images serve as a grim reminder: The advance of liberty is opposed by terrorists and extremists, evil men who despise freedom, despise America, and aim to subject millions to their violent rule. 80 | 81 | Since 9/11, we have taken the fight to these terrorists and extremists. We will stay on the offense; we will keep up the pressure; and we will deliver justice to our enemies. 82 | 83 | We are engaged in the defining ideological struggle of the 21st century. The terrorists oppose every principle of humanity and decency that we hold dear. Yet in this war on terror, there is one thing we and our enemies agree on: In the long run, men and women who are free to determine their own destinies will reject terror and refuse to live in tyranny. And that is why the terrorists are fighting to deny this choice to the people in Lebanon, Iraq, Afghanistan, Pakistan, and the Palestinian Territories. And that is why, for the security of America and the peace of the world, we are spreading the hope of freedom. 84 | 85 | In Afghanistan, America, our 25 NATO allies, and 15 partner nations are helping the Afghan people defend their freedom and rebuild their country. Thanks to the courage of these military and civilian personnel, a nation that was once a safe haven for Al Qaeda is now a young democracy where boys and girls are going to school, new roads and hospitals are being built, and people are looking to the future with new hope. These successes must continue, so we're adding 3,200 marines to our forces in Afghanistan, where they will fight the terrorists and train the Afghan Army and police. Defeating the Taliban and Al Qaeda is critical to our security, and I thank the Congress for supporting America's vital mission in Afghanistan. 86 | 87 | In Iraq, the terrorists and extremists are fighting to deny a proud people their liberty and fighting to establish safe havens for attacks across the world. One year ago, our enemies were succeeding in their efforts to plunge Iraq into chaos. So we reviewed our strategy and changed course. We launched a surge of American forces into Iraq. We gave our troops a new mission: Work with the Iraqi forces to protect the Iraqi people; pursue the enemy in its strongholds; and deny the terrorists sanctuary anywhere in the country. 88 | 89 | The Iraqi people quickly realized that something dramatic had happened. Those who had worried that America was preparing to abandon them instead saw tens of thousands of American forces flowing into their country. They saw our forces moving into neighborhoods, clearing out the terrorists, and staying behind to ensure the enemy did not return. And they saw our troops, along with Provincial Reconstruction Teams that include Foreign Service officers and other skilled public servants, coming in to ensure that improved security was followed by improvements in daily life. Our military and civilians in Iraq are performing with courage and distinction, and they have the gratitude of our whole nation. 90 | 91 | The Iraqis launched a surge of their own. In the fall of 2006, Sunni tribal leaders grew tired of Al Qaeda's brutality, started a popular uprising called the "Anbar Awakening." Over the past year, similar movements have spread across the country. And today, the grassroots surge includes more than 80,000 Iraqi citizens who are fighting the terrorists. The government in Baghdad has stepped forward as well, adding more than 100,000 new Iraqi soldiers and police during the past year. 92 | 93 | While the enemy is still dangerous and more work remains, the American and Iraqi surges have achieved results few of us could have imagined just one year ago. When we met last year, many said that containing the violence was impossible. A year later, high-profile terrorist attacks are down, civilian deaths are down, sectarian killings are down. 94 | 95 | When we met last year, militia extremists—some armed and trained by Iran—were wreaking havoc in large areas of Iraq. A year later, coalition and Iraqi forces have killed or captured hundreds of militia fighters. And Iraqis of all backgrounds increasingly realize that defeating these militia fighters is critical to the future of their country. 96 | 97 | When we met last year, Al Qaeda had sanctuaries in many areas of Iraq, and their leaders had just offered American forces safe passage out of the country. Today, it is Al Qaeda that is searching for safe passage. They have been driven from many of the strongholds they once held. And over the past year, we've captured or killed thousands of extremists in Iraq, including hundreds of key Al Qaeda leaders and operatives. 98 | 99 | Last month, Osama bin Laden released a tape in which he railed against Iraqi tribal leaders who have turned on Al Qaeda and admitted that coalition forces are growing stronger in Iraq. Ladies and gentlemen, some may deny the surge is working, but among the terrorists there is no doubt. Al Qaeda is on the run in Iraq, and this enemy will be defeated. 100 | 101 | When we met last year, our troop levels in Iraq were on the rise. Today, because of the progress just described, we are implementing a policy of return on success, and the surge forces we sent to Iraq are beginning to come home. 102 | 103 | This progress is a credit to the valor of our troops and the brilliance of their commanders. This evening I want to speak directly to our men and women on the frontlines. Soldiers and sailors, airmen, marines, and coast guardsmen: In the past year, you have done everything we've asked of you and more. Our nation is grateful for your courage. We are proud of your accomplishments. And tonight in this hallowed chamber, with the American people as our witness, we make you a solemn pledge: In the fight ahead, you will have all you need to protect our nation. And I ask Congress to meet its responsibilities to these brave men and women by fully funding our troops. 104 | 105 | Our enemies in Iraq have been hit hard. They are not yet defeated, and we can still expect tough fighting ahead. Our objective in the coming year is to sustain and build on the gains we made in 2007 while transitioning to the next phase of our strategy. American troops are shifting from leading operations to partnering with Iraqi forces and, eventually, to a protective overwatch mission. As part of this transition, one Army brigade combat team and one Marine expeditionary unit have already come home and will not be replaced. In the coming months, four additional brigades and two Marine battalions will follow suit. Taken together, this means more than 20,000 of our troops are coming home. 106 | 107 | Any further drawdown of U.S. troops will be based on conditions in Iraq and the recommendations of our commanders. General Petraeus has warned that too fast a drawdown could result in, quote, "the disintegration of the Iraqi security forces, Al Qaeda-Iraq regaining lost ground, and a marked increase in violence." Members of Congress, having come so far and achieved so much, we must not allow this to happen. 108 | 109 | In the coming year, we will work with Iraqi leaders as they build on the progress they're making toward political reconciliation. At the local level, Sunnis, Shi'a, and Kurds are beginning to come together to reclaim their communities and rebuild their lives. Progress in the provinces must be matched by progress in Baghdad. We're seeing some encouraging signs. The national government is sharing oil revenues with the provinces. The Parliament recently passed both a pension law and de-Ba'athification reform. They're now debating a provincial powers law. The Iraqis still have a distance to travel, but after decades of dictatorship and the pain of sectarian violence, reconciliation is taking place, and the Iraqi people are taking control of their future. 110 | 111 | The mission in Iraq has been difficult and trying for our nation. But it is in the vital interest of the United States that we succeed. A free Iraq will deny Al Qaeda a safe haven. A free Iraq will show millions across the Middle East that a future of liberty is possible. A free Iraq will be a friend of America, a partner in fighting terror, and a source of stability in a dangerous part of the world. 112 | 113 | By contrast, a failed Iraq would embolden the extremists, strengthen Iran, and give terrorists a base from which to launch new attacks on our friends, our allies, and our homeland. The enemy has made its intentions clear. At a time when the momentum seemed to favor them, Al Qaeda's top commander in Iraq declared that they will not rest until they have attacked us here in Washington. My fellow Americans, we will not rest either. We will not rest until this enemy has been defeated. We must do the difficult work today so that years from now, people will look back and say that this generation rose to the moment, prevailed in a tough fight, and left behind a more hopeful region and a safer America. 114 | 115 | We're also standing against the forces of extremism in the Holy Land, where we have new cause for hope. Palestinians have elected a President who recognizes that confronting terror is essential to achieving a state where his people can live in dignity and at peace with Israel. Israelis have leaders who recognize that a peaceful, democratic Palestinian state will be a source of lasting security. This month in Ramallah and Jerusalem, I assured leaders from both sides that America will do, and I will do, everything we can to help them achieve a peace agreement that defines a Palestinian state by the end of this year. The time has come for a Holy Land where a democratic Israel and a democratic Palestine live side by side in peace. 116 | 117 | We're also standing against the forces of extremism embodied by the regime in Tehran. Iran's rulers oppress a good and talented people. And wherever freedom advances in the Middle East, it seems the Iranian regime is there to oppose it. Iran is funding and training militia groups in Iraq, supporting Hizballah terrorists in Lebanon, and backing Hamas efforts to undermine peace in the Holy Land. Tehran is also developing ballistic missiles of increasing range and continues to develop its capability to enrich uranium, which could be used to create a nuclear weapon. 118 | 119 | Our message to the people of Iran is clear: We have no quarrel with you. We respect your traditions and your history. We look forward to the day when you have your freedom. Our message to the leaders of Iran is also clear: Verifiably suspend your nuclear enrichment so negotiations can begin. And to rejoin the community of nations, come clean about your nuclear intentions and past actions, stop your oppression at home, cease your support for terror abroad. But above all, know this: America will confront those who threaten our troops; we will stand by our allies; and we will defend our vital interests in the Persian Gulf. 120 | 121 | On the homefront, we will continue to take every lawful and effective measure to protect our country. This is our most solemn duty. We are grateful that there has not been another attack on our soil since 9/11. This is not for the lack of desire or effort on the part of the enemy. In the past six years, we've stopped numerous attacks, including a plot to fly a plane into the tallest building in Los Angeles and another to blow up passenger jets bound for America over the Atlantic. Dedicated men and women in our government toil day and night to stop the terrorists from carrying out their plans. These good citizens are saving American lives, and everyone in this chamber owes them our thanks. 122 | 123 | And we owe them something more; we owe them the tools they need to keep our people safe. And one of the most important tools we can give them is the ability to monitor terrorist communications. To protect America, we need to know who the terrorists are talking to, what they are saying, and what they're planning. Last year, Congress passed legislation to help us do that. Unfortunately, Congress set the legislations to expire on February 1. That means if you don't act by Friday, our ability to track terrorist threats would be weakened and our citizens will be in greater danger. Congress must ensure the flow of vital intelligence is not disrupted. Congress must pass liability protection for companies believed to have assisted in the efforts to defend America. We've had ample time for debate. The time to act is now. 124 | 125 | Protecting our nation from the dangers of a new century requires more than good intelligence and a strong military. It also requires changing the conditions that breed resentment and allow extremists to prey on despair. So America is using its influence to build a freer, more hopeful, and more compassionate world. This is a reflection of our national interests; it is the calling of our conscience. 126 | 127 | America opposes genocide in Sudan. We support freedom in countries from Cuba and Zimbabwe to Belarus and Burma. 128 | 129 | America is leading the fight against global poverty with strong education initiatives and humanitarian assistance. We've also changed the way we deliver aid by launching the Millennium Challenge Account. This program strengthens democracy, transparency, and the rule of law in developing nations, and I ask you to fully fund this important initiative. 130 | 131 | America is leading the fight against global hunger. Today, more than half the world's food aid comes from the United States. And tonight I ask Congress to support an innovative proposal to provide food assistance by purchasing crops directly from farmers in the developing world, so we can build up local agriculture and help break the cycle of famine. 132 | 133 | America is leading the fight against disease. With your help, we're working to cut by half the number of malaria-related deaths in 15 African nations. And our Emergency Plan for AIDS Relief is treating 1.4 million people. We can bring healing and hope to many more. So I ask you to maintain the principles that have changed behavior and made this program a success. And I call on you to double our initial commitment to fighting HIV/AIDS by approving an additional $30 billion over the next five years. 134 | 135 | America is a force for hope in the world because we are a compassionate people, and some of the most compassionate Americans are those who have stepped forward to protect us. We must keep faith with all who have risked life and limb so that we might live in freedom and peace. Over the past seven years, we've increased funding for veterans by more than 95 percent. And as we increase funding, we must also reform our veterans system to meet the needs of a new war and a new generation. I call on Congress to enact the reforms recommended by Senator Bob Dole and Secretary Donna Shalala, so we can improve the system of care for our wounded warriors and help them build lives of hope and promise and dignity. 136 | 137 | Our military families also sacrifice for America. They endure sleepless nights and the daily struggle of providing for children while a loved one is serving far from home. We have a responsibility to provide for them. So I ask you to join me in expanding their access to child care, creating new hiring preferences for military spouses across the federal government, and allowing our troops to transfer their unused education benefits to their spouses or children. Our military families serve our nation; they inspire our nation; and tonight our nation honors them. 138 | 139 | The strength—the secret of our strength, the miracle of America is that our greatness lies not in our government, but in the spirit and determination of our people. When the federal convention met in Philadelphia in 1787, our nation was bound by the Articles of Confederation, which began with the words, "We the undersigned delegates." When Governor Morris was asked to draft the preamble to our new Constitution, he offered an important revision and opened with words that changed the course of our nation and the history of the world: "We the people." 140 | 141 | By trusting the people, our Founders wagered that a great and noble nation could be built on the liberty that resides in the hearts of all men and women. By trusting the people, succeeding generations transformed our fragile young democracy into the most powerful nation on Earth and a beacon of hope for millions. And so long as we continue to trust the people, our nation will prosper, our liberty will be secure, and the state of our Union will remain strong. 142 | 143 | So tonight, with confidence in freedom's power and trust in the people, let us set forth to do their business. God bless America. 144 | -------------------------------------------------------------------------------- /data/english_stopwords.txt: -------------------------------------------------------------------------------- 1 | i 2 | me 3 | my 4 | myself 5 | we 6 | our 7 | ours 8 | ourselves 9 | you 10 | your 11 | yours 12 | yourself 13 | yourselves 14 | he 15 | him 16 | his 17 | himself 18 | she 19 | her 20 | hers 21 | herself 22 | it 23 | its 24 | itself 25 | they 26 | them 27 | their 28 | theirs 29 | themselves 30 | what 31 | which 32 | who 33 | whom 34 | this 35 | that 36 | these 37 | those 38 | am 39 | is 40 | are 41 | was 42 | were 43 | be 44 | been 45 | being 46 | have 47 | has 48 | had 49 | having 50 | do 51 | does 52 | did 53 | doing 54 | a 55 | an 56 | the 57 | and 58 | but 59 | if 60 | or 61 | because 62 | as 63 | until 64 | while 65 | of 66 | at 67 | by 68 | for 69 | with 70 | about 71 | against 72 | between 73 | into 74 | through 75 | during 76 | before 77 | after 78 | above 79 | below 80 | to 81 | from 82 | up 83 | down 84 | in 85 | out 86 | on 87 | off 88 | over 89 | under 90 | again 91 | further 92 | then 93 | once 94 | here 95 | there 96 | when 97 | where 98 | why 99 | how 100 | all 101 | any 102 | both 103 | each 104 | few 105 | more 106 | most 107 | other 108 | some 109 | such 110 | no 111 | nor 112 | not 113 | only 114 | own 115 | same 116 | so 117 | than 118 | too 119 | very 120 | s 121 | t 122 | can 123 | will 124 | just 125 | don 126 | should 127 | now 128 | -------------------------------------------------------------------------------- /data/obama2016.txt: -------------------------------------------------------------------------------- 1 | AS FOUND ON https://millercenter.org/the-presidency/presidential-speeches/january-12-2016-2016-state-union-address 2 | FOR USE IN ACADEMIC WORKSHOP 3 | Mr. Speaker, Mr. Vice President, Members of Congress, my fellow Americans: 4 | 5 | Tonight marks the eighth year that I’ve come here to report on the State of the Union. And for this final one, I’m going to try to make it a little shorter. (Applause.) I know some of you are antsy to get back to Iowa. (Laughter.) I've been there. I'll be shaking hands afterwards if you want some tips. (Laughter.) 6 | 7 | And I understand that because it’s an election season, expectations for what we will achieve this year are low. But, Mr. Speaker, I appreciate the constructive approach that you and the other leaderstook at the end of last year to pass a budget and make tax cuts permanent for working families. So I hope we can work together this year on some bipartisan priorities like criminal justice reform -- (applause) -- and helping people who are battling prescription drug abuse and heroin abuse. (Applause.) So, who knows, we might surprise the cynics again. 8 | 9 | But tonight, I want to go easy on the traditional list of proposals for the year ahead. Don’t worry, I’ve got plenty, from helping students learn to write computer code to personalizing medical treatments for patients. And I will keep pushing for progress on the work that I believe still needs to be done. Fixing a broken immigration system. (Applause.) Protecting our kids from gun violence. (Applause.)Equal pay for equal work. (Applause.) Paid leave. (Applause.)Raising the minimum wage. (Applause.) All these things still matter to hardworking families. They’re still the right thing to do. And I won't let up until they get done. 10 | 11 | But for my final address to this chamber, I don’t want to just talk about next year. I want to focus on the next five years, the next 10 years, and beyond. I want to focus on our future. 12 | 13 | We live in a time of extraordinary change -- change that’s reshaping the way we live, the way we work, our planet, our place in the world. It’s change that promises amazing medical breakthroughs, but also economic disruptions that strain working families. It promises education for girls in the most remote villages, but also connects terrorists plotting an ocean away. It’s change that can broaden opportunity, or widen inequality. And whether we like it or not, the pace of this change will only accelerate. 14 | 15 | America has been through big changes before -- wars and depression, the influx of new immigrants, workers fighting for a fair deal, movements to expand civil rights. Each time, there have been those who told us to fear the future; who claimed we could slam the brakes on change; who promised to restore past glory if we just got some group or idea that was threatening America under control. And each time, we overcame those fears. We did not, in the words of Lincoln, adhere to the “dogmas of the quiet past.” Instead we thought anew, and acted anew. We made change work for us, always extending America’s promise outward, to the next frontier, to more people. And because we did -- because we saw opportunity where others saw only peril -- we emerged stronger and better than before. 16 | 17 | What was true then can be true now. Our unique strengths as a nation -- our optimism and work ethic, our spirit of discovery, our diversity, our commitment to rule of law -- these things give us everything we need to ensure prosperity and security for generations to come. 18 | 19 | In fact, it’s that spirit that made the progress of these past seven years possible. It’s how we recovered from the worst economic crisis in generations. It’s how we reformed our health care system, and reinvented our energy sector; how we delivered more care and benefits to our troops and veterans, and how we secured the freedom in every state to marry the person we love. 20 | 21 | But such progress is not inevitable. It’s the result of choices we make together. And we face such choices right now. Will we respond to the changes of our time with fear, turning inward as a nation, turning against each other as a people? Or will we face the future with confidence in who we are, in what we stand for, in the incredible things that we can do together? 22 | 23 | So let’s talk about the future, and four big questions that I believe we as a country have to answer -- regardless of who the next President is, or who controls the next Congress. 24 | 25 | First, how do we give everyone a fair shot at opportunity and security in this new economy? (Applause.) 26 | 27 | Second, how do we make technology work for us, and not against us -- especially when it comes to solving urgent challenges like climate change? (Applause.) 28 | 29 | Third, how do we keep America safe and lead the world without becoming its policeman? (Applause.) 30 | 31 | And finally, how can we make our politics reflect what’s best in us, and not what’s worst? 32 | 33 | Let me start with the economy, and a basic fact: The United States of America, right now, has the strongest, most durable economy in the world. (Applause.) We’re in the middle of the longest streak of private sector job creation in history. (Applause.) More than 14 million new jobs, the strongest two years of job growth since the ‘90s, an unemployment rate cut in half. Our auto industry just had its best year ever. (Applause.) That's just part of a manufacturing surge that's created nearly 900,000 new jobs in the past six years.And we’ve done all this while cutting our deficits by almost three-quarters. (Applause.) 34 | 35 | Anyone claiming that America’s economy is in decline is peddling fiction. (Applause.) Now, what is true -- and the reason that a lot of Americans feel anxious -- is that the economy has been changing in profound ways, changes that started long before the Great Recession hit; changes that have not let up. 36 | 37 | Today, technology doesn’t just replace jobs on the assembly line, but any job where work can be automated. Companies in a global economy can locate anywhere, and they face tougher competition. As a result, workers have less leverage for a raise. Companies have less loyalty to their communities. And more and more wealth and income is concentrated at the very top. 38 | 39 | All these trends have squeezed workers, even when they have jobs; even when the economy is growing. It’s made it harder for a hardworking family to pull itself out of poverty, harder for young people to start their careers, tougher for workers to retire when they want to. And although none of these trends are unique to America, they do offend our uniquely American belief that everybody who works hard should get a fair shot. 40 | 41 | For the past seven years, our goal has been a growing economy that works also better for everybody. We’ve made progress. But we need to make more. And despite all the political arguments that we’ve had these past few years, there are actually some areas where Americans broadly agree. 42 | 43 | We agree that real opportunity requires every American to get the education and training they need to land a good-paying job. The bipartisan reform of No Child Left Behind was an important start, and together, we’ve increased early childhood education, lifted high school graduation rates to new highs, boosted graduates in fields like engineering. In the coming years, we should build on that progress, by providing Pre-K for all and -- (applause) -- offering every student the hands-on computer science and math classes that make them job-ready on day one. We should recruit and support more great teachers for our kids. (Applause.) 44 | 45 | And we have to make college affordable for every American. (Applause.) No hardworking student should be stuck in the red.We’ve already reduced student loan payments to 10 percent of a borrower’s income. And that's good. But now, we’ve actually got to cut the cost of college. (Applause.) Providing two years of community college at no cost for every responsible student is one of the best ways to do that, and I’m going to keep fighting to get that started this year. (Applause.) It's the right thing to do. (Applause.) 46 | 47 | But a great education isn’t all we need in this new economy. We also need benefits and protections that provide a basic measure of security. It’s not too much of a stretch to say that some of the only people in America who are going to work the same job, in the same place, with a health and retirement package for 30 years are sitting in this chamber. (Laughter.) For everyone else, especially folks in their 40s and 50s, saving for retirement or bouncing back from job loss has gotten a lot tougher. Americans understand that at some point in their careers, in this new economy, they may have to retool and they may have to retrain. But they shouldn’t lose what they’ve already worked so hard to build in the process. 48 | 49 | That’s why Social Security and Medicare are more important than ever. We shouldn’t weaken them; we should strengthen them.(Applause.) And for Americans short of retirement, basic benefits should be just as mobile as everything else is today. That, by the way, is what the Affordable Care Act is all about. It’s about filling the gaps in employer-based care so that when you lose a job, or you go back to school, or you strike out and launch that new business, you’ll still have coverage. Nearly 18 million people have gained coverage so far. (Applause.) And in the process, health care inflation has slowed.And our businesses have created jobs every single month since it became law. 50 | 51 | Now, I’m guessing we won’t agree on health care anytime soon. (Applause.) A little applause right there. (Laughter.) Just a guess. But there should be other ways parties can work together to improve economic security. Say a hardworking American loses his job -- we shouldn’t just make sure that he can get unemployment insurance; we should make sure that program encourages him to retrain for a business that’s ready to hire him. If that new job doesn’t pay as much, there should be a system of wage insurance in place so that he can still pay his bills. And even if he’s going from job to job, he should still be able to save for retirement and take his savings with him. That’s the way we make the new economy work better for everybody. 52 | 53 | I also know Speaker Ryan has talked about his interest in tackling poverty. America is about giving everybody willing to work a chance, a hand up. And I’d welcome a serious discussion about strategies we can all support, like expanding tax cuts for low-income workers who don't have children. (Applause.) 54 | 55 | But there are some areas where we just have to be honest -- it has been difficult to find agreement over the last seven years. And a lot of them fall under the category of what role the government should play in making sure the system’s not rigged in favor of the wealthiest and biggest corporations. (Applause.) And it's an honest disagreement, and the American people have a choice to make. 56 | 57 | I believe a thriving private sector is the lifeblood of our economy. I think there are outdated regulations that need to be changed. There is red tape that needs to be cut. (Applause.) There you go! Yes! (Applause.) But after years now of record corporate profits, working families won’t get more opportunity or bigger paychecks just by letting big banks or big oil or hedge funds make their own rules at everybody else’s expense. (Applause.) Middle-class families are not going to feel more secure because we allowed attacks on collective bargaining to go unanswered. Food Stamp recipients did not cause the financial crisis; recklessness on Wall Street did. (Applause.) Immigrants aren’t the principal reason wages haven’t gone up; those decisions are made in the boardrooms that all too often put quarterly earnings over long-term returns. It’s sure not the average family watching tonight that avoids paying taxes through offshore accounts. (Applause.) 58 | 59 | The point is, I believe that in this new economy, workers and start-ups and small businesses need more of a voice, not less. The rules should work for them. (Applause.) And I'm not alone in this. This year I plan to lift up the many businesses who’ve figured out that doing right by their workers or their customers or their communities ends up being good for their shareholders. (Applause.) And I want to spread those best practices across America. That's part of a brighter future. (Applause.) 60 | 61 | In fact, it turns out many of our best corporate citizens are also our most creative. And this brings me to the second big question we as a country have to answer: How do we reignite that spirit of innovation to meet our biggest challenges? 62 | 63 | Sixty years ago, when the Russians beat us into space, we didn’t deny Sputnik was up there. (Laughter.) We didn’t argue about the science, or shrink our research and development budget. We built a space program almost overnight. And 12 years later, we were walking on the moon. (Applause.) 64 | 65 | Now, that spirit of discovery is in our DNA. America is Thomas Edison and the Wright Brothers and George Washington Carver. America is Grace Hopper and Katherine Johnson and Sally Ride. America is every immigrant and entrepreneur from Boston to Austin to Silicon Valley, racing to shape a better world. (Applause.) That's who we are. 66 | 67 | And over the past seven years, we’ve nurtured that spirit. We’ve protected an open Internet, and taken bold new steps to get more students and low-income Americans online. (Applause.) We’ve launched next-generation manufacturing hubs, and online tools that give an entrepreneur everything he or she needs to start a business in a single day. But we can do so much more. 68 | 69 | Last year, Vice President Biden said that with a new moonshot,America can cure cancer. Last month, he worked with this Congress to give scientists at the National Institutes of Health the strongest resources that they’ve had in over a decade. (Applause.) So tonight, I’m announcing a new national effort to get it done. And because he’s gone to the mat for all of us on so many issues over the past 40 years, I’m putting Joe in charge of Mission Control. (Applause.) For the loved ones we’ve all lost, for the families that we can still save, let’s make America the country that cures cancer once and for all.(Applause.) 70 | 71 | Medical research is critical. We need the same level of commitment when it comes to developing clean energy sources. (Applause.)Look, if anybody still wants to dispute the science around climate change, have at it. You will be pretty lonely, because you’ll be debating our military, most of America’s business leaders, the majority of the American people, almost the entire scientific community, and 200 nations around the world who agree it’s a problem and intend to solve it. (Applause.) 72 | 73 | But even if -- even if the planet wasn’t at stake, even if 2014 wasn’t the warmest year on record -- until 2015 turned out to be even hotter -- why would we want to pass up the chance for American businesses to produce and sell the energy of the future? (Applause.) 74 | 75 | Listen, seven years ago, we made the single biggest investment in clean energy in our history. Here are the results. In fields from Iowa to Texas, wind power is now cheaper than dirtier, conventional power. On rooftops from Arizona to New York, solar is saving Americans tens of millions of dollars a year on their energy bills, and employs more Americans than coal -- in jobs that pay better than average. We’re taking steps to give homeowners the freedom to generate and store their own energy -- something, by the way, that environmentalists and Tea Partiers have teamed up to support. And meanwhile, we’ve cut our imports of foreign oil by nearly 60 percent, and cut carbon pollution more than any other country on Earth. (Applause.) 76 | 77 | Gas under two bucks a gallon ain’t bad, either. (Applause.) 78 | 79 | Now we’ve got to accelerate the transition away from old, dirtier energy sources. Rather than subsidize the past, we should invest in the future -- especially in communities that rely on fossil fuels. We do them no favor when we don't show them where the trends are going. That’s why I’m going to push to change the way we manage our oil and coal resources, so that they better reflect the costs they impose on taxpayers and our planet. And that way, we put money back into those communities, and put tens of thousands of Americans to work building a 21st century transportation system. (Applause.) 80 | 81 | Now, none of this is going to happen overnight. And, yes, there are plenty of entrenched interests who want to protect the status quo. But the jobs we’ll create, the money we’ll save, the planet we’ll preserve -- that is the kind of future our kids and our grandkids deserve. And it's within our grasp. 82 | 83 | Climate change is just one of many issues where our security is linked to the rest of the world. And that’s why the third big question that we have to answer together is how to keep America safe and strong without either isolating ourselves or trying to nation-build everywhere there’s a problem. 84 | 85 | I told you earlier all the talk of America’s economic decline is political hot air. Well, so is all the rhetoric you hear about our enemies getting stronger and America getting weaker. Let me tell you something. The United States of America is the most powerful nation on Earth. Period. (Applause.) Period. It’s not even close. It's not even close. (Applause.) It's not even close. We spend more on our military than the next eight nations combined. Our troops are the finest fighting force in the history of the world. (Applause.) No nation attacks us directly, or our allies, because they know that’s the path to ruin. Surveys show our standing around the world is higher than when I was elected to this office, and when it comes to every important international issue, people of the world do not look to Beijing or Moscow to lead -- they call us. (Applause.) 86 | 87 | I mean, it's useful to level the set here, because when we don't, we don't make good decisions. 88 | 89 | Now, as someone who begins every day with an intelligence briefing, I know this is a dangerous time. But that’s not primarily because of some looming superpower out there, and certainly not because of diminished American strength. In today’s world, we’re threatened less by evil empires and more by failing states. 90 | 91 | The Middle East is going through a transformation that will play out for a generation, rooted in conflicts that date back millennia. Economic headwinds are blowing in from a Chinese economy that is in significant transition. Even as their economy severely contracts, Russia is pouring resources in to prop up Ukraine and Syria -- client states that they saw slipping away from their orbit. And the international system we built after World War II is now struggling to keep pace with this new reality. 92 | 93 | It’s up to us, the United States of America, to help remake that system. And to do that well it means that we’ve got to set priorities. 94 | 95 | Priority number one is protecting the American people and going after terrorist networks. (Applause.) Both al Qaeda and now ISIL pose a direct threat to our people, because in today’s world, even a handful of terrorists who place no value on human life, including their own, can do a lot of damage. They use the Internet to poison the minds of individuals inside our country. Their actions undermine and destabilize our allies. We have to take them out./p> 96 | 97 | But as we focus on destroying ISIL, over-the-top claims that this is World War III just play into their hands. Masses of fighters on the back of pickup trucks, twisted souls plotting in apartments or garages -- they pose an enormous danger to civilians; they have to be stopped. But they do not threaten our national existence. (Applause.) That is the story ISIL wants to tell. That’s the kind of propaganda they use to recruit. We don’t need to build them up to show that we’re serious, and we sure don't need to push away vital allies in this fight by echoing the lie that ISIL is somehow representative of one of the world’s largest religions. (Applause.) We just need to call them what they are -- killers and fanatics who have to be rooted out, hunted down, and destroyed. (Applause.) 98 | 99 | And that’s exactly what we’re doing. For more than a year, America has led a coalition of more than 60 countries to cut off ISIL’s financing, disrupt their plots, stop the flow of terrorist fighters, and stamp out their vicious ideology. With nearly 10,000 air strikes, we’re taking out their leadership, their oil, their training camps, their weapons. We’re training, arming, and supporting forces who are steadily reclaiming territory in Iraq and Syria. 100 | 101 | If this Congress is serious about winning this war, and wants to send a message to our troops and the world, authorize the use of military force against ISIL. Take a vote. (Applause.) Take a vote. But the American people should know that with or without congressional action, ISIL will learn the same lessons as terrorists before them. If you doubt America’s commitment -- or mine -- to see that justice is done, just ask Osama bin Laden. (Applause.) Ask the leader of al Qaeda in Yemen, who was taken out last year, or the perpetrator of the Benghazi attacks, who sits in a prison cell. When you come after Americans, we go after you. (Applause.) And it may take time, but we have long memories, and our reach has no limits. (Applause.) 102 | 103 | Our foreign policy hast to be focused on the threat from ISIL and al Qaeda, but it can’t stop there. For even without ISIL, even without al Qaeda, instability will continue for decades in many parts of the world -- in the Middle East, in Afghanistan, parts of Pakistan, in parts of Central America, in Africa, and Asia. Some of these places may become safe havens for new terrorist networks. Others will just fall victim to ethnic conflict, or famine, feeding the next wave of refugees. The world will look to us to help solve these problems, and our answer needs to be more than tough talk or calls to carpet-bomb civilians. That may work as a TV sound bite, but it doesn’t pass muster on the world stage. 104 | 105 | We also can’t try to take over and rebuild every country that falls into crisis, even if it's done with the best of intentions. (Applause.) That’s not leadership; that’s a recipe for quagmire, spilling American blood and treasure that ultimately will weaken us. It’s the lesson of Vietnam; it's the lesson of Iraq -- and we should have learned it by now. (Applause.) 106 | 107 | Fortunately, there is a smarter approach, a patient and disciplined strategy that uses every element of our national power. It says America will always act, alone if necessary, to protect our people and our allies; but on issues of global concern, we will mobilize the world to work with us, and make sure other countries pull their own weight. 108 | 109 | That’s our approach to conflicts like Syria, where we’re partnering with local forces and leading international efforts to help that broken society pursue a lasting peace. 110 | 111 | That’s why we built a global coalition, with sanctions and principled diplomacy, to prevent a nuclear-armed Iran. And as we speak, Iran has rolled back its nuclear program, shipped out its uranium stockpile, and the world has avoided another war. (Applause.) 112 | 113 | That’s how we stopped the spread of Ebola in West Africa. (Applause.) Our military, our doctors, our development workers -- they were heroic; they set up the platform that then allowed other countries to join in behind us and stamp out that epidemic. Hundreds of thousands, maybe a couple million lives were saved. 114 | 115 | That’s how we forged a Trans-Pacific Partnership to open markets, and protect workers and the environment, and advance American leadership in Asia. It cuts 18,000 taxes on products made in America, which will then support more good jobs here in America. With TPP, China does not set the rules in that region; we do. You want to show our strength in this new century? Approve this agreement. Give us the tools to enforce it. It's the right thing to do. (Applause.) 116 | 117 | Let me give you another example. Fifty years of isolating Cuba had failed to promote democracy, and set us back in Latin America. That’s why we restored diplomatic relations -- (applause) -- opened the door to travel and commerce, positioned ourselves to improve the lives of the Cuban people. (Applause.) So if you want to consolidate our leadership and credibility in the hemisphere, recognize that the Cold War is over -- lift the embargo. (Applause.) 118 | 119 | The point is American leadership in the 21st century is not a choice between ignoring the rest of the world -- except when we kill terrorists -- or occupying and rebuilding whatever society is unraveling. Leadership means a wise application of military power, and rallying the world behind causes that are right. It means seeing our foreign assistance as a part of our national security, not something separate, not charity. 120 | 121 | When we lead nearly 200 nations to the most ambitious agreement in history to fight climate change, yes, that helps vulnerable countries, but it also protects our kids. When we help Ukraine defend its democracy, or Colombia resolve a decades-long war, that strengthens the international order we depend on. When we help African countries feed their people and care for the sick -- (applause) -- it's the right thing to do, and it prevents the next pandemic from reaching our shores. Right now, we’re on track to end the scourge of HIV/AIDS. That's within our grasp. (Applause.) And we have the chance to accomplish the same thing with malaria -- something I’ll be pushing this Congress to fund this year. (Applause.) 122 | 123 | That's American strength. That's American leadership. And that kind of leadership depends on the power of our example. That’s why I will keep working to shut down the prison at Guantanamo. (Applause.) It is expensive, it is unnecessary, and it only serves as a recruitment brochure for our enemies. (Applause.) There’s a better way. (Applause.) 124 | 125 | And that’s why we need to reject any politics -- any politics -- that targets people because of race or religion. (Applause.) Let me just say this. This is not a matter of political correctness. This is a matter of understanding just what it is that makes us strong. The world respects us not just for our arsenal; it respects us for our diversity, and our openness, and the way we respect every faith. 126 | 127 | His Holiness, Pope Francis, told this body from the very spot that I'm standing on tonight that “to imitate the hatred and violence of tyrants and murderers is the best way to take their place.” When politicians insult Muslims, whether abroad or our fellow citizens, when a mosque is vandalized, or a kid is called names, that doesn’t make us safer. That’s not telling it like it is. It’s just wrong. (Applause.) It diminishes us in the eyes of the world. It makes it harder to achieve our goals. It betrays who we are as a country. (Applause.) 128 | 129 | “We the People.” Our Constitution begins with those three simple words, words we’ve come to recognize mean all the people, not just some; words that insist we rise and fall together, and that's how we might perfect our Union. And that brings me to the fourth, and maybe the most important thing that I want to say tonight. 130 | 131 | The future we want -- all of us want -- opportunity and security for our families, a rising standard of living, a sustainable, peaceful planet for our kids -- all that is within our reach. But it will only happen if we work together. It will only happen if we can have rational, constructive debates. It will only happen if we fix our politics. 132 | 133 | A better politics doesn’t mean we have to agree on everything. This is a big country -- different regions, different attitudes, different interests. That’s one of our strengths, too. Our Founders distributed power between states and branches of government, and expected us to argue, just as they did, fiercely, over the size and shape of government, over commerce and foreign relations, over the meaning of liberty and the imperatives of security. 134 | 135 | But democracy does require basic bonds of trust between its citizens. It doesn’t work if we think the people who disagree with us are all motivated by malice. It doesn’t work if we think that our political opponents are unpatriotic or trying to weaken America. Democracy grinds to a halt without a willingness to compromise, or when even basic facts are contested, or when we listen only to those who agree with us. Our public life withers when only the most extreme voices get all the attention. And most of all, democracy breaks down when the average person feels their voice doesn’t matter; that the system is rigged in favor of the rich or the powerful or some special interest. 136 | 137 | Too many Americans feel that way right now. It’s one of the few regrets of my presidency -- that the rancor and suspicion between the parties has gotten worse instead of better. I have no doubt a president with the gifts of Lincoln or Roosevelt might have better bridged the divide, and I guarantee I’ll keep trying to be better so long as I hold this office. 138 | 139 | But, my fellow Americans, this cannot be my task -- or any President’s -- alone. There are a whole lot of folks in this chamber, good people who would like to see more cooperation, would like to see a more elevated debate in Washington, but feel trapped by the imperatives of getting elected, by the noise coming out of your base. I know; you’ve told me. It's the worst-kept secret in Washington. And a lot of you aren't enjoying being trapped in that kind of rancor. 140 | 141 | But that means if we want a better politics -- and I'm addressing the American people now -- if we want a better politics, it’s not enough just to change a congressman or change a senator or even change a President. We have to change the system to reflect our better selves. I think we've got to end the practice of drawing our congressional districts so that politicians can pick their voters, and not the other way around. (Applause.) Let a bipartisan group do it. (Applause.) 142 | 143 | We have to reduce the influence of money in our politics, so that a handful of families or hidden interests can’t bankroll our elections. (Applause.) And if our existing approach to campaign finance reform can’t pass muster in the courts, we need to work together to find a real solution -- because it's a problem. And most of you don't like raising money. I know; I've done it. (Applause.) We’ve got to make it easier to vote, not harder. (Applause.) We need to modernize it for the way we live now. (Applause.) This is America: We want to make it easier for people to participate. And over the course of this year, I intend to travel the country to push for reforms that do just that. 144 | 145 | But I can’t do these things on my own. (Applause.) Changes in our political process -- in not just who gets elected, but how they get elected -- that will only happen when the American people demand it. It depends on you. That’s what’s meant by a government of, by, and for the people. 146 | 147 | What I’m suggesting is hard. It’s a lot easier to be cynical; to accept that change is not possible, and politics is hopeless, and the problem is all the folks who are elected don't care, and to believe that our voices and actions don’t matter. But if we give up now, then we forsake a better future. Those with money and power will gain greater control over the decisions that could send a young soldier to war, or allow another economic disaster, or roll back the equal rights and voting rights that generations of Americans have fought, even died, to secure. And then, as frustration grows, there will be voices urging us to fall back into our respective tribes, to scapegoat fellow citizens who don’t look like us, or pray like us, or vote like we do, or share the same background. 148 | 149 | We can’t afford to go down that path. It won’t deliver the economy we want. It will not produce the security we want. But most of all, it contradicts everything that makes us the envy of the world. 150 | 151 | So, my fellow Americans, whatever you may believe, whether you prefer one party or no party, whether you supported my agenda or fought as hard as you could against it -- our collective futures depends on your willingness to uphold your duties as a citizen. To vote. To speak out. To stand up for others, especially the weak, especially the vulnerable, knowing that each of us is only here because somebody, somewhere, stood up for us. (Applause.) We need every American to stay active in our public life -- and not just during election time -- so that our public life reflects the goodness and the decency that I see in the American people every single day. 152 | 153 | It is not easy. Our brand of democracy is hard. But I can promise that a little over a year from now, when I no longer hold this office, I will be right there with you as a citizen, inspired by those voices of fairness and vision, of grit and good humor and kindness that helped America travel so far. Voices that help us see ourselves not, first and foremost, as black or white, or Asian or Latino, not as gay or straight, immigrant or native born, not as Democrat or Republican, but as Americans first, bound by a common creed. Voices Dr. King believed would have the final word -- voices of unarmed truth and unconditional love. 154 | 155 | And they’re out there, those voices. They don’t get a lot of attention; they don't seek a lot of fanfare; but they’re busy doing the work this country needs doing. I see them everywhere I travel in this incredible country of ours. I see you, the American people. And in your daily acts of citizenship, I see our future unfolding. 156 | 157 | I see it in the worker on the assembly line who clocked extra shifts to keep his company open, and the boss who pays him higher wages instead of laying him off. 158 | 159 | I see it in the Dreamer who stays up late to finish her science project, and the teacher who comes in early because he knows she might someday cure a disease. 160 | 161 | I see it in the American who served his time, and made mistakes as a child but now is dreaming of starting over -- and I see it in the business owner who gives him that second chance. The protester determined to prove that justice matters -- and the young cop walking the beat, treating everybody with respect, doing the brave, quiet work of keeping us safe. (Applause.) 162 | 163 | I see it in the soldier who gives almost everything to save his brothers, the nurse who tends to him till he can run a marathon, the community that lines up to cheer him on. 164 | 165 | It’s the son who finds the courage to come out as who he is, and the father whose love for that son overrides everything he’s been taught. (Applause.) 166 | 167 | I see it in the elderly woman who will wait in line to cast her vote as long as she has to; the new citizen who casts his vote for the first time; the volunteers at the polls who believe every vote should count -- because each of them in different ways know how much that precious right is worth. 168 | 169 | That's the America I know. That’s the country we love. Clear-eyed. Big-hearted. Undaunted by challenge. Optimistic that unarmed truth and unconditional love will have the final word. (Applause.) That’s what makes me so hopeful about our future. I believe in change because I believe in you, the American people. 170 | 171 | And that’s why I stand here confident as I have ever been that the State of our Union is strong. (Applause.) 172 | 173 | Thank you, God bless you. God bless the United States of America. 174 | -------------------------------------------------------------------------------- /data/tree.svg: -------------------------------------------------------------------------------- 1 | 2 | Mr.NNPSpeaker,NNPMr. Vice President,NNPmembersNNSofINCongress,NNPhonoredVBDguests,NNSmyPRP$fellowJJAmericans:NNPS SPWePRPareVBPfortunateJJtoTObeVBaliveJJatINthis momentNNinINhistory.NNcompoundnpadvmodnsubjnsubjpreppobjdobjpossamodapposnsubjrelclacompauxxcompacomppreppobjpreppobj -------------------------------------------------------------------------------- /data/trump.txt: -------------------------------------------------------------------------------- 1 | AS FOUND ON http://www.cnn.com/2017/02/28/politics/donald-trump-speech-transcript-full-text/ 2 | AND TO BE USED FOR ACADEMIC WORKSHOP 3 | Mr. Speaker, Mr. Vice President, Members of Congress, the First Lady of the United States, and Citizens of America: 4 | Tonight, as we mark the conclusion of our celebration of Black History Month, we are reminded of our Nation's path toward civil rights and the work that still remains. Recent threats targeting Jewish Community Centers and vandalism of Jewish cemeteries, as well as last week's shooting in Kansas City, remind us that while we may be a Nation divided on policies, we are a country that stands united in condemning hate and evil in all its forms. 5 | Each American generation passes the torch of truth, liberty and justice --- in an unbroken chain all the way down to the present. 6 | That torch is now in our hands. And we will use it to light up the world. I am here tonight to deliver a message of unity and strength, and it is a message deeply delivered from my heart. 7 | A new chapter of American Greatness is now beginning. 8 | A new national pride is sweeping across our Nation. 9 | And a new surge of optimism is placing impossible dreams firmly within our grasp. 10 | What we are witnessing today is the Renewal of the American Spirit. 11 | Our allies will find that America is once again ready to lead. 12 | All the nations of the world -- friend or foe -- will find that America is strong, America is proud, and America is free. 13 | In 9 years, the United States will celebrate the 250th anniversary of our founding -- 250 years since the day we declared our Independence. 14 | It will be one of the great milestones in the history of the world. 15 | But what will America look like as we reach our 250th year? What kind of country will we leave for our children? 16 | I will not allow the mistakes of recent decades past to define the course of our future. 17 | For too long, we've watched our middle class shrink as we've exported our jobs and wealth to foreign countries. 18 | We've financed and built one global project after another, but ignored the fates of our children in the inner cities of Chicago, Baltimore, Detroit -- and so many other places throughout our land. 19 | We've defended the borders of other nations, while leaving our own borders wide open, for anyone to cross -- and for drugs to pour in at a now unprecedented rate. 20 | And we've spent trillions of dollars overseas, while our infrastructure at home has so badly crumbled. 21 | Then, in 2016, the earth shifted beneath our feet. The rebellion started as a quiet protest, spoken by families of all colors and creeds --- families who just wanted a fair shot for their children, and a fair hearing for their concerns. 22 | But then the quiet voices became a loud chorus -- as thousands of citizens now spoke out together, from cities small and large, all across our country. 23 | Finally, the chorus became an earthquake -- and the people turned out by the tens of millions, and they were all united by one very simple, but crucial demand, that America must put its own citizens first ... because only then, can we truly MAKE AMERICA GREAT AGAIN. 24 | Dying industries will come roaring back to life. Heroic veterans will get the care they so desperately need. 25 | Our military will be given the resources its brave warriors so richly deserve. 26 | Crumbling infrastructure will be replaced with new roads, bridges, tunnels, airports and railways gleaming across our beautiful land. 27 | Our terrible drug epidemic will slow down and ultimately, stop. 28 | And our neglected inner cities will see a rebirth of hope, safety, and opportunity. 29 | Above all else, we will keep our promises to the American people. 30 | It's been a little over a month since my inauguration, and I want to take this moment to update the Nation on the progress I've made in keeping those promises. 31 | Since my election, Ford, Fiat-Chrysler, General Motors, Sprint, Softbank, Lockheed, Intel, Walmart, and many others, have announced that they will invest billions of dollars in the United States and will create tens of thousands of new American jobs. 32 | The stock market has gained almost three trillion dollars in value since the election on November 8th, a record. We've saved taxpayers hundreds of millions of dollars by bringing down the price of the fantastic new F-35 jet fighter, and will be saving billions more dollars on contracts all across our Government. We have placed a hiring freeze on non-military and non-essential Federal workers. 33 | We have begun to drain the swamp of government corruption by imposing a 5 year ban on lobbying by executive branch officials --- and a lifetime ban on becoming lobbyists for a foreign government. 34 | We have undertaken a historic effort to massively reduce job‑crushing regulations, creating a deregulation task force inside of every Government agency; imposing a new rule which mandates that for every 1 new regulation, 2 old regulations must be eliminated; and stopping a regulation that threatens the future and livelihoods of our great coal miners. 35 | We have cleared the way for the construction of the Keystone and Dakota Access Pipelines -- thereby creating tens of thousands of jobs -- and I've issued a new directive that new American pipelines be made with American steel. 36 | We have withdrawn the United States from the job-killing Trans-Pacific Partnership. 37 | With the help of Prime Minister Justin Trudeau, we have formed a Council with our neighbors in Canada to help ensure that women entrepreneurs have access to the networks, markets and capital they need to start a business and live out their financial dreams. 38 | To protect our citizens, I have directed the Department of Justice to form a Task Force on Reducing Violent Crime. 39 | I have further ordered the Departments of Homeland Security and Justice, along with the Department of State and the Director of National Intelligence, to coordinate an aggressive strategy to dismantle the criminal cartels that have spread across our Nation. 40 | We will stop the drugs from pouring into our country and poisoning our youth -- and we will expand treatment for those who have become so badly addicted. 41 | At the same time, my Administration has answered the pleas of the American people for immigration enforcement and border security. By finally enforcing our immigration laws, we will raise wages, help the unemployed, save billions of dollars, and make our communities safer for everyone. We want all Americans to succeed --- but that can't happen in an environment of lawless chaos. We must restore integrity and the rule of law to our borders. 42 | For that reason, we will soon begin the construction of a great wall along our southern border. It will be started ahead of schedule and, when finished, it will be a very effective weapon against drugs and crime. 43 | As we speak, we are removing gang members, drug dealers and criminals that threaten our communities and prey on our citizens. Bad ones are going out as I speak tonight and as I have promised. 44 | To any in Congress who do not believe we should enforce our laws, I would ask you this question: what would you say to the American family that loses their jobs, their income, or a loved one, because America refused to uphold its laws and defend its borders? 45 | Our obligation is to serve, protect, and defend the citizens of the United States. We are also taking strong measures to protect our Nation from Radical Islamic Terrorism. 46 | According to data provided by the Department of Justice, the vast majority of individuals convicted for terrorism-related offenses since 9/11 came here from outside of our country. We have seen the attacks at home --- from Boston to San Bernardino to the Pentagon and yes, even the World Trade Center. 47 | We have seen the attacks in France, in Belgium, in Germany and all over the world. 48 | It is not compassionate, but reckless, to allow uncontrolled entry from places where proper vetting cannot occur. Those given the high honor of admission to the United States should support this country and love its people and its values. 49 | We cannot allow a beachhead of terrorism to form inside America -- we cannot allow our Nation to become a sanctuary for extremists. 50 | That is why my Administration has been working on improved vetting procedures, and we will shortly take new steps to keep our Nation safe -- and to keep out those who would do us harm. 51 | As promised, I directed the Department of Defense to develop a plan to demolish and destroy ISIS -- a network of lawless savages that have slaughtered Muslims and Christians, and men, women, and children of all faiths and beliefs. We will work with our allies, including our friends and allies in the Muslim world, to extinguish this vile enemy from our planet. 52 | I have also imposed new sanctions on entities and individuals who support Iran's ballistic missile program, and reaffirmed our unbreakable alliance with the State of Israel. 53 | Finally, I have kept my promise to appoint a Justice to the United States Supreme Court -- from my list of 20 judges -- who will defend our Constitution. I am honored to have Maureen Scalia with us in the gallery tonight. Her late, great husband, Antonin Scalia, will forever be a symbol of American justice. To fill his seat, we have chosen Judge Neil Gorsuch, a man of incredible skill, and deep devotion to the law. He was confirmed unanimously to the Court of Appeals, and I am asking the Senate to swiftly approve his nomination. 54 | Tonight, as I outline the next steps we must take as a country, we must honestly acknowledge the circumstances we inherited. 55 | Ninety-four million Americans are out of the labor force. 56 | Over 43 million people are now living in poverty, and over 43 million Americans are on food stamps. 57 | More than 1 in 5 people in their prime working years are not working. 58 | We have the worst financial recovery in 65 years. 59 | In the last 8 years, the past Administration has put on more new debt than nearly all other Presidents combined. 60 | We've lost more than one-fourth of our manufacturing jobs since NAFTA was approved, and we've lost 60,000 factories since China joined the World Trade Organization in 2001. 61 | Our trade deficit in goods with the world last year was nearly $800 billion dollars. 62 | And overseas, we have inherited a series of tragic foreign policy disasters. 63 | Solving these, and so many other pressing problems, will require us to work past the differences of party. It will require us to tap into the American spirit that has overcome every challenge throughout our long and storied history. 64 | But to accomplish our goals at home and abroad, we must restart the engine of the American economy -- making it easier for companies to do business in the United States, and much harder for companies to leave. 65 | Right now, American companies are taxed at one of the highest rates anywhere in the world. 66 | My economic team is developing historic tax reform that will reduce the tax rate on our companies so they can compete and thrive anywhere and with anyone. At the same time, we will provide massive tax relief for the middle class. 67 | We must create a level playing field for American companies and workers. 68 | Currently, when we ship products out of America, many other countries make us pay very high tariffs and taxes -- but when foreign companies ship their products into America, we charge them almost nothing. 69 | I just met with officials and workers from a great American company, Harley-Davidson. In fact, they proudly displayed five of their magnificent motorcycles, made in the USA, on the front lawn of the White House. 70 | At our meeting, I asked them, how are you doing, how is business? They said that it's good. I asked them further how they are doing with other countries, mainly international sales. They told me -- without even complaining because they have been mistreated for so long that they have become used to it -- that it is very hard to do business with other countries because they tax our goods at such a high rate. They said that in one case another country taxed their motorcycles at 100 percent. 71 | They weren't even asking for change. But I am. 72 | I believe strongly in free trade but it also has to be FAIR TRADE. 73 | The first Republican President, Abraham Lincoln, warned that the "abandonment of the protective policy by the American Government [will] produce want and ruin among our people." 74 | Lincoln was right -- and it is time we heeded his words. I am not going to let America and its great companies and workers, be taken advantage of anymore. 75 | I am going to bring back millions of jobs. Protecting our workers also means reforming our system of legal immigration. The current, outdated system depresses wages for our poorest workers, and puts great pressure on taxpayers. 76 | Nations around the world, like Canada, Australia and many others --- have a merit-based immigration system. It is a basic principle that those seeking to enter a country ought to be able to support themselves financially. Yet, in America, we do not enforce this rule, straining the very public resources that our poorest citizens rely upon. According to the National Academy of Sciences, our current immigration system costs America's taxpayers many billions of dollars a year. 77 | Switching away from this current system of lower-skilled immigration, and instead adopting a merit-based system, will have many benefits: it will save countless dollars, raise workers' wages, and help struggling families --- including immigrant families --- enter the middle class. 78 | I believe that real and positive immigration reform is possible, as long as we focus on the following goals: to improve jobs and wages for Americans, to strengthen our nation's security, and to restore respect for our laws. 79 | If we are guided by the well-being of American citizens then I believe Republicans and Democrats can work together to achieve an outcome that has eluded our country for decades. 80 | Another Republican President, Dwight D. Eisenhower, initiated the last truly great national infrastructure program --- the building of the interstate highway system. The time has come for a new program of national rebuilding. 81 | America has spent approximately six trillion dollars in the Middle East, all this while our infrastructure at home is crumbling. With this six trillion dollars we could have rebuilt our country --- twice. And maybe even three times if we had people who had the ability to negotiate. 82 | To launch our national rebuilding, I will be asking the Congress to approve legislation that produces a $1 trillion investment in the infrastructure of the United States -- financed through both public and private capital --- creating millions of new jobs. 83 | This effort will be guided by two core principles: Buy American, and Hire American. 84 | Tonight, I am also calling on this Congress to repeal and replace Obamacare with reforms that expand choice, increase access, lower costs, and at the same time, provide better Healthcare. 85 | Mandating every American to buy government-approved health insurance was never the right solution for America. The way to make health insurance available to everyone is to lower the cost of health insurance, and that is what we will do. 86 | Obamacare premiums nationwide have increased by double and triple digits. As an example, Arizona went up 116 percent last year alone. Governor Matt Bevin of Kentucky just said Obamacare is failing in his State -- it is unsustainable and collapsing. 87 | One third of counties have only one insurer on the exchanges --- leaving many Americans with no choice at all. 88 | Remember when you were told that you could keep your doctor, and keep your plan? 89 | We now know that all of those promises have been broken. 90 | Obamacare is collapsing --- and we must act decisively to protect all Americans. Action is not a choice --- it is a necessity. 91 | So I am calling on all Democrats and Republicans in the Congress to work with us to save Americans from this imploding Obamacare disaster. 92 | Here are the principles that should guide the Congress as we move to create a better healthcare system for all Americans: 93 | First, we should ensure that Americans with pre-existing conditions have access to coverage, and that we have a stable transition for Americans currently enrolled in the healthcare exchanges. 94 | Secondly, we should help Americans purchase their own coverage, through the use of tax credits and expanded Health Savings Accounts --- but it must be the plan they want, not the plan forced on them by the Government. 95 | Thirdly, we should give our great State Governors the resources and flexibility they need with Medicaid to make sure no one is left out. 96 | Fourthly, we should implement legal reforms that protect patients and doctors from unnecessary costs that drive up the price of insurance -- and work to bring down the artificially high price of drugs and bring them down immediately. 97 | Finally, the time has come to give Americans the freedom to purchase health insurance across State lines --- creating a truly competitive national marketplace that will bring cost way down and provide far better care. 98 | Everything that is broken in our country can be fixed. Every problem can be solved. And every hurting family can find healing, and hope. 99 | Our citizens deserve this, and so much more --- so why not join forces to finally get it done? On this and so many other things, Democrats and Republicans should get together and unite for the good of our country, and for the good of the American people. 100 | My administration wants to work with members in both parties to make childcare accessible and affordable, to help ensure new parents have paid family leave, to invest in women's health, and to promote clean air and clear water, and to rebuild our military and our infrastructure. 101 | True love for our people requires us to find common ground, to advance the common good, and to cooperate on behalf of every American child who deserves a brighter future. 102 | An incredible young woman is with us this evening who should serve as an inspiration to us all. 103 | Today is Rare Disease day, and joining us in the gallery is a Rare Disease Survivor, Megan Crowley. Megan was diagnosed with Pompe Disease, a rare and serious illness, when she was 15 months old. She was not expected to live past 5. 104 | On receiving this news, Megan's dad, John, fought with everything he had to save the life of his precious child. He founded a company to look for a cure, and helped develop the drug that saved Megan's life. Today she is 20 years old -- and a sophomore at Notre Dame. 105 | Megan's story is about the unbounded power of a father's love for a daughter. 106 | But our slow and burdensome approval process at the Food and Drug Administration keeps too many advances, like the one that saved Megan's life, from reaching those in need. 107 | If we slash the restraints, not just at the FDA but across our Government, then we will be blessed with far more miracles like Megan. 108 | In fact, our children will grow up in a Nation of miracles. 109 | But to achieve this future, we must enrich the mind --- and the souls --- of every American child. 110 | Education is the civil rights issue of our time. 111 | I am calling upon Members of both parties to pass an education bill that funds school choice for disadvantaged youth, including millions of African-American and Latino children. These families should be free to choose the public, private, charter, magnet, religious or home school that is right for them. 112 | Joining us tonight in the gallery is a remarkable woman, Denisha Merriweather. As a young girl, Denisha struggled in school and failed third grade twice. But then she was able to enroll in a private center for learning, with the help of a tax credit scholarship program. Today, she is the first in her family to graduate, not just from high school, but from college. Later this year she will get her masters degree in social work. 113 | We want all children to be able to break the cycle of poverty just like Denisha. 114 | But to break the cycle of poverty, we must also break the cycle of violence. 115 | The murder rate in 2015 experienced its largest single-year increase in nearly half a century. 116 | In Chicago, more than 4,000 people were shot last year alone --- and the murder rate so far this year has been even higher. 117 | This is not acceptable in our society. 118 | Every American child should be able to grow up in a safe community, to attend a great school, and to have access to a high-paying job. 119 | But to create this future, we must work with --- not against --- the men and women of law enforcement. 120 | We must build bridges of cooperation and trust --- not drive the wedge of disunity and division. 121 | Police and sheriffs are members of our community. They are friends and neighbors, they are mothers and fathers, sons and daughters -- and they leave behind loved ones every day who worry whether or not they'll come home safe and sound. 122 | We must support the incredible men and women of law enforcement. 123 | And we must support the victims of crime. 124 | I have ordered the Department of Homeland Security to create an office to serve American Victims. The office is called VOICE --- Victims Of Immigration Crime Engagement. We are providing a voice to those who have been ignored by our media, and silenced by special interests. 125 | Joining us in the audience tonight are four very brave Americans whose government failed them. 126 | Their names are Jamiel Shaw, Susan Oliver, Jenna Oliver, and Jessica Davis. 127 | Jamiel's 17-year-old son was viciously murdered by an illegal immigrant gang member, who had just been released from prison. Jamiel Shaw Jr. was an incredible young man, with unlimited potential who was getting ready to go to college where he would have excelled as a great quarterback. But he never got the chance. His father, who is in the audience tonight, has become a good friend of mine. 128 | Also with us are Susan Oliver and Jessica Davis. Their husbands --- Deputy Sheriff Danny Oliver and Detective Michael Davis --- were slain in the line of duty in California. They were pillars of their community. These brave men were viciously gunned down by an illegal immigrant with a criminal record and two prior deportations. 129 | Sitting with Susan is her daughter, Jenna. Jenna: I want you to know that your father was a hero, and that tonight you have the love of an entire country supporting you and praying for you. 130 | To Jamiel, Jenna, Susan and Jessica: I want you to know --- we will never stop fighting for justice. Your loved ones will never be forgotten, we will always honor their memory. 131 | Finally, to keep America Safe we must provide the men and women of the United States military with the tools they need to prevent war and --- if they must --- to fight and to win. 132 | I am sending the Congress a budget that rebuilds the military, eliminates the Defense sequester, and calls for one of the largest increases in national defense spending in American history. 133 | My budget will also increase funding for our veterans. 134 | Our veterans have delivered for this Nation --- and now we must deliver for them. 135 | The challenges we face as a Nation are great. But our people are even greater. 136 | And none are greater or braver than those who fight for America in uniform. 137 | We are blessed to be joined tonight by Carryn Owens, the widow of a U.S. Navy Special Operator, Senior Chief William "Ryan" Owens. Ryan died as he lived: a warrior, and a hero --- battling against terrorism and securing our Nation. 138 | I just spoke to General Mattis, who reconfirmed that, and I quote, "Ryan was a part of a highly successful raid that generated large amounts of vital intelligence that will lead to many more victories in the future against our enemies." Ryan's legacy is etched into eternity. For as the Bible teaches us, there is no greater act of love than to lay down one's life for one's friends. Ryan laid down his life for his friends, for his country, and for our freedom --- we will never forget him. 139 | To those allies who wonder what kind of friend America will be, look no further than the heroes who wear our uniform. 140 | Our foreign policy calls for a direct, robust and meaningful engagement with the world. It is American leadership based on vital security interests that we share with our allies across the globe. 141 | We strongly support NATO, an alliance forged through the bonds of two World Wars that dethroned fascism, and a Cold War that defeated communism. 142 | But our partners must meet their financial obligations. 143 | And now, based on our very strong and frank discussions, they are beginning to do just that. 144 | We expect our partners, whether in NATO, in the Middle East, or the Pacific --- to take a direct and meaningful role in both strategic and military operations, and pay their fair share of the cost. 145 | We will respect historic institutions, but we will also respect the sovereign rights of nations. 146 | Free nations are the best vehicle for expressing the will of the people --- and America respects the right of all nations to chart their own path. My job is not to represent the world. My job is to represent the United States of America. But we know that America is better off, when there is less conflict -- not more. 147 | We must learn from the mistakes of the past --- we have seen the war and destruction that have raged across our world. 148 | The only long-term solution for these humanitarian disasters is to create the conditions where displaced persons can safely return home and begin the long process of rebuilding. 149 | America is willing to find new friends, and to forge new partnerships, where shared interests align. We want harmony and stability, not war and conflict. 150 | We want peace, wherever peace can be found. America is friends today with former enemies. Some of our closest allies, decades ago, fought on the opposite side of these World Wars. This history should give us all faith in the possibilities for a better world. 151 | Hopefully, the 250th year for America will see a world that is more peaceful, more just and more free. 152 | On our 100th anniversary, in 1876, citizens from across our Nation came to Philadelphia to celebrate America's centennial. At that celebration, the country's builders and artists and inventors showed off their creations. 153 | Alexander Graham Bell displayed his telephone for the first time. 154 | Remington unveiled the first typewriter. An early attempt was made at electric light. 155 | Thomas Edison showed an automatic telegraph and an electric pen. 156 | Imagine the wonders our country could know in America's 250th year. 157 | Think of the marvels we can achieve if we simply set free the dreams of our people. 158 | Cures to illnesses that have always plagued us are not too much to hope. 159 | American footprints on distant worlds are not too big a dream. 160 | Millions lifted from welfare to work is not too much to expect. 161 | And streets where mothers are safe from fear -- schools where children learn in peace -- and jobs where Americans prosper and grow -- are not too much to ask. 162 | When we have all of this, we will have made America greater than ever before. For all Americans. 163 | This is our vision. This is our mission. 164 | But we can only get there together. 165 | We are one people, with one destiny. 166 | We all bleed the same blood. 167 | We all salute the same flag. 168 | And we are all made by the same God. 169 | And when we fulfill this vision; when we celebrate our 250 years of glorious freedom, we will look back on tonight as when this new chapter of American Greatness began. 170 | The time for small thinking is over. The time for trivial fights is behind us. 171 | We just need the courage to share the dreams that fill our hearts. 172 | The bravery to express the hopes that stir our souls. 173 | And the confidence to turn those hopes and dreams to action. 174 | From now on, America will be empowered by our aspirations, not burdened by our fears --- 175 | inspired by the future, not bound by the failures of the past --- 176 | and guided by our vision, not blinded by our doubts. 177 | I am asking all citizens to embrace this Renewal of the American Spirit. I am asking all members of Congress to join me in dreaming big, and bold and daring things for our country. And I am asking everyone watching tonight to seize this moment and -- 178 | Believe in yourselves. 179 | Believe in your future. 180 | And believe, once more, in America. 181 | Thank you, God bless you, and God Bless these United States. 182 | -------------------------------------------------------------------------------- /data_manipulation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "

Python for the Humanities and Social Sciences
Data Manipulation

\n", 8 | "\n", 9 | "## Info\n", 10 | "- Scott Bailey (CIDR), *scottbailey@stanford.edu*\n", 11 | "- Javier de la Rosa (CIDR), *versae@stanford.edu*\n", 12 | "- Ashley Jester (CIDR/SSDS), *ajester@stanford.edu*\n", 13 | "\n", 14 | "## Goal\n", 15 | "By the end of our workshop today, we hope you'll be able to load in data into a Pandas `DataFrame`, perform basic cleaning and analysis, and visualize relevant aspects of a dataset. We will work with a dataset of tweets collected during the release of the Apple Watch.\n", 16 | "\n", 17 | "## Topics\n", 18 | "- Pandas Series and DataFrame\n", 19 | "- Loading data in, null and missing data\n", 20 | "- Describing data\n", 21 | "- Column manipulation\n", 22 | "- String manipulation\n", 23 | "- Split-Apply-Combine\n", 24 | "- Plotting:\n", 25 | " - Basic charts (line, bar, pie)\n", 26 | " - Histograms\n", 27 | " - Scatter plots\n", 28 | " - Boxplots, violinplots\n", 29 | "\n", 30 | "## Setup and packages we need in our environment\n", 31 | "We'll be using Anaconda with Jupyter Notebooks for this workshop. For setting up both, please see the [setup](setup.ipynb).\n", 32 | "\n", 33 | "For this workshop, we'll need an environment with the following packages:\n", 34 | "- `matplotlib`\n", 35 | "- `pandas`\n", 36 | "- `requests`\n", 37 | "- `seaborn`, available in the `conda-forge` channel" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "## Pandas\n", 45 | "\n", 46 | "From Jake Vanderplas' book [**Python Data Science Handbook**](http://shop.oreilly.com/product/0636920034919.do) (from which some code excerpts are used in this workshop):\n", 47 | "\n", 48 | "> Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a `DataFrame`. `DataFrame`s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data. As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs." 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": { 55 | "collapsed": true 56 | }, 57 | "outputs": [], 58 | "source": [ 59 | "import numpy as np # np becomes the namespace of numpy\n", 60 | "import pandas as pd\n", 61 | "import requests\n", 62 | "\n", 63 | "# Set some options\n", 64 | "pd.set_option('display.max_columns', 20)\n", 65 | "pd.set_option('display.max_rows', 10)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "There are three main data structures in Pandas: `Series`, `DataFrame`, and `Index`. Pandas has a very decent [documentation](http://pandas.pydata.org/pandas-docs/stable/), and using Jupyter, any method help can be shown by appending the a `?` to the end and running the cell." 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": null, 78 | "metadata": { 79 | "collapsed": true 80 | }, 81 | "outputs": [], 82 | "source": [ 83 | "# For example\n", 84 | "pd.isnull?" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "## Data I/O" 92 | ] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": {}, 97 | "source": [ 98 | "Pandas provides a few methods to load in and out data in CSVs, Excel spreadsheets, HDF, or even JSON format.\n", 99 | "\n", 100 | "For example, click in the next URL of a CSV file containing twitter data during the release of the Apple Watch: http://bit.ly/python_workshop_data" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "# Pandas can fetch data from a URL\n", 110 | "pd.read_csv(\"http://bit.ly/python_workshop_data\")" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Let's save the previous data to a local file." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": true 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "with open(\"twitter.csv\", \"wb\") as file:\n", 129 | " file.write(requests.get(\"http://bit.ly/python_workshop_data\").content)" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "pd.read_csv(\"twitter.csv\")" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "Let's reload the CSV but this time specifying a index column" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "df = pd.read_csv(\"twitter.csv\", index_col=\"created_at\")\n", 155 | "df" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "Now we can just save the clean data to any format supported by Pandas" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "df.to_csv(\"twitter_indexed.csv\", encoding=\"utf8\")" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "## `DataFrame` and `Series`\n", 179 | "\n", 180 | "A `DataFrame` is a two-dimensional array with both flexible row indices and flexible column names. It can be seen as a generalization of a two-dimensional NumPy array, or a specialization of a dictionary in which each column name maps to a `Series` of column data. A `Series` is a one-dimensional array of indexed data. It can be seem as a specialized dictionary or a generalized NumPy array.\n", 181 | "\n", 182 | "A `DataFrame` is made up of `Series` in a similar way in which a table is made up of columns. The only restriction os that ach column must be of the same data type." 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": null, 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "df = pd.read_csv(\"twitter.csv\")\n", 192 | "df.columns" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "Accessing columns can be done using the dot notation, `df.column_name`, or the dictionary notation, `df[\"column_name\"]`." 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "df[\"urls\"]" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "df.urls" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "`DataFrame`s can be sliced to extract just a set of the columns you are interested in. We just pass in a list of the columns we need to the slice and get a `DataFrame` back." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "df[[\"urls\", \"text\"]]" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "All `DataFrame`s are indexed. If an index is not explictly provided Pandas will asign one, givinh each row a consecutive number. `Series` and slices keep these indices, which makes possible further operations such as merging or columns manipulation.\n", 241 | "\n", 242 | "`DataFrames` are designed to operate at the column level, not the row level. However, a subset of rows can be visualized easily using a slice like in any Python list." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "df[10:15]" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "df.urls[10:15]" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "df[[\"urls\"]][10:15]" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "And you can even access individual rows and mix index and rows." 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "df[[\"urls\", \"text\"]].loc[2:5] # for non numeric indices" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "df[[\"urls\", \"text\"]].iloc[2:5] # for nummeric indices" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": null, 300 | "metadata": {}, 301 | "outputs": [], 302 | "source": [ 303 | "df.ix[2:5, [\"urls\", \"text\"]] # for mixed indices and columns" 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "metadata": {}, 309 | "source": [ 310 | "
\n", 311 | "

\n", 312 | "Activity\n", 313 | "

\n", 314 | "

\n", 315 | "Given the `DataFrame` defined above, write an expression to extract a `DataFrame` with the columns `text`, `user_screen_name`, `user_name`, `user_lang`, and `hashtags`. Show only the first 5 rows of it.\n", 316 | "
\n", 317 | "\n", 318 | "

\n", 319 | "
" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": {}, 326 | "outputs": [], 327 | "source": [ 328 | "# Write here your solution" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "## Indexing and Expressions" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "Operations performed using a column or `Series` are broadcast to each of the elements contained." 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": {}, 349 | "outputs": [], 350 | "source": [ 351 | "df[\"id\"] * 2" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "\"@\" + df[\"user_name\"] + \": \" + df[\"text\"]" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": null, 366 | "metadata": {}, 367 | "outputs": [], 368 | "source": [ 369 | "df[\"id\"] > 0" 370 | ] 371 | }, 372 | { 373 | "cell_type": "markdown", 374 | "metadata": {}, 375 | "source": [ 376 | "Which allows for a more advanced and useful indexing as you can pass in an expression to a `DataFrame` to select content." 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": null, 382 | "metadata": {}, 383 | "outputs": [], 384 | "source": [ 385 | "df[df[\"id\"] > 575043732472528896]" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "Basically any expression that evaluates to a `Series` of `True` and `False` values and share the index can be used. And conditions can be put together using logical operators for \"and\", `&`, \"or\", `|`, and \"not\", `~`, making the filter more precise and expressive." 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [ 401 | "df[(df[\"id\"] > 575043732472528896) & (len(df[\"user_mentions\"]) > 5)]" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": {}, 407 | "source": [ 408 | "Some string operations are also available at the column level on the `.str` attribute of `Series`." 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "execution_count": null, 414 | "metadata": { 415 | "scrolled": true 416 | }, 417 | "outputs": [], 418 | "source": [ 419 | "df[\"urls\"].str.split()" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "So the previous selection could also be written as:" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "df[(df[\"id\"] > 575043732472528896) & (df[\"user_mentions\"].str.len() > 5)]" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "
\n", 443 | "

\n", 444 | "Activity\n", 445 | "

\n", 446 | "

\n", 447 | "Given the `states` `DataFrame` defined below, write an expression to calculate the population density of each state.\n", 448 | "
\n", 449 | "* **Hint**: Population density is defined as the number of people per unit of area.*\n", 450 | "

\n", 451 | "
" 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": null, 457 | "metadata": {}, 458 | "outputs": [], 459 | "source": [ 460 | "population_dict = {'California': 38332521, 'Texas': 26448193, 'New York': 19651127,\n", 461 | " 'Florida': 19552860, 'Illinois': 12882135}\n", 462 | "area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,\n", 463 | " 'Florida': 170312, 'Illinois': 149995} # these are in km²\n", 464 | "states = pd.DataFrame({'population': population_dict, 'area': area_dict})\n", 465 | "states" 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "execution_count": null, 471 | "metadata": {}, 472 | "outputs": [], 473 | "source": [ 474 | "# Write your code here" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "## Manipulation" 482 | ] 483 | }, 484 | { 485 | "cell_type": "markdown", 486 | "metadata": {}, 487 | "source": [ 488 | "The fundamental way of manipulating the contents of `DataFrame` columns is by using the `apply()` method, which allows to call a user defined function to each of the elements in the `Series`. Unlike the `.str` attribute, `apply()` is a general way of transforming values." 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": null, 494 | "metadata": {}, 495 | "outputs": [], 496 | "source": [ 497 | "def count_links(text):\n", 498 | " links = text.split(\",\")\n", 499 | " count = len(links)\n", 500 | " return count\n", 501 | "\n", 502 | "df[\"urls\"].apply(count_links) # urls are separated by comma" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "However our naive `count_links` function does not know how to handle missing data. We could ignore those values by dropping the `NaN`, which is the Pandas way of saying missing data, or by cleaning our dataset on import time." 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": null, 515 | "metadata": {}, 516 | "outputs": [], 517 | "source": [ 518 | "df[\"urls\"].dropna().apply(count_links)" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "Cleaning the data at the beginning, at import time, and for the whole `DataFrame` is usually a good idea, since makes operating with it more consistent and lesss prone to error.\n", 526 | "\n", 527 | "This also avoids us the hassle to drop `NaN`'s everytime. In our case we will:\n", 528 | "- Filter out some columns we are not interested in\n", 529 | "- Specify and index for thr `DataFrame`\n", 530 | "- Provide data types for some columns\n", 531 | "- Parse dates as Python `datetime` for columns containing dates as strings\n", 532 | "- Replace `NaN` values by empty strings in string columns\n", 533 | "\n", 534 | "And then show the first 5, this time using the `head()` method." 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "execution_count": null, 540 | "metadata": {}, 541 | "outputs": [], 542 | "source": [ 543 | "columns = [\n", 544 | " \"created_at\", \"id\",\n", 545 | " \"text\", \"lang\", \"possibly_sensitive\", \"user_screen_name\",\n", 546 | " \"hashtags\", \"media\", \"symbols\", \"urls\",\n", 547 | " \"place\", \"country\"] # columns we want\n", 548 | "index_column = \"created_at\"\n", 549 | "column_types = {\n", 550 | " \"id\": int,\n", 551 | " \"possibly_sensitive\": bool,\n", 552 | " \"lat\": float,\n", 553 | " \"lon\": float,\n", 554 | "}\n", 555 | "fill_nans = {\n", 556 | " 'country': '',\n", 557 | " 'hashtags': '',\n", 558 | " 'lang': '',\n", 559 | " 'media': '',\n", 560 | " 'place': '',\n", 561 | " 'symbols': '',\n", 562 | " 'text': '',\n", 563 | " 'urls': '',\n", 564 | " 'user_lang': '',\n", 565 | " 'user_location': '',\n", 566 | " 'user_name': '',\n", 567 | " 'user_screen_name': ''\n", 568 | "}\n", 569 | "date_columns = [\"created_at\"]\n", 570 | "df = pd.read_csv(\"twitter.csv\",\n", 571 | " parse_dates=date_columns,\n", 572 | " index_col=index_column,\n", 573 | " usecols=columns,\n", 574 | " dtype=column_types).fillna(value=fill_nans)\n", 575 | "df.head(5)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "metadata": {}, 581 | "source": [ 582 | "Now, our `count_links` should work just fine." 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": null, 588 | "metadata": {}, 589 | "outputs": [], 590 | "source": [ 591 | "df[\"urls\"].apply(count_links)" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "And since the result of `appply()` is another `Series`, we can even create a new column with the it to enrich a `DataFrame`." 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": null, 604 | "metadata": {}, 605 | "outputs": [], 606 | "source": [ 607 | "df[\"urls_count\"] = df[\"urls\"].apply(count_links)\n", 608 | "df[[\"urls\", \"urls_count\"]]" 609 | ] 610 | }, 611 | { 612 | "cell_type": "markdown", 613 | "metadata": {}, 614 | "source": [ 615 | "If we now wanted to know the distribution or histogram of the number of links, we could use the `.value_counts()` method of `Series`." 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": null, 621 | "metadata": {}, 622 | "outputs": [], 623 | "source": [ 624 | "df[\"urls_count\"].value_counts()" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "
\n", 632 | "

\n", 633 | "Activity\n", 634 | "

\n", 635 | "

\n", 636 | "Given the twitter `DataFrame`, add a new column `length` with the length ot the `text`, and show the tweets with exactly 140 characters.\n", 637 | "
\n", 638 | "

\n", 639 | "
" 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": null, 645 | "metadata": {}, 646 | "outputs": [], 647 | "source": [ 648 | "# Write your code here\n", 649 | "df[\"length\"] = df[\"text\"].apply(...)\n", 650 | "df[...][[\"text\"]]" 651 | ] 652 | }, 653 | { 654 | "cell_type": "markdown", 655 | "metadata": {}, 656 | "source": [ 657 | "`Series` also have some handy functions to compute basic statistics, like the sum or the mean. For example, given the new column created above, let's compute the average lenght of the tweets." 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "execution_count": null, 663 | "metadata": {}, 664 | "outputs": [], 665 | "source": [ 666 | "df[\"length\"].mean()" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "### Grouping data\n", 674 | "\n", 675 | "But what about the most tweeted language? Or the most prolific user? For this kind of operations we need to use what is called the [Split-Apply-Combine](https://www.jstatsoft.org/article/view/v040i01/v40i01.pdf) approach:\n", 676 | "- *Split* up a dataset\n", 677 | "- *Apply* a function to each piece\n", 678 | "- *Combine* all the pieces back together\n", 679 | "\n", 680 | "
\n", 681 | " \"Split-Apply-Combine\"\n", 682 | "
* Split-Apply-Combine - Source: [Software Carpentry](https://software-carpentry.org/lessons/). *
\n", 683 | "
\n", 684 | "\n", 685 | "In Pandas this can take the form of a `.groupby()` (split) operation followed by an `.aggregate()` (apply) function. Aggregates are like `apply()` that operate at the group level. Combining is done automatically for us by Pandas." 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "execution_count": null, 691 | "metadata": {}, 692 | "outputs": [], 693 | "source": [ 694 | "df.groupby(\"lang\")" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": null, 700 | "metadata": {}, 701 | "outputs": [], 702 | "source": [ 703 | "df.groupby(\"lang\")[[\"text\"]] # no computation is made yet!" 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": null, 709 | "metadata": {}, 710 | "outputs": [], 711 | "source": [ 712 | "def count_nonzero(items):\n", 713 | " total = 0\n", 714 | " for item in items:\n", 715 | " if item != 0:\n", 716 | " total += 1\n", 717 | " return total\n", 718 | "\n", 719 | "df.groupby(\"lang\")[[\"text\"]].aggregate(count_nonzero)" 720 | ] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": {}, 725 | "source": [ 726 | "`DataFrames` can be sorted by the values of one or more columns, in either ascending or descending order." 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "metadata": {}, 733 | "outputs": [], 734 | "source": [ 735 | "aggregated = df.groupby(\"lang\")[[\"text\"]].aggregate(count_nonzero)\n", 736 | "aggregated.sort_values(\"text\", ascending=False)" 737 | ] 738 | }, 739 | { 740 | "cell_type": "markdown", 741 | "metadata": {}, 742 | "source": [ 743 | "However, for complex groupings like, creating a pivot table can be more useful." 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": null, 749 | "metadata": {}, 750 | "outputs": [], 751 | "source": [ 752 | "df.pivot_table(\n", 753 | " index=[\"lang\", \"user_screen_name\"],\n", 754 | " values=[\"text\"],\n", 755 | " aggfunc=count_nonzero\n", 756 | ").sort_values(\"text\", ascending=False)" 757 | ] 758 | }, 759 | { 760 | "cell_type": "markdown", 761 | "metadata": {}, 762 | "source": [ 763 | "
\n", 764 | "

\n", 765 | "Activity\n", 766 | "

\n", 767 | "

\n", 768 | "Given the twitter `DataFrame`, show the most popular retweet written in English.\n", 769 | "
\n", 770 | "* **Hint**: In our dataset, retweets are tweets that start with \"RT @\".*\n", 771 | "

\n", 772 | "
" 773 | ] 774 | }, 775 | { 776 | "cell_type": "code", 777 | "execution_count": null, 778 | "metadata": {}, 779 | "outputs": [], 780 | "source": [ 781 | "# Write your code here" 782 | ] 783 | }, 784 | { 785 | "cell_type": "markdown", 786 | "metadata": {}, 787 | "source": [ 788 | "## Visualization" 789 | ] 790 | }, 791 | { 792 | "cell_type": "markdown", 793 | "metadata": {}, 794 | "source": [ 795 | "Pandas also provides some utilities to create basic plots just by calling `plot()` on a `Series` or `DataFrame`. But first we need to tell Jupyter that we are going to plot some charts using the plotting library matplotlib." 796 | ] 797 | }, 798 | { 799 | "cell_type": "code", 800 | "execution_count": null, 801 | "metadata": {}, 802 | "outputs": [], 803 | "source": [ 804 | "# enables inline plotting in Jupyter using matplotlib\n", 805 | "%matplotlib inline\n", 806 | "import matplotlib.pyplot as plt" 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "execution_count": null, 812 | "metadata": {}, 813 | "outputs": [], 814 | "source": [ 815 | "df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot()" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "Each time you call `plot()` an `Axes` object is returned, and Jupyter knows how to paint those. `Axes` objects are objects of the underlying `matplotlib` library for plotting in Python, and as such, lots of different options can be given to customize the aspect." 823 | ] 824 | }, 825 | { 826 | "cell_type": "code", 827 | "execution_count": null, 828 | "metadata": {}, 829 | "outputs": [], 830 | "source": [ 831 | "ax = df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot(\n", 832 | " kind=\"bar\",\n", 833 | " figsize=(15, 5),\n", 834 | " title=\"# Tweets per Language\",\n", 835 | " legend=None\n", 836 | ")\n", 837 | "ax.set_ylabel(\"Languagae\")\n", 838 | "ax.set_xlabel(\"# Tweets\")" 839 | ] 840 | }, 841 | { 842 | "cell_type": "markdown", 843 | "metadata": {}, 844 | "source": [ 845 | "`Axes` can also be created empty using `matplotlib` and then put some content in them." 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": null, 851 | "metadata": {}, 852 | "outputs": [], 853 | "source": [ 854 | "fig, ax = plt.subplots(1, figsize=(15, 5))\n", 855 | "df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot(ax=ax,\n", 856 | " kind=\"bar\",\n", 857 | " title=\"# Tweets per Language\",\n", 858 | " legend=None\n", 859 | ")\n", 860 | "ax.set_ylabel(\"Languagae\")\n", 861 | "ax.set_xlabel(\"# Tweets\")" 862 | ] 863 | }, 864 | { 865 | "cell_type": "markdown", 866 | "metadata": {}, 867 | "source": [ 868 | "There are other styles available as well." 869 | ] 870 | }, 871 | { 872 | "cell_type": "code", 873 | "execution_count": null, 874 | "metadata": {}, 875 | "outputs": [], 876 | "source": [ 877 | "plt.style.available" 878 | ] 879 | }, 880 | { 881 | "cell_type": "code", 882 | "execution_count": null, 883 | "metadata": {}, 884 | "outputs": [], 885 | "source": [ 886 | "with plt.style.context('ggplot'):\n", 887 | " df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot()" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": null, 893 | "metadata": {}, 894 | "outputs": [], 895 | "source": [ 896 | "# Even a special one for XKCD!\n", 897 | "with plt.xkcd():\n", 898 | " df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot()" 899 | ] 900 | }, 901 | { 902 | "cell_type": "markdown", 903 | "metadata": {}, 904 | "source": [ 905 | "`seaborn`, a convenience wrapper around `matplotlib`, changes the default style after being imported, but it can be reverted back easily setting the default style to `classic` using `plt.style.use(\"classic\")`." 906 | ] 907 | }, 908 | { 909 | "cell_type": "code", 910 | "execution_count": null, 911 | "metadata": {}, 912 | "outputs": [], 913 | "source": [ 914 | "import seaborn as sns\n", 915 | "df.groupby(\"lang\")[[\"lang\"]].aggregate(count_nonzero).plot()" 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "metadata": {}, 921 | "source": [ 922 | "Let's create a hitogram with the lengths of tweets." 923 | ] 924 | }, 925 | { 926 | "cell_type": "code", 927 | "execution_count": null, 928 | "metadata": {}, 929 | "outputs": [], 930 | "source": [ 931 | "fig, ax = plt.subplots(1, figsize=(15, 5))\n", 932 | "df[\"length\"].hist(ax=ax, bins=15, normed=True, color='lightseagreen')\n", 933 | "df[\"length\"].plot(ax=ax, kind='kde', xlim=(0, 150), style='r--')\n", 934 | "ax.set_title(\"Histogram of lengths of tweets\")" 935 | ] 936 | }, 937 | { 938 | "cell_type": "markdown", 939 | "metadata": {}, 940 | "source": [ 941 | "Boxplots are available by default." 942 | ] 943 | }, 944 | { 945 | "cell_type": "code", 946 | "execution_count": null, 947 | "metadata": {}, 948 | "outputs": [], 949 | "source": [ 950 | "fig, ax = plt.subplots(1, figsize=(8, 6))\n", 951 | "df.boxplot(column=\"length\", grid=False, ax=ax)" 952 | ] 953 | }, 954 | { 955 | "cell_type": "markdown", 956 | "metadata": {}, 957 | "source": [ 958 | "Although violinplots can be used through `seaborn`." 959 | ] 960 | }, 961 | { 962 | "cell_type": "code", 963 | "execution_count": null, 964 | "metadata": {}, 965 | "outputs": [], 966 | "source": [ 967 | "fig, ax = plt.subplots(1, figsize=(8, 6))\n", 968 | "sns.violinplot(y=df[\"length\"], grid=False, ax=ax)" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "
\n", 976 | "

\n", 977 | "Activity\n", 978 | "

\n", 979 | "

\n", 980 | "Given the twitter `DataFrame`, let's try to find out visually if there is any sort of relationship between the length of a tweet and the number of hastags it uses.\n", 981 | "

\n", 982 | "
" 983 | ] 984 | }, 985 | { 986 | "cell_type": "code", 987 | "execution_count": null, 988 | "metadata": {}, 989 | "outputs": [], 990 | "source": [ 991 | "# Write your code here\n", 992 | "df[\"hashtags_count\"] = \n", 993 | "\n", 994 | "fig, ax = plt.subplots(1, figsize=(15, 5))\n", 995 | "ax.scatter(x=..., y=...)\n", 996 | "ax.set_ylabel(\"Length\")\n", 997 | "ax.set_xlabel(\"# Hashtags\")\n", 998 | "ax.set_title(\"Tweet length by number of hashtags\")" 999 | ] 1000 | } 1001 | ], 1002 | "metadata": { 1003 | "anaconda-cloud": {}, 1004 | "kernelspec": { 1005 | "display_name": "Python 3", 1006 | "language": "python", 1007 | "name": "python3" 1008 | }, 1009 | "language_info": { 1010 | "codemirror_mode": { 1011 | "name": "ipython", 1012 | "version": 3 1013 | }, 1014 | "file_extension": ".py", 1015 | "mimetype": "text/x-python", 1016 | "name": "python", 1017 | "nbconvert_exporter": "python", 1018 | "pygments_lexer": "ipython3", 1019 | "version": "3.5.3" 1020 | } 1021 | }, 1022 | "nbformat": 4, 1023 | "nbformat_minor": 1 1024 | } 1025 | -------------------------------------------------------------------------------- /descriptions/intro_description.md: -------------------------------------------------------------------------------- 1 | # Intro to Python for Humanities and Social Sciences 2 | 3 | The objective of this workshop is to introduce students to the Python programming language and several libraries particularly useful for the humanities and social sciences. In learning basic Python syntax, we'll also learn to scrape information from the web and parse it for varied uses. 4 | 5 | Preparation: Please install [Anaconda with Python 3.5](https://www.continuum.io/downloads). If you need help with the installation, please arrive 20 minutes early and we're happy to help you. 6 | -------------------------------------------------------------------------------- /descriptions/series_description.md: -------------------------------------------------------------------------------- 1 | # Python for the Humanities and Social Sciences 2 | 3 | This three workshop series will introduce students to Python with special attention to patterns and libraries particularly useful in the humanities and social sciences. 4 | 5 | ## Intro to Python 6 | 7 | The first workshop will cover basic Python syntax and project set up through teaching students fundamentals of web scraping. 8 | 9 | ## Data Manipulation with Python 10 | 11 | The second workshop will guide students through fundamentals of data manipulation and visualization with [Pandas](http://pandas.pydata.org/) and [Seaborn](http://seaborn.pydata.org/). 12 | 13 | ## Natural Language Processing with Python 14 | 15 | The third workshop will teach students natural language processing in Python, with topics such as tokenization, part of speech tagging, and sentiment analysis. 16 | -------------------------------------------------------------------------------- /images/anaconda-channels.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sul-cidr/python_workshops/f7c625b5b1c36e9811afa692c72eff8e536821d3/images/anaconda-channels.gif -------------------------------------------------------------------------------- /images/anaconda-envs.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sul-cidr/python_workshops/f7c625b5b1c36e9811afa692c72eff8e536821d3/images/anaconda-envs.gif -------------------------------------------------------------------------------- /images/anaconda-jupyter.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sul-cidr/python_workshops/f7c625b5b1c36e9811afa692c72eff8e536821d3/images/anaconda-jupyter.gif -------------------------------------------------------------------------------- /images/anaconda-notebook.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sul-cidr/python_workshops/f7c625b5b1c36e9811afa692c72eff8e536821d3/images/anaconda-notebook.gif -------------------------------------------------------------------------------- /images/anaconda-packages.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sul-cidr/python_workshops/f7c625b5b1c36e9811afa692c72eff8e536821d3/images/anaconda-packages.gif -------------------------------------------------------------------------------- /intro_to_jekyll_and_github_pages.md: -------------------------------------------------------------------------------- 1 | # Building Websites with Jekyll and Github Pages 2 | 3 | Center for Interdisciplinary Digital Research 4 | 5 | - Scott Bailey - scottbailey[at]stanford.edu 6 | - Javier de la Rosa - versae[at]stanford.edu 7 | - Ashley Jester - ajester[at]stanford.edu 8 | 9 | ## Goals 10 | 11 | By the end of this workshop, we hope that through building a sample website and deploying it to Github Pages, you will understand why you would use Jekyll vs a content management system like Wordpress, how to build a website with Jekyll, and how to deploy a Jekyll website to Github for free hosting. 12 | 13 | ## Setup 14 | 15 | For the requirements and setup process, see [https://github.com/sul-cidr/python_workshops/blob/master/setup_jekyll_githubpages.md](https://github.com/sul-cidr/python_workshops/blob/master/setup_jekyll_githubpages.md). 16 | 17 | 18 | ## What are Jekyll and Github Pages? 19 | 20 | - What is Jekyll? 21 | - Static site generator built with Ruby; uses the Liquid template language. 22 | - Static site generators vs dynamic websites (CMSs like Wordpress and other web applications) 23 | - What is Git? 24 | - Version control 25 | - Shell vs GUI 26 | - Github 27 | - Repository for code projects that runs on the Git version control language. 28 | - Also useful for non-code things. 29 | - Github Pages 30 | - Github will build and serve websites for free whose code lives on their system. It will serve regular HTML/CSS, but also by default will build Jekyll sites. 31 | - User/Organization vs Project Pages 32 | - Limitations: substantially restricted plugins and themes allowed. 33 | 34 | ## Create a repository on Github for your project 35 | 36 | - Sign in to Github in your browser 37 | - Create a repository. 38 | - Click "Set up in Desktop" to have Github automatically open Github Desktop, add the repo as a project, and clone it to your computer. You'll have a chance to choose where to copy the project to your local hard drive at this stage. I like to create a single folder or directory where I keep all of my projects that I use with git, simply called `projects`. 39 | - Key concepts: repository, Github 40 | 41 | ## Create a Jekyll website in the folder for your repository on your local machine 42 | 43 | - We want to create a new branch called `gh-pages` that will house the code for our website. We're using a separate branch so that we can keep using the `master` branch for the code or files of the project itself. 44 | - In Github Desktop, click "File", then "New branch." Name it `gh-pages` and base it off `master`. 45 | - In the shell, inside the local directory of your repository, run `jekyll new .`. This scaffolds a new website with the Jekyll framework. 46 | - Key concepts: git branches, Jekyll scaffold 47 | 48 | ## Building the website locally 49 | 50 | - In your shell, still within the repository on your local machine, run `bundle exec jekyll serve`. This will build the website and serve it from a local server on your machine. You can then open up the link it shows you in your browser to see the website. This is typically [http://localhost:4000/](http://localhost:4000/). 51 | - Command break down: 52 | - `bundle exec` 53 | - `jekyll serve` 54 | - You should see a straightforwardly themed website with a title/header, navigation, a list of posts (currently just one), and a footer with several pieces of information, including how to edit the content. 55 | - Key concept: building jekyll locally 56 | 57 | ## Exploring the core components of Jekyll 58 | 59 | - In order to explore what is in these new files, let's open the repository in a text editor. 60 | - In your text editor you should be able to see a list of the files and folders that are in your repository. We'll go through them one by one now. I also always recommend reading the Jekyll [documentation](https://jekyllrb.com/docs/structure/) that explains each of these files or folders. 61 | - Tour through: `Gemfile`, `_config.yml`, `index.md`, `about.md`, `_posts`, `_site` 62 | - Key concepts: Ruby gems, separation of content and appearance, markdown 63 | 64 | ## Creating and Editing your site's content 65 | 66 | - Let's start by editing `_config.yml`. Within your text editor, open that file and edit several pieces of information, such as the title and description of the site. NOTE: for most changes, while `jekyll serve` is running, it will automatically rebuild the site. However, if you make a change to the config file, you will need to manually restart the process. 67 | - Next, let's create a new post. We'll do this by duplicating the existing post in the `_posts/` directory, then editing the filename, the YAML front-matter, and content. 68 | - Next, let's create a static page. Similarly to the post, we'll duplicate the existing `about.md` page, rename the file, edit the YAML front-matter, and then the content. 69 | - Key concepts: YAML front-matter, posts, static pages, explicit configuration 70 | 71 | ## Modifying your site's appearance 72 | 73 | - Themes are now installed via Ruby gem rather than as part of the core Jekyll install. Let's take a look at the files and folders within the standard theme, [minima](https://github.com/jekyll/minima). After that, we'll learn how to override the theme you have installed in order to customize it. 74 | - Tour through `_includes`, `_layouts`, `_sass`, `assets` 75 | - As an example of overriding some of the theme, let's override part of the footer. In your local repository, create a new folder called `_includes`. Inside of that folder, create a file called `footer.html`. This is a **partial**, a reusable piece of code that we can include in the website. 76 | - Within your new `footer.html`, just put in some text and then reload your page in the browser to see the change. 77 | - Next, to look at how we'd actually just modify the theme, let's copy the raw html from the minima theme footer partial into our own footer partial file from [here](https://raw.githubusercontent.com/jekyll/minima/master/_includes/footer.html). If you reload in the browser, you'll see the footer from the theme. Back in your text editor, let's look through the code and remove line 5, which begins with an `h2` tag. This will remove header from the footer. Then, reload the page in the browser to see your change. NOTE: pay attention to the liquid language here and how it draws information out of the config file (`site.variable`). 78 | - Switching entire themes: 79 | - Github Pages supports only a limited number of themes, which you can find [here](https://pages.github.com/themes/). 80 | - We want to be able to see what our website would look like locally before we switch our theme in our published site, so let's make sure we have those themes installed locally. In your editor, open `Gemfile`. Jekyll gives us some nice instructions in this file to get set up for Github Pages. Per its instructions, comment out the line with the `jekyll`, and uncomment the line with `github-pages`. After you save, return to your shell and run `bundle`. If it gives you an error, try running `bundle update` to get the most recent gems. After this has run, you can look at `Gemfile.lock` and you should see `jekyll-themes-*` here. This means that the theme gems are installed and available now. 81 | - In our `_config.yml` file, find "theme", and change the value from "minima" to "jekyll-theme-midnight". Go back to your shell and run `bundle exec jekyll serve`. You'll notice that the site builds, but it gives you several build warnings about layouts being requested that don't exist. This is because the minima theme includes layouts called "post", "page", and "home", while the midnight theme only includes a "default" layout (check [here](https://github.com/pages-themes/midnight/tree/master/_layouts)). To fix this, let's go into each file we get a warning about and change the layout in the YAML front-matter to "default." After you do that, try to build/serve the site again and check it in your browser. 82 | - You'll notice another issue now: there is no navigation in the header to your static pages, such as "About". In the minima theme, there is a header partial in the `includes` folder that creates a nav link for each page ([see the code](https://github.com/jekyll/minima/blob/master/_includes/header.html#L22-L27)). The midnight theme doesn't have any partials, as you can see from the lack of `_includes` folder. It does hardcode its brief navigation group in the default layout file ([see the code](https://github.com/pages-themes/midnight/blob/master/_layouts/default.html#L21-L28)). There are a couple of potential ways to fix this. One would be to create a `_layouts` folder in your local repo, create a `default.html` file, copy the layout code from the midnight theme into your own default layout, and replace the nav code within that with the nav code from the minima theme, linked above. In general, you'll follow a similar pattern of overriding a theme partial or layout any time you're modifying the theme you've chosen. 83 | - Key concepts: themes, overriding themes, partials, layouts 84 | 85 | ## Deploying your site to Github Pages 86 | 87 | - In Github Desktop, if you click on your project in the left sidebar, you should now see a list of files that have been changed. You can click on each file to see the new/changed content in each. 88 | - In the left sidebar of Github Desktop, in the "Summary" field, type a short message that explains the change you're making. For an initial commit, I often just put in "Initial commit." You could be a bit clearer and write something like "Scaffolds Jekyll website with configuration changes." If you wanted to go into more detail you could write something further, or use bullet points, in the "Description" field. Do pay attention to which branch you're adding your work. Github Desktop will show which branch in the top middle of the application pane. 89 | - After you hit commit, make sure to hit the "Publish" button in the top right. This will push your code up to Github so that the remote repository has all the changes. If you go back to the repository on Github in your browser and switch to the `gh-pages` branch, you should now see your code. 90 | - We now need to tell Github that we want to use the `gh-pages` branch as the source for Github Pages to build a website. In your browser, on the Github page for your repo, click "Settings". Scroll down to the section called "Github Pages". Usually, if a `gh-pages` branch exists, Github knows to set that as the source for Github Pages. If under "Source", it doesn't have `gh-pages` selected, go ahead and select it. In this same section, Github should tell you the URL for your site, which should be `githubusername.github.io/reponame`. If there are any problems building your site, Github will list the problems here and likely send you an email. 91 | - Key concepts to explain: adding files, commits, pushing to remote 92 | 93 | ## Workflow 94 | 95 | - Once you've set up your local and remote repos, and added them into Github Desktop, there is a fairly consistent workflow you can use to update your site. 96 | - In Github Desktop, in the upper right corner, click "Fetch/Pull Origin" to get changes from the remote repository. If you're the only person working on your site, your local and remote repositories should be kept in sync. However, it's good practice to always fetch/pull from the remote repo before doing any work to establish good practice for collaborative work. 97 | - In your shell, run `bundle exec jekyll serve` to build, serve, and watch your website. 98 | - In your text editor, make any chances you wish. 99 | - Check out your website locally in your browser, probably at [http://localhost:4000/](http://localhost:4000/). 100 | - Once you're happy with your changes, review them in Github Desktop. When ready, write a commit message in the bottom left of Github Desktop. Hit the "Commit" button, making sure to commit to the `gh-pages` branch. 101 | - Click "Push Origin" in the upper right of Github Desktop to push your changes to the remote repository. Github Pages will now rebuild your site with your changes. 102 | - Check that it worked fine at your published site. Sometimes it takes a few minutes for Github to rebuild your page, so be patient. 103 | 104 | ## Helpful links 105 | - https://pages.github.com/ 106 | - https://jekyllrb.com/ 107 | - https://shopify.github.io/liquid/ 108 | - http://programminghistorian.org/lessons/building-static-sites-with-jekyll-github-pages 109 | - Minimal critical editions: https://github.com/elotroalex/ed 110 | -------------------------------------------------------------------------------- /intro_to_python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Intro to Python and Web Scraping\n", 8 | "\n", 9 | "## Info\n", 10 | "- Scott Bailey (CIDR), *scottbailey@stanford.edu*\n", 11 | "- Javier de la Rosa (CIDR), *versae@stanford.edu*\n", 12 | "- Ashley Jester (CIDR/SSDS), *ajester@stanford.edu*\n", 13 | "\n", 14 | "## Goal\n", 15 | "\n", 16 | "By the end of our workshop today, we hope you'll understand basic syntax in Python for variables, functions, and control flow. We also hope you'll know enough about the process of web scraping and some standard packages in Python to successfully scrape information off of a basic, well-formatted web site. \n", 17 | "\n", 18 | "## Topics\n", 19 | "- Imports\n", 20 | "- Variables and types/structures (String, Int, List)\n", 21 | "- Functions\n", 22 | "- Control flow\n", 23 | "- Web scraping with Requests and BeautifulSoup\n", 24 | "- Writing text to a file\n", 25 | "\n", 26 | "## Setup and packages we need in our environment\n", 27 | "We'll be using Anaconda with Jupyter Notebooks for this workshop. For setting up both, please see https://github.com/sul-cidr/python_workshops/blob/master/setup.ipynb\n", 28 | "\n", 29 | "For this workshop, we'll need an environment with the following packages:\n", 30 | "- requests\n", 31 | "- beautifulsoup4" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "## Imports\n", 39 | "- At the top of your script/file, do imports. \n", 40 | "- Import whole module\n", 41 | "- Import part of a module" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": { 48 | "collapsed": true 49 | }, 50 | "outputs": [], 51 | "source": [ 52 | "from bs4 import BeautifulSoup\n", 53 | "import os\n", 54 | "import requests" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": {}, 60 | "source": [ 61 | "## Types and variables" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": { 68 | "collapsed": false 69 | }, 70 | "outputs": [], 71 | "source": [ 72 | "# Strings\n", 73 | "greeting = \"Hello, I'm Scott. It's a pleasure to meet you.\"\n", 74 | "# After you run this cell, note the difference between printing out in Jupyter and getting the\n", 75 | "# output from the last line of the cell\n", 76 | "print(greeting)\n", 77 | "greeting" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": { 84 | "collapsed": false 85 | }, 86 | "outputs": [], 87 | "source": [ 88 | "# Find a letter by index\n", 89 | "greeting[0]" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": { 96 | "collapsed": false 97 | }, 98 | "outputs": [], 99 | "source": [ 100 | "# Get the length of a string\n", 101 | "len(greeting)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": { 108 | "collapsed": false 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "# Count spaces in the string\n", 113 | "greeting.count(' ')" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": { 120 | "collapsed": false 121 | }, 122 | "outputs": [], 123 | "source": [ 124 | "# Slice to get the first 3 characters\n", 125 | "greeting[0:3]" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": { 132 | "collapsed": false 133 | }, 134 | "outputs": [], 135 | "source": [ 136 | "# Get the last three characters\n", 137 | "greeting[-3:]" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "# Replace hello with goodbye\n", 149 | "greeting.replace(\"Hello\", \"Goodbye\")" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "metadata": { 156 | "collapsed": false 157 | }, 158 | "outputs": [], 159 | "source": [ 160 | "# String concatenation\n", 161 | "\"Hello\" + \"World\"" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": { 168 | "collapsed": false 169 | }, 170 | "outputs": [], 171 | "source": [ 172 | "# Numbers\n", 173 | "# Integer and floats\n", 174 | "first_num = 10\n", 175 | "second_num = 5.467\n", 176 | "print(type(first_num), type(second_num))" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": { 183 | "collapsed": false 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "# Addition\n", 188 | "1 + 5" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "collapsed": false 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "# Division\n", 200 | "10 / 2" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": { 207 | "collapsed": false 208 | }, 209 | "outputs": [], 210 | "source": [ 211 | "# Multiplication\n", 212 | "5 * 2" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [], 222 | "source": [ 223 | "# Lists\n", 224 | "drinks = ['coffee', 'tea', 'water']\n", 225 | "drinks" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "# Python allows you to create lists of different types\n", 237 | "mixed = [2, 'hello', 10.5, 'here is a sentence']\n", 238 | "mixed" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": { 245 | "collapsed": false 246 | }, 247 | "outputs": [], 248 | "source": [ 249 | "# Get item by index\n", 250 | "drinks[2]" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": { 257 | "collapsed": false 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "# Add an item to the end of the list\n", 262 | "drinks.append('juice')\n", 263 | "drinks" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": null, 269 | "metadata": { 270 | "collapsed": false 271 | }, 272 | "outputs": [], 273 | "source": [ 274 | "# Splitting a string - note the type of the output\n", 275 | "greeting_words = greeting.split(' ')\n", 276 | "greeting_words" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": null, 282 | "metadata": { 283 | "collapsed": false 284 | }, 285 | "outputs": [], 286 | "source": [ 287 | "# Joining a list of strings \n", 288 | "' '.join(greeting_words)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "There are plenty of other data types and structures that we aren't going to use today, such as: sets, dictionaries, tuples, and so forth. " 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "## Functions\n", 303 | "\n", 304 | "At the most basic level, functions are chunks of reusable code" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": { 311 | "collapsed": false 312 | }, 313 | "outputs": [], 314 | "source": [ 315 | "# Define a function\n", 316 | "def add(x, y):\n", 317 | " return x + y\n", 318 | "\n", 319 | "add(1, 2)\n" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": null, 325 | "metadata": { 326 | "collapsed": false 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "def combine_arrays(array1, array2):\n", 331 | " new_list = array1 + array2\n", 332 | " return new_list\n", 333 | "\n", 334 | "first = ['hello', 2]\n", 335 | "second = ['1', 10]\n", 336 | "new = combine_arrays(first, second)\n", 337 | "new" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "
\n", 345 | "

\n", 346 | "Activity\n", 347 | "

\n", 348 | "

\n", 349 | "In the cell below, experiment with the add function defined above. What happens if you put in two strings? A string and an integer? A list and a string?\n", 350 | "

\n", 351 | "
" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": { 358 | "collapsed": true 359 | }, 360 | "outputs": [], 361 | "source": [ 362 | "# Experiment with using different and mixed variable types with add(x, y)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "
\n", 370 | "

\n", 371 | "Activity\n", 372 | "

\n", 373 | "

\n", 374 | "\n", 375 | "Pig latin is a language game where you take the first letter of a word, move it to the back of the word, then add '-ay' at the end. For example, 'pig latin' would be 'igpay atinlay' and 'python' would turn into 'ythonpay'.\n", 376 | "\n", 377 | "In the cell below, write a function that takes a string, lowercases it, and returns the pig latin translation of the word. You'll need to use slicing and string concatenation to make this work. \n", 378 | "

\n", 379 | "
" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": { 386 | "collapsed": false 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "def pig_latinize(word):\n", 391 | " ...\n", 392 | " return ...\n" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "## Control flow " 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": { 406 | "collapsed": false 407 | }, 408 | "outputs": [], 409 | "source": [ 410 | "# IF\n", 411 | "name = \"Bob\"\n", 412 | "\n", 413 | "if name == \"Scott\":\n", 414 | " print(\"Hi Scott!\")\n", 415 | "else:\n", 416 | " print(\"Who are you?\")" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": { 423 | "collapsed": false 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "# You can use control flow with functions\n", 428 | "# Also, you can if, else if, and else to specify more than one condition\n", 429 | "name = \"John\"\n", 430 | "\n", 431 | "def say_hello(name):\n", 432 | " return \"Hello \" + name + \"!\"\n", 433 | "\n", 434 | "if (name == \"Bob\"):\n", 435 | " message = say_hello(\"Bob\")\n", 436 | " print(message)\n", 437 | "elif (name == \"Scott\"):\n", 438 | " message = say_hello(\"Scott\")\n", 439 | " print(message)\n", 440 | "else:\n", 441 | " print(\"Who are you?\")" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": { 448 | "collapsed": false 449 | }, 450 | "outputs": [], 451 | "source": [ 452 | "# FOR loops let you iterate over a list or other iterable object\n", 453 | "names = [\"Stu\", \"Scott\", \"Javier\", \"Ashley\"]\n", 454 | "for name in names:\n", 455 | " print(name, len(name))" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": null, 461 | "metadata": { 462 | "collapsed": false 463 | }, 464 | "outputs": [], 465 | "source": [ 466 | "# You can combine types of control flow\n", 467 | "for name in names[:3]:\n", 468 | " if len(name) > 5:\n", 469 | " print(name)" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": { 476 | "collapsed": false 477 | }, 478 | "outputs": [], 479 | "source": [ 480 | "def add_one(num):\n", 481 | " return num + 1\n", 482 | "\n", 483 | "nums = [1, 2, 3, 4]\n", 484 | "plus = []\n", 485 | "for num in nums:\n", 486 | " plus.append(add_one(num))\n", 487 | "plus" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": null, 493 | "metadata": { 494 | "collapsed": false 495 | }, 496 | "outputs": [], 497 | "source": [ 498 | "# ADVANCED: List Comprehensions\n", 499 | "# List comprehensions are a \"pythonic\" way of building lists in a compact manner\n", 500 | "\n", 501 | "added = [add_one(num) for num in nums]\n", 502 | "added" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": { 509 | "collapsed": false 510 | }, 511 | "outputs": [], 512 | "source": [ 513 | "long_names = [name.lower() for name in names[:3] if len(name) > 5]\n", 514 | "long_names" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "
\n", 522 | "

\n", 523 | "Activity\n", 524 | "

\n", 525 | "

\n", 526 | "In the cell below, write a function that loops over a list and returns a new list where all the strings have been replaced with their pig latin translations. \n", 527 | "\n", 528 | "For example, if your list is `['hello', 5, 'world']` your output should be `['ellohay', 5, 'orldway']`.\n", 529 | "\n", 530 | "Feel free to reuse the pig latinizer you wrote above. You'll also need to think about checking the type of each item in the list. \n", 531 | "

\n", 532 | "
" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": null, 538 | "metadata": { 539 | "collapsed": true 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "def pig_latinize_list(items):\n", 544 | " ...\n", 545 | " return ..." 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "metadata": {}, 551 | "source": [ 552 | "## Web scraping with Requests and Beautiful Soup" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "### Scraping text" 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": null, 565 | "metadata": { 566 | "collapsed": false 567 | }, 568 | "outputs": [], 569 | "source": [ 570 | "# We'll use the requests library to carry out an HTTP request on the url\n", 571 | "# Then use BeautifulSoup to parse the HTML\n", 572 | "url = \"https://en.wikipedia.org/wiki/Stanford_University\"\n", 573 | "page = requests.get(url)\n", 574 | "soup = BeautifulSoup(page.text, \"html5lib\")\n", 575 | "soup" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": null, 581 | "metadata": { 582 | "collapsed": false 583 | }, 584 | "outputs": [], 585 | "source": [ 586 | "# We can use the find method to specify an HTML element to find,\n", 587 | "# and pass attributes such as class or id to find specific elements\n", 588 | "# Find only returns the first element found\n", 589 | "hatnote = soup.find('div', {'class': 'hatnote'})\n", 590 | "hatnote" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": null, 596 | "metadata": { 597 | "collapsed": false 598 | }, 599 | "outputs": [], 600 | "source": [ 601 | "# The get_text method pulls just the text from a chunk of HTML\n", 602 | "hat_text = hatnote.get_text()\n", 603 | "hat_text" 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "execution_count": null, 609 | "metadata": { 610 | "collapsed": false 611 | }, 612 | "outputs": [], 613 | "source": [ 614 | "# Within a chunk of HTML we've found, we can use find again to find another html element\n", 615 | "main_text_area = soup.find('div', {'class': 'mw-content-ltr'})\n", 616 | "main_text = main_text_area.find('p')\n", 617 | "main_text.get_text()" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": null, 623 | "metadata": { 624 | "collapsed": false 625 | }, 626 | "outputs": [], 627 | "source": [ 628 | "# We can use find_all to find every instance of an HTMl element\n", 629 | "# find_all returns an object we can iterate over\n", 630 | "paragraphs = soup.find_all('p')\n", 631 | "type(paragraphs)" 632 | ] 633 | }, 634 | { 635 | "cell_type": "code", 636 | "execution_count": null, 637 | "metadata": { 638 | "collapsed": false, 639 | "scrolled": true 640 | }, 641 | "outputs": [], 642 | "source": [ 643 | "for para in paragraphs:\n", 644 | " print(para.get_text())" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": { 650 | "collapsed": true 651 | }, 652 | "source": [ 653 | "### Another text scraping example\n", 654 | "\n", 655 | "Let's create a list of urls for the chapters of A Byte of Python, iterate over the first few, and get that page content.\n", 656 | "\n", 657 | "A Byte of Python is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, allowing us to copy the book, distribute it, transmit it, remix it and so forth. " 658 | ] 659 | }, 660 | { 661 | "cell_type": "code", 662 | "execution_count": null, 663 | "metadata": { 664 | "collapsed": true 665 | }, 666 | "outputs": [], 667 | "source": [ 668 | "url = \"https://python.swaroopch.com/\"\n", 669 | "page = requests.get(url)\n", 670 | "soup = BeautifulSoup(page.text, \"html5lib\")" 671 | ] 672 | }, 673 | { 674 | "cell_type": "code", 675 | "execution_count": null, 676 | "metadata": { 677 | "collapsed": false 678 | }, 679 | "outputs": [], 680 | "source": [ 681 | "# We can chain together methods to find one element, then find all instances of another\n", 682 | "# element within that HTML block\n", 683 | "chapters = soup.find('nav').find_all('a')\n", 684 | "chapters" 685 | ] 686 | }, 687 | { 688 | "cell_type": "code", 689 | "execution_count": null, 690 | "metadata": { 691 | "collapsed": false 692 | }, 693 | "outputs": [], 694 | "source": [ 695 | "# We can use square brackets to access the value of an attribute, such as the href of a link\n", 696 | "for a in chapters:\n", 697 | " print(a['href'])" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": null, 703 | "metadata": { 704 | "collapsed": false 705 | }, 706 | "outputs": [], 707 | "source": [ 708 | "# Since the href didn't give us a full url, we use a function to build one\n", 709 | "def create_url(url):\n", 710 | " return 'https://python.swaroopch.com/' + url\n", 711 | "\n", 712 | "# Then use a list comprehension to create a list of full urls of chapters\n", 713 | "chapter_links = [create_url(a['href']) for a in chapters[2:-1]]\n", 714 | "chapter_links" 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": null, 720 | "metadata": { 721 | "collapsed": false 722 | }, 723 | "outputs": [], 724 | "source": [ 725 | "# We've used this chunk of code several times, so let's make it a function that specifically\n", 726 | "# gets the text from a chapter page\n", 727 | "def get_page_text(url):\n", 728 | " page = requests.get(url)\n", 729 | " soup = BeautifulSoup(page.text, \"html5lib\")\n", 730 | " return soup.find('section', {'class': 'markdown-section'}).get_text()\n", 731 | "\n", 732 | "for url in chapter_links:\n", 733 | " print(get_page_text(url))\n" 734 | ] 735 | }, 736 | { 737 | "cell_type": "markdown", 738 | "metadata": {}, 739 | "source": [ 740 | "### Writing text to a file\n", 741 | "However you prefer, create a directory/folder named 'chapters' at the same level as the file for this notebook. " 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": null, 747 | "metadata": { 748 | "collapsed": false 749 | }, 750 | "outputs": [], 751 | "source": [ 752 | "# In the below functions, I've put in docstrings, which let you document the purpose of a \n", 753 | "# function, its parameters, and what it returns \n", 754 | "\n", 755 | "# The first function breaks apart a filename, builds a path including a directory,\n", 756 | "# then puts the right file extension at the end\n", 757 | "def create_filename(name, dirname):\n", 758 | " \"\"\"\n", 759 | " Builds a filename\n", 760 | " \n", 761 | " Args:\n", 762 | " name (string) - the name of the file to be written\n", 763 | " dirname (string) - the name of the directory to contain the files\n", 764 | " \n", 765 | " Returns:\n", 766 | " filename (string) - path to the file\n", 767 | " \"\"\"\n", 768 | " chunks = name.split('.')\n", 769 | " filename = os.path.join(dirname, chunks[0] + '.txt')\n", 770 | " return filename\n", 771 | "\n", 772 | "def create_url(url):\n", 773 | " \"\"\"\n", 774 | " Takes a final chunk of a url and creates a full url\n", 775 | " \n", 776 | " Args:\n", 777 | " url (string) - the url with file extension, e.g. 'dedication.html'\n", 778 | " \n", 779 | " Returns a full url (string)\n", 780 | " \"\"\"\n", 781 | " return 'https://python.swaroopch.com/' + url\n", 782 | "\n", 783 | "def get_page_text(url):\n", 784 | " \"\"\"\n", 785 | " Pulls html from the url, creates a beautiful soup object, and gets the text from the page\n", 786 | " \n", 787 | " Args:\n", 788 | " url (string) - the url for the page from which you want text\n", 789 | " \n", 790 | " Returns the text (string) from the page\n", 791 | " \"\"\"\n", 792 | " page = requests.get(url)\n", 793 | " soup = BeautifulSoup(page.text, \"html5lib\")\n", 794 | " return soup.find('section', {'class': 'markdown-section'}).get_text()\n", 795 | "\n", 796 | "# Iterate over the chapter links, create a filename for each, get the text for each, \n", 797 | "# then write it to a local file in the chapters directory\n", 798 | "for a in chapters[2:-1]:\n", 799 | " filename = create_filename(a['href'], 'chapters')\n", 800 | " text = get_page_text(create_url(a['href']))\n", 801 | " with open(filename, 'w') as f:\n", 802 | " f.write(text)" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "
\n", 810 | "

\n", 811 | "Activity\n", 812 | "

\n", 813 | "

\n", 814 | "One type of common use for web scraping is to gather content for analysis, such as sentiment analysis. You may have seen data-driven journalists offer sentiment analysis of political content. We won't do the analysis, but let's practice scraping news headlines off of a news site.

\n", 815 | "

In the cell below, scrape the article titles from ProPublica's page on the current presidential administration - https://www.propublica.org/trump-administration/. You'll need to look at the html code of the page to locate the right markup to find. Think about first finding all the articles, then iterating over those to find each title. You can either just print out each title, or put it into a list.

\n", 816 | "

Always check any given site's terms of use, content policy, and robots.txt file before scraping it. In this case, content has a Creative Commons license and the robots.txt file seems to allow a robot to hit pages for categories of articles.

\n", 817 | "
" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": null, 823 | "metadata": { 824 | "collapsed": true 825 | }, 826 | "outputs": [], 827 | "source": [] 828 | } 829 | ], 830 | "metadata": { 831 | "anaconda-cloud": {}, 832 | "kernelspec": { 833 | "display_name": "Python [default]", 834 | "language": "python", 835 | "name": "python3" 836 | }, 837 | "language_info": { 838 | "codemirror_mode": { 839 | "name": "ipython", 840 | "version": 3 841 | }, 842 | "file_extension": ".py", 843 | "mimetype": "text/x-python", 844 | "name": "python", 845 | "nbconvert_exporter": "python", 846 | "pygments_lexer": "ipython3", 847 | "version": "3.5.2" 848 | } 849 | }, 850 | "nbformat": 4, 851 | "nbformat_minor": 1 852 | } 853 | -------------------------------------------------------------------------------- /setup.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Setup\n", 8 | "\n", 9 | "In this series of workshops we will be using [Jupyter Notebook](http://jupyter.org/) as our Integrated Development Environment (IDE), and the [Anaconda](https://www.continuum.io/) distribution of Python.\n", 10 | "\n", 11 | "## Anaconda\n", 12 | "\n", 13 | "Anaconda is a software application that packages together different Python versions and a package manager. There are other options out there, like using your distribution's Python version, or [`pyenv`](https://github.com/pyenv/pyenv), but we won't be covering the setup for those. Therefore, the first thing to do is to download and install the [Anaconda Navigator](https://www.continuum.io/downloads) graphical installer for Python 3.5 (or higher).\n", 14 | "\n", 15 | "\n", 16 | "## Jupyter Notebooks\n", 17 | "> The Jupyter Notebook App is a server-client application that allows editing and running “notebook“ documents via a web browser [...] In addition to displaying/editing/running notebook documents, the Jupyter Notebook App has a “Dashboard” (Notebook Dashboard), a “control panel” showing local files and allowing to open notebook documents or shutting down their kernels.\n", 18 | ">\n", 19 | "> — [Jupyter/IPython Notebook Quick Start Guide](http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html)\n", 20 | "\n", 21 | "In Jupyter, a kernel is the engine in which commands are run. In our case, the engine will be in Python, specifically, the IPython kernel.\n", 22 | "\n", 23 | "Jupyter let's you write and evaluate code at a granular level without re-running scripts constantly and using a lot of print debugging. It also allows mixing in Markdown and HTML within your notebook, and so is a great way of presenting code and data analysis.\n", 24 | "\n", 25 | "For getting started, you should go to the official [Jupyter documentation](http://test-jupyter.readthedocs.io/en/latest/index.html), the [starter guide](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/), or take a look on [how to run code](https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Running%20Code.ipynb), or [other interesting notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks). There is even [video tutorials available](https://www.youtube.com/watch?v=HW29067qVWk).\n", 26 | "\n", 27 | "## Virtualenvs\n", 28 | "\n", 29 | "In the Python world, a `virtualenv`, virtual environment, or just environment, is a way of managing isolated environments so each project can have its own dependencies without conflicts with other projects.\n", 30 | "\n", 31 | "> A Virtual Environment is a tool to keep the dependencies required by different projects in separate places, by creating virtual Python environments for them. It solves the “Project X depends on version 1.x but, Project Y needs 4.x” dilemma, and keeps your global site-packages directory clean and manageable.\n", 32 | ">\n", 33 | "> — [The Hitchhiker’s Guide to Python!](http://python-guide-pt-br.readthedocs.io/en/latest/dev/virtualenvs/)\n", 34 | "\n", 35 | "Anaconda has its own environment and package manager (`conda`), let's you easily set Python versions, and comes with many of the standard packages used in scientific computing. It also provides the concept of channels so you can easily install packages that are maintained by other people. We will be covering the instructions for setting up your Python environment using Anaconda.\n", 36 | "\n", 37 | "\n", 38 | "## Setting up Jupyter using Anaconda in the command line\n", 39 | "\n", 40 | "To set up an environment, in your shell or terminal, run (`$` means a shell command):\n", 41 | "\n", 42 | "`\n", 43 | "$ conda create -n name_of_your_environment python=x.y package1 package2\n", 44 | "`\n", 45 | "\n", 46 | "This creates an environment named `name_of_your_environment`, where the Python version is specified to `x.y`, and installs the packages `lib1` and `lib2` into the environment.\n", 47 | "\n", 48 | "For example;\n", 49 | "\n", 50 | "`\n", 51 | "$ conda create -n data python=3.5 jupyter requests\n", 52 | "`\n", 53 | "\n", 54 | "Installs `jupyter` and `requests` in a `data` virtualenv using Python 3.5.\n", 55 | "\n", 56 | "After you create the environment, run\n", 57 | "\n", 58 | "`\n", 59 | "$ source activate name_of_your_environment\n", 60 | "`\n", 61 | "\n", 62 | "Or\n", 63 | "\n", 64 | "`\n", 65 | "$ activate name_of_your_environment\n", 66 | "`\n", 67 | "\n", 68 | "Depending on whether you are on OSX or Windows to activate the environment.\n", 69 | "\n", 70 | "Once you have a virtual environment running, just install `jupyter` and run `jupyter notebook` from the location where you want to store your notebook.\n", 71 | "\n", 72 | "`\n", 73 | "$ jupyter notebook\n", 74 | "`\n", 75 | "\n", 76 | "And go to http://localhost:8888/ in your browser.\n", 77 | "\n", 78 | "If you need to install packages from other channels, just add `-c channel_name` to the `install`:\n", 79 | "\n", 80 | "`\n", 81 | "$ conda install -n name_of_your_environment -c channel_name package_maintained_by_someone_else\n", 82 | "`\n", 83 | "\n", 84 | "Often times, we will be using the channel `conda-forge` for more up-to-date or missing packages from the official Anaconda repository.\n", 85 | "\n", 86 | "## Setting up Jupyter using Anaconda Navigator\n", 87 | "\n", 88 | "First, launch the Anaconda-Navigator interface, go to \"Environments,\" and create a new one with Python 3.5 (or higher) and click on it.\n", 89 | "\n", 90 | "![Environment creation](images/anaconda-envs.gif \"Environment creation\")\n", 91 | "\n", 92 | "Go to channels and add a new one called `conda-forge`, then click on \"Update channels.\"\n", 93 | "\n", 94 | "![Channels](images/anaconda-channels.gif \"Channels\")\n", 95 | "\n", 96 | "Now go to the \"Installed\" packages and select \"Not installed.\" Look for \"jupyter\" in the \"Search Packages\" box and mark it to install. Repeat the process for any other package you need, such as \"requests\" for example. Then click on \"Applyl,\" and you should see a list of packages before installing the them. Click \"OK.\"\n", 97 | "\n", 98 | "![Packages](images/anaconda-packages.gif \"Packages\")\n", 99 | "\n", 100 | "Once installation is finished, click on the green triangle and select \"Open with Jupyter Notebook.\" A terminal window should popup. It's loading Jupyter. After a few seconds you should see the main interface of Jupyter showing the current directory contents.\n", 101 | "\n", 102 | "![Launching Jupyter](images/anaconda-jupyter.gif \"Launching Jupyter\")\n", 103 | "\n", 104 | "Navigate to where you want your notebooks stored, and then click on \"New\" and select \"Python 3.5\". Now you should be ready to start writing Python code in a new and clean notebook.\n", 105 | "\n", 106 | "![Launching a Notebook](images/anaconda-notebook.gif \"Launching a Notebook\")\n", 107 | "\n", 108 | "\n", 109 | "## Final remarks\n", 110 | "\n", 111 | "Now you should have all you need to start coding in Python with Anaconda and Jupyter Notebook. We still recommend you to have your code control versioned, preferably with a tool like `git`, to ease history revision and to avoid accidental code loss.\n", 112 | "\n", 113 | "Happy coding!" 114 | ] 115 | } 116 | ], 117 | "metadata": { 118 | "anaconda-cloud": {}, 119 | "kernelspec": { 120 | "display_name": "Python 3", 121 | "language": "python", 122 | "name": "python3" 123 | }, 124 | "language_info": { 125 | "codemirror_mode": { 126 | "name": "ipython", 127 | "version": 3 128 | }, 129 | "file_extension": ".py", 130 | "mimetype": "text/x-python", 131 | "name": "python", 132 | "nbconvert_exporter": "python", 133 | "pygments_lexer": "ipython3", 134 | "version": "3.5.3" 135 | } 136 | }, 137 | "nbformat": 4, 138 | "nbformat_minor": 2 139 | } 140 | -------------------------------------------------------------------------------- /setup_jekyll_githubpages.md: -------------------------------------------------------------------------------- 1 | # Setup for Jekyll and Github Pages 2 | 3 | ## Requirements 4 | - Github account 5 | - Github Desktop or facility with using git in a shell 6 | - Ruby, with Jekyll and Bundler gems installed 7 | - A Bash shell. This is included in MacOS and Linux, and in the newest 64bit Windows. For older versions of Windows, you can download [Git for Windows](https://git-for-windows.github.io/) to get Git Bash. 8 | - A text editor (Atom is a great choice: [Atom](https://atom.io/)) 9 | 10 | ## Create a Github account 11 | 12 | From the [Github home page](https://github.com/), sign up for a Github account. You only need a free account. For this workshop, your website will end up with a url like, username.github.io/projectname, so pick your username wisely. You can also add custom domains for your URL, but we won't be covering that in this workshop. 13 | 14 | ## Download Github Desktop 15 | 16 | [https://desktop.github.com/](https://desktop.github.com/) 17 | 18 | After you've downloaded this, open up the application and sign-in with your Github account. 19 | 20 | ## Download a Bash shell if necessary 21 | 22 | If you are on a Mac or Linux machine, you should already have a Bash shell available. 23 | 24 | If you have a Windows machine with 64bit Windows 10, you might have a Bash shell available. You can turn it on with these [instructions](https://msdn.microsoft.com/en-us/commandline/wsl/install_guide). 25 | 26 | For any Windows user, you can also download Git for Windows, which will include Git Bash, a Bash shell for Windows. Go [here](https://git-for-windows.github.io/). 27 | 28 | ## Download Ruby and the Bundler and Jekyll gems 29 | 30 | There are a number of great tutorials on Jekyll out there, some of which nicely go through the setup on Windows machines, which is, unfortunately, more complicated. Rather than recreate already solid instructions, we'll use the instructions for installing Jekyll and everything necessary for it from the Programming Historian: [here](http://programminghistorian.org/lessons/building-static-sites-with-jekyll-github-pages#section2). Please read and do all the instructions in "Installing dependencies." There are specific instructions for Mac/Linux and Windows. 31 | 32 | ## Download a text editor 33 | 34 | There are a lot of text editors out there, and your OS might have one already installed. I like to recommend Atom, though, which is made by Github and has a lot of great options out of the box. See [Atom](https://atom.io/). Note that a text editor in this context is a plain text editor, which means that Microsoft Word or Apple Pages will *not* work. 35 | --------------------------------------------------------------------------------