` parent we can do `div > p` for example.\n",
1120 | "\n",
1121 | "For a full overview I recommend checking this page: \n",
1122 | "https://www.w3schools.com/cssref/css_selectors.asp"
1123 | ]
1124 | },
1125 | {
1126 | "cell_type": "markdown",
1127 | "metadata": {},
1128 | "source": [
1129 | "### *A pragmatic way to generate the right `css selector`*\n",
1130 | "\n",
1131 | "If you are unfamiliar with programming websites then it might be hard to wrap your head around CSS selectors. \n",
1132 | "Fortunately, there are tools out there that can make it very easy to generate the css selector that you need! \n",
1133 | "\n",
1134 | "***Option 1:*** \n",
1135 | "\n",
1136 | "If you want just one element you can use the build-in Chrome DevTools (Firefox has something similar). \n",
1137 | "You achieve this by right clicking on the element you want and then click `\"inspect\"`, this should bring up the Dev console. \n",
1138 | "\n",
1139 | "If you then right click on the element you want to extract, you can have DevTools generate a `css selector`:\n",
1140 | "\n",
1141 | "

\n",
1142 | "\n",
1143 | "\n",
1144 | "This will result in the following `css selector`:\n",
1145 | "\n",
1146 | "`#en_proceedings > div:nth-child(1) > div.en_session_title > a`\n",
1147 | "\n",
1148 | "***Option 2:***\n",
1149 | "\n",
1150 | "The above can be limiting if you want to select multiple elements. \n",
1151 | "An other option that makes this easier is to use an awesome Chrome extension called `SelectorGadget`. \n",
1152 | "\n",
1153 | "You can install it here: \n",
1154 | "https://chrome.google.com/webstore/detail/selectorgadget/mhjhnkcfbdhnjickkkdbjoemdmbfginb\n",
1155 | "\n",
1156 | "\n",
1157 | "There is more information available here as well: \n",
1158 | "http://selectorgadget.com/\n",
1159 | "\n",
1160 | "With this extension you can simply highlight what do / do not want to select and it will generate the `css selector` that you need. For example, if we want all the titles:\n",
1161 | "\n",
1162 | "

\n",
1163 | "\n",
1164 | "\n",
1165 | "This yields the following `css selector`: \n",
1166 | "\n",
1167 | "`'.en_session_title a'`\n",
1168 | "\n",
1169 | "\n",
1170 | "*Note:* The number between brackets after 'Clear' indicates the number of elements selected."
1171 | ]
1172 | },
1173 | {
1174 | "cell_type": "markdown",
1175 | "metadata": {},
1176 | "source": [
1177 | "##
CSS Selectors with `Requests-HTML`:"
1178 | ]
1179 | },
1180 | {
1181 | "cell_type": "markdown",
1182 | "metadata": {},
1183 | "source": [
1184 | "### Generate a list of all titles"
1185 | ]
1186 | },
1187 | {
1188 | "cell_type": "code",
1189 | "execution_count": 46,
1190 | "metadata": {},
1191 | "outputs": [],
1192 | "source": [
1193 | "title_elements = res.html.find('.en_session_title a')"
1194 | ]
1195 | },
1196 | {
1197 | "cell_type": "code",
1198 | "execution_count": 47,
1199 | "metadata": {},
1200 | "outputs": [
1201 | {
1202 | "data": {
1203 | "text/plain": [
1204 | "48"
1205 | ]
1206 | },
1207 | "execution_count": 47,
1208 | "metadata": {},
1209 | "output_type": "execute_result"
1210 | }
1211 | ],
1212 | "source": [
1213 | "len(title_elements)"
1214 | ]
1215 | },
1216 | {
1217 | "cell_type": "markdown",
1218 | "metadata": {},
1219 | "source": [
1220 | "#### Get text of first element:"
1221 | ]
1222 | },
1223 | {
1224 | "cell_type": "code",
1225 | "execution_count": 48,
1226 | "metadata": {},
1227 | "outputs": [
1228 | {
1229 | "data": {
1230 | "text/plain": [
1231 | "'Containerizing notebooks for serverless execution (sponsored by AWS)'"
1232 | ]
1233 | },
1234 | "execution_count": 48,
1235 | "metadata": {},
1236 | "output_type": "execute_result"
1237 | }
1238 | ],
1239 | "source": [
1240 | "title_elements[0].text"
1241 | ]
1242 | },
1243 | {
1244 | "cell_type": "markdown",
1245 | "metadata": {},
1246 | "source": [
1247 | "*Note:* if you are only interested in the first (or only) object you can add `first=True` to `res.html.find()` and it will only return one result"
1248 | ]
1249 | },
1250 | {
1251 | "cell_type": "markdown",
1252 | "metadata": {},
1253 | "source": [
1254 | "#### Get text of all elements:"
1255 | ]
1256 | },
1257 | {
1258 | "cell_type": "code",
1259 | "execution_count": 49,
1260 | "metadata": {},
1261 | "outputs": [
1262 | {
1263 | "data": {
1264 | "text/plain": [
1265 | "['Containerizing notebooks for serverless execution (sponsored by AWS)',\n",
1266 | " 'Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks',\n",
1267 | " 'All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations']"
1268 | ]
1269 | },
1270 | "execution_count": 49,
1271 | "metadata": {},
1272 | "output_type": "execute_result"
1273 | }
1274 | ],
1275 | "source": [
1276 | "[element.text for element in title_elements][:3]"
1277 | ]
1278 | },
1279 | {
1280 | "cell_type": "markdown",
1281 | "metadata": {},
1282 | "source": [
1283 | "### Extract the hyperlink that leads to the talk page"
1284 | ]
1285 | },
1286 | {
1287 | "cell_type": "markdown",
1288 | "metadata": {},
1289 | "source": [
1290 | "Above we extract the text, but we can also add `.attrs` to access any attributes of the element:"
1291 | ]
1292 | },
1293 | {
1294 | "cell_type": "code",
1295 | "execution_count": 50,
1296 | "metadata": {},
1297 | "outputs": [
1298 | {
1299 | "data": {
1300 | "text/plain": [
1301 | "{'href': '/jupyter/jup-ny/public/schedule/detail/71980'}"
1302 | ]
1303 | },
1304 | "execution_count": 50,
1305 | "metadata": {},
1306 | "output_type": "execute_result"
1307 | }
1308 | ],
1309 | "source": [
1310 | "title_elements[0].attrs"
1311 | ]
1312 | },
1313 | {
1314 | "cell_type": "markdown",
1315 | "metadata": {},
1316 | "source": [
1317 | "As you can see, there is a `href` attribute with the url. \n",
1318 | "So we can create a list with both the text and the url:"
1319 | ]
1320 | },
1321 | {
1322 | "cell_type": "code",
1323 | "execution_count": 51,
1324 | "metadata": {},
1325 | "outputs": [],
1326 | "source": [
1327 | "talks = []\n",
1328 | "for element in title_elements:\n",
1329 | " talks.append((element.text, \n",
1330 | " element.attrs['href']))"
1331 | ]
1332 | },
1333 | {
1334 | "cell_type": "code",
1335 | "execution_count": 52,
1336 | "metadata": {},
1337 | "outputs": [
1338 | {
1339 | "data": {
1340 | "text/plain": [
1341 | "[('Containerizing notebooks for serverless execution (sponsored by AWS)',\n",
1342 | " '/jupyter/jup-ny/public/schedule/detail/71980'),\n",
1343 | " ('Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks',\n",
1344 | " '/jupyter/jup-ny/public/schedule/detail/68407'),\n",
1345 | " ('All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations',\n",
1346 | " '/jupyter/jup-ny/public/schedule/detail/71345')]"
1347 | ]
1348 | },
1349 | "execution_count": 52,
1350 | "metadata": {},
1351 | "output_type": "execute_result"
1352 | }
1353 | ],
1354 | "source": [
1355 | "talks[:3]"
1356 | ]
1357 | },
1358 | {
1359 | "cell_type": "markdown",
1360 | "metadata": {},
1361 | "source": [
1362 | "### Extract the title, hyperlink, description, and authors for each talk"
1363 | ]
1364 | },
1365 | {
1366 | "cell_type": "markdown",
1367 | "metadata": {},
1368 | "source": [
1369 | "We can use the above approach and do also get a list of all the authors and the descriptions. \n",
1370 | "It, however, becomes a little bit tricky to combine everything given that one talk might have multiple authors. \n",
1371 | "\n",
1372 | "To deal with this (common) problem it is best to loop over each talk element separately and only then extract the information for that talk, that way it is easy to keep everything linked to a specific talk. \n",
1373 | "\n",
1374 | "If we look in the Chrome DevTools element viewer, we can observe that each talk is a separate `
` with the `en_session` class:\n",
1375 | "\n",
1376 | "

"
1377 | ]
1378 | },
1379 | {
1380 | "cell_type": "markdown",
1381 | "metadata": {},
1382 | "source": [
1383 | "We first select all the `divs` with the `en_session` class that have a parent with `en_proceedings` as id:"
1384 | ]
1385 | },
1386 | {
1387 | "cell_type": "code",
1388 | "execution_count": 53,
1389 | "metadata": {},
1390 | "outputs": [
1391 | {
1392 | "data": {
1393 | "text/plain": [
1394 | "[
,\n",
1395 | " ,\n",
1396 | " ]"
1397 | ]
1398 | },
1399 | "execution_count": 53,
1400 | "metadata": {},
1401 | "output_type": "execute_result"
1402 | }
1403 | ],
1404 | "source": [
1405 | "talk_elements = res.html.find('#en_proceedings > .en_session')\n",
1406 | "talk_elements[:3]"
1407 | ]
1408 | },
1409 | {
1410 | "cell_type": "markdown",
1411 | "metadata": {},
1412 | "source": [
1413 | "Now we can loop over each of these elements and extract the information we want:"
1414 | ]
1415 | },
1416 | {
1417 | "cell_type": "code",
1418 | "execution_count": 54,
1419 | "metadata": {},
1420 | "outputs": [],
1421 | "source": [
1422 | "talk_details = []\n",
1423 | "for talk in talk_elements:\n",
1424 | " title = talk.find('.en_session_title a', first=True).text\n",
1425 | " href = talk.find('.en_session_title a', first=True).attrs['href']\n",
1426 | " description = talk.find('.en_session_description', first=True).text.strip()\n",
1427 | " speakers = [speaker.text for speaker in talk.find('.speaker_names > a')]\n",
1428 | " talk_details.append((title, href, description, speakers))"
1429 | ]
1430 | },
1431 | {
1432 | "cell_type": "markdown",
1433 | "metadata": {},
1434 | "source": [
1435 | "For the sake of the example, below a prettified inspection of the data we gathered:"
1436 | ]
1437 | },
1438 | {
1439 | "cell_type": "code",
1440 | "execution_count": 56,
1441 | "metadata": {},
1442 | "outputs": [
1443 | {
1444 | "name": "stdout",
1445 | "output_type": "stream",
1446 | "text": [
1447 | "The title is: Containerizing notebooks for serverless execution (sponsored by AWS)\n",
1448 | "Speakers: ['Kevin McCormick', 'Vladimir Zhukov'] \n",
1449 | "\n",
1450 | "Description: \n",
1451 | " Kevin McCormick explains the story of two approaches which were used internally at AWS to accelerate new ML algorithm development, and easily package Jupyter notebooks for scheduled execution, by creating custom Jupyter kernels that automatically create Docker containers, and dispatch them to either a distributed training service or job execution environment. \n",
1452 | "\n",
1453 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/71980\n",
1454 | "---------------------------------------------------------------------------------------------------- \n",
1455 | "\n",
1456 | "The title is: Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks\n",
1457 | "Speakers: ['Matt Brems'] \n",
1458 | "\n",
1459 | "Description: \n",
1460 | " Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. \n",
1461 | "\n",
1462 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/68407\n",
1463 | "---------------------------------------------------------------------------------------------------- \n",
1464 | "\n",
1465 | "The title is: All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations\n",
1466 | "Speakers: ['Will M Farr'] \n",
1467 | "\n",
1468 | "Description: \n",
1469 | " Will Farr shares examples of Jupyter use within the LIGO and Virgo Scientific Collaborations and offers lessons about the (many) advantages and (few) disadvantages of Jupyter for large, global scientific collaborations. Along the way, Will speculates on Jupyter's future role in gravitational wave astronomy. \n",
1470 | "\n",
1471 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/71345\n",
1472 | "---------------------------------------------------------------------------------------------------- \n",
1473 | "\n"
1474 | ]
1475 | }
1476 | ],
1477 | "source": [
1478 | "for title, href, description, speakers in talk_details[:3]:\n",
1479 | " print('The title is: ', title)\n",
1480 | " print('Speakers: ', speakers, '\\n')\n",
1481 | " print('Description: \\n', description, '\\n')\n",
1482 | " print('For details see: ', 'https://conferences.oreilly.com/' + href)\n",
1483 | " print('-'*100, '\\n')"
1484 | ]
1485 | },
1486 | {
1487 | "cell_type": "markdown",
1488 | "metadata": {},
1489 | "source": [
1490 | "## CSS Selectors with `LXML`:"
1491 | ]
1492 | },
1493 | {
1494 | "cell_type": "markdown",
1495 | "metadata": {},
1496 | "source": [
1497 | "**Note:** In order to use css selectors with LXML you might have to install `cssselect` by running this in your command prompt: \n",
1498 | "`pip install cssselect`"
1499 | ]
1500 | },
1501 | {
1502 | "cell_type": "markdown",
1503 | "metadata": {},
1504 | "source": [
1505 | "### Generate a list of all titles:"
1506 | ]
1507 | },
1508 | {
1509 | "cell_type": "markdown",
1510 | "metadata": {},
1511 | "source": [
1512 | "We can use the css selector that we generated earlier with the SelectorGadget extension:"
1513 | ]
1514 | },
1515 | {
1516 | "cell_type": "code",
1517 | "execution_count": 57,
1518 | "metadata": {},
1519 | "outputs": [],
1520 | "source": [
1521 | "title_elements = tree.cssselect('.en_session_title a')"
1522 | ]
1523 | },
1524 | {
1525 | "cell_type": "code",
1526 | "execution_count": 58,
1527 | "metadata": {},
1528 | "outputs": [
1529 | {
1530 | "data": {
1531 | "text/plain": [
1532 | "48"
1533 | ]
1534 | },
1535 | "execution_count": 58,
1536 | "metadata": {},
1537 | "output_type": "execute_result"
1538 | }
1539 | ],
1540 | "source": [
1541 | "len(title_elements)"
1542 | ]
1543 | },
1544 | {
1545 | "cell_type": "markdown",
1546 | "metadata": {},
1547 | "source": [
1548 | "If we select the first title element we see that it doesn't return the text:"
1549 | ]
1550 | },
1551 | {
1552 | "cell_type": "code",
1553 | "execution_count": 59,
1554 | "metadata": {},
1555 | "outputs": [
1556 | {
1557 | "data": {
1558 | "text/plain": [
1559 | ""
1560 | ]
1561 | },
1562 | "execution_count": 59,
1563 | "metadata": {},
1564 | "output_type": "execute_result"
1565 | }
1566 | ],
1567 | "source": [
1568 | "title_elements[0]"
1569 | ]
1570 | },
1571 | {
1572 | "cell_type": "markdown",
1573 | "metadata": {},
1574 | "source": [
1575 | "In order to extract the text we have to add `.text` to the end:"
1576 | ]
1577 | },
1578 | {
1579 | "cell_type": "code",
1580 | "execution_count": 60,
1581 | "metadata": {},
1582 | "outputs": [
1583 | {
1584 | "data": {
1585 | "text/plain": [
1586 | "' Containerizing notebooks for serverless execution (sponsored by AWS)'"
1587 | ]
1588 | },
1589 | "execution_count": 60,
1590 | "metadata": {},
1591 | "output_type": "execute_result"
1592 | }
1593 | ],
1594 | "source": [
1595 | "title_elements[0].text"
1596 | ]
1597 | },
1598 | {
1599 | "cell_type": "markdown",
1600 | "metadata": {},
1601 | "source": [
1602 | "We can do this for all titles to get a list with all the title texts:"
1603 | ]
1604 | },
1605 | {
1606 | "cell_type": "code",
1607 | "execution_count": 61,
1608 | "metadata": {},
1609 | "outputs": [
1610 | {
1611 | "data": {
1612 | "text/plain": [
1613 | "[' Containerizing notebooks for serverless execution (sponsored by AWS)',\n",
1614 | " 'Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks',\n",
1615 | " 'All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations']"
1616 | ]
1617 | },
1618 | "execution_count": 61,
1619 | "metadata": {},
1620 | "output_type": "execute_result"
1621 | }
1622 | ],
1623 | "source": [
1624 | "title_texts = [x.text for x in title_elements]\n",
1625 | "title_texts[:3]"
1626 | ]
1627 | },
1628 | {
1629 | "cell_type": "markdown",
1630 | "metadata": {},
1631 | "source": [
1632 | "### Extract the hyperlink that leads to the talk page"
1633 | ]
1634 | },
1635 | {
1636 | "cell_type": "markdown",
1637 | "metadata": {},
1638 | "source": [
1639 | "Above we extract the text, but we can also add `.attrib` to access any attributes of the element:"
1640 | ]
1641 | },
1642 | {
1643 | "cell_type": "code",
1644 | "execution_count": 62,
1645 | "metadata": {},
1646 | "outputs": [
1647 | {
1648 | "data": {
1649 | "text/plain": [
1650 | "{'href': '/jupyter/jup-ny/public/schedule/detail/71980'}"
1651 | ]
1652 | },
1653 | "execution_count": 62,
1654 | "metadata": {},
1655 | "output_type": "execute_result"
1656 | }
1657 | ],
1658 | "source": [
1659 | "title_elements[0].attrib"
1660 | ]
1661 | },
1662 | {
1663 | "cell_type": "markdown",
1664 | "metadata": {},
1665 | "source": [
1666 | "As you can see, there is a `href` attribute with the url. \n",
1667 | "So we can create a list with both the text and the url:"
1668 | ]
1669 | },
1670 | {
1671 | "cell_type": "code",
1672 | "execution_count": 63,
1673 | "metadata": {},
1674 | "outputs": [],
1675 | "source": [
1676 | "talks = []\n",
1677 | "for element in title_elements:\n",
1678 | " talks.append((element.text, \n",
1679 | " element.attrib['href']))"
1680 | ]
1681 | },
1682 | {
1683 | "cell_type": "code",
1684 | "execution_count": 64,
1685 | "metadata": {},
1686 | "outputs": [
1687 | {
1688 | "data": {
1689 | "text/plain": [
1690 | "[(' Containerizing notebooks for serverless execution (sponsored by AWS)',\n",
1691 | " '/jupyter/jup-ny/public/schedule/detail/71980'),\n",
1692 | " ('Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks',\n",
1693 | " '/jupyter/jup-ny/public/schedule/detail/68407'),\n",
1694 | " ('All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations',\n",
1695 | " '/jupyter/jup-ny/public/schedule/detail/71345')]"
1696 | ]
1697 | },
1698 | "execution_count": 64,
1699 | "metadata": {},
1700 | "output_type": "execute_result"
1701 | }
1702 | ],
1703 | "source": [
1704 | "talks[:3]"
1705 | ]
1706 | },
1707 | {
1708 | "cell_type": "markdown",
1709 | "metadata": {},
1710 | "source": [
1711 | "### Extract the title, hyperlink, description, and authors for each talk"
1712 | ]
1713 | },
1714 | {
1715 | "cell_type": "markdown",
1716 | "metadata": {},
1717 | "source": [
1718 | "We can use the above approach and do also get a list of all the authors and the descriptions. \n",
1719 | "It, however, becomes a little bit tricky to combine everything given that one talk might have multiple authors. \n",
1720 | "\n",
1721 | "To deal with this (common) problem it is best to loop over each talk element separately and only then extract the information for that talk, that way it is easy to keep everything linked to a specific talk. \n",
1722 | "\n",
1723 | "If we look in the Chrome DevTools element viewer, we can observe that each talk is a separate `` with the `en_session` class:\n",
1724 | "\n",
1725 | "

"
1726 | ]
1727 | },
1728 | {
1729 | "cell_type": "markdown",
1730 | "metadata": {},
1731 | "source": [
1732 | "We first select all the `divs` with the `en_session` class that have a parent with `en_proceedings` as id:"
1733 | ]
1734 | },
1735 | {
1736 | "cell_type": "code",
1737 | "execution_count": 65,
1738 | "metadata": {},
1739 | "outputs": [
1740 | {
1741 | "data": {
1742 | "text/plain": [
1743 | "[
,\n",
1744 | " ,\n",
1745 | " ]"
1746 | ]
1747 | },
1748 | "execution_count": 65,
1749 | "metadata": {},
1750 | "output_type": "execute_result"
1751 | }
1752 | ],
1753 | "source": [
1754 | "talk_elements = tree.cssselect('#en_proceedings > .en_session')\n",
1755 | "talk_elements[:3]"
1756 | ]
1757 | },
1758 | {
1759 | "cell_type": "markdown",
1760 | "metadata": {},
1761 | "source": [
1762 | "Now we can loop over each of these elements and extract the information we want:"
1763 | ]
1764 | },
1765 | {
1766 | "cell_type": "code",
1767 | "execution_count": 66,
1768 | "metadata": {},
1769 | "outputs": [],
1770 | "source": [
1771 | "talk_details = []\n",
1772 | "for talk in talk_elements:\n",
1773 | " title = talk.cssselect('.en_session_title a')[0].text\n",
1774 | " href = talk.cssselect('.en_session_title a')[0].attrib['href']\n",
1775 | " description = talk.cssselect('.en_session_description')[0].text.strip()\n",
1776 | " speakers = [speaker.text for speaker in talk.cssselect('.speaker_names > a')]\n",
1777 | " talk_details.append((title, href, description, speakers))"
1778 | ]
1779 | },
1780 | {
1781 | "cell_type": "markdown",
1782 | "metadata": {},
1783 | "source": [
1784 | "For the sake of the example, below a prettified inspection of the data we gathered:"
1785 | ]
1786 | },
1787 | {
1788 | "cell_type": "code",
1789 | "execution_count": 68,
1790 | "metadata": {},
1791 | "outputs": [
1792 | {
1793 | "name": "stdout",
1794 | "output_type": "stream",
1795 | "text": [
1796 | "The title is: Containerizing notebooks for serverless execution (sponsored by AWS)\n",
1797 | "Speakers: ['Kevin McCormick', 'Vladimir Zhukov'] \n",
1798 | "\n",
1799 | "Description: \n",
1800 | " Kevin McCormick explains the story of two approaches which were used internally at AWS to accelerate new ML algorithm development, and easily package Jupyter notebooks for scheduled execution, by creating custom Jupyter kernels that automatically create Docker containers, and dispatch them to either a distributed training service or job execution environment. \n",
1801 | "\n",
1802 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/71980\n",
1803 | "---------------------------------------------------------------------------------------------------- \n",
1804 | "\n",
1805 | "The title is: Advanced data science, part 2: Five ways to handle missing data in Jupyter notebooks\n",
1806 | "Speakers: ['Matt Brems'] \n",
1807 | "\n",
1808 | "Description: \n",
1809 | " Missing data plagues nearly every data science problem. Often, people just drop or ignore missing data. However, this usually ends up with bad results. Matt Brems explains how bad dropping or ignoring missing data can be and teaches you how to handle missing data the right way by leveraging Jupyter notebooks to properly reweight or impute your data. \n",
1810 | "\n",
1811 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/68407\n",
1812 | "---------------------------------------------------------------------------------------------------- \n",
1813 | "\n",
1814 | "The title is: All the cool kids are doing it; maybe we should too? Jupyter, gravitational waves, and the LIGO and Virgo Scientific Collaborations\n",
1815 | "Speakers: ['Will M Farr'] \n",
1816 | "\n",
1817 | "Description: \n",
1818 | " Will Farr shares examples of Jupyter use within the LIGO and Virgo Scientific Collaborations and offers lessons about the (many) advantages and (few) disadvantages of Jupyter for large, global scientific collaborations. Along the way, Will speculates on Jupyter's future role in gravitational wave astronomy. \n",
1819 | "\n",
1820 | "For details see: https://conferences.oreilly.com//jupyter/jup-ny/public/schedule/detail/71345\n",
1821 | "---------------------------------------------------------------------------------------------------- \n",
1822 | "\n"
1823 | ]
1824 | }
1825 | ],
1826 | "source": [
1827 | "for title, href, description, speakers in talk_details[:3]:\n",
1828 | " print('The title is: ', title)\n",
1829 | " print('Speakers: ', speakers, '\\n')\n",
1830 | " print('Description: \\n', description, '\\n')\n",
1831 | " print('For details see: ', 'https://conferences.oreilly.com/' + href)\n",
1832 | " print('-'*100, '\\n')\n",
1833 | " "
1834 | ]
1835 | },
1836 | {
1837 | "cell_type": "markdown",
1838 | "metadata": {},
1839 | "source": [
1840 | "## Extract data from Javascript heavy websites (Headless browsers / Selenium) [(to top)](#toc)"
1841 | ]
1842 | },
1843 | {
1844 | "cell_type": "markdown",
1845 | "metadata": {},
1846 | "source": [
1847 | "A lot of websites nowadays use Javascript elements that are difficult (or impossible) to crawl using `requests`.\n",
1848 | "\n",
1849 | "In these scenarios we can use an alternative method where we have Python interact with a browser that is capable of handling Javascript elements. \n",
1850 | "\n",
1851 | "There are essentially two ways to do this:\n",
1852 | "\n",
1853 | "1. Use a so-called `headless automated browsing` package that runs in the background (you don't see the browser).\n",
1854 | "2. Use the `Selenium Webdriver` to control a browser like Chrome (you do see the browser)."
1855 | ]
1856 | },
1857 | {
1858 | "cell_type": "markdown",
1859 | "metadata": {},
1860 | "source": [
1861 | "## Headless automated browsing"
1862 | ]
1863 | },
1864 | {
1865 | "cell_type": "markdown",
1866 | "metadata": {},
1867 | "source": [
1868 | "The goal of headless browser automation is to interact with a browser that is in the background (i.e. has no user interface). \n",
1869 | "They essentially render a website the same way a normal browser would, but they are more lightweight due to not having to spend resources on the user interface. \n",
1870 | "\n",
1871 | "There are many packages available: https://github.com/dhamaniasad/HeadlessBrowsers \n",
1872 | "\n",
1873 | "**The easiest solution is to use the `requests-html` package with `r.html.render()`, see here: [requests-html: javascript support](https://github.com/kennethreitz/requests-html#javascript-support)**\n",
1874 | "\n",
1875 | "Alternatives:\n",
1876 | "\n",
1877 | "1. Ghost.py (http://jeanphix.me/Ghost.py/)\n",
1878 | "2. Dryscrape (https://dryscrape.readthedocs.io/en/latest/)\n",
1879 | "3. Splinter (http://splinter.readthedocs.io/en/latest/index.html?highlight=headless)\n",
1880 | "\n",
1881 | "Setting up headless browsers can be tricky and they can also be hard to debug (given that they run in the background)"
1882 | ]
1883 | },
1884 | {
1885 | "cell_type": "markdown",
1886 | "metadata": {},
1887 | "source": [
1888 | "#### Example using `requests-html`"
1889 | ]
1890 | },
1891 | {
1892 | "cell_type": "markdown",
1893 | "metadata": {},
1894 | "source": [
1895 | "*Note:* if you get an error you might have to run `pyppeteer-install` in your terminal to install Chromium ."
1896 | ]
1897 | },
1898 | {
1899 | "cell_type": "code",
1900 | "execution_count": 1,
1901 | "metadata": {},
1902 | "outputs": [],
1903 | "source": [
1904 | "import requests_html"
1905 | ]
1906 | },
1907 | {
1908 | "cell_type": "code",
1909 | "execution_count": 6,
1910 | "metadata": {},
1911 | "outputs": [
1912 | {
1913 | "name": "stdout",
1914 | "output_type": "stream",
1915 | "text": [
1916 | "Financial Accounting\n",
1917 | "Management Accounting\n",
1918 | "Computer Science\n",
1919 | "Data Engineering\n"
1920 | ]
1921 | }
1922 | ],
1923 | "source": [
1924 | "asession = requests_html.AsyncHTMLSession()\n",
1925 | "URL = 'https://www.tiesdekok.com'\n",
1926 | "r = await asession.get(URL)\n",
1927 | "await r.html.arender()\n",
1928 | "for element in r.html.find('.ul-interests > li'):\n",
1929 | " print(element.text)"
1930 | ]
1931 | },
1932 | {
1933 | "cell_type": "markdown",
1934 | "metadata": {},
1935 | "source": [
1936 | "## Selenium"
1937 | ]
1938 | },
1939 | {
1940 | "cell_type": "markdown",
1941 | "metadata": {},
1942 | "source": [
1943 | "The `Selenium WebDriver` allows to control a browser, this essentially automates / simulates a normal user interacting with the browser. \n",
1944 | "One of the most common ways to use the `Selenium WebDriver` is through the Python language bindings. \n",
1945 | "\n",
1946 | "Combining `Selenium` with Python makes it very easy to automate web browser interaction, allowing you to scrape essentially every webpage imaginable!\n",
1947 | "\n",
1948 | "**Note: if you can use `requests` + `LXML` then this is always preferred as it is much faster compared to using Selenium.**\n",
1949 | "\n",
1950 | "The package page for the Selenium Python bindings is here: https://pypi.python.org/pypi/selenium\n",
1951 | "\n",
1952 | "If you run below it will install both `selenium` and the `selenium Python bindings`:\n",
1953 | "> pip install selenium\n",
1954 | "\n",
1955 | "You will also need to install a driver to interface with a browser of your preference, I personally use the `ChromeDriver` to interact with the Chrome browser: \n",
1956 | "https://sites.google.com/a/chromium.org/chromedriver/downloads"
1957 | ]
1958 | },
1959 | {
1960 | "cell_type": "markdown",
1961 | "metadata": {},
1962 | "source": [
1963 | "## Quick demonstration"
1964 | ]
1965 | },
1966 | {
1967 | "cell_type": "markdown",
1968 | "metadata": {},
1969 | "source": [
1970 | "### Set up selenium"
1971 | ]
1972 | },
1973 | {
1974 | "cell_type": "code",
1975 | "execution_count": 8,
1976 | "metadata": {},
1977 | "outputs": [],
1978 | "source": [
1979 | "import selenium, os\n",
1980 | "from selenium import webdriver"
1981 | ]
1982 | },
1983 | {
1984 | "cell_type": "markdown",
1985 | "metadata": {},
1986 | "source": [
1987 | "Often `selenium` cannot automatically find the `ChromeDriver` so it helps to find the location it is installed and point `selenium` to it. \n",
1988 | "In my case it is here:"
1989 | ]
1990 | },
1991 | {
1992 | "cell_type": "code",
1993 | "execution_count": 13,
1994 | "metadata": {},
1995 | "outputs": [],
1996 | "source": [
1997 | "CHROME = r\"C:\\chromedriver83.exe\"\n",
1998 | "os.environ [\"webdriver.chrome.driver\" ] = CHROME"
1999 | ]
2000 | },
2001 | {
2002 | "cell_type": "markdown",
2003 | "metadata": {},
2004 | "source": [
2005 | "### Start a selenium session"
2006 | ]
2007 | },
2008 | {
2009 | "cell_type": "code",
2010 | "execution_count": 14,
2011 | "metadata": {},
2012 | "outputs": [],
2013 | "source": [
2014 | "driver = webdriver.Chrome(CHROME)"
2015 | ]
2016 | },
2017 | {
2018 | "cell_type": "markdown",
2019 | "metadata": {},
2020 | "source": [
2021 | "After executing `driver = webdriver.Chrome(CHROME)` you should see a chrome window pop-up, this is the window that you can control with Python!"
2022 | ]
2023 | },
2024 | {
2025 | "cell_type": "markdown",
2026 | "metadata": {},
2027 | "source": [
2028 | "### Load a page"
2029 | ]
2030 | },
2031 | {
2032 | "cell_type": "markdown",
2033 | "metadata": {},
2034 | "source": [
2035 | "Let's say we want to extract something from the Yahoo Finance page for Tesla (TSLA): \n",
2036 | "https://finance.yahoo.com/quote/TSLA/"
2037 | ]
2038 | },
2039 | {
2040 | "cell_type": "code",
2041 | "execution_count": 15,
2042 | "metadata": {},
2043 | "outputs": [],
2044 | "source": [
2045 | "Tesla_URL = r'https://finance.yahoo.com/quote/TSLA/'"
2046 | ]
2047 | },
2048 | {
2049 | "cell_type": "code",
2050 | "execution_count": 16,
2051 | "metadata": {},
2052 | "outputs": [],
2053 | "source": [
2054 | "driver.get(Tesla_URL)"
2055 | ]
2056 | },
2057 | {
2058 | "cell_type": "markdown",
2059 | "metadata": {},
2060 | "source": [
2061 | "If you open the Chrome window you should see that it now loaded the URL we gave it."
2062 | ]
2063 | },
2064 | {
2065 | "cell_type": "markdown",
2066 | "metadata": {},
2067 | "source": [
2068 | "### Navigate"
2069 | ]
2070 | },
2071 | {
2072 | "cell_type": "markdown",
2073 | "metadata": {},
2074 | "source": [
2075 | "You can select an element multiple ways (most frequent ones):\n",
2076 | "\n",
2077 | "> driver.find_element_by_name() \n",
2078 | "> driver.find_element_by_id() \n",
2079 | "> driver.find_element_by_class_name() \n",
2080 | "> driver.find_element_by_css_selector() \n",
2081 | "> driver.find_element_by_tag_name() \n"
2082 | ]
2083 | },
2084 | {
2085 | "cell_type": "markdown",
2086 | "metadata": {},
2087 | "source": [
2088 | "Let's say we want to extract some values from the \"earnings\" interactive figure on the right side:\n",
2089 | "\n",
2090 | "
"
2091 | ]
2092 | },
2093 | {
2094 | "cell_type": "markdown",
2095 | "metadata": {},
2096 | "source": [
2097 | "This would be near-impossible using `requests` as it would simply not load the element, it only loads in an actual browser. \n",
2098 | "\n",
2099 | "We could extract this data in two ways:\n",
2100 | "\n",
2101 | "1. Programming Selenium to mouse-over the element we want, and use CSS selectors to extract the values from the mouse-over window.\n",
2102 | "2. Use the console to interact with the underlying Javascript data directly.\n",
2103 | "\n",
2104 | "The second method is far more convenient than the first so I will demonstrate that:"
2105 | ]
2106 | },
2107 | {
2108 | "cell_type": "markdown",
2109 | "metadata": {},
2110 | "source": [
2111 | "### Retrieve data from Javascript directly\n",
2112 | "We can use a neat trick to find out which Javascript variable holds a certain value that we are looking for: \n",
2113 | "https://stackoverflow.com/questions/26796873/find-which-variable-holds-a-value-using-chrome-devtools\n",
2114 | "\n",
2115 | "After pasting the provided function into the dev console we can run `globalSearch(App, '-1.82')` in the Chrome Dev Console to get:\n",
2116 | "\n",
2117 | "> App.main.context.dispatcher.stores.QuoteSummaryStore.earnings.earningsChart.quarterly[3].estimate.fmt\n",
2118 | "\n",
2119 | "This is all the information that we need to extract all the data points:"
2120 | ]
2121 | },
2122 | {
2123 | "cell_type": "code",
2124 | "execution_count": 17,
2125 | "metadata": {},
2126 | "outputs": [],
2127 | "source": [
2128 | "script = 'App.main.context.dispatcher.stores.QuoteSummaryStore.earnings.earningsChart.quarterly'"
2129 | ]
2130 | },
2131 | {
2132 | "cell_type": "code",
2133 | "execution_count": 18,
2134 | "metadata": {},
2135 | "outputs": [],
2136 | "source": [
2137 | "quarterly_values = driver.execute_script('return {}'.format(script))"
2138 | ]
2139 | },
2140 | {
2141 | "cell_type": "markdown",
2142 | "metadata": {},
2143 | "source": [
2144 | "*Note:* I add `return` in the beginning to get a JSON response. "
2145 | ]
2146 | },
2147 | {
2148 | "cell_type": "code",
2149 | "execution_count": 19,
2150 | "metadata": {},
2151 | "outputs": [
2152 | {
2153 | "data": {
2154 | "text/plain": [
2155 | "[{'actual': {'fmt': '-1.12', 'raw': -1.12},\n",
2156 | " 'date': '2Q2019',\n",
2157 | " 'estimate': {'fmt': '-0.36', 'raw': -0.36}},\n",
2158 | " {'actual': {'fmt': '1.86', 'raw': 1.86},\n",
2159 | " 'date': '3Q2019',\n",
2160 | " 'estimate': {'fmt': '-0.42', 'raw': -0.42}},\n",
2161 | " {'actual': {'fmt': '2.06', 'raw': 2.06},\n",
2162 | " 'date': '4Q2019',\n",
2163 | " 'estimate': {'fmt': '1.72', 'raw': 1.72}},\n",
2164 | " {'actual': {'fmt': '1.14', 'raw': 1.14},\n",
2165 | " 'date': '1Q2020',\n",
2166 | " 'estimate': {'fmt': '-0.25', 'raw': -0.25}}]"
2167 | ]
2168 | },
2169 | "execution_count": 19,
2170 | "metadata": {},
2171 | "output_type": "execute_result"
2172 | }
2173 | ],
2174 | "source": [
2175 | "quarterly_values"
2176 | ]
2177 | },
2178 | {
2179 | "cell_type": "markdown",
2180 | "metadata": {},
2181 | "source": [
2182 | "Using `driver.execute_script()` is essentially the programmatical way of executing it in the dev console: \n",
2183 | "\n",
2184 | "\n",
2185 | "
"
2186 | ]
2187 | },
2188 | {
2189 | "cell_type": "markdown",
2190 | "metadata": {},
2191 | "source": [
2192 | "If you are not familiar with Javascript and programming for the web then this might be very hard to wrap you head around, but if you are serious about web-scraping these kinds of tricks can save you days of work. "
2193 | ]
2194 | },
2195 | {
2196 | "cell_type": "markdown",
2197 | "metadata": {},
2198 | "source": [
2199 | "### Close driver"
2200 | ]
2201 | },
2202 | {
2203 | "cell_type": "code",
2204 | "execution_count": 20,
2205 | "metadata": {},
2206 | "outputs": [],
2207 | "source": [
2208 | "driver.close()"
2209 | ]
2210 | },
2211 | {
2212 | "cell_type": "markdown",
2213 | "metadata": {},
2214 | "source": [
2215 | "## Web crawling with Scrapy"
2216 | ]
2217 | },
2218 | {
2219 | "cell_type": "markdown",
2220 | "metadata": {},
2221 | "source": [
2222 | "In the examples above we always provide the URL directly. \n",
2223 | "We could program a loop (with any of the above methods) that takes a URL from the page and then goes to that page and extracts another URL, etc. \n",
2224 | "\n",
2225 | "This tends to get confusing pretty fast, if you really want to create a crawler you might be better of to look into the `scrapy` package. \n",
2226 | "\n",
2227 | "`Scrapy` allows you to create a `spider` that basically 'walks' through webpages and crawls the information. \n",
2228 | "\n",
2229 | "In my experience you don't need this for 95% of our use-cases, but feel free to try it out: http://scrapy.org/"
2230 | ]
2231 | }
2232 | ],
2233 | "metadata": {
2234 | "kernelspec": {
2235 | "display_name": "Python 3",
2236 | "language": "python",
2237 | "name": "python3"
2238 | },
2239 | "language_info": {
2240 | "codemirror_mode": {
2241 | "name": "ipython",
2242 | "version": 3
2243 | },
2244 | "file_extension": ".py",
2245 | "mimetype": "text/x-python",
2246 | "name": "python",
2247 | "nbconvert_exporter": "python",
2248 | "pygments_lexer": "ipython3",
2249 | "version": "3.7.6"
2250 | }
2251 | },
2252 | "nbformat": 4,
2253 | "nbformat_minor": 4
2254 | }
2255 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 | Want to learn how to use Python for (Social Science) Research?
14 | This repository has everything that you need to get started!
15 | Author: Ties de Kok (Personal Page)
16 |
17 |
18 | ## Table of contents
19 |
20 | * [Introduction](#introduction)
21 | * [Who is this repository for?](#audience)
22 | * [How to use this repository?](#howtouse)
23 | * [Getting your Python setup ready](#setup)
24 | * [Installing Anaconda](#anacondainstall)
25 | * [Setting up Conda Environment](#setupenv)
26 | * [Using Python](#usingpython)
27 | * [Jupyter Notebook/Lab](#jupyter)
28 | * [Installing packages](#packages)
29 | * [Tutorial Notebooks](#notebooks)
30 | * [Exercises](#exercises)
31 | * [Code along](#codealong)
32 | * [Binder](#binder)
33 | * [Local installation](#clonerepo)
34 | * [Questions?](#questions)
35 | * [License](#license)
36 | * [Special thanks](#specialthanks)
37 |
38 | Introduction
39 |
40 | The goal of this GitHub page is to provide you with everything you need to get started with Python for actual research projects.
41 |
42 | Who is this repository for?
43 |
44 | The topics and techniques demonstrated in this repository are primarily oriented towards empirical research projects in fields such as Accounting, Finance, Marketing, Political Science, and other Social Sciences.
45 |
46 | However, many of the basics are also perfectly applicable if you are looking to use Python for any other type of Data Science!
47 |
48 | How to use this repository?
49 |
50 | This repository is written to facilitate learning by doing
51 |
52 | **If you are starting from scratch I recommend the following:**
53 |
54 | 1. Familiarize yourself with the [`Getting your Python setup ready`](#setup) and [`Using Python`](#usingpython) sections below
55 | 2. Check the [`Code along!`](#codealong) section to make sure that you can interactively use the Jupyter Notebooks
56 | 3. Work through the [`0_python_basics.ipynb`](0_python_basics.ipynb) notebook and try to get a basics grasp on the Python syntax
57 | 4. Do the "Basic Python tasks" part of the [`exercises.ipynb`](exercises.ipynb) notebook
58 | 5. Work through the [`1_opening_files.ipynb`](#), [`2_handling_data.ipynb`](2_handling_data.ipynb), and [`3_visualizing_data.ipynb`](3_visualizing_data.ipynb) notebooks.
59 | **Note:** the [`2_handling_data.ipynb`](2_handling_data.ipynb) notebook is very comprehensive, feel free to skip the more advanced parts at first.
60 | 6. Do the "Data handling tasks (+ some plotting)" part of the [`exercises.ipynb`](exercises.ipynb) notebook
61 |
62 | If you are interested in web-scraping:
63 |
64 | 7. Work through the [`4_web_scraping.ipynb`](4_web_scraping.ipynb) notebook
65 | 8. Do the "Web scraping" part of the [`exercises.ipynb`](exercises.ipynb) notebook
66 |
67 | If you are interested in Natural Language Processing with Python:
68 |
69 | 9. Take a look at my [Python NLP tutorial repository + notebook](https://github.com/TiesdeKok/Python_NLP_Tutorial)
70 |
71 | **If you are already familiar with the Python basics:**
72 |
73 | Use the notebooks provided in this repository selectively depending on the types of problems that you try to solve with Python.
74 |
75 | Everything in the notebooks is purposely sectioned by the task description. So if you, for example, are looking to merge two Pandas dataframes together, you can use the `Combining dataframes` section of the [`2_handling_data.ipynb`](2_handling_data.ipynb) notebook as a starting point.
76 |
77 |
78 | Getting your Python setup ready
79 |
80 | There are multiple ways to get your Python environment set up. To keep things simple I will only provide you with what I believe to be the best and easiest way to get started: the Anaconda distribution + a conda environment.
81 |
82 | Anaconda Distribution
83 |
84 | The Anaconda Distribution bundles Python with a large collection of Python packages from the (data) science Python eco-system.
85 |
86 | By installing the Anaconda Distribution you essentially obtain everything you need to get started with Python for Research!
87 |
88 | Step 1: Install Anaconda
89 |
90 | 1. Go to [anaconda.com/download/](https://www.anaconda.com/download/)
91 | 2. Download the **Python 3.x version** installer
92 | 3. Install Anaconda.
93 | * It is worth to take note of the installation directory in case you ever need to find it again.
94 | 4. Check if the installation works by launching a command prompt (terminal) and type `python`, it should say Anaconda at the top.
95 | * On Windows I recommend using the `Anaconda Prompt`
96 |
97 | *Note:* Anaconda also comes with the `Anaconda Explorer`, I haven't personally used it yet but it might be convenient.
98 |
99 | Step 2: Set up the learnpythonforresearch environment
100 |
101 | 1. Make sure you've cloned/downloaded this repository: [Clone repository](#clonerepo)
102 | 2. `cd` (i.e. Change) to the folder where you extracted the ZIP file
103 | for example: `cd "C:\Files\Work\Project_1"`
104 | *Note:* if you are changing do folder on another drive you might have to also switch drives by typing, for example, `E:`
105 | 3. Run the following command `conda env create -f environment.yml`
106 | 4. Activate the environment with: `conda activate LearnPythonforResearch`
107 |
108 | A full list of all the packages used is provided in the `environment.yml` file.
109 |
110 | Python 3 vs Python 2?
111 |
112 | Python 3.x is the newer and superior version over Python 2.7 so I strongly recommend to use Python 3.x whenever possible. There is no reason to use Python 2.7, unless you are forced to work with old Python 2.7 code.
113 |
114 | Using Python
115 |
116 | **Basic methods:**
117 |
118 | The native way to run Python code is by saving the code to a file with the ".py" extension and executing it from the console / terminal:
119 |
120 | ```python code.py```
121 |
122 | Alternatively, you can run some quick code by starting a python or ipython interactive console by typing either `python` or `ipython` in your console / terminal.
123 |
124 | Jupyter Notebook/Lab
125 |
126 | The above is, however, not very convenient for research purposes as we desire easy interactivity and good documentation options.
127 | Fortunately, the awesome **Jupyter Notebooks** provide a great alternative way of using Python for research purposes.
128 |
129 | [Jupyter](http://jupyter.org/) comes pre-installed with the Anaconda distribution so you should have everything already installed and ready to go.
130 |
131 | ***Note on Jupyter Lab***
132 |
133 | > **JupyterLab 1.0: Jupyter’s Next-Generation Notebook Interface**
134 | JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.
135 |
136 | Jupyter Lab is an additional interface layer that extends the functionality of Jupyter Notebooks which are the primary way you interact with Python code.
137 |
138 | ***What is the Jupyter Notebook?***
139 |
140 | From the [Jupyter](http://jupyter.org/) website:
141 | > The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text.
142 |
143 | In other words, the Jupyter Notebook allows you to program Python code straight from your browser!
144 |
145 | ***How does the Jupyter Notebook/Lab work in the background?***
146 |
147 | The diagram below sums up the basics components of Jupyter:
148 |
149 |
150 |
151 | At the heart there is the *Jupyter Server* that handles everything, the *Jupyter Notebook* which is accessed and used through your browser, and the *kernel* that executes the code. We will be focusing on the natively included *Python Kernel* but Jupyter is language agnostic so you can also use it with other languages/software such as 'R'.
152 |
153 | It is worth noting that in most cases you will be running the `Jupyter Server` on your own computer and will connect to it locally in your browser (i.e. you don't need to be connected to the internet). However, it is also possible to run the Jupyter Server on a different computer, for example a high performance computation server in the cloud, and connect to it over the internet.
154 |
155 | ***How to start a Jupyter Notebook/Lab?***
156 |
157 | The primary method that I would recommend to start a Jupyter Notebook/Lab is to use the command line (terminal) directly:
158 |
159 | 1. Open your command prompt / terminal (on Windows I recommend the Anaconda Prompt)
160 | 2. Activate the right environment with `conda activate LearnPythonForResearch`
161 | 2. `cd` (i.e. Change) to the desired starting directory
162 | for example: `cd "C:\Files\Work\Project_1"`
163 | *Note:* if you are changing do folder on another drive you might have to also switch drives by typing, for example, `E:`
164 | 3. Start the Jupyter Notebook/Lab server by typing: `jupyter notebook` or `jupyter lab`
165 |
166 | This should automatically open up the corresponding Jupyter Notebook/Lab in your default browser.
167 | You can also manually go to the Jupyter Notebook/Lab by going to `localhost:8888` with your browser. (You might be asked for a password, which can find in the terminal window where there Jupyter server is running.)
168 |
169 | ***How to close a Jupyter Server erver?***
170 |
171 | If you want to close down the Jupyter Server: open up the command prompt window that runs the server and press `CTRL + C` twice.
172 | Make sure that you have saved any open Jupyter Notebooks!
173 |
174 | ***How to use the Jupyter Notebook?***
175 |
176 | *Some shortcuts are worth mentioning for reference purposes:*
177 |
178 | `command mode` --> enable by pressing `esc`
179 | `edit mode` --> enable by pressing `enter`
180 |
181 | | `command mode` |`edit mode` | `both modes`
182 | |--- |--- |---
183 | | `Y` : cell to code | `Tab` : code completion or indent | `Shift-Enter` : run cell, select below
184 | | `M` : cell to markdown | `Shift-Tab` : tooltip | `Ctrl-Enter` : run cell
185 | | `A` : insert cell above | `Ctrl-A` : select all |
186 | | `B` : insert cell below | `Ctrl-Z` : undo |
187 | | `X`: cut selected cell |
188 |
189 |
190 | Installing Packages
191 |
192 | The Python eco-system consists of many packages and modules that people have programmed and made available for everyone to use.
193 | These packages/modules are one of the things that makes Python so useful.
194 |
195 | Some packages are natively included with Python and Anaconda, but anything not included you need to install first before you can import them.
196 | I will discuss the three primary methods of installing packages:
197 |
198 | **Method 1:** use `pip`
199 |
200 | > Many packages are available on the "Python Package Index" (i.e. "PyPI"): [https://pypi.python.org/pypi](https://pypi.python.org/pypi)
201 | >
202 | > You can install packages that are on "PyPI" by using the `pip` command:
203 | >
204 | > Example, install the `requests` package: run `pip install requests` in your command line / terminal (not in the Jupyter Notebook!).
205 | >
206 | > To uninstall you can use `pip uninstall` and to upgrade an existing package you can add the `-U` flag (`pip install -U requests`)
207 |
208 | **Method 2:** use `conda`
209 |
210 | >Sometimes when you try something with `pip` you get a compile error (especially on Windows). You can try to fix this by configuring the right compiler but most of the times it is easier to try to install it directly via Anaconda as these are pre-compiled. For example:
211 | >
212 | >`conda install scipy`
213 | >
214 | >Full documentation is here: [Conda documentation](https://conda.io/docs/user-guide/tasks/manage-pkgs.html)
215 |
216 | **Method 3:** install directly using the `setup.py` file
217 |
218 | >Sometimes a package is not on pypi and conda (you often find these packages on GitHub). Follow these steps to install those:
219 | >
220 | >1. Download the folder with all the files (if archived, make sure to unpack the folder)
221 | >2. Open your command prompt (terminal) and `cd` to the folder you just downloaded
222 | >3. Type: `python setup.py install`
223 |
224 | Tutorial Notebooks
225 |
226 | This repository covers the following topics:
227 |
228 | * [`0_python_basics.ipynb`](0_python_basics.ipynb): Basics of the Python syntax
229 | * [`1_opening_files.ipynb`](1_opening_files.ipynb): Examples on how to open TXT, CSV, Excel, Stata, Sas, JSON, and HDF files.
230 | * [`2_handling_data.ipynb`](2_handling_data.ipynb): A comprehensive overview on how to use the `Pandas` library for data wrangling.
231 | * [`3_visualizing_data.ipynb`](3_visualizing_data.ipynb): Examples on how to generate visualizations with Python.
232 | * [`4_web_scraping.ipynb`](4_web_scraping.ipynb): A comprehensive overview on how to use `Requests`, `Requests-html`, and `Selenium` for APIs and web scraping.
233 |
234 | Additionally, if you are interested in Natural Language Processing I have a notebook for that as well:
235 | * [`NLP_Notebook`](https://nbviewer.jupyter.org/github/TiesdeKok/Python_NLP_Tutorial/blob/master/NLP_Notebook.ipynb): Basics of the Python syntax
236 |
237 | Exercises
238 |
239 | I have provided several tasks / exercises that you can try to solve in the [`exercises.ipynb`](exercises.ipynb) notebook.
240 |
241 | **Note:** To avoid the "oh, that looks easy!" trap I have not uploaded the exercises notebook with examples answers.
242 | *Feel free to email me for the answer keys once you are done!*
243 |
244 | Code along!
245 |
246 | You can code along in two ways:
247 |
248 | Option 1: use Binder
249 |
250 | If you want to experiment with the code in a live environment you can also use `binder`.
251 |
252 | Binder allows to create a live environment where you can execute code just as-if you were on your own computer based on a GitHub repository, it is very awesome!
253 |
254 | Click on the button below to launch binder:
255 |
256 |
257 |
258 | **Note: you could use binder to complete the exercises but it will not save!!**
259 |
260 | Option 2: Set up local Python setup
261 |
262 | You can essentially "download" the contents of this repository by cloning the repository.
263 |
264 | You can do this by clicking "Clone or download" button and then "Download ZIP":
265 |
266 |
267 |
268 | After you download and extracted the zip file into a folder you can follow the steps to set up your environment:
269 |
270 | 1. [Installing Anaconda](#anacondainstall)
271 | 2. [Setting up Conda Environment](#setupenv)
272 |
273 | Questions?
274 |
275 | If you have questions or experience problems please use the `issues` tab of this repository.
276 |
277 | License
278 |
279 | [MIT](LICENSE) - Ties de Kok - 2020
280 |
281 | Special Thanks
282 |
283 | https://github.com/teles/array-mixer for having an awesome readme that I used as a template.
284 |
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: LearnPythonforResearch
2 | channels:
3 | - conda-forge
4 | - defaults
5 | dependencies:
6 | - python=3.7
7 | - pip
8 | - jupyterlab
9 | - numpy
10 | - pandas
11 | - matplotlib
12 | - ipywidgets
13 | - requests
14 | - xlrd
15 | - openpyxl
16 | - seaborn
17 | - bokeh
18 | - hickle
19 | - lxml
20 | - cssselect
21 | - PyTables
22 | - plotnine
23 | - plotly
24 | - selenium
25 | - tqdm
26 | - pip:
27 | - qgrid
28 | - requests-html
--------------------------------------------------------------------------------
/example_data/auto_df.csv:
--------------------------------------------------------------------------------
1 | ;make;price;mpg;rep78;headroom;trunk;weight;length;turn;displacement;gear_ratio;foreign
2 | 0;AMC Concord;4099;22;3.0;2.5;11;2930;186;40;121;3.57999992371;Domestic
3 | 1;AMC Pacer;4749;17;3.0;3.0;11;3350;173;40;258;2.52999997139;Domestic
4 | 2;AMC Spirit;3799;22;;3.0;12;2640;168;35;121;3.07999992371;Domestic
5 | 3;Buick Century;4816;20;3.0;4.5;16;3250;196;40;196;2.93000006676;Domestic
6 | 4;Buick Electra;7827;15;4.0;4.0;20;4080;222;43;350;2.41000008583;Domestic
7 | 5;Buick LeSabre;5788;18;3.0;4.0;21;3670;218;43;231;2.73000001907;Domestic
8 | 6;Buick Opel;4453;26;;3.0;10;2230;170;34;304;2.86999988556;Domestic
9 | 7;Buick Regal;5189;20;3.0;2.0;16;3280;200;42;196;2.93000006676;Domestic
10 | 8;Buick Riviera;10372;16;3.0;3.5;17;3880;207;43;231;2.93000006676;Domestic
11 | 9;Buick Skylark;4082;19;3.0;3.5;13;3400;200;42;231;3.07999992371;Domestic
12 | 10;Cad. Deville;11385;14;3.0;4.0;20;4330;221;44;425;2.27999997139;Domestic
13 | 11;Cad. Eldorado;14500;14;2.0;3.5;16;3900;204;43;350;2.19000005722;Domestic
14 | 12;Cad. Seville;15906;21;3.0;3.0;13;4290;204;45;350;2.24000000954;Domestic
15 | 13;Chev. Chevette;3299;29;3.0;2.5;9;2110;163;34;231;2.93000006676;Domestic
16 | 14;Chev. Impala;5705;16;4.0;4.0;20;3690;212;43;250;2.55999994278;Domestic
17 | 15;Chev. Malibu;4504;22;3.0;3.5;17;3180;193;31;200;2.73000001907;Domestic
18 | 16;Chev. Monte Carlo;5104;22;2.0;2.0;16;3220;200;41;200;2.73000001907;Domestic
19 | 17;Chev. Monza;3667;24;2.0;2.0;7;2750;179;40;151;2.73000001907;Domestic
20 | 18;Chev. Nova;3955;19;3.0;3.5;13;3430;197;43;250;2.55999994278;Domestic
21 | 19;Dodge Colt;3984;30;5.0;2.0;8;2120;163;35;98;3.53999996185;Domestic
22 | 20;Dodge Diplomat;4010;18;2.0;4.0;17;3600;206;46;318;2.47000002861;Domestic
23 | 21;Dodge Magnum;5886;16;2.0;4.0;17;3600;206;46;318;2.47000002861;Domestic
24 | 22;Dodge St. Regis;6342;17;2.0;4.5;21;3740;220;46;225;2.94000005722;Domestic
25 | 23;Ford Fiesta;4389;28;4.0;1.5;9;1800;147;33;98;3.15000009537;Domestic
26 | 24;Ford Mustang;4187;21;3.0;2.0;10;2650;179;43;140;3.07999992371;Domestic
27 | 25;Linc. Continental;11497;12;3.0;3.5;22;4840;233;51;400;2.47000002861;Domestic
28 | 26;Linc. Mark V;13594;12;3.0;2.5;18;4720;230;48;400;2.47000002861;Domestic
29 | 27;Linc. Versailles;13466;14;3.0;3.5;15;3830;201;41;302;2.47000002861;Domestic
30 | 28;Merc. Bobcat;3829;22;4.0;3.0;9;2580;169;39;140;2.73000001907;Domestic
31 | 29;Merc. Cougar;5379;14;4.0;3.5;16;4060;221;48;302;2.75;Domestic
32 | 30;Merc. Marquis;6165;15;3.0;3.5;23;3720;212;44;302;2.25999999046;Domestic
33 | 31;Merc. Monarch;4516;18;3.0;3.0;15;3370;198;41;250;2.43000006676;Domestic
34 | 32;Merc. XR-7;6303;14;4.0;3.0;16;4130;217;45;302;2.75;Domestic
35 | 33;Merc. Zephyr;3291;20;3.0;3.5;17;2830;195;43;140;3.07999992371;Domestic
36 | 34;Olds 98;8814;21;4.0;4.0;20;4060;220;43;350;2.41000008583;Domestic
37 | 35;Olds Cutl Supr;5172;19;3.0;2.0;16;3310;198;42;231;2.93000006676;Domestic
38 | 36;Olds Cutlass;4733;19;3.0;4.5;16;3300;198;42;231;2.93000006676;Domestic
39 | 37;Olds Delta 88;4890;18;4.0;4.0;20;3690;218;42;231;2.73000001907;Domestic
40 | 38;Olds Omega;4181;19;3.0;4.5;14;3370;200;43;231;3.07999992371;Domestic
41 | 39;Olds Starfire;4195;24;1.0;2.0;10;2730;180;40;151;2.73000001907;Domestic
42 | 40;Olds Toronado;10371;16;3.0;3.5;17;4030;206;43;350;2.41000008583;Domestic
43 | 41;Plym. Arrow;4647;28;3.0;2.0;11;3260;170;37;156;3.04999995232;Domestic
44 | 42;Plym. Champ;4425;34;5.0;2.5;11;1800;157;37;86;2.97000002861;Domestic
45 | 43;Plym. Horizon;4482;25;3.0;4.0;17;2200;165;36;105;3.36999988556;Domestic
46 | 44;Plym. Sapporo;6486;26;;1.5;8;2520;182;38;119;3.53999996185;Domestic
47 | 45;Plym. Volare;4060;18;2.0;5.0;16;3330;201;44;225;3.23000001907;Domestic
48 | 46;Pont. Catalina;5798;18;4.0;4.0;20;3700;214;42;231;2.73000001907;Domestic
49 | 47;Pont. Firebird;4934;18;1.0;1.5;7;3470;198;42;231;3.07999992371;Domestic
50 | 48;Pont. Grand Prix;5222;19;3.0;2.0;16;3210;201;45;231;2.93000006676;Domestic
51 | 49;Pont. Le Mans;4723;19;3.0;3.5;17;3200;199;40;231;2.93000006676;Domestic
52 | 50;Pont. Phoenix;4424;19;;3.5;13;3420;203;43;231;3.07999992371;Domestic
53 | 51;Pont. Sunbird;4172;24;2.0;2.0;7;2690;179;41;151;2.73000001907;Domestic
54 | 52;Audi 5000;9690;17;5.0;3.0;15;2830;189;37;131;3.20000004768;Foreign
55 | 53;Audi Fox;6295;23;3.0;2.5;11;2070;174;36;97;3.70000004768;Foreign
56 | 54;BMW 320i;9735;25;4.0;2.5;12;2650;177;34;121;3.6400001049;Foreign
57 | 55;Datsun 200;6229;23;4.0;1.5;6;2370;170;35;119;3.8900001049;Foreign
58 | 56;Datsun 210;4589;35;5.0;2.0;8;2020;165;32;85;3.70000004768;Foreign
59 | 57;Datsun 510;5079;24;4.0;2.5;8;2280;170;34;119;3.53999996185;Foreign
60 | 58;Datsun 810;8129;21;4.0;2.5;8;2750;184;38;146;3.54999995232;Foreign
61 | 59;Fiat Strada;4296;21;3.0;2.5;16;2130;161;36;105;3.36999988556;Foreign
62 | 60;Honda Accord;5799;25;5.0;3.0;10;2240;172;36;107;3.04999995232;Foreign
63 | 61;Honda Civic;4499;28;4.0;2.5;5;1760;149;34;91;3.29999995232;Foreign
64 | 62;Mazda GLC;3995;30;4.0;3.5;11;1980;154;33;86;3.73000001907;Foreign
65 | 63;Peugeot 604;12990;14;;3.5;14;3420;192;38;163;3.57999992371;Foreign
66 | 64;Renault Le Car;3895;26;3.0;3.0;10;1830;142;34;79;3.72000002861;Foreign
67 | 65;Subaru;3798;35;5.0;2.5;11;2050;164;36;97;3.80999994278;Foreign
68 | 66;Toyota Celica;5899;18;5.0;2.5;14;2410;174;36;134;3.05999994278;Foreign
69 | 67;Toyota Corolla;3748;31;5.0;3.0;9;2200;165;35;97;3.21000003815;Foreign
70 | 68;Toyota Corona;5719;18;5.0;2.0;11;2670;175;36;134;3.04999995232;Foreign
71 | 69;VW Dasher;7140;23;4.0;2.5;12;2160;172;36;97;3.74000000954;Foreign
72 | 70;VW Diesel;5397;41;5.0;3.0;15;2040;155;35;90;3.77999997139;Foreign
73 | 71;VW Rabbit;4697;25;4.0;3.0;15;1930;155;35;89;3.77999997139;Foreign
74 | 72;VW Scirocco;6850;25;4.0;2.0;16;1990;156;36;97;3.77999997139;Foreign
75 | 73;Volvo 260;11995;17;5.0;2.5;14;3170;193;37;163;2.98000001907;Foreign
76 |
--------------------------------------------------------------------------------
/example_data/csv_sample.csv:
--------------------------------------------------------------------------------
1 | ,Unnamed: 0,Unnamed: 0.1,foreign,make,price,weight
2 | 0,0,1,Domestic,AMCPacer,4749,3350
3 | 1,1,2,Domestic,AMCSpirit,3799,2640
4 | 2,2,3,Domestic,BuickCentury,4816,3250
5 | 3,3,6,Domestic,BuickOpel,4453,2230
6 | 4,4,7,Domestic,BuickRegal,5189,3280
7 | 5,5,8,Domestic,BuickRiviera,10372,3880
8 | 6,6,9,Domestic,BuickSkylark,4082,3400
9 | 7,7,14,Domestic,Chev.Impala,5705,3690
10 | 8,8,21,Domestic,DodgeMagnum,5886,3600
11 | 9,9,23,Domestic,FordFiesta,4389,1800
12 | 10,10,24,Domestic,FordMustang,4187,2650
13 | 11,11,30,Domestic,Merc.Marquis,6165,3720
14 | 12,12,31,Domestic,Merc.Monarch,4516,3370
15 | 13,13,33,Domestic,Merc.Zephyr,3291,2830
16 | 14,14,37,Domestic,OldsDelta88,4890,3690
17 | 15,15,38,Domestic,OldsOmega,4181,3370
18 | 16,16,43,Domestic,Plym.Horizon,4482,2200
19 | 17,17,48,Domestic,Pont.GrandPrix,5222,3210
20 | 18,18,50,Domestic,Pont.Phoenix,4424,3420
21 | 19,19,51,Domestic,Pont.Sunbird,4172,2690
22 | 20,20,53,Foreign,AudiFox,6295,2070
23 | 21,21,56,Foreign,Datsun210,4589,2020
24 | 22,22,57,Foreign,Datsun510,5079,2280
25 | 23,23,66,Foreign,ToyotaCelica,5899,2410
26 | 24,24,70,Foreign,VWDiesel,5397,2040
27 |
--------------------------------------------------------------------------------
/example_data/dd_example.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/example_data/dd_example.h5
--------------------------------------------------------------------------------
/example_data/excel_sample.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/example_data/excel_sample.xlsx
--------------------------------------------------------------------------------
/example_data/hdf_sample.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/example_data/hdf_sample.h5
--------------------------------------------------------------------------------
/example_data/hkl_example.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/example_data/hkl_example.h5
--------------------------------------------------------------------------------
/example_data/json_sample.json:
--------------------------------------------------------------------------------
1 | {"foreign": {"1": "Domestic", "14": "Domestic", "2": "Domestic", "21": "Domestic", "23": "Domestic", "24": "Domestic", "3": "Domestic", "30": "Domestic", "31": "Domestic", "33": "Domestic", "37": "Domestic", "38": "Domestic", "43": "Domestic", "48": "Domestic", "50": "Domestic", "51": "Domestic", "53": "Foreign", "56": "Foreign", "57": "Foreign", "6": "Domestic", "66": "Foreign", "7": "Domestic", "70": "Foreign", "8": "Domestic", "9": "Domestic"}, "make": {"1": "AMCPacer", "14": "Chev.Impala", "2": "AMCSpirit", "21": "DodgeMagnum", "23": "FordFiesta", "24": "FordMustang", "3": "BuickCentury", "30": "Merc.Marquis", "31": "Merc.Monarch", "33": "Merc.Zephyr", "37": "OldsDelta88", "38": "OldsOmega", "43": "Plym.Horizon", "48": "Pont.GrandPrix", "50": "Pont.Phoenix", "51": "Pont.Sunbird", "53": "AudiFox", "56": "Datsun210", "57": "Datsun510", "6": "BuickOpel", "66": "ToyotaCelica", "7": "BuickRegal", "70": "VWDiesel", "8": "BuickRiviera", "9": "BuickSkylark"}, "price": {"1": 4749, "14": 5705, "2": 3799, "21": 5886, "23": 4389, "24": 4187, "3": 4816, "30": 6165, "31": 4516, "33": 3291, "37": 4890, "38": 4181, "43": 4482, "48": 5222, "50": 4424, "51": 4172, "53": 6295, "56": 4589, "57": 5079, "6": 4453, "66": 5899, "7": 5189, "70": 5397, "8": 10372, "9": 4082}, "weight": {"1": 3350, "14": 3690, "2": 2640, "21": 3600, "23": 1800, "24": 2650, "3": 3250, "30": 3720, "31": 3370, "33": 2830, "37": 3690, "38": 3370, "43": 2200, "48": 3210, "50": 3420, "51": 2690, "53": 2070, "56": 2020, "57": 2280, "6": 2230, "66": 2410, "7": 3280, "70": 2040, "8": 3880, "9": 3400}}
--------------------------------------------------------------------------------
/example_data/stata_sample.dta:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/example_data/stata_sample.dta
--------------------------------------------------------------------------------
/example_data/text_sample.txt:
--------------------------------------------------------------------------------
1 | Learning Python is great.
2 | Good luck!
--------------------------------------------------------------------------------
/images/API_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/API_screenshot.PNG
--------------------------------------------------------------------------------
/images/CSSInspector_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/CSSInspector_screenshot.PNG
--------------------------------------------------------------------------------
/images/CloneRepo.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/CloneRepo.PNG
--------------------------------------------------------------------------------
/images/DIV_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/DIV_screenshot.PNG
--------------------------------------------------------------------------------
/images/DevTools_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/DevTools_screenshot.PNG
--------------------------------------------------------------------------------
/images/Earnings_console_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/Earnings_console_screenshot.PNG
--------------------------------------------------------------------------------
/images/Earnings_graph_screenshot.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/Earnings_graph_screenshot.PNG
--------------------------------------------------------------------------------
/images/SSRN_screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/SSRN_screenshot.png
--------------------------------------------------------------------------------
/images/bannerimage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/bannerimage.png
--------------------------------------------------------------------------------
/images/jupyterimage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TiesdeKok/LearnPythonforResearch/3777625b646336a67e4c9d23b270ddeff8e58854/images/jupyterimage.png
--------------------------------------------------------------------------------
/postBuild:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | jupyter labextension install @jupyter-widgets/jupyterlab-manager qgrid
4 | jupyter labextension install jupyterlab-plotly@4.8.1
5 | jupyter labextension install @jupyter-widgets/jupyterlab-manager plotlywidget@4.8.1
6 | jupyter labextension install @jupyter-widgets/jupyterlab-manager
--------------------------------------------------------------------------------