├── .gitignore ├── API.md ├── README.md ├── functions └── events │ └── slack │ └── command │ ├── list.js │ └── scrape.js ├── images ├── DeveloperMode.png ├── Example.png ├── LinkNew.png ├── Name.png ├── Navigate.png ├── Oauth.png ├── SlackAppCrawlerSample.png ├── StandardApp.png ├── custom.png ├── deploy.png ├── edit.png ├── link.png ├── linked.png ├── list.png ├── list2.png ├── listexample.png └── release.png ├── package.json └── stdlib.json /.gitignore: -------------------------------------------------------------------------------- 1 | node_modules/ 2 | env.json 3 | .DS_Store -------------------------------------------------------------------------------- /API.md: -------------------------------------------------------------------------------- 1 | # API Documentation 2 | 3 | Use this file to explain a little bit about your project's API ahead 4 | of individual endpoint details. It will be available to end-users of your 5 | API endpoints and will display on your project's API documentation page. 6 | 7 | Usage examples and additional information around calling your project's 8 | API belong here. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # README 2 | [](https://deploy.stdlib.com/) 3 | 4 | # Slack App Website Scraper 5 | 6 | In the [last tutorial](https://github.com/JanethL/WebScraper), we learned how to use [crawler.api](https://stdlib.com/@crawler/lib/query) on Standard Library to scrape websites using CSS selectors and as an example, we scraped the front page of [The Economist](https://www.economist.com/) for titles and their respective URLs.  7 | 8 | In this guide, we will learn to retrieve and send our scraped data into Slack. We'll set up a Slack app that scrapes websites for links using a Slack slash command and posts the results inside a Slack channel like this: 9 | 10 | 11 | 12 | # Table of Contents 13 | 14 | 1. [Installation](#installation) 15 | 1. [Test Your Workflow](#test-your-workflow) 16 | 1. [How It Works](#how-it-works) 17 | 1. [Making Changes](#making-changes) 18 | 1. [via Web Browser](#via-web-browser) 19 | 1. [via Command Line](#via-command-line) 20 | 1. [Support](#support) 21 | 1. [Acknowledgements](#acknowledgements) 22 | 23 | 24 | # Installation 25 | 26 | Click this deploy from Autocode button to quickly set up your project on Autocode. 27 | 28 | [](https://deploy.stdlib.com/) 29 | 30 | You will be prompted to sign in or create a **FREE** account. If you have a Standard Library account click **Already Registered** and sign in using your Standard Library credentials. 31 | 32 | Give your project a unique name and select **Start API Project from Github** 33 | 34 | 35 | 36 | Autocode automatically sets up a project scaffold to save your project as an API endpoint, but it hasn’t been deployed. 37 | 38 | 39 | 40 | To deploy your API to the cloud navigate through the `functions/events/slack/command/` folders and select `scrape.js` file. 41 | 42 | 43 | 44 | Select the **1 Account Required** red button which will prompt you to link a Slack account. 45 | 46 | 47 | 48 | If you’ve built Slack apps with Standard Library, you’ll see existing Slack accounts, or you can select **Link New Resource** to link a new Slack app. 49 | 50 | 51 | 52 | Select **Install Standard Library App**. 53 | 54 | 55 | 56 | You should see an OAuth popup that looks something like this: 57 | 58 | 59 | 60 | Select **Allow**. You'll have the option to customize your Slack app with a name and image. 61 | 62 | 63 | 64 | Select **Finish**. The green checkmarks confirm that you’ve linked your accounts correctly. Click **Finished Linking.** 65 | 66 | 67 | 68 | To deploy your API to the cloud select **Deploy API** in the bottom-left of the file manager. 69 | 70 | 71 | 72 | # Test Your Workflow 73 | 74 | You’re all done. Try it out! Your Slack App is now available for use in the Slack workspace you authorized it for. Your Slack app should respond to a `/cmd scrape ` as I show in the screenshot: 75 | 76 | 77 | 78 | I've included an additional command that lists a few websites and their selectors to retrieve links. 79 | 80 | Just type `/cmd list` and you should see your app respond with the following message: 81 | 82 | 83 | 84 | # How It Works 85 | 86 | When you submit `/cmd scrape https://techcrunch.com/ a.post-block__title__link` (or any URL followed by its respective selector) in Slack’s message box, a webhook will be triggered. The webhook, built and hosted on [Standard Library](stdlib.com), will first make a request to [crawler.api](https://stdlib.com/@crawler/lib/query), which will return a JSON payload with results from the query. 87 | 88 | Our webhook will then create Slack messages for each event and post those to the channel where the command was invoked. 89 | 90 | ``` javascript 91 | const lib = require('lib')({token: process.env.STDLIB_SECRET_TOKEN}); 92 | /** 93 | * An HTTP endpoint that acts as a webhook for Slack command event 94 | * @param {object} event 95 | * @returns {object} result Your return value 96 | */ 97 | module.exports = async (event) => { 98 | // Store API Responses 99 | const result = {slack: {}, crawler: {}}; 100 | 101 | if ((event.text || '').split(/\s+/).length != 2) { 102 | return lib.slack.channels['@0.6.6'].messages.create({ 103 | channel: `#${event.channel_id}`, 104 | text: `${event.text} has wrong format. ` 105 | }); 106 | } 107 | 108 | console.log(`Running [Slack → Retrieve Channel, DM, or Group DM by id]...`); 109 | result.slack.channel = await lib.slack.conversations['@0.2.5'].info({ 110 | id: `${event.channel_id}` 111 | }); 112 | console.log(`Running [Slack → Retrieve a User]...`); 113 | result.slack.user = await lib.slack.users['@0.3.32'].retrieve({ 114 | user: `${event.user_id}` 115 | }); 116 | 117 | console.log(`Running [Crawler → Query (scrape) a provided URL based on CSS selectors]...`); 118 | result.crawler.pageData = await lib.crawler.query['@0.0.1'].selectors({ 119 | url: event.text.split(/\s+/)[0], 120 | userAgent: `stdlib/crawler/query`, 121 | includeMetadata: false, 122 | selectorQueries: [ 123 | { 124 | 'selector': event.text.split(/\s+/)[1], 125 | 'resolver': `attr`, 126 | 'attr': 'href' 127 | } 128 | ] 129 | }); 130 | let text = `Here are the links that we found for ${event.text.split(/\s+/)[0]}\n \n ${result.crawler.pageData.queryResults[0].map((r) => { 131 | if (r.attr.startsWith('http://') || r.attr.startsWith('https://') || r.attr.startsWith('//')) { 132 | return r.attr; 133 | } else { 134 | return result.crawler.pageData.url + r.attr; 135 | } 136 | }).join(' \n ')}`; 137 | console.log(`Running [Slack → Send a Message from your Bot to a Channel]...`); 138 | result.slack.response = await lib.slack.channels['@0.6.6'].messages.create({ 139 | channel: `#${event.channel_id}`, 140 | text: text 141 | }) 142 | return result; 143 | }; 144 | 145 | ``` 146 | The first line of code imports an NPM package called “lib” to allow us to communicate with other APIs on top of Standard Library: 147 | 148 | `const lib = require(‘lib’)({token: process.env.STDLIB_SECRET_TOKEN});` 149 | 150 | **Line 2–6** is a comment that serves as documentation and allows Standard Library to type check calls to our functions. If a call does not supply a parameter with a correct (or expected type) it would return an error. 151 | 152 | **Line 7** is a function (module.exports) that will export our entire code found in lines 8–54. Once we deploy our code, this function will be wrapped into an HTTP endpoint (API endpoint) and it’ll automatically register with Slack so that every time a Slack command event happens, Slack will send the event payload for our API endpoint to consume. 153 | 154 | **Lines 11-16** is an if statement that handles improper inputs and posts a message to Slack using `lib.slack.channels['@0.6.6'].messages.create`. 155 | 156 | **Lines 18-21** make an HTTP GET request to the `lib.slack.conversations[‘@0.2.5’]` API and uses the info method to retrieve the channel object which has info about the channel including name, topic, purpose etc and stores it in result.slack.channel. 157 | 158 | **Lines 22-25** also make an HTTP GET request to `lib.slack.users[‘@0.3.32’]` and uses the retrieve method to get the user object which has info about the user and stores it in result.slack.user. 159 | 160 | **Lines 27-39** is making an HTTP GET request to `lib.crawler.query['@0.0.1']` and passes in inputs from when a Slack command event is invoked. 161 | For the `url` we pass in the first input from our Slack event `event.text.split(/\s+/)[0]`. 162 | 163 | `userAgent` is set to the default: `stdlib/crawler/query` 164 | 165 | `includeMetadata` is `False` (if `True`, will return additional metadata in a meta field in the response) 166 | 167 | `selectorQueries` is an array with one object, the values being {`selector`:`event.text.split(/\s+/)[1]`,`resolver':'attr`, `attr`: `href`} 168 | 169 | For `selector` we retrieve the second input from the Slack event using `event.text.split(/\s+/)[1]`. 170 | 171 | **Lines 40–53** creates and posts your message using the parameters that are passed in: channelId, Text. 172 | 173 | You can read more about API specifications and parameters here: https://docs.stdlib.com/connector-apis/building-an-api/api-specification/ 174 | 175 | 176 | # Making Changes 177 | 178 | Now that your app is live, you can return at any time to add additional logic and scrape websites for data with [crawler.api](https://stdlib.com/@crawler/lib/query). 179 | 180 | There are two ways to modify your application. The first is via our in-browser 181 | editor, [Autocode](https://autocode.com/). The second is 182 | via the [Standard Library CLI](https://github.com/stdlib/lib). 183 | 184 | ## via Web Browser 185 | 186 | Simply visit [`Autocode.com`](https://autocode.com) and select your project. 187 | You can easily make updates and changes this way, save your changes and deploy directly from your browser. 188 | 189 | ## via Command Line 190 | 191 | You can install the CLI tools from [stdlib/lib](https://github.com/stdlib/lib) to test, 192 | makes changes, and deploy. 193 | 194 | 195 | To retrieve your package via `lib get`... 196 | 197 | ```shell 198 | lib get /@dev 199 | ``` 200 | 201 | ```shell 202 | # Deploy to dev environment 203 | lib up dev 204 | ``` 205 | 206 | # Shipping to Production 207 | 208 | Standard Library has easy dev / prod environment management, if you'd like to ship to production, 209 | visit [`build.stdlib.com`](https://build.stdlib.com), 210 | find your project and select `manage`. 211 | 212 | 213 | 214 | From the environment management screen, simply click **Ship Release**. 215 | 216 | 217 | 218 | Link any necessary resources, specify the version of the release and click **Create Release** to proceed. 219 | 220 | That's all you need to do! 221 | 222 | # Support 223 | 224 | Via Slack: [`libdev.slack.com`](https://libdev.slack.com/) 225 | 226 | You can request an invitation by clicking `Community > Slack` in the top bar 227 | on [`https://stdlib.com`](https://stdlib.com). 228 | 229 | Via Twitter: [@SandardLibrary](https://twitter.com/StandardLibrary) 230 | 231 | Via E-mail: [support@stdlib.com](mailto:support@stdlib.com) 232 | 233 | # Acknowledgements 234 | 235 | Thanks to the Standard Library team and community for all the support! 236 | 237 | Keep up to date with platform changes on our [Blog](https://stdlib.com/blog). 238 | 239 | Happy hacking! 240 | -------------------------------------------------------------------------------- /functions/events/slack/command/list.js: -------------------------------------------------------------------------------- 1 | const lib = require('lib')({token: process.env.STDLIB_SECRET_TOKEN}); 2 | 3 | /** 4 | * An HTTP endpoint that acts as a webhook for Slack command event 5 | * @param {object} event 6 | * @returns {object} result Your return value 7 | */ 8 | module.exports = async (event) => { 9 | 10 | // Store API Responses 11 | const result = {slack: {}}; 12 | 13 | console.log(`Running [Slack → Retrieve Channel, DM, or Group DM by id]...`); 14 | result.slack.channel = await lib.slack.conversations['@0.2.5'].info({ 15 | id: `${event.channel_id}` 16 | }); 17 | 18 | console.log(`Running [Slack → Retrieve a User]...`); 19 | result.slack.user = await lib.slack.users['@0.3.32'].retrieve({ 20 | user: `${event.user_id}` 21 | }); 22 | 23 | 24 | await lib.slack.channels['@0.6.6'].messages.create({ 25 | channel: `#${event.channel_id}`, 26 | text: 27 | ` Here is a list of websites with their respective selectors: \n\n \t/cmd scrape https://techcrunch.com a.post-block__title__link \n\n \t/cmd scrape https://www.economist.com/ a.headline-link \n\n \t/cmd scrape https://markets.businessinsider.com a.teaser-headline \n\n \t/cmd scrape https://news.ycombinator.com a.storylink \n\n \t/cmd scrape https://www.nytimes.com a \n\n \t/cmd scrape https://www.cnn.com a \n\n \t/cmd scrape https://www.bbc.com a.media__link` 28 | }); 29 | 30 | 31 | return result; 32 | 33 | }; 34 | -------------------------------------------------------------------------------- /functions/events/slack/command/scrape.js: -------------------------------------------------------------------------------- 1 | const lib = require('lib')({token: process.env.STDLIB_SECRET_TOKEN}); 2 | /** 3 | * An HTTP endpoint that acts as a webhook for Slack command event 4 | * @param {object} event 5 | * @returns {object} result Your return value 6 | */ 7 | module.exports = async (event) => { 8 | // Store API Responses 9 | const result = {slack: {}, crawler: {}}; 10 | 11 | if ((event.text || '').split(/\s+/).length != 2) { 12 | return lib.slack.channels['@0.6.6'].messages.create({ 13 | channel: `#${event.channel_id}`, 14 | text: `${event.text} has wrong format. ` 15 | }); 16 | } 17 | 18 | console.log(`Running [Slack → Retrieve Channel, DM, or Group DM by id]...`); 19 | result.slack.channel = await lib.slack.conversations['@0.2.5'].info({ 20 | id: `${event.channel_id}` 21 | }); 22 | console.log(`Running [Slack → Retrieve a User]...`); 23 | result.slack.user = await lib.slack.users['@0.3.32'].retrieve({ 24 | user: `${event.user_id}` 25 | }); 26 | 27 | console.log(`Running [Crawler → Query (scrape) a provided URL based on CSS selectors]...`); 28 | result.crawler.pageData = await lib.crawler.query['@0.0.1'].selectors({ 29 | url: event.text.split(/\s+/)[0], 30 | userAgent: `stdlib/crawler/query`, 31 | includeMetadata: false, 32 | selectorQueries: [ 33 | { 34 | 'selector': event.text.split(/\s+/)[1], 35 | 'resolver': `attr`, 36 | 'attr': 'href' 37 | } 38 | ] 39 | }); 40 | let text = `Here are the links that we found for ${event.text.split(/\s+/)[0]}\n \n ${result.crawler.pageData.queryResults[0].map((r) => { 41 | if (r.attr.startsWith('http://') || r.attr.startsWith('https://') || r.attr.startsWith('//')) { 42 | return r.attr; 43 | } else { 44 | return result.crawler.pageData.url + r.attr; 45 | } 46 | }).join(' \n ')}`; 47 | console.log(`Running [Slack → Send a Message from your Bot to a Channel]...`); 48 | result.slack.response = await lib.slack.channels['@0.6.6'].messages.create({ 49 | channel: `#${event.channel_id}`, 50 | text: text 51 | }) 52 | return result; 53 | }; 54 | -------------------------------------------------------------------------------- /images/DeveloperMode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/DeveloperMode.png -------------------------------------------------------------------------------- /images/Example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/Example.png -------------------------------------------------------------------------------- /images/LinkNew.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/LinkNew.png -------------------------------------------------------------------------------- /images/Name.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/Name.png -------------------------------------------------------------------------------- /images/Navigate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/Navigate.png -------------------------------------------------------------------------------- /images/Oauth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/Oauth.png -------------------------------------------------------------------------------- /images/SlackAppCrawlerSample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/SlackAppCrawlerSample.png -------------------------------------------------------------------------------- /images/StandardApp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/StandardApp.png -------------------------------------------------------------------------------- /images/custom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/custom.png -------------------------------------------------------------------------------- /images/deploy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/deploy.png -------------------------------------------------------------------------------- /images/edit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/edit.png -------------------------------------------------------------------------------- /images/link.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/link.png -------------------------------------------------------------------------------- /images/linked.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/linked.png -------------------------------------------------------------------------------- /images/list.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/list.png -------------------------------------------------------------------------------- /images/list2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/list2.png -------------------------------------------------------------------------------- /images/listexample.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/listexample.png -------------------------------------------------------------------------------- /images/release.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JanethL/SlackAppWebscraper/f15821aa0b9f7a00e605d6c9436359bd9746d6ba/images/release.png -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Slackappcrawler", 3 | "author": "Janeth Ledezma ", 4 | "publish": false, 5 | "dependencies": { 6 | "lib": "latest" 7 | } 8 | } -------------------------------------------------------------------------------- /stdlib.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "JanethL/Slackappcrawler", 3 | "version": "0.0.0", 4 | "timeout": 10000, 5 | "connector": false, 6 | "events": { 7 | "functions/events/slack/command/scrape.js": { 8 | "name": "slack.command", 9 | "subtype": { 10 | "command": "scrape" 11 | } 12 | }, 13 | "functions/events/slack/command/list.js": { 14 | "name": "slack.command", 15 | "subtype": { 16 | "command": "list" 17 | } 18 | } 19 | } 20 | } --------------------------------------------------------------------------------