122 |
123 |
124 | ???
125 | Researchers needed patient or biospecimen data for their IRB-approved protocols
126 |
127 | ---
128 | class: hide-count no-logo no-border bg-black carina-nebula-soft
129 |
130 |
Help you make a dashboard!
131 |
Fulfill your research data request!
132 |
Remember how the data got here!
133 |
Help you find that database!
134 |
135 |
136 | ???
137 | Somebody needed to know about data lineage as well as coding or metadata standards; basically, how the data got here why it looks like it does
138 |
139 | ---
140 | class: hide-count no-logo no-border bg-black carina-nebula-soft
141 |
142 |
Help you make a dashboard!
143 |
Fulfill your research data request!
144 |
Help you find that database!
145 |
Remember how the data got here!
146 |
147 |
148 | ???
149 | Organizing databases into a warehouse and granting access was important. We had someone for that!
150 |
151 | But eventually, some of these teams were operating at a scale which would be better situated as independent of the IT gravity field
152 |
153 | ---
154 | name: world-bi
155 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen
156 |
157 |
158 |
Help you make a dashboard!
159 |
Fulfill your research data request!
160 |
Remember how the data got here!
161 |
Help you find that database!
162 |
163 |
164 |
165 |
166 |
Business Intelligence
167 |
168 | ???
169 | One of these, was business intelligence. That person who was making dashboards for operational, i.e. non-research stakeholders, is now part of a larger team that creates such products at scale
170 |
171 | ---
172 | name: world-cdsc
173 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen
174 |
175 |
176 |
Fulfill your research data request!
177 |
Help you find that database!
178 |
Remember how the data got here!
179 |
180 |
181 |
182 |
183 |
Business Intelligence
184 |
185 |
186 |
Collaborative Data Services
187 |
188 | ???
189 | Next, our research-focused stakeholders had many of the same needs as the operational end-users, such as reporting/dashboarding and, importantly data provisioning. The twist in the research space is that such activities must be conducted in accordance with IRB and ethical approval, and study design feasibility as it relates to data availability and structure requires specialized training. Hence, the CDS team was formed. This is one of the groups that Garrick and I are representing today.
190 |
191 | ---
192 | name: world-dqs
193 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen
194 |
195 |
196 |
Remember how the data got here!
197 |
Help you find that database!
198 |
199 |
200 |
201 |
202 |
Business Intelligence
203 |
204 |
205 |
Collaborative Data Services
206 |
207 |
208 |
Data Quality
209 |
210 | ???
211 | But CDS can't operate at scale in a vacuum either. A critical and complementary team, Data Quality and Standards, formed from IT's "data historian." They ensure that data dictionaries are robust and data lineage is understood by the BI and CDS teams for appropriate downstream data usage.
212 |
213 | ---
214 | name: data-engineering
215 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen
216 |
217 |
218 |
Help you find that database!
219 |
220 |
221 |
222 |
223 |
224 |
Business Intelligence
225 |
226 |
227 |
Collaborative Data Services
228 |
229 |
230 |
Data Quality
231 |
232 | ???
233 | As institutional data assets grew, warehousing and access rules became necessarily complex. Data engineering formed a new continent within IT to meet the challenge.
234 |
235 | ---
236 | name: rocket
237 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen
238 |
239 |
240 |
241 |
242 |
243 |
244 |
Business Intelligence
245 |
246 |
247 |
Collaborative Data Services
248 |
249 |
250 |
Data Quality
251 |
252 |
253 |
254 | ???
255 | Now, with so many teams completing data-related operations at a rapid pace, we needed a shuttlecraft to coordinate technology strategy, inform general data governance, and mine valuable software ore from the astRoid belt
256 |
257 | ---
258 | name: app-dev
259 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen animated fadeIn
260 |
261 |
262 |
263 |
264 |
265 |
266 |
267 |
Business Intelligence
268 |
269 |
270 |
Collaborative Data Services
271 |
272 |
273 |
Data Quality
274 |
275 |
276 |
277 |
IT
278 |
279 | ???
280 | When those tools are ready for placement and maintenance in the institutionally supported production environment, the new Applications Development land mass within IT can help out. For example, they would maintain software such as RStudio Server or GitHub Enterprise.
281 |
282 | ---
283 | name: cdo
284 | class: hide-count no-logo no-border bg-black carina-nebula-soft fullscreen animated fadeIn
285 |
286 |
287 |
288 |
289 |
290 |
291 |
Business Intelligence
292 |
293 |
294 |
Collaborative Data Services
295 |
296 |
297 |
Data Quality
298 |
299 |
300 |
301 |
IT
302 |
303 |
304 |
305 | ???
306 | This whole story, admittedly with some shortcuts for clarity, mirrors the rise of the Chief Data Officer role across the healthcare industry. Indeed, all of these groups tend to roll up or be horizontally aligned in some way with the CDO's vertical.
307 |
308 | ---
309 | class: left middle blueprint2
310 |
311 | .f4[
312 | Scaling .b.blue[provisioning]
313 | by scaling .b.em.red[people]
314 | ]
315 |
316 | .absolute.bottom-1.right-1[
317 |
318 | ]
319 |
320 | ???
321 | Taken together, this is our first hint that scaling data provisioning isn't just about scaling data: it's about scaling the people who are doing the provisioning. In part 2, Garrick is going to tell you more about the "how"
322 |
323 | ---
324 | class: left middle blueprint2
325 |
326 | .f4[
327 | [...continued in part 2 →](part-two.html)
328 | ]
329 |
330 |
--------------------------------------------------------------------------------
/docs/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Build your own universe
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
44 |
45 |
339 |
340 |
341 |
430 |
431 |
450 |
451 |
461 |
462 |
463 |
--------------------------------------------------------------------------------
/docs/part-two.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Build your own universe"
3 | subtitle: "Scale high-quality research data provisioning with R packages"
4 | author:
5 | - Travis Gerke
6 | - Garrick Aden-Buie
7 | institute: ".small[.blue-medium[Moffitt Cancer Center]]"
8 | date: 'August 28, 2020'
9 | output:
10 | xaringan::moon_reader:
11 | lib_dir: libs
12 | css:
13 | - css/moffitt-xaringan.css
14 | - css/moffitt-xaringan-extra.css
15 | - css/talk-base.css
16 | - css/part-two.min.css
17 | includes:
18 | in_header: head.html
19 | seal: false
20 | nature:
21 | titleSlideClass: ["bottom", "left"]
22 | slideNumberFormat: "%current%"
23 | highlightStyle: atom-one-light
24 | highlightLines: true
25 | ratio: 16:9
26 | countIncrementalSlides: true
27 | ---
28 |
29 | class: left middle blueprint2
30 |
31 | ```{r setup, echo=FALSE}
32 | source("library.R")
33 | source("setup.R")$value
34 | ```
35 |
36 | .f4[
37 | Scaling .b.blue[provisioning]
38 | by scaling .b.red[people]
39 | ]
40 |
41 | .absolute.bottom-1.right-1[
42 | [...continued from part 1](index.html)
43 | ]
44 |
45 | ???
46 |
47 | Travis ends here...
48 |
49 |
50 | ---
51 | class: left middle blueprint2
52 |
53 | .f4[
54 | Scaling .blue[provisioning]
55 | by scaling .red[people]
56 | ... by scaling .b.green[access]
57 | ]
58 |
59 |
60 | ???
61 |
62 | So as Travis talked about scaling provisioning
63 | by scaling systems of people,
64 | I'm going to talk about how we scale
65 | access to data systems
66 | through R packages
67 |
68 | I'm going to start with an entirely hypothetical but familiar story...
69 |
70 | ---
71 |
72 | class: center middle
73 |
74 | .center.w-50[
75 | .bubble.thought.vanishIn[
76 | How do I connect our .blue.b[tissue sample] inventory to a .orange.b[patient's clinical] data?
77 | ]
78 |
79 | .tr.f6[
80 | 🤔
81 | ]
82 | ]
83 |
84 | ???
85 |
86 | It starts with a question
87 |
88 | I want to connect **our tissue sample** inventory
89 | to **patient clinical** data.
90 |
91 | It's not something I've done before,
92 | so I'm not quite sure how to access the samples table
93 | or how to link a sample to a patient
94 | but obviously it has to be possible.
95 |
96 | So how do I get started?
97 |
98 | ---
99 | class: no-border no-logo hide-count
100 | background-image: url(figures/big-data-sensor-display.jpg)
101 | background-size: cover
102 | background-position: center right
103 |
104 | ???
105 |
106 | If you believe the big data stock photos,
107 | I go to the self-service data wall
108 | and point at the numbers I want.
109 |
110 | ---
111 |
112 | .ba.b--gray-4.br2.pv2.ph3.mv5.shadow-4.animated.fadeIn[
113 | .b[Aden-Buie, Garrick]
114 | Question about Sample Availability Table
115 |
116 | To: .gray-3[data-ops@acmemed.org]
117 |
118 | ***
119 |
120 | .o-0[
121 | How can I connect our sample inventory to patient-level clinical data?
122 |
123 | I've heard you know the secret.
124 |
125 | Thanks!
126 |
127 | .gray-3[Garrick]
128 | ]
129 | ]
130 |
131 | ???
132 |
133 | In reality, it probably starts with an email.
134 | Or many emails.
135 |
136 | I start by reaching out to someone I know
137 | in data engineering
138 | who manages that particular data resource
139 | to see what they can tell me...
140 |
141 | ---
142 |
143 | .ba.b--gray-4.br2.pv2.ph3.mv5.shadow-4[
144 | .b[Aden-Buie, Garrick]
145 | Question about Sample Availability Table
146 |
147 | To: .gray-3[data-ops@acmemed.org]
148 |
149 | ***
150 |
151 | .css-typing[
152 | How can I connect our sample inventory to patient-level clinical data?
153 |
154 | I've heard you know the secret.
155 |
156 | Thanks!
157 |
158 | .gray-3[Garrick]
159 | ]
160 | ]
161 |
162 |
163 | ---
164 |
165 | .w-90.ba.b--gray-4.br2.pv2.ph3.mv5.shadow-4.absolute.animated.lightSpeedOut[
166 | .b[Aden-Buie, Garrick]
167 | Question about Sample Availability Table
168 |
169 | To: .gray-3[data-ops@acmemed.org]
170 |
171 | ***
172 |
173 | How can I connect our sample inventory to patient-level clinical data?
174 |
175 | I've heard you know the secret.
176 |
177 | Thanks!
178 |
179 | .gray-3[Garrick]
180 | ]
181 |
182 | .ba.b--gray-4.br2.pv2.ph3.mv5.shadow-4.relative.animated.bounceInDown.delay-2s[
183 | .b[Friendly Data Ops Person]
184 | RE: Question about Sample Availability Table
185 |
186 |
187 |
188 |
189 | To: .gray-3[Aden-Buie, Garrick]
190 |
191 |
192 | ***
193 | Hey Garrick,
194 |
195 | Here's the query we use to populate the table.
196 |
197 | Good Luck!
198 |
199 | .gray-3[Data Ops]
200 | ]
201 |
202 | ???
203 |
204 | I fire off the email and a little while later I get a reply.
205 |
206 | _read email_
207 |
208 | And look, the email came with an attachment that I can open up.
209 |
210 | ---
211 | name: sql-example
212 |
213 |
222 |
223 | ```{sql example-query, eval=FALSE}
224 | SELECT SA.PERSON_ID,SA.SAMPLE_ID,SA.SAMPLE_FAMILY_ID
225 | ,CH1.CHOICE_NAME AS COLLECTION_CONSORTIUM
226 | ,CH2.CHOICE_NAME AS CELFILE_RELATED
227 | ,CH3.CHOICE_NAME AS FROZEN_TUMOR_TISSUE
228 | ,CH4.CHOICE_NAME AS FROZEN_NORMAL_TISSUE
229 | ,CH5.CHOICE_NAME AS WHOLE_BLOOD
230 | ,CH6.CHOICE_NAME AS BUFFY_COAT
231 | ,CH7.CHOICE_NAME AS DNA_FROM_BLOOD
232 | ,CH8.CHOICE_NAME AS PLASMA
233 | ,CH9.CHOICE_NAME AS SERUM
234 | ,CH10.CHOICE_NAME AS PARAFFIN
235 | ,CH11.CHOICE_NAME AS URINE
236 | ,CH12.CHOICE_NAME AS RNA
237 | ,CH13.CHOICE_NAME AS MNC
238 | ,CH14.CHOICE_NAME AS DNA_FROM_SOLID
239 | ,CH15.CHOICE_NAME AS ADJACENT_NORMAL
240 | ,CH16.CHOICE_NAME AS LCS_PROTOCOL_BLOOD_AVA
241 | ,CH17.CHOICE_NAME AS SOO_CATEGORY
242 | ,CH36.CHOICE_NAME AS SOO_TISSUETYPE
243 | ,CH19.CHOICE_NAME AS HISTOLOGYCAP
244 | ,CH20.CHOICE_NAME AS PRIMARY_METASTATIC
245 | ,PI.MEDICAL_ID
246 | ,PI.PARTNER_ID
247 | ,CH21.Anatomic_Site AS SOO_ANATOMICSITE
248 | ,CH22.CHOICE_NAME AS GROSS_DIAGNOSIS
249 | ,CH23.CHOICE_NAME AS PRIMARY_SITE_PATIENT
250 | ,CH24.CHOICE_NAME AS HISTOLOGY_PATIENT
251 | ,CH25.CHOICE_NAME AS DERIVED_CATEGORY
252 | ,CH27.CHOICE_NAME AS DERIVED_TISSUETYPE
253 | ,SA.DERIVED_ANATOMIC_SITE AS DERIVED_ANATOMICSITE
254 | ,CH29.TISSUE_TYPE AS COL_SITE_TISSUE_TYPE
255 | ,SA.SPECIMEN_COLLECTION_DATE
256 | ,CH29.ANATOMIC_SITE AS COL_SITE_ANATOMIC
257 | ,CH30.CHOICE_NAME AS COL_SITE_CATEGORY
258 | ,SA.BIOBANKING_SUBJECT_ID AS SUBJECT_ID
259 | ,CH31.CHOICE_NAME AS WHOLE_EXOME
260 | ,CH32.CHOICE_NAME AS TARGET_EXOME
261 | ,CH33.CHOICE_NAME AS SAMPLE_TYPE
262 | ,SA.CURRENT_QUANTITY
263 | ,CH34.CHOICE_NAME AS PATH_DIAGNOSIS_PQC
264 | ,SA.SAMPLE_TO_CANR_CHAR_LINK
265 | ,CH35.CHOICE_NAME AS PROTOCOL
266 | FROM ABC.SAMPLE SA
267 | LEFT JOIN ABC.CHOICE CH1 ON SA.COLLECTION_FACILITY = CH1.CHOICE_ID
268 | LEFT JOIN ABC.CHOICE CH2 ON SA.CELFILE_RELATED = CH2.CHOICE_ID
269 | LEFT JOIN ABC.PATIENT PI ON SA.PERSON_ID = PI.PERSON_ID
270 | LEFT JOIN ABC.SAMPLE_INDICATOR SI ON SA.SAMPLE_KEY = SI.SAMPLE_KEY
271 | LEFT JOIN ABC.CHOICE CH3 ON SI.FROZ_TTISE_AVL_FOR_RIND = CH3.CHOICE_ID
272 | LEFT JOIN ABC.CHOICE CH4 ON SI.FROZ_NORM_TISS_AVL_FOR_RIND = CH4.CHOICE_ID
273 | LEFT JOIN ABC.CHOICE CH5 ON SI.WHLBAVLFDNAEXTNDR_INCL_PAXDNAS = CH5.CHOICE_ID
274 | LEFT JOIN ABC.CHOICE CH6 ON SI.BUFY_COAT_AVL_FOR_RIND = CH6.CHOICE_ID
275 | LEFT JOIN ABC.CHOICE CH7 ON SI.DNAEXTRF_BLOD_AVL_FOR_RIND = CH7.CHOICE_ID
276 | LEFT JOIN ABC.CHOICE CH8 ON SI.PLA_AVL_FOR_RIND = CH8.CHOICE_ID
277 | LEFT JOIN ABC.CHOICE CH9 ON SI.SER_AVL_FOR_RIND = CH9.CHOICE_ID
278 | LEFT JOIN ABC.CHOICE CH10 ON SI.PARAF_BLK_AVL_FOR_RIND = CH10.CHOICE_ID
279 | LEFT JOIN ABC.CHOICE CH11 ON SI.UNINE_AVL_FOR_RIND = CH11.CHOICE_ID
280 | LEFT JOIN ABC.CHOICE CH12 ON SI.RNAEXTRF_SOLID_TTISE_FOR_RIND = CH12.CHOICE_ID
281 | LEFT JOIN ABC.CHOICE CH13 ON SI.MONO_CELS_EXTR_FOR_RIND = CH13.CHOICE_ID
282 | LEFT JOIN ABC.CHOICE CH14 ON SI.DNAEXTRF_SOLID_TTISE_FOR_RIND = CH14.CHOICE_ID
283 | LEFT JOIN ABC.CHOICE CH15 ON SI.ADJA_NSLD_SAMPLE_AVL_INDR = CH15.CHOICE_ID
284 | LEFT JOIN ABC.CHOICE CH16 ON SI.LCS_PROTC_BLOD_AVL_FOR_RIND = CH16.CHOICE_ID
285 | LEFT JOIN ABC.CHOICE CH17 ON SA.DERIVED_SOO_CATEGORY = CH17.CHOICE_ID
286 | LEFT JOIN ABC.CHOICE CH18 ON SA.DERIVED_SOO_TISSUE_TYPE = CH18.CHOICE_ID
287 | LEFT JOIN ABC.CHOICE CH19 ON SA.HISTOLOGY_TYPE_CONFORMED_CAP = CH19.CHOICE_ID
288 | LEFT JOIN ABC.CHOICE CH20 ON SA.DERIVED_PRIMARY_VS_METASTATIC = CH20.CHOICE_ID
289 | LEFT JOIN ABC.TISSUE_TYPE CH21
290 | ON SUBSTR(SA.DERIVED_SOO_ANATOMIC_SITE,0,8) = CH21.TISSUE_TYPE_KEY
291 | LEFT JOIN ABC.CHOICE CH22 ON SA.DERIVED_GROSS_DIAGNOSIS = CH22.CHOICE_ID
292 | LEFT JOIN ABC.CHOICE CH23 ON SA.PRIMARY_SITE_PATIENT = CH23.CHOICE_ID
293 | LEFT JOIN ABC.CHOICE CH24 ON SA.HISTOLOGY_PATIENT = CH24.CHOICE_ID
294 | LEFT JOIN ABC.CHOICE CH25 ON SA.DERIVED_CATEGORY = CH25.CHOICE_ID
295 | LEFT JOIN ABC.CHOICE CH26 ON SA.DERIVED_TISSUE_TYPE = CH26.CHOICE_ID
296 | LEFT JOIN ABC.CHOICE CH27 ON SA.DERIVED_TISSUE_TYPE = CH27.CHOICE_ID
297 | LEFT JOIN ABC.CHOICE CH28 ON SA.COLLECTION_SITE_TISSUE_TYPE = CH28.CHOICE_ID
298 | LEFT JOIN ABC.TISSUE_TYPE CH29 ON SA.COLLECTION_SITE_ANATOMIC = CH29.TISSUE_TYPE_KEY
299 | LEFT JOIN ABC.CHOICE CH30 ON SA.COLLECTION_SITE_CATEGORY = CH30.CHOICE_ID
300 | LEFT JOIN ABC.CHOICE CH31 ON SI.WHL_EXOM_SEQ_AVL_INDR = CH31.CHOICE_ID
301 | LEFT JOIN ABC.CHOICE CH32 ON SI.TAR_EXOM_SEQ_AVL_INDR = CH32.CHOICE_ID
302 | LEFT JOIN ABC.CHOICE CH33 ON SA.SAMPLE_TYPE = CH33.CHOICE_ID
303 | LEFT JOIN ABC.CHOICE CH34 ON SA.PATH_DIAGNOSIS_PQC = CH34.CHOICE_ID
304 | LEFT JOIN ABC.CHOICE CH35 ON SA.PROTOCOL = CH35.CHOICE_ID
305 | LEFT JOIN ABC.CHOICE CH36 ON SA.DERIVED_SOO_TISSUE_TYPE = CH36.CHOICE_ID
306 | WHERE SA.ISDELETED = 0
307 | ```
308 | ]
309 |
310 | ???
311 |
312 | And I'm immediately hit with a wall of sequel.
313 |
314 | This query doesn't look pretty
315 | but in a couple hours I'll probably get the gist of it...
316 |
317 | Somewhere in here, I'll find that this query uses...
318 | `sample`,
319 | `patient` and
320 | `sample_indicator` tables
321 |
322 | and that all these lines
323 | are for translated coded columns into text labels
324 |
325 | And hey, it's at least code, right?
326 |
327 | Well... since we're emailing files around, sometimes you'll get a query like this
328 | in a slightly different format...
329 |
330 | ---
331 | class: no-logo no-border hide-count word-doc animated-slide slideInRight boingOutDown
332 |
333 |
334 |
335 |
336 |
337 | ```{sql ref.label="example-query", eval=FALSE}
338 | ```
339 |
340 | ???
341 |
342 | Like a word document!
343 |
344 | Where the query doesn't really fit on screen or a page...
345 |
346 | and formatting choices are... fluid
347 |
348 | ---
349 | class: middle animated fadeIn delay-1s
350 |
351 | .f6.b[SQL is .red[robot logic] 🤖]
352 |
353 | 
354 |
355 | .footnote.tr[
356 | [_Definitive Guide to SQLite_](https://www.apress.com/gp/book/9781430232254)
357 | ]
358 |
359 | ???
360 |
361 | Putting aside the emailing and the word document format,
362 | SQL queries aren't a great vehicle for knowledge transfer
363 |
364 | They're good for precisely communicating data specifications
365 | in the robot logic databases understand
366 |
367 | but we have other ways of working with data
368 | that have been specifically designed with humans in mind
369 |
370 | ---
371 | class: middle
372 |
373 | .f6.b[.code[dplyr] is .green[human logic] 🤗]
374 |
375 |
376 | Programs must be written for people to read, and only incidentally for machines to execute.
377 |
378 |
Harold Abelson, Structure and Interpretation of Computer Programs
379 |
380 |
381 | ???
382 |
383 | for example, dplyr
384 |
385 | whose API is very intentionally designed
386 | in line with the philopsophy that code is
387 | written for people to read
388 | and only incidentally for machines to execute
389 |
390 | Which reminds me of a great quote from Jenny Bryan...
391 |
392 | ---
393 | class: middle
394 |
395 |
396 | Of course someone
397 | has to write for loops.
398 | It doesn't have to be you.
399 |
518 |
519 | ???
520 |
521 | Most of the packages are from the tidyverse,
522 |
523 | but we also include our own supporting package:
524 | ▶️ `moffittCDS`
525 | specifically tailored to my team's workflow.
526 |
527 | This creates a common starting point
528 | for everyone on the team.
529 |
530 | ---
531 | class: animated-slide fadeIn
532 |
533 | .remark-code.code-example.pa2.gray-4[
534 | library(moffittverse)
547 |
548 | ???
549 |
550 | It also gives us a formal "on ramp"
551 | to install and set up database dependencies
552 | that we can leverage in specific packages
553 | to interface with our many back-end systems
554 |
555 | This makes connecting to a specific databse straightforward...
556 |
557 | you call _use backend_
558 | with the name of the database or server
559 | that you need to connect to
560 |
561 | ---
562 | class: animated-slide fadeIn
563 |
564 | .remark-code.code-example.pa2.gray-4[
565 | library(moffittverse)
584 |
585 | ???
586 |
587 | and behind the scenes `use_backend()`
588 | loads additional packages like `dbplyr`
589 | as well as a package specifically for this back-end,
590 | ▶️ `moffittABC`
591 |
592 | Each backend package has two primary goals,
593 | the first is to simplify access.
594 |
595 | So by default, moffittABC will not only
596 | ▶️ remember the incancantations required to connect
597 | to the ABC database
598 |
599 | but it will actually manage the connection for users internally
600 |
601 | ---
602 | class: animated-slide fadeIn
603 |
604 |
605 | .remark-code.code-example.pa2.gray-4[
606 | library(moffittverse)
619 |
620 | ???
621 |
622 | and in addition it also provides easy access to tables
623 | with functions like `abc_tbl()`.
624 |
625 | Here we use the `abc_tbl()` function to
626 | ▶️ connect to a table in our database
627 | that not at all coincidentally is a table in the ABC schema.
628 | And notice that the connection is handled for the user.
629 |
630 | So in this step we connect to the three tables we need...
631 |
632 | ---
633 |
634 | .remark-code.code-example.pa2.gray-4[
635 | library(moffittverse)
648 |
649 | ???
650 |
651 | At this point we've set up our workspace and our environment,
652 | and connected to the database and tables we need
653 |
654 | ... so we can now focus on
655 | how these tables relate to each other
656 | so we can link samples to patients
657 | through a series of left joins
658 |
659 | ---
660 |
661 | .remark-code.code-example.pa2.gray-4[
662 | library(moffittverse)
675 |
676 | ???
677 |
678 | The final lines speak to the second goal
679 | of the back-end specific packages
680 | which is to wrap **common, tedious or error-prone**
681 | data base moves into standard functions.
682 |
683 | Here, because we're working in R,
684 | we have a lot more flexibility to write functions,
685 | and use tidyselect, tidyeval, and more
686 | to do things that would otherwise be very hard to do in SQL, like:
687 |
688 | - ▶️ applying a "not deleted" filter to all tables used in the query
689 | - ▶️ automatically look up text labels of coded values
690 |
691 | ---
692 | template: moffittverse-code-example
693 |
694 | ???
695 |
696 | 👋👋👋
697 |
698 | Let's take a step back and reflect on this code as a whole,
699 |
700 | it's not that it's fewer lines of code,
701 | or less repetition
702 | or a question of R vs SQL ...
703 | it's that this code does a much better job
704 | explaining to humans
705 | how the data is being collected and transformed
706 |
707 | There are still plenty of assumptions in this code,
708 | but as we'll see,
709 | because these functions live in R packages
710 | they bring a lot of context with them
711 |
712 | ---
713 | layout: true
714 | class: middle
715 |
716 | ---
717 |
718 | ```{r abc-choice-replace, eval=FALSE}
719 | #' Replace Coded Choice Values
720 | #'
721 | #' Uses `ABC.CHOICE` to replace integers with text labels
722 | #'
723 | #' @examples ...
724 | #' @param x A table in ABC
725 | #' @export
726 | abc_choice_replace <- function(x, ...) {
727 |
728 | choice_cols <- choice_column_info(x)
729 |
730 | for (column in choice_cols) {
731 | choice_lookup <- get_choices(column)
732 | x <- replace_choice(x, column, choice_lookup)
733 | }
734 |
735 | x
736 | }
737 | ```
738 |
739 | ???
740 |
741 | Let's take a look at the source of `abc_choice_replace()` function
742 |
743 | ---
744 |
745 | ```{r abc-choice-replace-2, eval=FALSE}
746 | #' Replace Coded Choice Values
747 | #'
748 | #' Uses `ABC.CHOICE` to replace integers with text labels
749 | #'
750 | #' @examples ...
751 | #' @param x A table in ABC
752 | #' @export
753 | abc_choice_replace <- function(x, ...) { #<<
754 |
755 | choice_cols <- choice_column_info(x)
756 |
757 | for (column in choice_cols) {
758 | choice_lookup <- get_choices(column)
759 | x <- replace_choice(x, column, choice_lookup)
760 | }
761 |
762 | x
763 | }
764 | ```
765 |
766 | ???
767 |
768 | We've already seen how the naming conventions
769 | and the functions' API communicate intent:
770 |
771 | "...and then we replace choices..."
772 |
773 | but on top of that,
774 | the function name is chosen to aid discoverability.
775 |
776 | in other words,
777 | a user can easily find other functions
778 | that operate on _choice columns_
779 | by exploring autocomplete in RStudio
780 |
781 | ---
782 | class: middle
783 |
784 | ```{r abc-choice-replace-4, eval=FALSE}
785 | #' Replace Coded Choice Values #<<
786 | #' #<<
787 | #' Uses `ABC.CHOICE` to replace integers with text labels #<<
788 | #' #<<
789 | #' @examples ... #<<
790 | #' @param x A table in ABC #<<
791 | #' @export #<<
792 | abc_choice_replace <- function(x, ...)
793 |
794 | choice_cols <- choice_column_info(x)
795 |
796 | for (column in choice_cols) {
797 | choice_lookup <- get_choices(column)
798 | x <- replace_choice(x, column, choice_lookup)
799 | }
800 |
801 | x
802 | }
803 | ```
804 |
805 | ???
806 |
807 | Because this is a function in an R package,
808 | we can document **what** the function does and **why**
809 | right next to the code...
810 |
811 | And the documentation is comfortably available
812 | right inside the data analysis environment
813 |
814 | --
815 |
816 |
817 |
818 | ---
819 |
820 |
821 |
822 | ```{r abc-choice-replace-3, eval=FALSE}
823 | #' Replace Coded Choice Values
824 | #'
825 | #' Uses `ABC.CHOICE` to replace integers with text labels
826 | #'
827 | #' @examples ...
828 | #' @param x A table in ABC
829 | #' @export
830 | abc_choice_replace <- function(x, ...) {
831 |
832 | choice_cols <- choice_column_info(x) #<<
833 | #<<
834 | for (column in choice_cols) { #<<
835 | choice_lookup <- get_choices(column) #<<
836 | x <- replace_choice(x, column, choice_lookup) #<<
837 | } #<<
838 | #<<
839 | x #<<
840 | }
841 | ```
842 |
843 | ???
844 |
845 | The body of the function
846 | can be considered **technical documentation**
847 | recording **how** the function works.
848 |
849 | It's more precise than **just a description** of the best practice,
850 | and we've learned that
851 | when interfacing with more technical teams,
852 | the function becomes the **specification**
853 | for how we accomplish tasks,
854 | which makes it easy to say to engineering:
855 | _this is what we do_ or
856 | _this is what the new platform needs to support_.
857 |
858 | ---
859 |
860 | ```{r abc-choice-replace-5, eval=FALSE}
861 | #' Replace Coded Choice Values
862 | #'
863 | #' Uses `ABC.CHOICE` to replace integers with text labels
864 | #'
865 | #' @examples ...
866 | #' @param x A table in ABC
867 | #' @export
868 | abc_choice_replace <- function(x, ...) {
869 |
870 | choice_cols <- choice_column_info(x)
871 |
872 | for (column in choice_cols) {
873 | choice_lookup <- get_choices(column)
874 | x <- replace_choice(x, column, choice_lookup)
875 | }
876 |
877 | x
878 | }
879 | ```
880 |
881 | ???
882 |
883 | So taking a step back,
884 | this function isn't **just** about making life easier
885 | for someone working with this data...
886 |
887 | it's also a self-contained unit of knowledge
888 |
889 | In this view,
890 | an R package isn't just a place to keep code,
891 | it's where store best practices
892 | or lessons learned
893 | and it's how you share that knowledge
894 | with others on your team
895 |
896 | ---
897 | layout: false
898 | class: center
899 |
900 |
901 |
902 | ???
903 |
904 | In fancy websites! Seriously!
905 |
906 | R's tooling for package development
907 | is really quite amazing.
908 |
909 | Tools like `pkgdown`
910 | don't just make your code documentation pretty
911 | and browseable
912 | and shareable
913 |
914 | they make your package documentation
915 | a viable knowledge repository
916 | and a place to turn
917 | when you need to learn something new
918 |
919 |
920 | ---
921 | layout: false
922 | class: animated slideInRight
923 |
924 |
925 |
926 | ???
927 |
928 | On top of this,
929 | if you using version control front-ends,
930 | like GitHub or GitLab,
931 | you also have a public place for sharing knowledge.
932 |
933 | Rather than sending emails
934 | that are only seen by the people copied in the email,
935 | I can open an issue
936 |
937 | ---
938 | class: animated fadeIn
939 |
940 |
941 |
942 | ???
943 |
944 | where my question is seen by others,
945 | answered publicly,
946 | and available for reference in the future..
947 |
948 | So I'd like to close with a few practical tips
949 | about how to make this happen for your organization and teams
950 |
951 | ---
952 | layout: true
953 | class: bigger highlight-last-item
954 |
955 | # Tips
956 |
957 |
958 |
959 | ---
960 |
961 | - .green[✔] **Do** start small
962 |
963 | ???
964 |
965 | Start small
966 |
967 | Start with one team and make their lives better!
968 |
969 | I can guarantee that if you look for it
970 | you will find a painfully manual process
971 | just waiting for a hero like you
972 |
973 | ---
974 |
975 | - .green[✔] **Do** start small
976 |
977 | - .green[✔] **Do** stay small
978 |
979 | ???
980 |
981 | Stay small
982 |
983 | Rather than throwing everything into
984 | a big, monolithic package
985 | I've had success creating smaller, focused packages
986 |
987 | this makes it easy to experiment
988 | and to provide targeted solutions
989 |
990 | ---
991 |
992 | - .green[✔] **Do** start small
993 |
994 | - .green[✔] **Do** stay small
995 |
996 | - .green[✔] **Do** use vignettes
997 |
998 | ???
999 |
1000 | My next tip is to use vignettes.
1001 |
1002 | Vignettes are a great way to document and share
1003 | processes that aren't easily captured
1004 | in a single function
1005 | or even in R code.
1006 |
1007 | For example,
1008 | I've used vignettes to document
1009 | database driver setup and configuration
1010 | or whole-game analysis examples
1011 |
1012 | ---
1013 |
1014 | - .green[✔] **Do** start small
1015 |
1016 | - .green[✔] **Do** stay small
1017 |
1018 | - .green[✔] **Do** use vignettes
1019 |
1020 | - .green[✔] **Do** be opinionated
1021 |
1022 | ???
1023 |
1024 | and finally, do be opinionated ...
1025 |
1026 | ... okay that came out wrong...
1027 |
1028 | ---
1029 |
1030 | - .green[✔] **Do** start small
1031 |
1032 | - .green[✔] **Do** stay small
1033 |
1034 | - .green[✔] **Do** use vignettes
1035 |
1036 | - .green[✔] **Do** be opinionated provide a happy path
1037 |
1038 | ???
1039 |
1040 | provide a happy path
1041 |
1042 | consider that your users
1043 | are likely used to a range of workflows
1044 | so help them fall into a pit of success
1045 | by making sure the happy path
1046 | is smooth and as bump-free as possible
1047 |
1048 | ---
1049 | layout: true
1050 | class: big highlight-last-item
1051 |
1052 | # Use this
1053 |
1054 | ---
1055 |
1056 | - `usethis`, `devtools`, `roxygen2`, `pkgdown` for happy package building
1057 |
1058 | .flex.h-25[
1059 | .w-25[
1060 | 
1061 | ]
1062 | .w-25[
1063 | 
1064 | ]
1065 | .w-25[
1066 | 
1067 | ]
1068 | .w-25[
1069 | 
1070 | ]
1071 | ]
1072 |
1073 | ???
1074 |
1075 | None of this would be possible without
1076 | a slew of packages and resources
1077 |
1078 | key among these are
1079 | usethis and devtools for package building
1080 | and roxygen2 and pkgdown for documentation
1081 |
1082 | ---
1083 |
1084 | - `usethis`, `devtools`, `roxygen2`, `pkgdown` for happy package building
1085 |
1086 | - .b[R Packages] by Hadley Wickham & Jenny Bryan [r-pkgs.org](https://r-pkgs.org)
1087 |
1088 | .absolute.right-2.bottom-1.w-25[
1089 | 
1090 | ]
1091 |
1092 | ???
1093 |
1094 | Hadley Wickham and Jenny Bryan's
1095 | R Packages
1096 | is a great place to start learning
1097 | about building R packages
1098 |
1099 | or to turn back to when you get stuck
1100 |
1101 | ---
1102 |
1103 | - `usethis`, `devtools`, `roxygen2`, `pkgdown` for happy package building
1104 |
1105 | - .b[R Packages] by Hadley Wickham & Jenny Bryan [r-pkgs.org](https://r-pkgs.org)
1106 |
1107 |
1108 | - `drat` by Dirk Eddelbuettel for internal CRAN-like repos
1109 | .gray-4[or [.gray-3[RStudio Package Manager]](https://rstudio.com/products/package-manager/)]
1110 |
1111 | ???
1112 |
1113 | We use `drat` by Dirk Eddelbuettel
1114 | to create an internal CRAN-like package repository
1115 | and it has made package installation much much better
1116 |
1117 | Another solution is RStudio's package manager
1118 |
1119 | ---
1120 |
1121 | - `usethis`, `devtools`, `roxygen2`, `pkgdown` for happy package building
1122 |
1123 | - .b[R Packages] by Hadley Wickham & Jenny Bryan [r-pkgs.org](https://r-pkgs.org)
1124 |
1125 |
1126 | - `drat` by Dirk Eddelbuettel for internal CRAN-like repos
1127 | .gray-4[or [.gray-3[RStudio Package Manager]](https://rstudio.com/products/package-manager/)]
1128 |
1129 | - `pkgverse` by Mike Kearney to kickstart your universe
1130 | [pkgverse.mikewk.com](https://pkgverse.mikewk.com)
1131 |
1132 | ???
1133 |
1134 | And finally Mike Kearney's pkgverse template
1135 | made it easy to pull together
1136 | our universe of package into a cohesive unit
1137 |
1138 | ---
1139 | layout: false
1140 | class: center middle hide-count no-border city-360-bg
1141 | background-color: #FBFCFF;
1142 |
1143 |
1144 |
1145 |
1146 | .pt5.z-2.animated-slide.lightSpeedIn[
1147 | .blue.f6[Thank you!]
1148 | [.b[build-your-own-universe].netlify.app](https://build-your-own-universe.netlify.app)
1149 | ]
1150 |
1151 | .flex.z-9999.relative.animated.fadeInUp.delay-2s.slow[
1152 | .w-40[
1153 | [@travisgerke]() Travis Gerke
1154 | ]
1155 | .w-20[
1156 |
1157 | ]
1158 | .w-40[
1159 | [@grrrck]() Garrick Aden-Buie
1160 | ]
1161 | ]
--------------------------------------------------------------------------------