├── README.md ├── .gitignore ├── docs ├── css │ └── docs.css ├── systems │ ├── 3-systems.md │ ├── 2-data.md │ ├── 0-intro.md │ └── 1-vcs.md ├── index.md └── about.md └── mkdocs.yml /README.md: -------------------------------------------------------------------------------- 1 | # ds-guide 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | site 3 | site.zip 4 | Style Guide 5 | TODO 6 | -------------------------------------------------------------------------------- /docs/css/docs.css: -------------------------------------------------------------------------------- 1 | .footnote { 2 | font-size: 85%; 3 | } 4 | 5 | /*.footnote > hr::after { 6 | content: "

Footnotes

"; 7 | }*/ 8 | -------------------------------------------------------------------------------- /docs/systems/3-systems.md: -------------------------------------------------------------------------------- 1 | thin client, thick client, rich client 2 | 3 | https://techbeacon.com/app-dev-testing/top-5-software-architecture-patterns-how-make-right-choice 4 | https://www.oreilly.com/library/view/software-architecture-patterns/9781491971437/ 5 | 6 | https://stackify.com/web-application-architecture/ 7 | 8 | 9 | https://www.infoworld.com/article/3080611/learning-from-soa-5-lessons-for-the-microservices-era.html 10 | https://martinfowler.com/articles/microservices.html#MicroservicesAndSoa 11 | http://service-architecture.blogspot.com/2014/03/microservices-is-soa-for-those-who-know.html 12 | https://martinfowler.com/microservices/ 13 | -------------------------------------------------------------------------------- /mkdocs.yml: -------------------------------------------------------------------------------- 1 | site_name: Introduction to Data & Systems 2 | nav: 3 | - Home: index.md 4 | - Chapters: 5 | - 'Introduction' : 'systems/0-intro.md' 6 | - 'Chapter 1: Version Control Systems' : 'systems/1-vcs.md' 7 | - About: about.md 8 | copyright: "© 2019, Tom Gregory" 9 | theme: cinder 10 | extra_css: 11 | - "css/docs.css" 12 | markdown_extensions: 13 | - abbr 14 | - toc: 15 | permalink: false 16 | - admonition 17 | - footnotes 18 | - smarty 19 | - pymdownx.mark 20 | shortcuts: 21 | help: 191 # ? 22 | next: 39 # right arrow 23 | previous: 37 # left arrow 24 | search: 83 # s 25 | -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | # Welcome 2 | 3 | Welcome to “Introduction to Data & Systems”. This course is intended to be a 4 | practical, hands-on approach to learning about technology systems in modern 5 | business. One of the great things about technology—and one if the most 6 | frustrating things about technology—is that it's constantly changing. Therefore, 7 | delivering this content via a digital medium seems ideal. Updates and changes 8 | are immediately visible, which helps deliver the most up-to-date and relevant 9 | curriculum. 10 | 11 | 12 | 13 | !!! note "Still being built!" 14 | This document is—and probably always will be—a work in progress. 15 | 16 | ## Chapters 17 | 18 | * [Introduction](systems/0-intro/) 19 | * [Chapter 1: Version Control Systems](systems/1-vcs/) 20 | -------------------------------------------------------------------------------- /docs/about.md: -------------------------------------------------------------------------------- 1 | # About 2 | 3 | “Introduction to Data & Systems” is written to suppor a module of the same name 4 | in the Masters of Information Systems core class at Indiana University. 5 | 6 | ## Errors and suggestions 7 | If you find errors or have suggestions, please let me know. 8 | 9 | ## License 10 | 11 | All content ©2019 by Tom Gregory unless otherwise noted. You may share this 12 | content in limited circumstances as detailed by the listed license. 13 | 14 | 15 | Creative Commons License
16 | This work is licensed under a 17 | Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. 18 | 19 | 20 | -------------------------------------------------------------------------------- /docs/systems/2-data.md: -------------------------------------------------------------------------------- 1 | What is data? 2 | Is it different from information? 3 | What’s the difference between data and metadata? 4 | What does data look like? 5 | 6 | Metadata: “Metadata are "data [information] that provide information about other data". (Wikipedia, “Metadata”) 7 | 8 | Shareable: Readily accessed by more than one person at a time 9 | Transportable: Easily moved to a decision maker 10 | Secure: Protected from destruction and unauthorized use 11 | Accurate: Reliable, precise records 12 | Timely: Current and up-to-date Relevant Appropriate to the decision 13 | 14 | Watson, Richard (2013-12-27). Data Management. eGreen Press. 15 | 16 | Usability 17 | Relevant 18 | Simple 19 | Flexible 20 | Economical 21 | 22 | Quality 23 | Accurate 24 | Verifiable 25 | Complete 26 | Reliable 27 | 28 | Delivery 29 | Timely 30 | Accessible 31 | Secure 32 | 33 | [img, nformation systsems data cycle] 34 | 35 | [Vijay, data triad] 36 | 37 | CAP Theorem (also called Brewster’s Theorem) 38 | At best, a distributed system can provide only two of the three guarantees: 39 | Consistency All nodes see the same data at the same time) 40 | Availability Every request receives a response about whether it succeeded or failed 41 | Partition tolerance System continues to operate despite arbitrary partitioning due to network failures 42 | 43 | Problems with data management systems 44 | 45 | Lack of redundancy 46 | Lack of data control 47 | Poor interface 48 | Lack of access/lack of security 49 | Delays 50 | Lack of reality 51 | Diversity of systems/silos 52 | Lack of data integration 53 | Volume of data 54 | 55 | 56 | Data silos are revealed as an organization grows. 57 | 58 | Data should have a documented owner. Other documentation should include meaning, associated risk/security, strategic importance, etc. 59 | 60 | 61 | “Client–server application” 62 | “Thin client/thick client” 63 | 64 | 65 | -------------------------------------------------------------------------------- /docs/systems/0-intro.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | This course—and consequently, this text—is designed to be as hands-on as 3 | possible. We will cover some theory, but its primary purpose is to introduce 4 | technologies that reinforce theoretical learning in other courses. 5 | 6 | By the end of the course you will have a better understanding of different types 7 | of data and how data are used for different purposes. You will understand some 8 | of the tradeoffs made by designers in data and systems, and describe and use 9 | multiple data formats. You will be able to describe the "devops" process (and 10 | even practice some of the steps), will better understand cloud computing and 11 | virtualization, version control, APIs, web services, and software design 12 | patterns. You will build a working multi-tier web application, and use a 13 | unix-style command line. 14 | 15 | ## Projects and software 16 | Along the way we will use: 17 | 18 | * HTML 19 | * Javascript, including the [Vue.js] library and the modern Fetch API 20 | * [Apache], the world's most-used web server[^apache] 21 | * [PHP] 22 | * [MySQL] and SQL 23 | * [Docker] 24 | * [Amazon Web Services] 25 | * ... and many other cool things 26 | 27 | [Apache]: https://www.apache.org 28 | [MySQL]: https://dev.mysql.com/downloads/ 29 | [PHP]: https://www.php.net 30 | [Vue.js]: https://vuejs.org/v2/guide/ 31 | [Docker]: https://www.docker.com 32 | [Amazon Web Services]: https://aws.amazon.com 33 | 34 | This text can't possibly provide good tutorials for all of those things, but I 35 | will try to link to some of the best resources I've found on the Internet. 36 | 37 | [^apache]: According to the [July 2019 Netcraft survey][netcraft-2019-07], 38 | Apache has top market share in the most active sites, busiest sites, and 39 | domains. It's been slowly losing ground to [nginx] and [Microsoft's IIS][IIS] 40 | 41 | [netcraft-2019-07]: https://news.netcraft.com/archives/2019/07/26/july-2019-web-server-survey.html 42 | [nginx]: https://www.nginx.com 43 | [IIS]: https://www.iis.net 44 | 45 | ## What you should already know 46 | 47 | Hopefully, you're at least somewhat familiar with your computer. You should also be 48 | able to install applications and follow instructions. (It might surprise you how 49 | frequently that last condition isn't met.) If you aren't already familiar with 50 | HTML, CSS, and SQL, then it will be helpful to seek out some tutorials when we 51 | encounter those languages. 52 | 53 | I expect you've used at least one object-oriented programming language, although 54 | it really doesn't matter which one. C# and Java are good choices. Scripting 55 | languages like Python, PHP, or Ruby fit the ticket if used in an object-oriented 56 | way. Again, it doesn't really matter which one, so long as you have some 57 | familiarity with basic control structures like `if` statements and loops, and 58 | a basic understanding of polymorphism and inheritance. 59 | 60 | *[HTML]: Hyper Text Markup Language 61 | *[CSS]: Cascading Style Sheets 62 | *[SQL]: Structured Query Language 63 | 64 | Along the way you'll get some practice with the Unix command line. Many 65 | technical systems rely on the command line or text files for configuration, 66 | and it's good to know how to navigate and run commands. I will explain commands 67 | the first time we use them, but a good quick reference page or tutorial of the 68 | Unix command line may be useful to you. 69 | 70 | To begin, you should know `ls`, `mkdir`, and `pwd`. 71 | 72 | !!! note "Command Line Tutorials" 73 | If you are not familiar with the command line, any of these references 74 | provide a excellent tutorials: 75 | 76 | * [_The Linux Command Line_][TLCL] by William Shotts has book and web 77 | versions. 78 | * [_Learn Enough Command Line to Be Dangerous_][LECLtbD] by Michael Hartl 79 | has the first few chapters of the book online. I recommend Chapter 1 and 80 | Section 2.2 to start. 81 | * CodeAcademy.com has a course ["Learn the Command Line"][CA-CL], which 82 | allows you to practice as you learn. 83 | 84 | [LECLtbD]: https://www.learnenough.com/command-line-tutorial/basics 85 | [TLCL]: http://linuxcommand.org/lc3_learning_the_shell.php 86 | [CA-CL]: https://www.codecademy.com/learn/learn-the-command-line 87 | -------------------------------------------------------------------------------- /docs/systems/1-vcs.md: -------------------------------------------------------------------------------- 1 | # Version Control Systems 2 | 3 | Have you ever needed to keep multiple versions of a file? Maybe you're president 4 | of the knitting club, and need to keep a list of active members, but you don't 5 | want to lose the historical list of who was active last year. Do you recreate 6 | the list every year, and keep old copies? 7 | 8 | Have you ever worked on a document, either by yourself or with a partner, and 9 | had different file names after each revision? Maybe you used name such as 10 | "Report", "Report-updated", "Report-updated-final", and maybe even 11 | "Report-updated-final-USE-ME". This method of versioning gets confusing quickly, 12 | especially when "final" doesn't really mean "final". 13 | 14 | Or consider a legislative body that proposes, debates, and amends laws. The law 15 | that's eventually passed (or defeated) may have little resemblance to the law 16 | that was originally proposed. Other legislators may propose amendments in 17 | committee, or on the floor of the legislature. Every change needs to be 18 | documented. Actually, every _proposed_ change, whether it passes or not, needs 19 | to be documented—and not just the text of the change, but also metadata such as 20 | who proposed the change and when. 21 | 22 | This is even more important with computer code, when adding features to an 23 | application may affect multiple files. Everyone needs to work from the most 24 | current files to help prevent errors. In addition, there needs to be an easy way 25 | of pushing the most current code to production, and _not_ pushing features that 26 | are incomplete or broken. 27 | 28 | Perhaps you can see why some automated system for handling versioning is 29 | critical. Such a system is called a _**version control system (VCS)**_. 30 | 31 | !!! note "Change Control Systems are Important Too" 32 | Organizations should also have an administrative process (with a supporting 33 | information system) called "change control", which helps keep track of what 34 | changes have been made in production systems. This is an important---but 35 | separate---topic, which we won't go in to here. 36 | 37 | Other names for version control include "revision control" and "source control". 38 | Although VCSs can be used for any type of versioned file, they are most often 39 | used for _source code management (SCM)_. 40 | 41 | *[VCS]: Version Control System 42 | *[VCSs]: Version Control Systems 43 | *[DVCS]: Distributed Version Control System 44 | *[DVCSs]: Distributed Version Control Systems 45 | *[SCM]: Source Code Management 46 | 47 | ## Early version control systems 48 | 49 | There were a handful of early attempts at building a VCS. For many years, one of 50 | the most common was a program called CVS, which stood for "Concurrent Versioning 51 | Sytem". CVS was free and open source, which was one of the reasons it was 52 | adopted so freely. 53 | 54 | CVS used a client–server model. The server held the "single source of truth" in 55 | a database called a _**repository**_. Developers could _**check out**_ copies from the 56 | repository to their local client machine. The local copy was called the _**working 57 | copy**_, or sometimes sandbox or workspace. After making changes, developers would 58 | _**commit**_ their changes to the remote repository. 59 | 60 | One major advantage of version control systems is _**branches**_. For example, the 61 | code for Windows 7 and Windows 10 are necessarily different, even though they 62 | have many of the same features. The same with Windows 10 Home and Windows 10 63 | Pro. The code is similar, but different in specific ways. When Microsoft 64 | releases a patch for a security flaw in Windows, in sometimes has to fix the 65 | security hole in each of those versions, or branches. It's common to create a 66 | branch for each released version of a product. Modernly, each feature might have 67 | its own branch as its being developed, with the feature being merged into the 68 | next version when it's ready. Additionally, each developer might have their own 69 | branch, to help limit unintended interactions from the work of others. 70 | 71 | If you attempted to commit your changes to a branch, and other developers had 72 | committed changes to the same branch since your checkout, it was necessary to 73 | _**update**_ (sometimes called _**merge**_) the new commits back from the 74 | repository to your local copy before the your new commit would be permitted. As 75 | you might suspect, as a development team grew in size, the amount of time it 76 | spent merging each others' commits also grew. Modernly, the term merge is used 77 | to describe copying updates in one branch into another branch. 78 | 79 | Over time, some of the warts in CVS became clear. Commits to CVS were not, 80 | _atomic_[^atomic], meaning if a commit was interrupted the repository could 81 | become corrupted. In addition there was some odd behavior around how it handled 82 | merging files renamed in separate branches. Several commercial VCSs sprung up, 83 | including Visual SourceSafe, which was bought by Microsoft in 1995. 84 | To add necessary features, and take advantage of the many things developers had 85 | learned about version control systems, in 2000, a new project called Subversion 86 | (often abbreviated as SVN) was started by some of the developers of CVS. 87 | Subversion 1.0 was officially released in 2004. 88 | 89 | [^atomic]: You might remember _atomic_ as one of the necessary parts of ACID 90 | properties of databases. The term ACID---which stands for Atomicity, 91 | Consistency, Isolation, and Durability---was coined by Andreas Reuter and Theo 92 | Härder in 1983. 93 | 94 | ## Distributed version control systems 95 | 96 | Until about 2005, the dominant model for VCS was client–server. Most systems, 97 | including CVS and Subversion, used this model. One of the downsides of the 98 | client--server model should be obvious: if the server failed, the repository 99 | history was lost, and developers were left with only their working copies. This 100 | was partly a conscious design decision in early systems. Hard drive space was 101 | expensive and network speeds were slow, so transferring large files was not 102 | optimal. 103 | 104 | Our systems have changed. Networks are faster and hard drive size is no longer 105 | constraints for most applications. More people are connected, which also makes 106 | peer-to-peer architectures more viable. 107 | 108 | In 2005, following a brief controversy with the VCS used by Linux, two new 109 | systems, Git and Mercurial, were released to the public as open source software. 110 | Both are _distributed_ version control systems (DVCS), which allow developers to 111 | push commits to _any_ remote system. With modern DVCSs, there may be a single 112 | centralized repository, or many. Developers can even push code commits directly 113 | to each other, bypassing the central system. Each developer keeps a full history 114 | of their work, so it's significantly easier to recover a projects history if a 115 | central server fails. 116 | 117 | Git was written by Linus Torvalds, the creator of Linux. Mercurial was written 118 | by Matt Mackall, at about the same time, and for much the same reason. Somewhat 119 | unsurprisingly, the Linux project adopted Git. Git was perhaps made famous by 120 | the creation of GitHub, which promised free cloud hosting for open source 121 | projects. I was dabbling with an open source project at the time, and it seemed 122 | there was a mass exodus to GitHub, as it was so very much better than other open 123 | source hosting at the time. (Google Code was better than SourceForge, but still 124 | had its challenges.) As of 2019, Mercurial is used by several major 125 | organizations, including Facebook[^MM], W3C, and Mozilla. 126 | 127 | [^MM]: As of 2019, Matt Mackall, author of Mercurial, works at Facebook. 128 | 129 | One of the major differences between DVCS and VCS is the local copy of the 130 | repository. Thus, systems like Git have a _**two-phase commit**_. Changes are 131 | first committed to the local repository, then when they are ready, the developer 132 | will _**push**_ all of the commits in a branch to a remote repository. 133 | 134 | ## Using version control systems 135 | 136 | Modern version control systems have several common characteristics: 137 | 138 | 1. Annotation and blame. The ability to provide a full history of each line in a 139 | code file, including who wrote, deleted, or changed it, when each revision 140 | was committed, and notes regarding the purpose of each commit. It's 141 | considered good practice to briefly describe each commit. A report showing 142 | who wrote each line (and when) is often called a "blame" file. 143 | 144 | 2. Branching and merging. Although workflows differ between organizations, it's 145 | common to create a branch for each feature under development. As a feature is 146 | completed, those branches are typically merged into release branches. In more 147 | complicated setups, each developer might have their own branch for each 148 | feature. The ability to merge a change to multiple branches---for example, 149 | making a security fix to multiple versions---is much easier in modern systems. 150 | 151 | 3. Traceability. Annotations allow VCSs to connect to requirements systems and 152 | issue tracking systems. In some industries, such as defense and aerospace, 153 | it's critical to be able to identify why each change was made, and who 154 | requested it. Traceability and requirements systems are often separate from 155 | VCSs, but can integrate with them. 156 | 157 | ## Git hosting providers 158 | 159 | It's important not to confuse Git with the hosting provider. Atlassian has a Git 160 | hosting service called [BitBucket]. GitHub---which was acquired by Microsoft in 161 | 2018 for $7.5 billion---has their eponymous hosting service. Like Bitbucket, 162 | GitHub is merely a web interface over the top of a cloud-hosted Git server. 163 | 164 | Modernly, there are many quality Git hosting providers, including [GitLab], 165 | which also provides an opensource interface if you wish to run your own hosting 166 | server and repository manager. 167 | 168 | [BitBucket]: https://bitbucket.org/product/ 169 | [GitHub]: https://github.com 170 | [GitLab]: https://about.gitlab.com 171 | 172 | ## Additional reading 173 | 174 | * _[Pro Git]_, by Scott Chacon and Ben Straub, APress. This book available 175 | online in HTML, PDF, and e-book formats for free under a Creative Commons 176 | Attribution Non-Commercial Share-Alike 3.0 license. It's worth reading the 177 | first few chapters. 178 | * _[Version Control with Subversion]_, by Ben Collins-Sussman, Brian W. 179 | Fitzpatrick, & C. Michael Pilato, O'Reilly. This book is available online in 180 | HTML Released under a Creative Commons 2.0 Attribution license. The section on 181 | [Subversion's history](http://svnbook.red-bean.com/en/1.7/svn.intro.whatis.html#svn.intro.history) 182 | is brief and interesting. 183 | * Perhaps the best introductions to Git come from [Atlassian](https://www.atlassian.com/git/tutorials/what-is-version-control) 184 | and from [GitHub](https://try.github.io) 185 | * GitHub has a [great tutorial](https://guides.github.com/activities/hello-world/) 186 | for their repository hosting product. GitHub is free for public/open source 187 | projects. 188 | * [Git-it] is a desktop (Mac, Windows and Linux) app that teaches you how to use 189 | Git and GitHub on the command line. 190 | * _Wikipedia_ has a good article on ["Version Control"](https://en.wikipedia.org/wiki/Version_control) 191 | 192 | [Pro Git]: https://git-scm.com/book/en/v2 193 | [Version Control with Subversion]: http://svnbook.red-bean.com 194 | [Git-it]: https://github.com/jlord/git-it-electron#what-to-install 195 | 196 | ## Footnotes 197 | ///Footnotes Go Here/// 198 | 199 | ## Review 200 | 201 | 1. What is a "Version Control System" used for? 202 | 2. Describe the difference between distributed version control systems and 203 | centralized version control systems. 204 | 3. Explain the phrase "two-phase commit". 205 | 3. Explain the terms "commit", "push", "merge". 206 | 4. Explain the difference between "merge" and "update" in modern Git. 207 | 208 | ## Exercises 209 | 210 | 1. Create a repository in Git, using the command line. Add a file, and commit 211 | the change. Make a change to the file, and commit with a useful commit message. 212 | 213 | 2. Create a repository with a hosting service like GitHub. Perform a checkout, 214 | and make changes. Push your commits back to the hosted repository. --------------------------------------------------------------------------------