├── README.md
├── .gitignore
├── docs
├── css
│ └── docs.css
├── systems
│ ├── 3-systems.md
│ ├── 2-data.md
│ ├── 0-intro.md
│ └── 1-vcs.md
├── index.md
└── about.md
└── mkdocs.yml
/README.md:
--------------------------------------------------------------------------------
1 | # ds-guide
2 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | site
3 | site.zip
4 | Style Guide
5 | TODO
6 |
--------------------------------------------------------------------------------
/docs/css/docs.css:
--------------------------------------------------------------------------------
1 | .footnote {
2 | font-size: 85%;
3 | }
4 |
5 | /*.footnote > hr::after {
6 | content: "
Footnotes
";
7 | }*/
8 |
--------------------------------------------------------------------------------
/docs/systems/3-systems.md:
--------------------------------------------------------------------------------
1 | thin client, thick client, rich client
2 |
3 | https://techbeacon.com/app-dev-testing/top-5-software-architecture-patterns-how-make-right-choice
4 | https://www.oreilly.com/library/view/software-architecture-patterns/9781491971437/
5 |
6 | https://stackify.com/web-application-architecture/
7 |
8 |
9 | https://www.infoworld.com/article/3080611/learning-from-soa-5-lessons-for-the-microservices-era.html
10 | https://martinfowler.com/articles/microservices.html#MicroservicesAndSoa
11 | http://service-architecture.blogspot.com/2014/03/microservices-is-soa-for-those-who-know.html
12 | https://martinfowler.com/microservices/
13 |
--------------------------------------------------------------------------------
/mkdocs.yml:
--------------------------------------------------------------------------------
1 | site_name: Introduction to Data & Systems
2 | nav:
3 | - Home: index.md
4 | - Chapters:
5 | - 'Introduction' : 'systems/0-intro.md'
6 | - 'Chapter 1: Version Control Systems' : 'systems/1-vcs.md'
7 | - About: about.md
8 | copyright: "© 2019, Tom Gregory"
9 | theme: cinder
10 | extra_css:
11 | - "css/docs.css"
12 | markdown_extensions:
13 | - abbr
14 | - toc:
15 | permalink: false
16 | - admonition
17 | - footnotes
18 | - smarty
19 | - pymdownx.mark
20 | shortcuts:
21 | help: 191 # ?
22 | next: 39 # right arrow
23 | previous: 37 # left arrow
24 | search: 83 # s
25 |
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # Welcome
2 |
3 | Welcome to “Introduction to Data & Systems”. This course is intended to be a
4 | practical, hands-on approach to learning about technology systems in modern
5 | business. One of the great things about technology—and one if the most
6 | frustrating things about technology—is that it's constantly changing. Therefore,
7 | delivering this content via a digital medium seems ideal. Updates and changes
8 | are immediately visible, which helps deliver the most up-to-date and relevant
9 | curriculum.
10 |
11 |
12 |
13 | !!! note "Still being built!"
14 | This document is—and probably always will be—a work in progress.
15 |
16 | ## Chapters
17 |
18 | * [Introduction](systems/0-intro/)
19 | * [Chapter 1: Version Control Systems](systems/1-vcs/)
20 |
--------------------------------------------------------------------------------
/docs/about.md:
--------------------------------------------------------------------------------
1 | # About
2 |
3 | “Introduction to Data & Systems” is written to suppor a module of the same name
4 | in the Masters of Information Systems core class at Indiana University.
5 |
6 | ## Errors and suggestions
7 | If you find errors or have suggestions, please let me know.
8 |
9 | ## License
10 |
11 | All content ©2019 by Tom Gregory unless otherwise noted. You may share this
12 | content in limited circumstances as detailed by the listed license.
13 |
14 |
15 | 
16 | This work is licensed under a
17 | Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
18 |
19 |
20 |
--------------------------------------------------------------------------------
/docs/systems/2-data.md:
--------------------------------------------------------------------------------
1 | What is data?
2 | Is it different from information?
3 | What’s the difference between data and metadata?
4 | What does data look like?
5 |
6 | Metadata: “Metadata are "data [information] that provide information about other data". (Wikipedia, “Metadata”)
7 |
8 | Shareable: Readily accessed by more than one person at a time
9 | Transportable: Easily moved to a decision maker
10 | Secure: Protected from destruction and unauthorized use
11 | Accurate: Reliable, precise records
12 | Timely: Current and up-to-date Relevant Appropriate to the decision
13 |
14 | Watson, Richard (2013-12-27). Data Management. eGreen Press.
15 |
16 | Usability
17 | Relevant
18 | Simple
19 | Flexible
20 | Economical
21 |
22 | Quality
23 | Accurate
24 | Verifiable
25 | Complete
26 | Reliable
27 |
28 | Delivery
29 | Timely
30 | Accessible
31 | Secure
32 |
33 | [img, nformation systsems data cycle]
34 |
35 | [Vijay, data triad]
36 |
37 | CAP Theorem (also called Brewster’s Theorem)
38 | At best, a distributed system can provide only two of the three guarantees:
39 | Consistency All nodes see the same data at the same time)
40 | Availability Every request receives a response about whether it succeeded or failed
41 | Partition tolerance System continues to operate despite arbitrary partitioning due to network failures
42 |
43 | Problems with data management systems
44 |
45 | Lack of redundancy
46 | Lack of data control
47 | Poor interface
48 | Lack of access/lack of security
49 | Delays
50 | Lack of reality
51 | Diversity of systems/silos
52 | Lack of data integration
53 | Volume of data
54 |
55 |
56 | Data silos are revealed as an organization grows.
57 |
58 | Data should have a documented owner. Other documentation should include meaning, associated risk/security, strategic importance, etc.
59 |
60 |
61 | “Client–server application”
62 | “Thin client/thick client”
63 |
64 |
65 |
--------------------------------------------------------------------------------
/docs/systems/0-intro.md:
--------------------------------------------------------------------------------
1 | # Introduction
2 | This course—and consequently, this text—is designed to be as hands-on as
3 | possible. We will cover some theory, but its primary purpose is to introduce
4 | technologies that reinforce theoretical learning in other courses.
5 |
6 | By the end of the course you will have a better understanding of different types
7 | of data and how data are used for different purposes. You will understand some
8 | of the tradeoffs made by designers in data and systems, and describe and use
9 | multiple data formats. You will be able to describe the "devops" process (and
10 | even practice some of the steps), will better understand cloud computing and
11 | virtualization, version control, APIs, web services, and software design
12 | patterns. You will build a working multi-tier web application, and use a
13 | unix-style command line.
14 |
15 | ## Projects and software
16 | Along the way we will use:
17 |
18 | * HTML
19 | * Javascript, including the [Vue.js] library and the modern Fetch API
20 | * [Apache], the world's most-used web server[^apache]
21 | * [PHP]
22 | * [MySQL] and SQL
23 | * [Docker]
24 | * [Amazon Web Services]
25 | * ... and many other cool things
26 |
27 | [Apache]: https://www.apache.org
28 | [MySQL]: https://dev.mysql.com/downloads/
29 | [PHP]: https://www.php.net
30 | [Vue.js]: https://vuejs.org/v2/guide/
31 | [Docker]: https://www.docker.com
32 | [Amazon Web Services]: https://aws.amazon.com
33 |
34 | This text can't possibly provide good tutorials for all of those things, but I
35 | will try to link to some of the best resources I've found on the Internet.
36 |
37 | [^apache]: According to the [July 2019 Netcraft survey][netcraft-2019-07],
38 | Apache has top market share in the most active sites, busiest sites, and
39 | domains. It's been slowly losing ground to [nginx] and [Microsoft's IIS][IIS]
40 |
41 | [netcraft-2019-07]: https://news.netcraft.com/archives/2019/07/26/july-2019-web-server-survey.html
42 | [nginx]: https://www.nginx.com
43 | [IIS]: https://www.iis.net
44 |
45 | ## What you should already know
46 |
47 | Hopefully, you're at least somewhat familiar with your computer. You should also be
48 | able to install applications and follow instructions. (It might surprise you how
49 | frequently that last condition isn't met.) If you aren't already familiar with
50 | HTML, CSS, and SQL, then it will be helpful to seek out some tutorials when we
51 | encounter those languages.
52 |
53 | I expect you've used at least one object-oriented programming language, although
54 | it really doesn't matter which one. C# and Java are good choices. Scripting
55 | languages like Python, PHP, or Ruby fit the ticket if used in an object-oriented
56 | way. Again, it doesn't really matter which one, so long as you have some
57 | familiarity with basic control structures like `if` statements and loops, and
58 | a basic understanding of polymorphism and inheritance.
59 |
60 | *[HTML]: Hyper Text Markup Language
61 | *[CSS]: Cascading Style Sheets
62 | *[SQL]: Structured Query Language
63 |
64 | Along the way you'll get some practice with the Unix command line. Many
65 | technical systems rely on the command line or text files for configuration,
66 | and it's good to know how to navigate and run commands. I will explain commands
67 | the first time we use them, but a good quick reference page or tutorial of the
68 | Unix command line may be useful to you.
69 |
70 | To begin, you should know `ls`, `mkdir`, and `pwd`.
71 |
72 | !!! note "Command Line Tutorials"
73 | If you are not familiar with the command line, any of these references
74 | provide a excellent tutorials:
75 |
76 | * [_The Linux Command Line_][TLCL] by William Shotts has book and web
77 | versions.
78 | * [_Learn Enough Command Line to Be Dangerous_][LECLtbD] by Michael Hartl
79 | has the first few chapters of the book online. I recommend Chapter 1 and
80 | Section 2.2 to start.
81 | * CodeAcademy.com has a course ["Learn the Command Line"][CA-CL], which
82 | allows you to practice as you learn.
83 |
84 | [LECLtbD]: https://www.learnenough.com/command-line-tutorial/basics
85 | [TLCL]: http://linuxcommand.org/lc3_learning_the_shell.php
86 | [CA-CL]: https://www.codecademy.com/learn/learn-the-command-line
87 |
--------------------------------------------------------------------------------
/docs/systems/1-vcs.md:
--------------------------------------------------------------------------------
1 | # Version Control Systems
2 |
3 | Have you ever needed to keep multiple versions of a file? Maybe you're president
4 | of the knitting club, and need to keep a list of active members, but you don't
5 | want to lose the historical list of who was active last year. Do you recreate
6 | the list every year, and keep old copies?
7 |
8 | Have you ever worked on a document, either by yourself or with a partner, and
9 | had different file names after each revision? Maybe you used name such as
10 | "Report", "Report-updated", "Report-updated-final", and maybe even
11 | "Report-updated-final-USE-ME". This method of versioning gets confusing quickly,
12 | especially when "final" doesn't really mean "final".
13 |
14 | Or consider a legislative body that proposes, debates, and amends laws. The law
15 | that's eventually passed (or defeated) may have little resemblance to the law
16 | that was originally proposed. Other legislators may propose amendments in
17 | committee, or on the floor of the legislature. Every change needs to be
18 | documented. Actually, every _proposed_ change, whether it passes or not, needs
19 | to be documented—and not just the text of the change, but also metadata such as
20 | who proposed the change and when.
21 |
22 | This is even more important with computer code, when adding features to an
23 | application may affect multiple files. Everyone needs to work from the most
24 | current files to help prevent errors. In addition, there needs to be an easy way
25 | of pushing the most current code to production, and _not_ pushing features that
26 | are incomplete or broken.
27 |
28 | Perhaps you can see why some automated system for handling versioning is
29 | critical. Such a system is called a _**version control system (VCS)**_.
30 |
31 | !!! note "Change Control Systems are Important Too"
32 | Organizations should also have an administrative process (with a supporting
33 | information system) called "change control", which helps keep track of what
34 | changes have been made in production systems. This is an important---but
35 | separate---topic, which we won't go in to here.
36 |
37 | Other names for version control include "revision control" and "source control".
38 | Although VCSs can be used for any type of versioned file, they are most often
39 | used for _source code management (SCM)_.
40 |
41 | *[VCS]: Version Control System
42 | *[VCSs]: Version Control Systems
43 | *[DVCS]: Distributed Version Control System
44 | *[DVCSs]: Distributed Version Control Systems
45 | *[SCM]: Source Code Management
46 |
47 | ## Early version control systems
48 |
49 | There were a handful of early attempts at building a VCS. For many years, one of
50 | the most common was a program called CVS, which stood for "Concurrent Versioning
51 | Sytem". CVS was free and open source, which was one of the reasons it was
52 | adopted so freely.
53 |
54 | CVS used a client–server model. The server held the "single source of truth" in
55 | a database called a _**repository**_. Developers could _**check out**_ copies from the
56 | repository to their local client machine. The local copy was called the _**working
57 | copy**_, or sometimes sandbox or workspace. After making changes, developers would
58 | _**commit**_ their changes to the remote repository.
59 |
60 | One major advantage of version control systems is _**branches**_. For example, the
61 | code for Windows 7 and Windows 10 are necessarily different, even though they
62 | have many of the same features. The same with Windows 10 Home and Windows 10
63 | Pro. The code is similar, but different in specific ways. When Microsoft
64 | releases a patch for a security flaw in Windows, in sometimes has to fix the
65 | security hole in each of those versions, or branches. It's common to create a
66 | branch for each released version of a product. Modernly, each feature might have
67 | its own branch as its being developed, with the feature being merged into the
68 | next version when it's ready. Additionally, each developer might have their own
69 | branch, to help limit unintended interactions from the work of others.
70 |
71 | If you attempted to commit your changes to a branch, and other developers had
72 | committed changes to the same branch since your checkout, it was necessary to
73 | _**update**_ (sometimes called _**merge**_) the new commits back from the
74 | repository to your local copy before the your new commit would be permitted. As
75 | you might suspect, as a development team grew in size, the amount of time it
76 | spent merging each others' commits also grew. Modernly, the term merge is used
77 | to describe copying updates in one branch into another branch.
78 |
79 | Over time, some of the warts in CVS became clear. Commits to CVS were not,
80 | _atomic_[^atomic], meaning if a commit was interrupted the repository could
81 | become corrupted. In addition there was some odd behavior around how it handled
82 | merging files renamed in separate branches. Several commercial VCSs sprung up,
83 | including Visual SourceSafe, which was bought by Microsoft in 1995.
84 | To add necessary features, and take advantage of the many things developers had
85 | learned about version control systems, in 2000, a new project called Subversion
86 | (often abbreviated as SVN) was started by some of the developers of CVS.
87 | Subversion 1.0 was officially released in 2004.
88 |
89 | [^atomic]: You might remember _atomic_ as one of the necessary parts of ACID
90 | properties of databases. The term ACID---which stands for Atomicity,
91 | Consistency, Isolation, and Durability---was coined by Andreas Reuter and Theo
92 | Härder in 1983.
93 |
94 | ## Distributed version control systems
95 |
96 | Until about 2005, the dominant model for VCS was client–server. Most systems,
97 | including CVS and Subversion, used this model. One of the downsides of the
98 | client--server model should be obvious: if the server failed, the repository
99 | history was lost, and developers were left with only their working copies. This
100 | was partly a conscious design decision in early systems. Hard drive space was
101 | expensive and network speeds were slow, so transferring large files was not
102 | optimal.
103 |
104 | Our systems have changed. Networks are faster and hard drive size is no longer
105 | constraints for most applications. More people are connected, which also makes
106 | peer-to-peer architectures more viable.
107 |
108 | In 2005, following a brief controversy with the VCS used by Linux, two new
109 | systems, Git and Mercurial, were released to the public as open source software.
110 | Both are _distributed_ version control systems (DVCS), which allow developers to
111 | push commits to _any_ remote system. With modern DVCSs, there may be a single
112 | centralized repository, or many. Developers can even push code commits directly
113 | to each other, bypassing the central system. Each developer keeps a full history
114 | of their work, so it's significantly easier to recover a projects history if a
115 | central server fails.
116 |
117 | Git was written by Linus Torvalds, the creator of Linux. Mercurial was written
118 | by Matt Mackall, at about the same time, and for much the same reason. Somewhat
119 | unsurprisingly, the Linux project adopted Git. Git was perhaps made famous by
120 | the creation of GitHub, which promised free cloud hosting for open source
121 | projects. I was dabbling with an open source project at the time, and it seemed
122 | there was a mass exodus to GitHub, as it was so very much better than other open
123 | source hosting at the time. (Google Code was better than SourceForge, but still
124 | had its challenges.) As of 2019, Mercurial is used by several major
125 | organizations, including Facebook[^MM], W3C, and Mozilla.
126 |
127 | [^MM]: As of 2019, Matt Mackall, author of Mercurial, works at Facebook.
128 |
129 | One of the major differences between DVCS and VCS is the local copy of the
130 | repository. Thus, systems like Git have a _**two-phase commit**_. Changes are
131 | first committed to the local repository, then when they are ready, the developer
132 | will _**push**_ all of the commits in a branch to a remote repository.
133 |
134 | ## Using version control systems
135 |
136 | Modern version control systems have several common characteristics:
137 |
138 | 1. Annotation and blame. The ability to provide a full history of each line in a
139 | code file, including who wrote, deleted, or changed it, when each revision
140 | was committed, and notes regarding the purpose of each commit. It's
141 | considered good practice to briefly describe each commit. A report showing
142 | who wrote each line (and when) is often called a "blame" file.
143 |
144 | 2. Branching and merging. Although workflows differ between organizations, it's
145 | common to create a branch for each feature under development. As a feature is
146 | completed, those branches are typically merged into release branches. In more
147 | complicated setups, each developer might have their own branch for each
148 | feature. The ability to merge a change to multiple branches---for example,
149 | making a security fix to multiple versions---is much easier in modern systems.
150 |
151 | 3. Traceability. Annotations allow VCSs to connect to requirements systems and
152 | issue tracking systems. In some industries, such as defense and aerospace,
153 | it's critical to be able to identify why each change was made, and who
154 | requested it. Traceability and requirements systems are often separate from
155 | VCSs, but can integrate with them.
156 |
157 | ## Git hosting providers
158 |
159 | It's important not to confuse Git with the hosting provider. Atlassian has a Git
160 | hosting service called [BitBucket]. GitHub---which was acquired by Microsoft in
161 | 2018 for $7.5 billion---has their eponymous hosting service. Like Bitbucket,
162 | GitHub is merely a web interface over the top of a cloud-hosted Git server.
163 |
164 | Modernly, there are many quality Git hosting providers, including [GitLab],
165 | which also provides an opensource interface if you wish to run your own hosting
166 | server and repository manager.
167 |
168 | [BitBucket]: https://bitbucket.org/product/
169 | [GitHub]: https://github.com
170 | [GitLab]: https://about.gitlab.com
171 |
172 | ## Additional reading
173 |
174 | * _[Pro Git]_, by Scott Chacon and Ben Straub, APress. This book available
175 | online in HTML, PDF, and e-book formats for free under a Creative Commons
176 | Attribution Non-Commercial Share-Alike 3.0 license. It's worth reading the
177 | first few chapters.
178 | * _[Version Control with Subversion]_, by Ben Collins-Sussman, Brian W.
179 | Fitzpatrick, & C. Michael Pilato, O'Reilly. This book is available online in
180 | HTML Released under a Creative Commons 2.0 Attribution license. The section on
181 | [Subversion's history](http://svnbook.red-bean.com/en/1.7/svn.intro.whatis.html#svn.intro.history)
182 | is brief and interesting.
183 | * Perhaps the best introductions to Git come from [Atlassian](https://www.atlassian.com/git/tutorials/what-is-version-control)
184 | and from [GitHub](https://try.github.io)
185 | * GitHub has a [great tutorial](https://guides.github.com/activities/hello-world/)
186 | for their repository hosting product. GitHub is free for public/open source
187 | projects.
188 | * [Git-it] is a desktop (Mac, Windows and Linux) app that teaches you how to use
189 | Git and GitHub on the command line.
190 | * _Wikipedia_ has a good article on ["Version Control"](https://en.wikipedia.org/wiki/Version_control)
191 |
192 | [Pro Git]: https://git-scm.com/book/en/v2
193 | [Version Control with Subversion]: http://svnbook.red-bean.com
194 | [Git-it]: https://github.com/jlord/git-it-electron#what-to-install
195 |
196 | ## Footnotes
197 | ///Footnotes Go Here///
198 |
199 | ## Review
200 |
201 | 1. What is a "Version Control System" used for?
202 | 2. Describe the difference between distributed version control systems and
203 | centralized version control systems.
204 | 3. Explain the phrase "two-phase commit".
205 | 3. Explain the terms "commit", "push", "merge".
206 | 4. Explain the difference between "merge" and "update" in modern Git.
207 |
208 | ## Exercises
209 |
210 | 1. Create a repository in Git, using the command line. Add a file, and commit
211 | the change. Make a change to the file, and commit with a useful commit message.
212 |
213 | 2. Create a repository with a hosting service like GitHub. Perform a checkout,
214 | and make changes. Push your commits back to the hosted repository.
--------------------------------------------------------------------------------