├── 2017
    ├── DDN_2017_10_DsePrimer.pdf
    ├── DDN_2017_11_Apriori.pdf
    ├── DDN_2017_11_Apriori.tar
    ├── DDN_2017_12_UDFs.pdf
    └── README.md
├── 2018
    ├── DDN_2018_13_AdvRep.pdf
    ├── DDN_2018_13_AdvRep.tar
    ├── DDN_2018_14_CQL-Search.pdf
    ├── DDN_2018_15_SearchPrimer.pdf
    ├── DDN_2018_16_Spatial.pdf
    ├── DDN_2018_16_Spatial.tar.gz
    ├── DDN_2018_17_Docker.pdf
    ├── DDN_2018_18_Studio.pdf
    ├── DDN_2018_19_QueryHarness.pdf
    ├── DDN_2018_20_Security.pdf
    ├── DDN_2018_21_Backup and Recovery.pdf
    ├── DDN_2018_22_ClientSideDriver.pdf
    ├── DDN_2018_23_IndexingCollections.pdf
    ├── DDN_2018_24_Zeppelin.pdf
    └── README.md
├── 2019
    ├── 41 Simple Customer Graph.txt
    ├── DDN_2019_25_GraphPrimer.pdf
    ├── DDN_2019_26_Inventory.pdf
    ├── DDN_2019_27_Security.pdf
    ├── DDN_2019_28_Jupyter, R.pdf
    ├── DDN_2019_29_Kafka.pdf
    ├── DDN_2019_30_GraphPrimer 68.pdf
    ├── DDN_2019_31a_DSE, Reco Engines.pdf
    ├── DDN_2019_31b_DSE, Reco Engines.pptx
    ├── DDN_2019_31c_JustGroceryData.tar
    ├── DDN_2019_31d_AllCommands.txt
    ├── DDN_2019_31e_KillrVideoDataAsPipe.tar
    ├── DDN_2019_31f_KillrVideoDDL.cql
    ├── DDN_2019_32_Python68Client.pdf
    ├── DDN_2019_32_Python68Client.py
    ├── DDN_2019_33_ShortestPoint.pdf
    ├── DDN_2019_33_ShortestPoint.tar
    ├── DDN_2019_34_GremlinPrimer.pdf
    ├── DDN_2019_34_GremlinPrimer.txt
    ├── DDN_2019_35 Desktop.pdf
    ├── DDN_2019_36_cfstats.pdf
    └── README.md
├── 2020
    ├── DDN_2020_37_Parquet.pdf
    ├── DDN_2020_38_FileMethods.pdf
    ├── DDN_2020_39_DriverFutures.pdf
    ├── DDN_2020_40_SSL.pdf
    ├── DDN_2020_41_GraphQL.pdf
    ├── DDN_2020_42_AstraGeohash.pdf
    ├── DDN_2020_42_AstraGeohash_Data.pipe.gz
    ├── DDN_2020_42_AstraGeohash_Programs.tar.gz
    ├── DDN_2020_43_AstraApiProgramming.pdf
    ├── DDN_2020_43_NoteBook.tar
    ├── DDN_2020_44_NoSQLBench.pdf
    ├── DDN_2020_44_NoSQLBench.yaml
    ├── DDN_2020_44_NoSQLBench_Slides.pdf
    ├── DDN_2020_45_KubernetesOperator.pdf
    ├── DDN_2020_45_KubernetesOperator.tar
    ├── DDN_2020_46_BetterVersOf42.pdf
    ├── DDN_2020_47_VMs.pdf
    ├── DDN_2020_48_NodeReplaceWoBootstrap.pdf
    └── README.md
├── 2021
    ├── 61_DemoProgram.tar.gz
    ├── DDN_2021_49_KubernetesPrimer.pdf
    ├── DDN_2021_50_KubernetesNodeRecovery.pdf
    ├── DDN_2021_51_KubernetesClusterCloning.pdf
    ├── DDN_2021_52_KubernetesSnapshots.pdf
    ├── DDN_2021_53_MoreContainersHelm.pdf
    ├── DDN_2021_53_ToolkitVersion2.tar
    ├── DDN_2021_54_AstraSvcBroker.pdf
    ├── DDN_2021_55_K8ssandra.pdf
    ├── DDN_2021_56_K8ssandra, Document API.pdf
    ├── DDN_2021_57_K8ssandra, GraphQL.pdf
    ├── DDN_2021_58_KastenVeeam.pdf
    ├── DDN_2021_59_DseStargate.pdf
    ├── DDN_2021_60_SnowFlake.pdf
    ├── DDN_2021_60_SnowFlake.tar
    ├── DDN_2021_KubernetesPrimer_Toolkit.tar
    └── README.md
├── 2022
    └── DDN_2022_61_SchemaValidation.pdf
└── README.md


/2017/DDN_2017_10_DsePrimer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_10_DsePrimer.pdf


--------------------------------------------------------------------------------
/2017/DDN_2017_11_Apriori.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_11_Apriori.pdf


--------------------------------------------------------------------------------
/2017/DDN_2017_11_Apriori.tar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_11_Apriori.tar


--------------------------------------------------------------------------------
/2017/DDN_2017_12_UDFs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_12_UDFs.pdf


--------------------------------------------------------------------------------
/2017/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | DataStax Developer's Notebook - Monthly Articles 2017
 3 | ===================
 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
 6 | 
 7 | 
 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
 9 | 
10 | 2017 December - - 
11 | 
12 | >Customer: As my company is beginning significant development of new applications that target 
13 | >the DataStax Enterprise database server (DSE), we are starting to explore some of the 
14 | >programability that DSE offers us. Specifically, we are very much interested in user defined 
15 | >functions (UDFs). Can you help ?
16 | >
17 | >Daniel:
18 | >Excellent question ! The topic of server side database programming can devolve into one of 
19 | >those information technology religious arguments, something we generally seek to avoid. Still, 
20 | >DataStax Enterprise (DSE) does offer user defined functions (UDFs), user defined aggregates 
21 | >(UDAs), and user defined types (UDTs).
22 | >
23 | >Recall that a core value proposition which DSE delivers is time constant lookups, that is; 
24 | >a CQL (Cassandra query language, similar to structured query language, SQL) SELECT, UPDATE 
25 | >or DELETE that uses an equality on at least the partition key columns from the primary key. 
26 | >These are the linearly scalable operations that DSE is distinguished for. If you seek to do 
27 | >aggregates or other set operands on large data sets (online analytics style processing, OLAP), 
28 | >DSE will likely need to perform a scatter gather query (a query that reads from multiple 
29 | >concurrent nodes). Inherently, these types of queries do not perform in low single digit 
30 | >millisecond time; just be advised.
31 | >
32 | >In this edition of this document we will detail all that you ask; user defined functions, 
33 | >and also user defined aggregates and user defined types, and we’ll write application code 
34 | >for each.
35 | >
36 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_12_UDFs.pdf).
37 | >
38 | 
39 | 2017 November - -
40 | 
41 | >Customer:
42 | >My company wants to deliver a, "customers who bought (x) also bought (y)" functionality to our
43 | >Web site. Can the DataStax database server help me do this ?
44 | >
45 | >Daniel:
46 | >Excellent question ! In this document we deliver all of the program code, sample data, and
47 | >instructions to deliver a recommendation engine ("customers who bought (x) bought (y) ..")
48 | >using DataStax Enterprise (DSE) and its Analytics Function powered by Apache Spark.
49 | >
50 | >First we code the solution by hand in Python, so you have the ability to fully dissect all
51 | >of the processing logic (master the Apriori algorithm). Then we move to using the supported
52 | >and parallel capable Spark FPGrowth library in both Python and Scala.
53 | >
54 | >Along the way we install Scala, Gradle, and use support functions like the DSE parallel
55 | >filesystem. We supply a 1.7 MB Tar ball with all data and programming.
56 | >
57 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_11_Apriori.pdf).
58 | >
59 | >[Resource Kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_11_Apriori.tar),
60 | >all of the programs and data used in this edition in Tar format.
61 | >
62 | 
63 | 2017 October - -
64 | 
65 | >Customer:
66 | >My company is investigating using DataStax for our new Customer/360 system in our customer call 
67 | >center. I haven’t learned a new database in over 10 years, and should mention that I know none 
68 | >of the NoSQL (post relational) databases. Can you help ?
69 | >
70 | >Daniel:
71 | >Excellent question ! We’ve expertly used a good number of the leading NoSQL databases and while 
72 | >DataStax may take longer to master than some, DataStax is easily more capable (functionally, and 
73 | >scalability wise), than any other systems we have experienced.
74 | >
75 | >DataStax supports operational AND analytics workloads on one integrated platform, offers no single 
76 | >point of failure, is proven to scale past 1000 nodes, and is enterprise ready with all of the requisite 
77 | >security and administrative (maintenance and self healing) features.
78 | >
79 | >In this document we will:
80 | >
81 | >• Walk through a reasonably complete primer on DataStax Enterprise (DSE) terms, its object hierarchy, history, use, operating conventions, configuration files, and more.
82 | >
83 | >• Build a 2 node DSE cluster from scratch with a NetworkTopologyStrategy. 
84 | >
85 | >• Demonstrate network partition failure tolerance.
86 | >
87 | >• Demonstrate strong and eventual consistency.
88 | >
89 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_10_DsePrimer.pdf).
90 | >
91 | 
92 | 


--------------------------------------------------------------------------------
/2018/DDN_2018_13_AdvRep.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_13_AdvRep.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_13_AdvRep.tar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_13_AdvRep.tar


--------------------------------------------------------------------------------
/2018/DDN_2018_14_CQL-Search.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_14_CQL-Search.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_15_SearchPrimer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_15_SearchPrimer.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_16_Spatial.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_16_Spatial.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_16_Spatial.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_16_Spatial.tar.gz


--------------------------------------------------------------------------------
/2018/DDN_2018_17_Docker.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_17_Docker.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_18_Studio.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_18_Studio.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_19_QueryHarness.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_19_QueryHarness.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_20_Security.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_20_Security.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_21_Backup and Recovery.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_21_Backup and Recovery.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_22_ClientSideDriver.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_22_ClientSideDriver.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_23_IndexingCollections.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_23_IndexingCollections.pdf


--------------------------------------------------------------------------------
/2018/DDN_2018_24_Zeppelin.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_24_Zeppelin.pdf


--------------------------------------------------------------------------------
/2018/README.md:
--------------------------------------------------------------------------------
  1 | DataStax Developer's Notebook - Monthly Articles 2018
  2 | ===================
  3 | 
  4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
  5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
  6 | 
  7 | 
  8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
  9 | 
 10 | 2018 December - -
 11 | 
 12 | >Customer: My developers want a quick and easy way to prototype Spark Scala, Spark Python, 
 13 | >and related. I know there are the Spark (Scala) and Python REPLs (read, evaluate, print and 
 14 | >loop; command prompts) that ship with DSE, but we want something more. Can you help ?
 15 | >
 16 | >Daniel: Excellent question! There are a number of free/open-source options here. In this document 
 17 | >we’ll install and use Apache Zeppelin to address this need.
 18 | >
 19 | >DSE Studio, based on Apache Zeppelin, ships with interpreters for; markdown, Spark/SQL, Gremlin, 
 20 | >and CQL. Apache Zeppelin ships with 20 or more interpreters, including Spark (Scala), Python, 
 21 | >Shell, and more. As such, many folks use both tools.
 22 | >
 23 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_24_Zeppelin.pdf)
 24 | >
 25 | 
 26 | 2018 November - -
 27 | 
 28 | >Customer: My application needs to store a dynamic number of latitude/longitude pairs per single 
 29 | >database row, along with a tag for these values like; home, work, mobile, etcetera. We need to 
 30 | >perform distance (proximity) queries for any of the latitude/longitude pair values, as well as 
 31 | >queries on specific tags; just home, just work, other. Can you help ?
 32 | >
 33 | >Daniel: Excellent question ! All of these application requirements are easily served with DataStax 
 34 | >Enterprise Server (DSE). While we’ve covered DSE Search queries including spatial/geo-spatial in 
 35 | >past editions of this document, the specific requirement you have for (a dynamic count of attributes), 
 36 | >is something we have not covered in this series of documents previously.
 37 | >
 38 | >Using this same technique, DataStax Enterprise can also serve a polymorphic schema ability, 
 39 | >similar to MongoDB.
 40 | >
 41 | >This edition of this document will address this application requirement with no prerequisites, 
 42 | >although; you might well be served to visit the past editions of this document detailing DSE 
 43 | >Search and DSE (spatial/geo-spatial Search), to gain a deep understanding.
 44 | >
 45 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_23_IndexingCollections.pdf)
 46 | 
 47 | 2018 October - - 
 48 | 
 49 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 
 50 | >customer call center. I am tasked with getting a simple (Hello World style) Java program 
 51 | >running against the DataStax server in a small Linux command line capable Docker container. 
 52 | >Can you help ?
 53 | >
 54 | >Daniel: Excellent question ! In the May/2018 edition of this document, we detailed how and 
 55 | >where to download a DataStax sponsored Docker container which includes the DataStax Enterprise 
 56 | >(DSE) server; boot and operate DSE. In the October/2017 edition of this document, we detailed 
 57 | >a DSE introduction, including table create, new data insert and query, and more.
 58 | >
 59 | >So, the only piece we are missing is the Java client program compile that targets DSE. To aid 
 60 | >in our compiling, we will document using the Apache Maven build automation tool. Inherently, 
 61 | >a given (any given) Java library will need other Java libraries, that themselves need more 
 62 | >Java libraries, rinse and repeat. It is best to automate resolution of this condition.
 63 | >
 64 | >We will install and configure all of the above, access the DSE server, and go home.
 65 | >
 66 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_22_ClientSideDriver.pdf)
 67 | >
 68 | 
 69 | 2018 September - - 
 70 | 
 71 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 
 72 | >customer call center. I’m a developer, and do not know how to administer DataStax Enterprise, 
 73 | >but, I need to know how to backup and restore tables for my programming and unit tests. Can 
 74 | >you help ?
 75 | >
 76 | >Daniel: Excellent question ! DataStax Enterprise (DSE) cab be backed up and restored using 
 77 | >DataStax Operations Center (DSE Ops Center), including activities to block stores like Amazon 
 78 | >S3, other. You can also perform sstabledump(s), and table unloads and loads, including bulk 
 79 | >unloads and loads.
 80 | >
 81 | >But, as you seek to perform these activities as part of your unit tests, we are going to detail 
 82 | >table backup and restore using snapshots; faster, less code, easily automated.
 83 | >
 84 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_21_Backup%20and%20Recovery.pdf)
 85 | >
 86 | 
 87 | 2018 August - -
 88 | 
 89 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 
 90 | >customer call center. I’m a developer, and do not know how to administer DataStax Enterprise, 
 91 | >but, I need to know how to set up user authentication and authorization for my programming 
 92 | >and unit tests. Can you help ?
 93 | >
 94 | >Daniel: Excellent question ! Setting up authentication and authorization using DataStax 
 95 | >Enterprise (DSE) is super easy. Below we detail all of the relevant topics and steps to 
 96 | >achieve same, including source code for all. We detail table level access control, and in 
 97 | >the event you need it, row level access control.
 98 | >
 99 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_20_Security.pdf)
100 | >
101 | 
102 | 2018 July - -
103 | 
104 | >Customer: I inherited an existing DataStax Enterprise (DSE) server system, and users are 
105 | >complaining about performance. I know nothing about DSE, and need to make the users happy. 
106 | >Can you help ?
107 | >
108 | >Daniel: Excellent question ! Based on your timeline (how quickly and safely does this problem 
109 | >need to be solved), you should probably contact DataStax for assistance. If you were already 
110 | >trained/capable on DSE and wanted to solve this problem, this document will cover introductory 
111 | >topics related to that goal.
112 | >
113 | >In short, this document discusses building a query harness; capturing and then executing a 
114 | >representative set of queries to measure your system performance against, how and why-
115 | >
116 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_19_QueryHarness.pdf)
117 | >
118 | 
119 | 2018 June - -
120 | 
121 | >Customer: I'm confused by all of the options for loaders, developer's tools and similar. 
122 | >Can you offer me an overvew, specifically detailing DSE Studio ?
123 | >
124 | >Daniel: Sure ! We overview all of the above, then detail install, configure and use of
125 | >DSE Studio version 6.0, including configuration and use of CQL and Spark/SQL.
126 | >
127 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_18_Studio.pdf)
128 | >
129 | 
130 | 2018 May - -
131 | 
132 | >Customer: My company is investigating using the cloud, containers including micro-services, 
133 | >automated deployment tools (continuous innovation / continuous deployment), and more. Can you 
134 | >help ?
135 | >
136 | >Daniel: Excellent question ! Huge and far ranging topics, obviously; we’ll offer a history 
137 | >and primer on many of these pieces, a Cloud-101 if you will. Then, to offer some amount of 
138 | >actionable content, we’ll delve a bit deeper into one container option, namely; Docker.
139 | >
140 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_17_Docker.pdf)
141 | >
142 | 
143 | 2018 April - -
144 | 
145 | >Customer: My company really enjoyed the last two documents in this series on the topic of 
146 | >DataStax Enterprise (DSE) DSE Search, however; our application also needs to include geo-spatial 
147 | >queries, specifically-
148 | >
149 | >• Find all points within a distance from a given point, including a compound Text Search.
150 | >
151 | >• Find all points within a polygon.
152 | >
153 | >• And while we’re at it; what is the best means to do end user testing of geo-spatial queries-
154 | >
155 | >Can you help ?
156 | >
157 | >Daniel: Excellent question ! In this document we will:
158 | >
159 | >• Review DSE Search, overview the main points from last month’s edition of this document.
160 | >
161 | >• Deliver the two DSE Search geo-spatial queries you detail above.
162 | >
163 | >• Deliver a graphical (custom) application you could use for end user testing.
164 | >
165 | >** This docment states that the accompanying download contains 330,000 sample
166 | >geo-points from the USA states of Colorado and Utah. This amount of data was 
167 | >greater in size than GitHub cared for. So, you only get Colorado data at nearly
168 | >220,000 geo-points.
169 | >
170 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_16_Spatial.pdf)
171 | >
172 | >[Resource kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_16_Spatial.tar.gz), all of the data and programs used in this edition in Tar/Gunzip format. About 24 MB compressed.
173 | >
174 | >[Demonstration video](https://youtu.be/kF3IjwVyBtI), of the programs created and used in this document.
175 | >
176 | 
177 | 2018 March - -
178 | 
179 | >Customer: My company wants to use the secondary indexes, part of DataStax Enterprise (DSE) 
180 | >Search, more specfically the (first name) synonym and Soundex style features to aid in 
181 | >customer call center record lookup. Can you help ?
182 | >
183 | >Daniel: Excellent question ! DataStax Enterprise (DSE) Search is one of the four primary 
184 | >functional areas inside DSE; the others being DSE Core, DSE Analytics, and DSE Graph. 
185 | >Built atop Apache Solr, DSE Search is a large topic. As such, we will detail the programming 
186 | >(use) of DSE Search, and let this document serve as a primer of sorts.
187 | >
188 | >We plan follow up editions of this document to cover not just programming, but capacity 
189 | >planning of DSE Search, and tuning of DSE Search.
190 | >
191 | >**Mar 21 - This document was heavily revised: 30 new pages of content, 2 errors corrected.**
192 | >
193 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_15_SearchPrimer.pdf)
194 | >
195 | 
196 | 2018 February - -
197 | 
198 | >Customer: My company was invited to participate in the DataStax Enterprise (DSE) 6.0 early 
199 | >release program. From discussions with DataStax, we learned there are a number of changes 
200 | >related to CQL native processing of DSE Search commands. Can you help us understand what 
201 | >this means ?
202 | >
203 | >Daniel: Excellent question ! On February 1, 2018, DataStax began accepting self nominations 
204 | >to the DSE release 6.0 Early Access Process (EAP) at the following Url,
205 | >
206 | >https://academy.datastax.com/eap?destination=eap
207 | >
208 | >When your nomination is accepted, you receive early access to the DSE version 6.0 software 
209 | >and documentation. In return, you are asked to formally test this release and participate 
210 | >in feedback relative to your experiences. The 6.0 release is huge, with many topics far 
211 | >larger than CQL native processing of DSE Search commands; this is a very cool, and strategic 
212 | >release.
213 | >
214 | >In this document, we detail the DSE Core and DSE Search areas of functionality, their intent, 
215 | >how they work pre release 6.0, and are planned to work in the 6.0 release. Further, we detail:
216 | >
217 | >• The four functional areas of DSE, including DSE Core with its network partition fault tolerance and time constant lookups.
218 | >
219 | >• We detail B-Tree+ and hash lookups, and which scale and why.
220 | >
221 | >• We define the DSE primary key, including its partitioning key and clustering key parts.
222 | >
223 | >• We detail what makes a query a DSE Core query versus a DSE Search query.
224 | >
225 | >• We highlight the new CQL native processing of DSE Search commands.
226 | >
227 | >• We overview DSE materialized views, and secondary indexes.
228 | >
229 | >• We detail how to add and drop table columns, and inform DSE Search indexes of same.
230 | >
231 | >• And we overview how to observe asynchronous/background index builds.
232 | >
233 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_14_CQL-Search.pdf)
234 | >
235 | 
236 | 2018 January - -
237 | 
238 | >Customer: My company is investigating using the DataStax advanced replication feature to 
239 | >move data between data centers. Can you help ?
240 | >
241 | >Daniel:
242 | >Excellent question ! In this document we overview DataStax Enterprise (DSE) data replication, 
243 | >advanced replication, and even recovery and diagnosis from failure of each of these sub-systems. 
244 | >Also, since advanced replication falls into an area of DataStax Enterprise titled, ‘advanced 
245 | >functionality’, we overview this topic as well.
246 | >
247 | >Just to be excessively chatty, we also detail DataStax Enterprise triggers and create yet 
248 | >another user defined function (UDF). (UDFs were a topic we covered in last month’s edition of 
249 | >this document.)
250 | >
251 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_13_AdvRep.pdf).
252 | >
253 | >[Resource kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_13_AdvRep.tar),
254 | >all of the programs and data used in this edition in Tar format.
255 | >
256 | 
257 | 


--------------------------------------------------------------------------------
/2019/41 Simple Customer Graph.txt:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | 
  4 | 
  5 | Simplest graph loading example using Customer
  6 | 
  7 | 
  8 | .  Paragraph 01, Studio, Gremlin
  9 | --------------------
 10 | //  Paragraph 01
 11 | 
 12 | system.graphs()
 13 | system.describe()
 14 | --------------------
 15 | 
 16 | 
 17 | .  Paragraph 02, Studio, Gremlin
 18 | --------------------
 19 | //  Paragraph 02
 20 | 
 21 | schema.drop()
 22 | schema.config().option('graph.allow_scan').set('true')
 23 | --------------------
 24 | 
 25 | 
 26 | .  Paragraph 03, Studio, Gremlin
 27 | --------------------
 28 | //  Paragraph 03
 29 | 
 30 | //  Property keys
 31 | 
 32 | schema.propertyKey('nodeID'   ).Int() .single().create()
 33 |    //
 34 | schema.propertyKey('cust_num' ).Text().single().create()
 35 | schema.propertyKey('cust_name').Text().single().create()
 36 | schema.propertyKey('url'      ).Text().single().create()
 37 | schema.propertyKey('order_num').Text().single().create()
 38 | schema.propertyKey('part'     ).Text().single().create()
 39 | schema.propertyKey('geo_name' ).Text().single().create()
 40 | --------------------
 41 | 
 42 | 
 43 | .  Paragraph 04, Studio, Gremlin
 44 | --------------------
 45 | //  Paragraph 04
 46 | 
 47 | //  Vertices
 48 | 
 49 | schema.vertexLabel("customers").
 50 |    partitionKey("nodeID").
 51 |    properties(
 52 |       'nodeID'                        ,
 53 |       'cust_num'                      ,
 54 |       'cust_name'                     ,
 55 |       'url'                           
 56 |    ).ifNotExists().create()
 57 | 
 58 | schema.vertexLabel("orders").
 59 |    partitionKey("nodeID").
 60 |    properties(
 61 |       'nodeID'                        ,
 62 |       'order_num'                     ,
 63 |       'cust_num'                      ,
 64 |       'part'                          
 65 |    ).ifNotExists().create()
 66 | 
 67 | schema.vertexLabel("geographies").
 68 |    partitionKey("nodeID").
 69 |    properties(
 70 |       'nodeID'                        ,
 71 |       'geo_name'
 72 |    ).ifNotExists().create()
 73 | 
 74 | //  Edges
 75 | 
 76 | schema.edgeLabel("ordered").
 77 |    single().
 78 |    connection("customers", "orders").
 79 |    ifNotExists().create()
 80 | 
 81 | schema.edgeLabel("is_in_geo").
 82 |    single().
 83 |    connection("customers", "geographies").
 84 |    ifNotExists().create()
 85 | --------------------
 86 | 
 87 | 
 88 | 
 89 | 
 90 | ** Move from DSE Studio to Apache Zeppelin 
 91 | 
 92 | 
 93 | 
 94 | 
 95 | .  Paragraph 01, Zepp
 96 | --------------------
 97 | %spark
 98 | 
 99 | //  Paragraph 01 
100 | 
101 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max}
102 | 
103 | val customers = sc.parallelize(Array(
104 |    ("C-4001", "United Airlines"  , "united.com"  ),
105 |    ("C-4002", "American Airlines", "aa.com"      ),
106 |    ("C-4003", "Air France"       , "airfrance.us") 
107 |    ) )
108 | 
109 | val customers_df = customers.toDF ("cust_num", "cust_name", "url").coalesce(1)
110 | 
111 | val customers_df_nodeID = customers_df.withColumn("nodeID", monotonically_increasing_id() + 4001)
112 | 
113 | customers_df_nodeID.getClass()
114 | customers_df_nodeID.printSchema()
115 | customers_df_nodeID.count()
116 | customers_df_nodeID.show()
117 | 
118 | customers_df_nodeID.registerTempTable("customers")
119 | --------------------
120 | 
121 | 
122 | .  Paragraph 02, Zepp
123 | --------------------
124 | %spark
125 | 
126 | //  Paragraph 02 
127 | 
128 | val orders = sc.parallelize(Array(
129 |    ("O-8001", "C-4001", "fuel"     ),
130 |    ("O-8002", "C-4002", "fuel"     ),
131 |    ("O-8003", "C-4002", "tires"    ),
132 |    ("O-8004", "C-4003", "wine"     ),
133 |    ("O-8005", "C-4003", "cheese"   ),
134 |    ("O-8006", "C-4003", "bread"    )
135 |    ) )
136 | val orders_df = orders.toDF ("order_num", "cust_num", "part").coalesce(1)
137 | 
138 | val orders_df_nodeID = orders_df.withColumn("nodeID", monotonically_increasing_id() + 8001)
139 | 
140 | orders_df_nodeID.getClass()
141 | orders_df_nodeID.printSchema()
142 | orders_df_nodeID.count()
143 | orders_df_nodeID.show()
144 | 
145 | orders_df_nodeID.registerTempTable("orders")
146 | --------------------
147 | 
148 | 
149 | .  Paragraph 03, Zepp
150 | --------------------
151 | %spark
152 | 
153 | //  Paragraph 03 
154 | 
155 | val ordered = spark.sql(
156 |    "select " +
157 |    "   t1.nodeID as nodeID_C, " +
158 |    "   t2.nodeID as nodeID_O   " +
159 |    "from " +
160 |    "   customers t1, " +
161 |    "   orders t2 " +
162 |    "where " +
163 |    "   t1.cust_num = t2.cust_num"
164 |    )
165 | 
166 | ordered.getClass()
167 | ordered.printSchema()
168 | ordered.count()
169 | ordered.show()
170 | --------------------
171 | 
172 | 
173 | .  Paragraph 04, Zepp
174 | --------------------
175 | %spark
176 | 
177 | //  Fourth paragraph
178 | 
179 | import com.datastax.bdp.graph.spark.graphframe._
180 | import org.apache.spark.ml.recommendation.ALS
181 | import org.apache.spark.sql.SparkSession
182 | import org.apache.spark.sql.functions._
183 | import org.apache.spark.sql.types._
184 | import org.apache.spark.ml.evaluation.RegressionEvaluator
185 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max}
186 | import java.net.URI
187 | import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
188 | import org.apache.hadoop.conf.Configuration
189 | import org.apache.hadoop.fs.{FileSystem, Path}
190 | import com.datastax.spark.connector._
191 | import org.apache.spark.sql.cassandra._
192 | import com.datastax.spark.connector.cql.CassandraConnectorConf
193 | import com.datastax.spark.connector.rdd.ReadConf
194 | import org.apache.spark.sql.expressions.Window
195 | 
196 | val graphName = "customer_graph04"
197 | 
198 | val g = spark.dseGraph(graphName)
199 | --------------------
200 | 
201 | 
202 | .  Paragraph 05, Zepp
203 | --------------------
204 | %spark
205 | 
206 | //  Fifth paragraph
207 | 
208 | val customers_V = customers_df_nodeID.withColumn("~label", lit("customers")).
209 |    withColumn("nodeID"   , col("nodeID")   ).
210 |    withColumn("cust_num" , col("cust_num") ).
211 |    withColumn("cust_name", col("cust_name")).
212 |    withColumn("url"      , col("url")      )
213 | g.updateVertices(customers_V)
214 | 
215 | val orders_V = orders_df_nodeID.withColumn("~label", lit("orders")).
216 |    withColumn("nodeID"   , col("nodeID")   ).
217 |    withColumn("order_num", col("order_num")).
218 |    withColumn("cust_num" , col("cust_num") ).
219 |    withColumn("part"     , col("part")     )
220 | g.updateVertices(orders_V)
221 | 
222 | g.V.hasLabel("customers").show()
223 | g.V.hasLabel("orders").show()
224 | --------------------
225 | 
226 | 
227 | .  Paragraph 06, Zepp
228 | --------------------
229 | %spark
230 | 
231 | //  Sixth paragraph
232 | 
233 | val orders_L = ordered.
234 |    withColumn("srcLabel", lit("customers")).
235 |    withColumn("dstLabel", lit("orders")).
236 |    withColumn("edgeLabel", lit("ordered")
237 |    )
238 | orders_L.show()
239 | 
240 | val ordered_E = orders_L.select(
241 |    g.idColumn(col("srcLabel"), col("nodeID_C" )) as "src",
242 |    g.idColumn(col("dstLabel"), col("nodeID_O")) as "dst",
243 |    col("edgeLabel") as "~label"
244 |    )
245 | 
246 | ordered_E.show()
247 | 
248 | g.updateEdges(ordered_E)
249 | --------------------
250 | 
251 | 
252 | .  Paragraph 07, Zepp
253 | --------------------
254 | %spark
255 | 
256 | //  Seventh paragraph
257 | 
258 | g.E.hasLabel("ordered").show()
259 | 
260 | g.V().hasLabel("customers").has("url", "aa.com").out("ordered").valueMap("order_num", "cust_num", "part").show()
261 | --------------------
262 | 
263 | 
264 | 
265 | 
266 | //  Now add geographies
267 | 
268 | 
269 | 
270 | 
271 | .  Paragraph 08, Zepp
272 | --------------------
273 | %spark
274 | 
275 | //  Paragraph 08 
276 | 
277 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max}
278 | 
279 | val geographies = sc.parallelize(Array(
280 |    ("N.America"),
281 |    ("S.America"),
282 |    ("EMEA"     )
283 |    ) )
284 | 
285 | val geographies_df = geographies.toDF ("geo_name").coalesce(1)
286 | 
287 | val geographies_df_nodeID = geographies_df.withColumn("nodeID", monotonically_increasing_id() + 9001)
288 | 
289 | geographies_df_nodeID.getClass()
290 | geographies_df_nodeID.printSchema()
291 | geographies_df_nodeID.count()
292 | geographies_df_nodeID.show()
293 | 
294 | geographies_df_nodeID.registerTempTable("geographies")
295 | --------------------
296 | 
297 | 
298 | .  Paragraph 09, Zepp
299 | --------------------
300 | %spark
301 | 
302 | //  Paragraph 09 
303 | 
304 | val geographies_V = geographies_df_nodeID.withColumn("~label", lit("geographies")).
305 |    withColumn("nodeID"   , col("nodeID")  ).
306 |    withColumn("geo_name" , col("geo_name"))
307 | g.updateVertices(geographies_V)
308 | 
309 | g.V.hasLabel("geographies").show()
310 | --------------------
311 | 
312 | 
313 | .  Paragraph 10, Zepp
314 | --------------------
315 | %spark
316 | 
317 | //  Paragraph 10 
318 | 
319 | val is_in_geo = sc.parallelize(Array(
320 |    (4001, 9001),
321 |    (4001, 9002),
322 |    (4002, 9001),
323 |    (4002, 9002),
324 |    (4003, 9002),
325 |    (4003, 9003)
326 |    ) )
327 | 
328 | val is_in_geo_df = is_in_geo.toDF ("nodeID_C", "nodeID_G").coalesce(1)
329 | 
330 | val geo_L = is_in_geo_df.
331 |    withColumn("srcLabel", lit("customers")).
332 |    withColumn("dstLabel", lit("geographies")).
333 |    withColumn("edgeLabel", lit("is_in_geo")
334 |    )
335 | geo_L.show()
336 | 
337 | val geo_E = geo_L.select(
338 |    g.idColumn(col("srcLabel"), col("nodeID_C" )) as "src",
339 |    g.idColumn(col("dstLabel"), col("nodeID_G")) as "dst",
340 |    col("edgeLabel") as "~label"
341 |    )
342 | geo_E.show()
343 | 
344 | g.updateEdges(geo_E)
345 | 
346 | g.E.hasLabel("is_in_geo").show()
347 | 
348 | g.V().hasLabel("customers").has("url", "aa.com").out("is_in_geo").valueMap("geo_name").show()
349 | --------------------
350 | 
351 | 
352 | 
353 | 
354 | **  Done with customers; stop here
355 | 
356 | 
357 | 
358 | 
359 | More customer graph queries
360 | --------------------
361 | 
362 | //  sql2gremlin.com
363 | 
364 | //  select * from t1
365 | g.V().hasLabel("customers").valueMap().show()
366 | 
367 | //  (subset of columns)
368 | g.V().hasLabel("customers").valueMap("nodeID", "cust_name").show()
369 | 
370 | //  Dervied columns
371 | g.V().hasLabel("customers").values("cust_name").  map {arg.upshift()}
372 | 
373 | 
374 | 
375 | 
376 | 


--------------------------------------------------------------------------------
/2019/DDN_2019_25_GraphPrimer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_25_GraphPrimer.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_26_Inventory.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_26_Inventory.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_27_Security.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_27_Security.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_28_Jupyter, R.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_28_Jupyter, R.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_29_Kafka.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_29_Kafka.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_30_GraphPrimer 68.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_30_GraphPrimer 68.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_31a_DSE, Reco Engines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_31a_DSE, Reco Engines.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_31b_DSE, Reco Engines.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_31b_DSE, Reco Engines.pptx


--------------------------------------------------------------------------------
/2019/DDN_2019_31d_AllCommands.txt:
--------------------------------------------------------------------------------
   1 | 
   2 | 
   3 | 
   4 | 
   5 | dse spark
   6 |    Version 2.4.0.6
   7 | 
   8 | Reference Urls,
   9 |    https://spark.apache.org/docs/2.4.3/ml-frequent-pattern-mining.html
  10 |    https://en.wikipedia.org/wiki/Association_rule_learning
  11 |    
  12 | 
  13 | 
  14 | 
  15 | -------------------------------------------------
  16 | %spark
  17 | 
  18 | //  1.1
  19 | 
  20 | val recs_0 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/10_grocery_sm.csv")
  21 | val recs_1 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/20_grocery_lg.csv")
  22 | 
  23 | recs_0.collect().take(5).foreach(println)
  24 | println("--")
  25 | recs_1.collect().take(5).foreach(println)
  26 | 
  27 | // soy milk,lettuce
  28 | // lettuce,diapers ,wine,chard
  29 | // soy milk,diapers,wine,OJ 
  30 | // lettuce,soy milk,diapers,wine 
  31 | // lettuce,soy milk,diapers,OJ 
  32 | // -- 
  33 | // citrus fruit,semi-finished bread,margarine,ready soups 
  34 | // tropical fruit,yogurt,coffee 
  35 | // whole milk 
  36 | // pip fruit,yogurt,cream cheese ,meat spreads 
  37 | // other vegetables,whole milk,condensed milk,long life bakery product
  38 | -------------------------------------------------
  39 | 
  40 | 
  41 | -------------------------------------------------
  42 | %spark
  43 | 
  44 | //  1.2
  45 | 
  46 | val i = recs_0.count()              //  number of rows,  5
  47 | val j = recs_1.count()              //  number of rows,  9835
  48 | 
  49 | // i: Long = 5 
  50 | // j: Long = 9835
  51 | -------------------------------------------------
  52 | 
  53 | 
  54 | -------------------------------------------------
  55 | %spark
  56 | 
  57 | //  1.3
  58 | 
  59 | val i = recs_0.distinct().count()   //  number of distinct rows, 5
  60 | val j = recs_1.distinct().count()   //  number of distinct rows, 7011
  61 | 
  62 | // i: Long = 5 
  63 | // j: Long = 7011
  64 | -------------------------------------------------
  65 | 
  66 | 
  67 | -------------------------------------------------
  68 | %spark
  69 | 
  70 | //  1.4
  71 | 
  72 | val recs_0s = recs_0.flatMap(s => s.trim.split(','))
  73 | val recs_1s = recs_1.flatMap(s => s.trim.split(','))
  74 | 
  75 | recs_0s.collect().take(5).foreach(println)
  76 | println("--")
  77 | recs_1s.collect().take(5).foreach(println)
  78 | 
  79 | // soy milk 
  80 | // lettuce 
  81 | // lettuce 
  82 | // diapers 
  83 | // wine 
  84 | // -- 
  85 | // citrus fruit 
  86 | // semi-finished bread 
  87 | // margarine 
  88 | // ready soups 
  89 | // tropical fruit
  90 | -------------------------------------------------
  91 | 
  92 | 
  93 | -------------------------------------------------
  94 | %spark
  95 | 
  96 | //  1.5
  97 | 
  98 | val i = recs_0s.           count()   //  number of distinct rows, 5
  99 | val j = recs_0s.distinct().count()
 100 | val k = recs_1s.           count()   //  number of distinct rows, 7011
 101 | val l = recs_1s.distinct().count()
 102 | 
 103 | // i: Long = 18
 104 | // j: Long = 7   
 105 | // k: Long = 43367
 106 | // l: Long = 169
 107 | -------------------------------------------------
 108 | 
 109 | 
 110 | -------------------------------------------------
 111 | %spark
 112 | 
 113 | //  1.6
 114 | 
 115 | val recs_0_top = recs_0s.map(str => (str, 1)).
 116 |    reduceByKey((k, v) => k + v).
 117 |    sortBy(_._2, false).
 118 |    take(5).
 119 |    foreach(println)
 120 | println("--")
 121 | val recs_1_top = recs_1s.map(str => (str, 1)).
 122 |    reduceByKey((k, v) => k + v).
 123 |    sortBy(_._2, false).
 124 |    take(5).
 125 |    foreach(println)
 126 | 
 127 | // (lettuce,4) 
 128 | // (soy milk,4) 
 129 | // (diapers,3) 
 130 | // (wine,3) 
 131 | // (OJ,2) 
 132 | // -- 
 133 | // (whole milk,2513) 
 134 | // (other vegetables,1903) 
 135 | // (rolls/buns,1809) 
 136 | // (soda,1715) 
 137 | // (yogurt,1372)
 138 | -------------------------------------------------
 139 | 
 140 | 
 141 | 
 142 | 
 143 | -------------------------------------------------
 144 | %md
 145 | 
 146 | ### Now actual ML routine
 147 | -------------------------------------------------
 148 | 
 149 | 
 150 | 
 151 | 
 152 | https://spark.apache.org/docs/2.4.3/ml-frequent-pattern-mining.html
 153 | 
 154 | -------------------------------------------------
 155 | %spark
 156 | 
 157 | //  2.1
 158 | 
 159 | import org.apache.spark.ml.fpm.FPGrowth
 160 | -------------------------------------------------
 161 | 
 162 | 
 163 | //  Data file,
 164 | //     soy milk,lettuce
 165 | //     lettuce,diapers ,wine,chard
 166 | //     soy milk,diapers,wine,OJ
 167 | //     lettuce,soy milk,diapers,wine
 168 | //     lettuce,soy milk,diapers,OJ
 169 | 
 170 | 
 171 | //  Support/minSupport
 172 | //
 173 | //  If an item/item-set appears in 3 of 5 total
 174 | //  transactions, then its support is,
 175 | //     3 / 5 = 0.60   or 60%
 176 | //
 177 | //  "How frequently the item appears ..
 178 | 
 179 | //  Confidence/minConfidence (aka, Conviction)
 180 | //
 181 | //  If the item/item-set appear 4 times (the antecedent),
 182 | //  and the consequent appears 2 times, then the
 183 | //  confidence is,
 184 | //     2 / 4 = 0.50   or 50%
 185 | //
 186 | //     ** Used to generate association rules
 187 | //
 188 | //  "Indication of how often the rule is found to be true
 189 | 
 190 | //  Lift
 191 | //
 192 | //    Support:(antecedent union consequent) / Support:(antecedent) * Support:(consequent)
 193 | //    
 194 | //    From above,
 195 | //       lettuce, soy milk -> wine
 196 | //       
 197 | //          0.2 / 0.4 * 0.6  == 0.83
 198 | //
 199 | //    Equal 1, implies probability of occurrence of antecedent and
 200 | //       consequent are independent of one another, no rule should
 201 | //       be drawn involving these two events
 202 | //
 203 | //    > 1, degree to which antecedent/consequent are dependent on
 204 | //       one another, a useful prediction rule
 205 | //
 206 | //    < 1, one item has a negative affect on the other and vice
 207 | //       versa, items can substitute for one another
 208 | 
 209 | //  Not supplied by the library currently,
 210 | //     Conviction
 211 | //     Rule Power Factor
 212 | // 
 213 | //     See (Wikipedia article)
 214 | 
 215 | 
 216 | -------------------------------------------------
 217 | %spark
 218 | 
 219 | //  2.2
 220 | 
 221 | val recs_0  = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/10_grocery_sm.csv")
 222 | val recs_0s = recs_0.map(i_str => i_str.split(",")).toDF("items")
 223 | 
 224 | 
 225 | val my_fpgrowth0 = new FPGrowth().setItemsCol("items").setMinSupport(0.1).setMinConfidence(0.1)
 226 | val my_model0 = my_fpgrowth0.fit(recs_0s)
 227 | 
 228 | my_model0.freqItemsets.show()
 229 | 
 230 | // +--------------------+----+ 
 231 | // | items              |freq| 
 232 | // +--------------------+----+ 
 233 | // | [diapers]          | 3  | 
 234 | // | [diapers, soy milk]| 3  | 
 235 | // | [diapers, soy mil..| 2  | 
 236 | // | [diapers, lettuce] | 2  | 
 237 | // | [OJ]               | 2  | 
 238 | // | [OJ, diapers]      | 2  | 
 239 | // | [OJ, diapers, soy..| 2  | 
 240 | // | [OJ, diapers, soy..| 1  | 
 241 | // | [OJ, diapers, let..| 1  | 
 242 | // | [OJ, soy milk]     | 2  | 
 243 | // | [OJ, soy milk, le..| 1  | 
 244 | // | [OJ, wine]         | 1  |
 245 | //    ...
 246 | -------------------------------------------------
 247 | 
 248 | 
 249 | -------------------------------------------------
 250 | %spark
 251 | 
 252 | //  2.3
 253 |          
 254 | my_model0.associationRules.show()
 255 | 
 256 | // +--------------------+-----------+----------+------------------+ 
 257 | // | antecedent         |consequent |confidence| lift             | 
 258 | // +--------------------+-----------+----------+------------------+ 
 259 | // | [OJ, diapers, soy..| [lettuce] | 0.5      | 0.625            | 
 260 | // | [OJ, diapers, soy..| [wine]    | 0.5      |0.8333333333333334| 
 261 | // | [soy milk]         | [diapers] | 0.75     | 1.25             | 
 262 | // | [soy milk]         | [OJ]      | 0.5      | 1.25             | 
 263 | // | [soy milk]         | [lettuce] | 0.75     | 0.9375           | 
 264 | // | [soy milk]         | [wine]    | 0.5      |0.8333333333333334| 
 265 | // | [diapers ]         | [chard]   | 1.0      | 5.0              | 
 266 | // | [diapers ]         | [wine]    | 1.0      |1.6666666666666667| 
 267 | // | [diapers ]         | [lettuce] | 1.0      | 1.25             | 
 268 | // | [wine, soy milk, ..| [diapers] | 1.0      |1.6666666666666667| 
 269 | // | [OJ, wine, diapers]| [soy milk]| 1.0      | 1.25             | 
 270 | // | [diapers , chard]  | [wine]    | 1.0      |1.6666666666666667| 
 271 | // | [diapers , chard]  | [lettuce] | 1.0      | 1.25             | 
 272 | // | [lettuce]          | [diapers] | 0.5      |0.8333333333333334| 
 273 | // | [lettuce]          | [OJ]      | 0.25     | 0.625            | 
 274 | // | [lettuce]          | [diapers] | 0.25     | 1.25             | 
 275 | // | [lettuce]          | [chard]   | 0.25     | 1.25             | 
 276 | // | [lettuce]          | [soy milk]| 0.75     | 0.9375           | 
 277 | // | [lettuce]          | [wine]    | 0.5      |0.8333333333333334|
 278 | //    ...
 279 | -------------------------------------------------
 280 | 
 281 |        
 282 | -------------------------------------------------
 283 | %spark
 284 | 
 285 | //  2.4
 286 | 
 287 | my_model0.transform(recs_0s).show()
 288 | 
 289 | // +--------------------+--------------------+ 
 290 | // | items              | prediction         | 
 291 | // +--------------------+--------------------+ 
 292 | // | [soy milk, lettuce]|[diapers, OJ, win...| 
 293 | // |[lettuce, diapers...|[diapers, OJ, soy...| 
 294 | // |[soy milk, diaper...|[lettuce, diapers...| 
 295 | // |[lettuce, soy mil...|[OJ, diapers , ch...| 
 296 | // |[lettuce, soy mil...|[wine, diapers , ...| 
 297 | // +--------------------+--------------------+
 298 | //    ...
 299 | -------------------------------------------------
 300 | 
 301 | 
 302 | -------------------------------------------------
 303 | %spark
 304 | 
 305 | //  2.5 
 306 | 
 307 | import scala.collection.mutable.WrappedArray;
 308 | 
 309 | val recs_0_ar = my_model0.associationRules.collect()
 310 | 
 311 | val recs_0_arl = recs_0_ar.map( row => ( 
 312 |    row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"),
 313 |    row.get(1).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"),
 314 |    row.getDouble(2), 
 315 |    row.getDouble(3))
 316 |    ).toList
 317 | 
 318 | val recs_0_ardf = recs_0_arl.toDF("antecedent", "consequent", "confidence", "lift")
 319 | 
 320 | recs_0_ardf.show(20)
 321 | 
 322 | // +---------------------+----------+----------+------------------+ 
 323 | // | antecedent          |consequent|confidence| lift             | 
 324 | // +---------------------+----------+----------+------------------+ 
 325 | // | OJ|diapers|soy milk | lettuce  | 0.5      | 0.625            |  
 326 | // | OJ|diapers|soy milk | wine     | 0.5      |0.8333333333333334| 
 327 | // | soy milk            | diapers  | 0.75     | 1.25             | 
 328 | // | soy milk            | OJ       | 0.5      | 1.25             | 
 329 | // | soy milk            | lettuce  | 0.75     | 0.9375           | 
 330 | // | soy milk            | wine     | 0.5      |0.8333333333333334|
 331 | //    ...
 332 | -------------------------------------------------
 333 | 
 334 | 
 335 | Must use Studio for this 1 block
 336 | -------------------------------------------------
 337 | //  2.6
 338 | 
 339 | DROP KEYSPACE IF EXISTS ks_31;
 340 | 
 341 | CREATE KEYSPACE ks_31
 342 |    WITH replication = {'class': 'SimpleStrategy',
 343 |       'replication_factor': 1}
 344 |    AND graph_engine = 'Native';
 345 | 
 346 | USE ks_31;
 347 | 
 348 | CREATE TABLE t_association_rules
 349 |    (
 350 |    antecedent       TEXT,
 351 |    consequent       TEXT,
 352 |    version          DOUBLE,
 353 |    confidence       DOUBLE,
 354 |    lift             DOUBLE,
 355 |    PRIMARY KEY((antecedent), consequent, version)
 356 |    ) ;
 357 | -------------------------------------------------
 358 | 
 359 | 
 360 | -------------------------------------------------
 361 | %spark
 362 | 
 363 | //  2.7
 364 | 
 365 | // import org.apache.spark.sql.functions._
 366 | 
 367 | val recs_0_ardf2 = recs_0_ardf.withColumn("version", lit(1.0) )
 368 | 
 369 | recs_0_ardf2.write.
 370 |    format("org.apache.spark.sql.cassandra").
 371 |    options(Map( "keyspace" -> "ks_31", "table" -> "t_association_rules" )).
 372 |    mode("append").
 373 |    save
 374 | 
 375 | val recs_0_test = spark.read.
 376 |    format("org.apache.spark.sql.cassandra").
 377 |    options(Map("keyspace" -> "ks_31", "table" -> "t_association_rules")).
 378 |    load
 379 | recs_0_test.count()
 380 | recs_0_test.show(20)
 381 | -------------------------------------------------
 382 | 
 383 | 
 384 | -------------------------------------------------
 385 | %sql
 386 | 
 387 | --  2.8
 388 | 
 389 | select * from ks_31.t_association_rules
 390 | where antecedent = 'lettuce'
 391 | order by consequent, confidence desc
 392 | -------------------------------------------------
 393 | 
 394 | 
 395 | 
 396 | 
 397 | -------------------------------------------------
 398 | %md
 399 | 
 400 | ### Repeat above, but now for larger item set
 401 | -------------------------------------------------
 402 | 
 403 | 
 404 | 
 405 | 
 406 | -------------------------------------------------
 407 | %spark
 408 | 
 409 | //  3.1
 410 | 
 411 | import org.apache.spark.ml.fpm.FPGrowth
 412 |    //
 413 | import scala.collection.mutable.WrappedArray;
 414 | 
 415 | val recs_1  = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/44 Topics_2019/41 Graph Reco Eng/02 Files/20_grocery_lg.csv")
 416 | val recs_1s = recs_1.map(i_str => i_str.split(",")).toDF("items")
 417 | 
 418 | val my_fpgrowth1 = new FPGrowth().setItemsCol("items").setMinSupport(0.01).setMinConfidence(0.01)
 419 | val my_model1 = my_fpgrowth1.fit(recs_1s)
 420 | 
 421 | val recs_1_ar = my_model1.associationRules.collect()
 422 | 
 423 | val recs_1_arl = recs_1_ar.map( row => ( 
 424 |    row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"),
 425 |    row.get(1).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"),
 426 |    row.getDouble(2), 
 427 |    row.getDouble(3))
 428 |    ).toList
 429 | 
 430 | val recs_1_ardf = recs_1_arl.toDF("antecedent", "consequent", "confidence", "lift")
 431 | 
 432 | val recs_1_ardf2 = recs_1_ardf.withColumn("version", lit(1.0) )
 433 | 
 434 | recs_1_ardf2.write.
 435 |    format("org.apache.spark.sql.cassandra").
 436 |    options(Map( "keyspace" -> "ks_31", "table" -> "t_association_rules" )).
 437 |    mode("append").
 438 |    save
 439 | 
 440 | recs_1_ardf2.show(20)
 441 | -------------------------------------------------
 442 | 
 443 | 
 444 | 
 445 | 
 446 | // +--------------------+--------------------+-------------------+------------------+---------+ 
 447 | // | antecedent         | consequent         |  confidence       | lift             | version |
 448 | // +--------------------+--------------------+-------------------+------------------+---------+ 
 449 | // | hygiene articles   | whole milk         | 0.3888888888888889|1.5219746208604146| 1.0     | 
 450 | // |domestic eggs|who...| other vegetables   | 0.4101694915254237|2.1198197315567744| 1.0     | 
 451 | // | cream cheese       | yogurt             | 0.3251366120218579| 2.330698672911787| 1.0     | 
 452 | // | cream cheese       | other vegetables   | 0.3524590163934426|1.8215630195635881| 1.0     | 
 453 | // | cream cheese       | whole milk         | 0.4262295081967213|1.6681126992100093| 1.0     | 
 454 | // | sugar              | other vegetables   | 0.3183183183183183|1.6451185815347664| 1.0     | 
 455 | // | sugar              | whole milk         | 0.4444444444444444|1.7393995666976165| 1.0     | 
 456 | // | tropical fruit     | shopping bags      |0.12887596899224807| 1.308044535643715| 1.0     | 
 457 | // | tropical fruit     |fruit/vegetable j...| 0.1308139534883721|1.8095010303208716| 1.0     | 
 458 | // | tropical fruit     | pastry             |0.12596899224806202| 1.415891472868217| 1.0     | 
 459 | // | tropical fruit     | brown bread        |0.10174418604651163| 1.56842330684552 | 1.0     | 
 460 | // | tropical fruit     | whipped/sour cream |0.13178294573643412|1.8384188245642974| 1.0     |
 461 | // | tropical fruit     | citrus fruit       |0.18992248062015504|2.2947022074929055| 1.0     |
 462 | // | tropical fruit     | newspapers         | 0.1124031007751938|1.4082605046166001| 1.0     |
 463 | // | tropical fruit     | rolls/buns         |0.23449612403100775| 1.274886334906004| 1.0     |
 464 | // | tropical fruit     | bottled water      |0.17635658914728683|1.5956458640879172| 1.0     |
 465 | // | tropical fruit     | yogurt             |0.27906976744186046|2.0004746084480303| 1.0     |
 466 | // | tropical fruit     | other vegetables   |0.34205426356589147|1.7677896385551983| 1.0     |
 467 | // | tropical fruit     | soda               |0.19864341085271317| 1.139159152032906| 1.0     |
 468 | // | tropical fruit     | root vegetables    | 0.2005813953488372|1.8402220366192295| 1.0     |
 469 | 
 470 | 
 471 | 
 472 | 
 473 | -------------------------------------------------
 474 | %sql
 475 | --  3.2
 476 | 
 477 | select count(*) from ks_31.t_association_rules   // 522
 478 | -------------------------------------------------
 479 | 
 480 | 
 481 | -------------------------------------------------
 482 | %sql
 483 | --  3.3
 484 | 
 485 | select "All records", count(*)
 486 | from ks_31.t_association_rules
 487 | union
 488 | select "Great than 1", count(*)
 489 | from ks_31.t_association_rules
 490 | where lift > 1.0
 491 | union
 492 | select "Equal to 1", count(*) 
 493 | from ks_31.t_association_rules
 494 | where lift = 1.0
 495 | union
 496 | select "Less than 1", count(*)
 497 | from ks_31.t_association_rules
 498 | where lift < 1.0
 499 | 
 500 | //  All records        522
 501 | //  Great than 1       502
 502 | //  Equal to 1          0
 503 | //  Less than 1        20
 504 | 
 505 | //  From above,
 506 | //
 507 | //  9835  total  orders
 508 | //  7011  unique orders
 509 | //   169  unique items
 510 | -------------------------------------------------
 511 | 
 512 | 
 513 | -------------------------------------------------
 514 | %sql
 515 | --  3.4
 516 | 
 517 | select * from ks_31.t_association_rules
 518 | order by lift desc
 519 | limit 20
 520 | 
 521 | //  (Get screen shot)   High is 3.37
 522 | -------------------------------------------------
 523 | 
 524 | 
 525 | 
 526 | 
 527 | -------------------------------------------------
 528 | -------------------------------------------------
 529 | %md
 530 | 
 531 | ### Move to Graph Recommendation Engine (Gremlin/Studio)
 532 | -------------------------------------------------
 533 | -------------------------------------------------
 534 | 
 535 | 
 536 | Doc,
 537 |    Types of traversals,
 538 |       https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/using/QueryTOC.html
 539 |    Traversal API, 
 540 |       (Top level Gremlin API doc)
 541 |       https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/refTPTOC.html
 542 | 
 543 | 
 544 | Jist of graph reco engine-
 545 | 
 546 |    To answer what would you like; Who do you know,
 547 |    and what do they like ?
 548 | 
 549 | 
 550 | -------------------------------------------------
 551 | // %md
 552 | 
 553 | ### Just 'know'ing someone
 554 | 
 555 | ### Protoype using smaller/known data set
 556 | -------------------------------------------------
 557 | 
 558 | 
 559 | -------------------------------------------------
 560 | //  4.1
 561 | 
 562 | USE ks_ngkv;
 563 | 
 564 | DROP MATERIALIZED VIEW IF EXISTS e_knows_bi;
 565 |    //
 566 | DROP TABLE IF EXISTS ordered;
 567 | DROP TABLE IF EXISTS orders;
 568 |    //
 569 | DROP TABLE IF EXISTS knows;
 570 | DROP TABLE IF EXISTS user;
 571 | 
 572 | CREATE TABLE user
 573 |    (
 574 |    user_id              TEXT,
 575 |    gender               TEXT,
 576 |    age                  INT,
 577 |    PRIMARY KEY((user_id))
 578 |    )
 579 |    WITH VERTEX LABEL v_user
 580 |    ;
 581 | CREATE TABLE knows
 582 |    (
 583 |    user_id_s            TEXT,
 584 |    user_id_d            TEXT,
 585 |    PRIMARY KEY((user_id_s), user_id_d)
 586 |    )
 587 |    WITH EDGE LABEL e_knows
 588 |    FROM v_user(user_id_s)
 589 |    TO v_user(user_id_d);
 590 | CREATE MATERIALIZED VIEW e_knows_bi
 591 |    AS SELECT user_id_s, user_id_d
 592 |    FROM knows
 593 |    WHERE
 594 |       user_id_s IS NOT NULL
 595 |    AND
 596 |       user_id_d IS NOT NULL
 597 |    PRIMARY KEY ((user_id_d), user_id_s);
 598 | -------------------------------------------------
 599 | 
 600 | 
 601 | -------------------------------------------------
 602 | //  4.2
 603 | 
 604 | USE ks_ngkv;
 605 | 
 606 | INSERT INTO user  (user_id, gender, age) VALUES ('dave'   , 'M', 30  );
 607 | INSERT INTO user  (user_id, gender, age) VALUES ('denise' , 'F', 30  );
 608 | INSERT INTO user  (user_id, gender, age) VALUES ('kiyu'   , 'M', 30  );
 609 | INSERT INTO user  (user_id,         age) VALUES ('farrell',      40  );
 610 | INSERT INTO user  (user_id, gender, age) VALUES ('hatcher', 'M', 30  );
 611 | INSERT INTO user  (user_id, gender, age) VALUES ('morty'  , 'M', 30  );
 612 | 
 613 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'kiyu'   );
 614 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'denise' );
 615 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'morty'  );
 616 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'hatcher');
 617 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu'   , 'farrell');
 618 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('denise' , 'farrell');
 619 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('dave'   , 'farrell');
 620 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu'   , 'denise' );
 621 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu'   , 'dave'   );
 622 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('hatcher', 'denise' );
 623 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu'   , 'hatcher');
 624 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('denise' , 'dave'   );
 625 | -------------------------------------------------
 626 | 
 627 | 
 628 | -------------------------------------------------
 629 | //  4.3
 630 | 
 631 | def l_user = "farrell"
 632 |    //
 633 | g.V().has("v_user", "user_id", l_user).
 634 |    out("e_knows").
 635 |    valueMap(true, "user_id")
 636 | 
 637 | // { "id": "v_user:denise#32",  "label": "v_user", "user_id": [ "denise"  ] },
 638 | // { "id": "v_user:hatcher#81", "label": "v_user", "user_id": [ "hatcher" ] },
 639 | // { "id": "v_user:kiyu#62",    "label": "v_user", "user_id": [ "kiyu"    ] },
 640 | // { "id": "v_user:morty#77",   "label": "v_user", "user_id": [ "morty"   ] }
 641 | -------------------------------------------------
 642 | 
 643 | 
 644 | -------------------------------------------------
 645 | //  4.4
 646 | 
 647 | def l_user = "farrell"
 648 |    //
 649 | g.V().has("v_user", "user_id", l_user).
 650 |    both("e_knows").
 651 |    valueMap(true, "user_id")
 652 | 
 653 | //  farrell knows
 654 | // { "id": "v_user:denise#32",  "label": "v_user", "user_id": [ "denise" ]  },
 655 | // { "id": "v_user:hatcher#81", "label": "v_user", "user_id": [ "hatcher" ] },
 656 | // { "id": "v_user:kiyu#62",    "label": "v_user", "user_id": [ "kiyu" ]    },
 657 | // { "id": "v_user:morty#77",   "label": "v_user", "user_id": [ "morty" ]   },
 658 | 
 659 | //  know farrell (and farrell knows them)
 660 | // { "id": "v_user:denise#32",  "label": "v_user", "user_id": [ "denise" ]  },
 661 | // { "id": "v_user:kiyu#62",    "label": "v_user", "user_id": [ "kiyu" ]    }
 662 | 
 663 | //  knows farrell, farrell doesn't know him
 664 | // { "id": "v_user:dave#38",    "label": "v_user", "user_id": [ "dave" ]    },
 665 | -------------------------------------------------
 666 | 
 667 | 
 668 | -------------------------------------------------
 669 | //  4.5
 670 | 
 671 | def l_user = "farrell"
 672 |    //
 673 | g.V().has("v_user", "user_id", l_user).
 674 |    in("e_knows").
 675 |    valueMap(true, "user_id")
 676 | 
 677 | // in
 678 | // { "id": "v_user:dave#38",   "label": "v_user", "user_id": [ "dave"   ] },
 679 | // { "id": "v_user:denise#32", "label": "v_user", "user_id": [ "denise" ] },
 680 | // { "id": "v_user:kiyu#62",   "label": "v_user", "user_id": [ "kiyu"   ] }
 681 | -------------------------------------------------
 682 | 
 683 | 
 684 | 
 685 | 
 686 | -------------------------------------------------
 687 | // %md
 688 | 
 689 | ### The 'as' step modulator
 690 | 
 691 | ### Use a simpler (non-recursive) relationship
 692 | -------------------------------------------------
 693 | 
 694 | 
 695 | Step modulators,
 696 |    https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/tpStepModTOC.html
 697 | 
 698 | 'as'  "Label an object in a traversal to use later in the traversal.
 699 |    https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravAs.html
 700 | 
 701 |    --
 702 | 
 703 | Similar to 'store()', which is a step
 704 |    "The store() step is a sideEffect step that stores information for later use in the traversal.
 705 |    https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravStore.html
 706 | 
 707 | And also, 'sack()', a step, and 'withSack(), a step modulator
 708 | 
 709 |    From,
 710 |       http://www.doanduyhai.com/blog/?p=13404
 711 | 
 712 |       "A sack is a local datastructure relative to each traverser, unlike global datastructure as
 713 |        when using aggregate()/store()/group(x)/subgraph(x).... It mean that each traverser is now
 714 |        equipped with his own sack. A sack can contain any type of data structure:
 715 | 
 716 | 
 717 | -------------------------------------------------
 718 | //  5.1
 719 | 
 720 | USE ks_ngkv;
 721 | 
 722 | DROP TABLE IF EXISTS ordered;
 723 | DROP TABLE IF EXISTS orders;
 724 | 
 725 | CREATE TABLE orders
 726 |    (
 727 |    order_id             INT,
 728 |    item                 TEXT,
 729 |    PRIMARY KEY((order_id))
 730 |    )
 731 |    WITH VERTEX LABEL v_orders
 732 |    ;
 733 | CREATE TABLE ordered
 734 |    (
 735 |    user_id              TEXT,
 736 |    order_id             INT,
 737 |    order_date           TEXT,
 738 |    PRIMARY KEY((user_id), order_id)
 739 |    )
 740 |    WITH EDGE LABEL e_ordered
 741 |    FROM v_user(user_id)
 742 |    TO v_orders(order_id);
 743 | 
 744 | INSERT INTO orders (order_id, item) VALUES (101, 'beer'  );
 745 | INSERT INTO orders (order_id, item) VALUES (102, 'chips' );
 746 | INSERT INTO orders (order_id, item) VALUES (103, 'wine'  );
 747 | INSERT INTO orders (order_id, item) VALUES (104, 'cheese');
 748 | 
 749 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('farrell', 101, 'Dec-31-2008');
 750 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('farrell', 102, 'Dec-31-2008');
 751 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('denise' , 103, 'Dec-31-2008');
 752 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('denise' , 104, 'Dec-31-2008');
 753 | -------------------------------------------------
 754 | 
 755 | 
 756 | -------------------------------------------------
 757 | //  5.2
 758 | 
 759 | def l_user = "farrell"
 760 |    //
 761 | g.V().has("v_user", "user_id", l_user).
 762 |    valueMap(true)
 763 | 
 764 | // { "id": "v_user:farrell#82", "label": "v_user", "user_id": [ "farrell" ], "age": [ "40" ] }
 765 | -------------------------------------------------
 766 | 
 767 | 
 768 | -------------------------------------------------
 769 | //  5.3
 770 | 
 771 | def l_user = "denise"
 772 |    //
 773 | g.V().has("v_user", "user_id", l_user).
 774 |    valueMap(true)
 775 | 
 776 | // { "id": "v_user:denise#32", "label": "v_user", "gender": [ "F" ], "user_id": [ "denise" ], "age": [ "30" ] }
 777 | -------------------------------------------------
 778 | 
 779 | 
 780 | -------------------------------------------------
 781 | //  5.4
 782 | 
 783 | def l_user = "farrell"
 784 |    //
 785 | g.V().has("v_user", "user_id", l_user).
 786 |    outE("e_ordered").
 787 |    valueMap(true)
 788 | 
 789 | // { "id": "v_user:farrell#82->e_ordered#03->v_orders:101#27", "label": "e_ordered", "order_date": "Dec-31-2008" },
 790 | // { "id": "v_user:farrell#82->e_ordered#03->v_orders:102#24", "label": "e_ordered", "order_date": "Dec-31-2008" }
 791 | -------------------------------------------------
 792 | 
 793 | 
 794 | -------------------------------------------------
 795 | //  5.5
 796 | 
 797 | def l_user = "farrell"
 798 |    //
 799 | g.V().has("v_user", "user_id", l_user).
 800 |    out("e_ordered").
 801 |    valueMap(true)
 802 | 
 803 | // { "id": "v_orders:101#27", "label": "v_orders", "item": [ "beer" ],  "order_id": [ "101" ] },
 804 | // { "id": "v_orders:102#24", "label": "v_orders", "item": [ "chips" ], "order_id": [ "102" ] }
 805 | -------------------------------------------------
 806 | 
 807 | 
 808 | -------------------------------------------------
 809 | //  5.6
 810 | 
 811 | def l_user = "farrell"
 812 |    //
 813 | g.V().has("v_user", "user_id", l_user).
 814 |    as("user").
 815 |    out("e_ordered").
 816 |    as("order").
 817 |    select("user", "order").
 818 |       by(valueMap(true)).
 819 |       by(valueMap(true))
 820 | 
 821 | { "user": {   "id": "v_user:farrell#82", "label": "v_user",   "user_id": [ "farrell" ], "age":      [ "40"  ] },
 822 |    "order": { "id": "v_orders:101#27",   "label": "v_orders", "item":    [ "beer"  ],   "order_id": [ "101" ] } },
 823 | { "user": {   "id": "v_user:farrell#82", "label": "v_user",   "user_id": [ "farrell" ], "age":      [ "40"  ] },
 824 |    "order": { "id": "v_orders:102#24",   "label": "v_orders", "item":    [ "chips" ],   "order_id": [ "102" ] } }
 825 | -------------------------------------------------
 826 | 
 827 | 
 828 | //  Why isn't order an embedded array to user ?
 829 | //
 830 | //  Think,
 831 | //
 832 | //     SELECT o.*
 833 | //     FROM user AS 'u'
 834 | //     INNER JOIN order AS 'o' ON u.user_id=o.user_id 
 835 | 
 836 | 
 837 | -------------------------------------------------
 838 | //  5.7
 839 | 
 840 | def l_user = "farrell"
 841 |    //
 842 | g.V().has("v_user", "user_id", l_user).
 843 |    as('a').
 844 |    out("e_knows").
 845 |    as('b').
 846 |    out("e_knows").
 847 |    where(eq('a')).      // this line
 848 |    select('a', 'b').
 849 |    by(values('user_id')).
 850 |    by(values('user_id'))
 851 | 
 852 | // { "a": "farrell", "b": "denise" },
 853 | // { "a": "farrell", "b": "kiyu"   }
 854 | -------------------------------------------------
 855 | 
 856 | 'this line' .. ..
 857 |    "each traverser compares the id value of 'b' to the id value of 'a' and if they
 858 |     aren't equal then it is filtered out
 859 | 
 860 | 
 861 | 
 862 | 
 863 | -------------------------------------------------
 864 | // %md
 865 | 
 866 | ### Back to KV
 867 | -------------------------------------------------
 868 | 
 869 | 
 870 | 
 871 | 
 872 | -------------------------------------------------
 873 | //  6.1
 874 | 
 875 | def l_user = "u1"
 876 |    //
 877 | g.V().has("v_user2", "user_id", l_user).
 878 |    valueMap(true)
 879 | 
 880 | // { "id": "v_user2:u1#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u1" ], "age": [ "17" ] }
 881 | -------------------------------------------------
 882 | 
 883 | 
 884 | -------------------------------------------------
 885 | //  6.2
 886 | 
 887 | def l_user = "u1"
 888 |    //
 889 | g.V().has("v_user2", "user_id", l_user).
 890 |    as('a').
 891 |    out("e_knows2").
 892 |    as('b').
 893 |    out("e_knows2").
 894 |    where(eq('a')).
 895 |    select('a', 'b').
 896 |    by(values('user_id')).
 897 |    by(values('user_id'))
 898 | 
 899 | //  No data
 900 | -------------------------------------------------
 901 | 
 902 | 
 903 | -------------------------------------------------
 904 | //  6.3
 905 | 
 906 | def l_user = "u1"
 907 |    //
 908 | g.V().has("v_user2", "user_id", l_user).
 909 |    out("e_knows2").
 910 |    valueMap(true)
 911 | 
 912 | // { "id": "v_user2:u151#66", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u151" ], "age": [ "17" ] },
 913 | // { "id": "v_user2:u192#77", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u192" ], "age": [ "14" ] },
 914 | // { "id": "v_user2:u74#16",  "label": "v_user2", "gender": [ "F" ], "user_id": [ "u74"  ], "age": [ "15" ] },
 915 | // { "id": "v_user2:u83#24",  "label": "v_user2", "gender": [ "F" ], "user_id": [ "u83"  ], "age": [ "14" ] }
 916 | -------------------------------------------------
 917 | 
 918 | 
 919 | -------------------------------------------------
 920 | //  6.4
 921 | 
 922 | def l_user = "u1"
 923 |    //
 924 | g.V().has("v_user2", "user_id", l_user).
 925 |    in("e_knows2").
 926 |    valueMap(true)
 927 | 
 928 | // { "id": "v_user2:u199#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u199" ], "age": [ "15" ] }
 929 | -------------------------------------------------
 930 | 
 931 | 
 932 | -------------------------------------------------
 933 | //  6.5 
 934 | 
 935 | def l_user = "u1"
 936 |    //
 937 | g.V().has("v_user2", "user_id", l_user).
 938 |    both("e_knows2").
 939 |    valueMap(true)
 940 | 
 941 | // { "id": "v_user2:u151#66", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u151" ], "age": [ "17" ] },
 942 | // { "id": "v_user2:u192#77", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u192" ], "age": [ "14" ] },
 943 | // { "id": "v_user2:u74#16",  "label": "v_user2", "gender": [ "F" ], "user_id": [ "u74" ],  "age": [ "15" ] }, 
 944 | // { "id": "v_user2:u83#24",  "label": "v_user2", "gender": [ "F" ], "user_id": [ "u83" ],  "age": [ "14" ] },
 945 |    //
 946 | // { "id": "v_user2:u199#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u199" ], "age": [ "15" ] }
 947 | -------------------------------------------------
 948 | 
 949 | 
 950 | 
 951 | 
 952 | -------------------------------------------------
 953 | // %md
 954 | 
 955 | ### Have at least (n) connections in common
 956 | 
 957 | ### Protoype using smaller/known data set
 958 | -------------------------------------------------
 959 | 
 960 | 
 961 | 
 962 | 
 963 | -------------------------------------------------
 964 | //  7.1
 965 | 
 966 | def l_user = "farrell"
 967 | def l_conn = 2
 968 |    //
 969 | g.V().has("v_user", "user_id", l_user).
 970 |    out("e_knows").
 971 |    in("e_knows").
 972 |    filter(
 973 |       has("user_id", neq(l_user))
 974 |       ).
 975 |    dedup().
 976 |    valueMap("user_id")
 977 | 
 978 | // { "user_id": [ "hatcher" ] },
 979 | // { "user_id": [ "kiyu"    ] }
 980 | -------------------------------------------------
 981 | 
 982 | 
 983 | Above
 984 |    Total set
 985 |       VALUES ('farrell', 'kiyu'   );
 986 |       VALUES ('farrell', 'denise' ); 
 987 |       VALUES ('farrell', 'morty'  ); 
 988 |       VALUES ('farrell', 'hatcher'); 
 989 |       VALUES ('kiyu'   , 'farrell'); 
 990 |       VALUES ('denise' , 'farrell'); 
 991 |       VALUES ('dave'   , 'farrell'); 
 992 |       VALUES ('kiyu'   , 'denise' ); 
 993 |       VALUES ('kiyu'   , 'dave'   ); 
 994 |       VALUES ('hatcher', 'denise' ); 
 995 |       VALUES ('kiyu'   , 'hatcher'); 
 996 |       VALUES ('denise' , 'dave'   );
 997 |    After 'has/out'
 998 |       VALUES ('farrell', 'kiyu'   );
 999 |       VALUES ('farrell', 'denise' ); 
1000 |       VALUES ('farrell', 'morty'  ); 
1001 |       VALUES ('farrell', 'hatcher'); 
1002 |    After 'in'
1003 |       VALUES ('farrell', 'kiyu'   );
1004 |       VALUES ('farrell', 'denise' ); 
1005 |       VALUES ('farrell', 'morty'  ); 
1006 |       VALUES ('farrell', 'hatcher'); 
1007 |       VALUES ('kiyu'   , 'denise' ); 
1008 |       VALUES ('hatcher', 'denise' ); 
1009 |       VALUES ('kiyu'   , 'hatcher'); 
1010 |    After 'filter'
1011 |       VALUES ('kiyu'   , 'denise' ); 
1012 |       VALUES ('hatcher', 'denise' ); 
1013 |       VALUES ('kiyu'   , 'hatcher'); 
1014 |    After 'dedup'                           //  Notice this is on keys ..
1015 |       { "user_id": [ "hatcher" ] },
1016 |       { "user_id": [ "kiyu"    ] }
1017 | 
1018 | 
1019 | -------------------------------------------------
1020 | //  7.2  
1021 | 
1022 | def l_user = "farrell"
1023 | def l_conn = 2
1024 |    //
1025 | g.withSack(1, sum).
1026 |    V().has("v_user", "user_id", l_user).
1027 |    out("e_knows").
1028 |    in("e_knows").
1029 |    filter(
1030 |       sack().is(gte(l_conn)).
1031 |       and().
1032 |       has("user_id", neq(l_user))
1033 |       ).
1034 |    dedup().
1035 |    valueMap("user_id")
1036 | 
1037 | // { "user_id": [ "kiyu" ] }
1038 | -------------------------------------------------
1039 | 
1040 | 
1041 | Above
1042 |    Total set
1043 |       VALUES ('farrell', 'kiyu'   );
1044 |       VALUES ('farrell', 'denise' ); 
1045 |       VALUES ('farrell', 'morty'  ); 
1046 |       VALUES ('farrell', 'hatcher'); 
1047 |       VALUES ('kiyu'   , 'farrell'); 
1048 |       VALUES ('denise' , 'farrell'); 
1049 |       VALUES ('dave'   , 'farrell'); 
1050 |       VALUES ('kiyu'   , 'denise' ); 
1051 |       VALUES ('kiyu'   , 'dave'   ); 
1052 |       VALUES ('hatcher', 'denise' ); 
1053 |       VALUES ('kiyu'   , 'hatcher'); 
1054 |       VALUES ('denise' , 'dave'   );
1055 |    After 'has/out'
1056 |       VALUES ('farrell', 'kiyu'   );
1057 |       VALUES ('farrell', 'denise' ); 
1058 |       VALUES ('farrell', 'morty'  ); 
1059 |       VALUES ('farrell', 'hatcher'); 
1060 |    After 'in'
1061 |       VALUES ('farrell', 'kiyu'   );
1062 |       VALUES ('farrell', 'denise' ); 
1063 |       VALUES ('kiyu'   , 'denise' ); 
1064 |       VALUES ('hatcher', 'denise' ); 
1065 |       VALUES ('farrell', 'morty'  ); 
1066 |       VALUES ('farrell', 'hatcher'); 
1067 |       VALUES ('kiyu'   , 'hatcher'); 
1068 |    After 'filter', just not farrell
1069 |       VALUES ('kiyu'   , 'denise' ); 
1070 |       VALUES ('hatcher', 'denise' ); 
1071 |       VALUES ('kiyu'   , 'hatcher'); 
1072 |    After 'filter', 2 or more in common with farrell
1073 |       VALUES ('kiyu'   , 'denise' ); 
1074 |       VALUES ('kiyu'   , 'hatcher'); 
1075 |    After 'dedup'                           //  Notice this is on keys ..
1076 |       // { "user_id": [ "kiyu" ] }
1077 | 
1078 | 
1079 | 
1080 | 
1081 | -------------------------------------------------
1082 | // %md
1083 | 
1084 | ### Back to KV
1085 | -------------------------------------------------
1086 | 
1087 | 
1088 | 
1089 | 
1090 | -------------------------------------------------
1091 | //  8.1
1092 | 
1093 | def l_user = "u1"
1094 | def l_conn = 2
1095 |    //
1096 | g.withSack(1, sum).
1097 |    V().has("v_user2", "user_id", l_user).
1098 |    out("e_knows2").
1099 |    both("e_knows2").
1100 |    filter(
1101 |       sack().is(gte(l_conn)).
1102 |       and().
1103 |       has("user_id", neq(l_user))
1104 |       ).
1105 |    dedup().
1106 |    valueMap("user_id")
1107 | 
1108 | // { "user_id": [ "u49" ] }
1109 | -------------------------------------------------
1110 | 
1111 | 
1112 | -------------------------------------------------
1113 | //  8.2
1114 | 
1115 | def l_user = "u1"
1116 |    //
1117 | g.V().has("v_user2", "user_id", l_user).
1118 |    out("e_knows2").
1119 |    valueMap("user_id")
1120 | 
1121 | { "user_id": [ "u151" ] },
1122 | { "user_id": [ "u192" ] },
1123 | { "user_id": [ "u74"  ] },   //  <--
1124 | { "user_id": [ "u83"  ] }    //  <--
1125 | -------------------------------------------------
1126 | 
1127 | 
1128 | -------------------------------------------------
1129 | //  8.3
1130 | 
1131 | def l_user = "u49"
1132 |    //
1133 | g.V().has("v_user2", "user_id", l_user).
1134 |    both("e_knows2").
1135 |    valueMap("user_id")
1136 | 
1137 | { "user_id": [ "u132" ] },
1138 | { "user_id": [ "u150" ] },
1139 | { "user_id": [ "u165" ] },
1140 | { "user_id": [ "u435" ] },
1141 | { "user_id": [ "u74"  ] },   //  <--
1142 | { "user_id": [ "u926" ] },
1143 | { "user_id": [ "u116" ] },
1144 | { "user_id": [ "u122" ] },
1145 | { "user_id": [ "u135" ] },
1146 | { "user_id": [ "u47"  ] },
1147 | { "user_id": [ "u54"  ] },
1148 | { "user_id": [ "u80"  ] },
1149 | { "user_id": [ "u83"  ] }    //  <--
1150 | -------------------------------------------------
1151 | 
1152 | 
1153 | 
1154 | 
1155 | -------------------------------------------------
1156 | // %md
1157 | 
1158 | ### Staying in KV
1159 | 
1160 | ### What likes do any two persons share  (Do they like the same movies)
1161 | -------------------------------------------------
1162 | 
1163 | 
1164 | 
1165 | 
1166 | -------------------------------------------------
1167 | //  9.1
1168 | 
1169 | def l_user = "u1"
1170 | def l_rate = 8
1171 |    //
1172 | g.V().has("v_user2", "user_id", l_user).
1173 |    outE("e_rated2").
1174 |    count()
1175 | 
1176 | // 32
1177 | -------------------------------------------------
1178 | 
1179 | 
1180 | //  This traversal requires an index
1181 | -------------------------------------------------
1182 | //  9.2
1183 | 
1184 | def l_user = "u1"
1185 | def l_rate = 8
1186 |    //
1187 | g.V().has("v_user2", "user_id", l_user).
1188 |    outE("e_rated2").
1189 |    has("rating",gte(l_rate)).
1190 |    count()
1191 | 
1192 | // 15
1193 | -------------------------------------------------
1194 | 
1195 | 
1196 | //  Step 1: Users like the same movie
1197 | //     User liked same move as target user
1198 | -------------------------------------------------
1199 | //  9.3
1200 | 
1201 | def l_user = "u1"
1202 | def l_rate = 8
1203 |    //
1204 | g.V().has("v_user2", "user_id", l_user).
1205 |    outE("e_rated2").
1206 |    has("rating", gte(l_rate)).
1207 |    inV().
1208 |    inE("e_rated2").
1209 |    has("rating", gte(l_rate)).
1210 |    count()
1211 | 
1212 | //  962
1213 | -------------------------------------------------
1214 | 
1215 | 
1216 | //  Step 2: Same as above, data view
1217 | -------------------------------------------------
1218 | //  9.4
1219 | 
1220 | def l_user = "u1"
1221 | def l_rate = 8
1222 |    //
1223 | g.V().has("v_user2", "user_id", l_user).
1224 |    outE("e_rated2").
1225 |    has("rating", gte(l_rate)).
1226 |    as("a").
1227 |    inV().
1228 |    inE("e_rated2").
1229 |    has("rating", gte(l_rate)).
1230 |    as("b").
1231 |    limit(5).
1232 |    select("a", "b").
1233 |       by(valueMap(true)).
1234 |       by(valueMap(true)) 
1235 | 
1236 | // {
1237 | //   "a": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" },
1238 | //   "b": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }
1239 | // },
1240 | // {
1241 | //   "a": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" },
1242 | //   "b": { "id": "v_user2:u118#25->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "9"  }
1243 | // },
1244 | // {
1245 | //   "a": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" },
1246 | //   "b": { "id": "v_user2:u119#24->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "8"  }
1247 | // },
1248 | // {
1249 | //   "a": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" },
1250 | //   "b": { "id": "v_user2:u125#15->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "9"  }
1251 | // },
1252 | // {
1253 | //   "a": { "id": "v_user2:u1#16->e_rated2#10  ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" },
1254 | //   "b": { "id": "v_user2:u143#15->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "8"  }
1255 | // }
1256 | //      ...
1257 | -------------------------------------------------
1258 | 
1259 | 
1260 | //  Step 3: Same as 9.3, not the target user, remove
1261 | //     duplicate user ids, count
1262 | -------------------------------------------------
1263 | //  9.5
1264 | 
1265 | def l_user = "u1"
1266 | def l_rate = 8
1267 |    //
1268 | g.V().has("v_user2", "user_id", l_user).
1269 |    outE("e_rated2").
1270 |    has("rating", gte(l_rate)).
1271 |    inV().
1272 |    inE("e_rated2").
1273 |    has("rating", gte(l_rate)).
1274 |       //
1275 |    outV().
1276 |    has("user_id", neq(l_user)).
1277 |    dedup().
1278 |    count()
1279 | 
1280 | //  599
1281 | -------------------------------------------------
1282 | 
1283 | 
1284 | //  Step 4: Same as above, data view
1285 | -------------------------------------------------
1286 | //  9.6
1287 | 
1288 | def l_user = "u1"
1289 | def l_rate = 8
1290 |    //
1291 | g.V().has("v_user2", "user_id", l_user).
1292 |    outE("e_rated2").
1293 |    has("rating", gte(l_rate)).
1294 |    inV().
1295 |    inE("e_rated2").
1296 |    has("rating", gte(l_rate)).
1297 |       //
1298 |    outV().
1299 |    has("user_id", neq(l_user)).
1300 |    dedup().
1301 |    limit(5).
1302 |    valueMap(true)
1303 | 
1304 | // { "id": "v_user2:u1027#65", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u1027" ], "age": [ "22" ] },
1305 | // { "id": "v_user2:u106#14" , "label": "v_user2", "gender": [ "F" ], "user_id": [ "u106"  ], "age": [ "14" ] },
1306 | // { "id": "v_user2:u168#22" , "label": "v_user2", "gender": [ "M" ], "user_id": [ "u168"  ], "age": [ "15" ] },
1307 | // { "id": "v_user2:u19#77"  , "label": "v_user2", "gender": [ "F" ], "user_id": [ "u19"   ], "age": [ "13" ] },
1308 | // { "id": "v_user2:u2#19"   , "label": "v_user2", "gender": [ "M" ], "user_id": [ "u2"    ], "age": [ "12" ] }
1309 | //      ...
1310 | -------------------------------------------------
1311 | 
1312 | 
1313 | //  Step 5: Other user likes the same movie as
1314 | //     target user (n) or more times
1315 | -------------------------------------------------
1316 | //  9.7
1317 | 
1318 | def l_user  = "u1"
1319 | def l_rate  = 8
1320 | def l_count = 5
1321 |    //
1322 | g.withSack(1, sum).
1323 |    V().has("v_user2", "user_id", l_user).
1324 |    outE("e_rated2").
1325 |    has("rating", gte(l_rate)).
1326 |    inV().
1327 |    inE("e_rated2").
1328 |    has("rating", gte(l_rate)).
1329 |       //
1330 |    outV().
1331 |    filter(
1332 |       sack().is(gte(l_count)).
1333 |       and().
1334 |       has("user_id", neq(l_user))
1335 |    ).
1336 |    dedup().
1337 |    valueMap(true)
1338 | 
1339 | // { "id": "v_user2:u156#69", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u156" ], "age": [ "15" ] },
1340 | // { "id": "v_user2:u119#78", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u119" ], "age": [ "17" ] },
1341 | // { "id": "v_user2:u13#17",  "label": "v_user2", "gender": [ "M" ], "user_id": [ "u13"  ], "age": [ "14" ] },
1342 | // { "id": "v_user2:u143#65", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u143" ], "age": [ "17" ] },
1343 | // { "id": "v_user2:u4#67",   "label": "v_user2", "gender": [ "F" ], "user_id": [ "u4"   ], "age": [ "16" ] },
1344 | // { "id": "v_user2:u85#22",  "label": "v_user2", "gender": [ "M" ], "user_id": [ "u85"  ], "age": [ "13" ] }
1345 | -------------------------------------------------
1346 | 
1347 | 
1348 |    
1349 | 
1350 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravSideEffect.html
1351 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravAggregate.html
1352 | -------------------------------------------------
1353 | // %md
1354 | 
1355 | ### Staying in KV
1356 | 
1357 | ### sideEffect(), and where(without())
1358 | -------------------------------------------------
1359 | 
1360 | 
1361 | -------------------------------------------------
1362 | //  10.1
1363 | 
1364 | def l_user = "u1"
1365 | def l_rate = 8
1366 |    //
1367 | g.V().has("v_user2", "user_id", l_user).
1368 |    outE("e_rated2").
1369 |    count()
1370 | 
1371 | //  32
1372 | -------------------------------------------------
1373 | 
1374 | 
1375 | -------------------------------------------------
1376 | //  10.2
1377 | 
1378 | def l_user = "u1"
1379 | def l_rate = 8
1380 |    //
1381 | g.V().has("v_user2", "user_id", l_user).
1382 |    outE("e_rated2").
1383 |    has("rating",lt(l_rate)).
1384 |    count()
1385 | 
1386 | //  17
1387 | -------------------------------------------------
1388 | 
1389 | 
1390 | -------------------------------------------------
1391 | //  10.3
1392 | 
1393 | def l_user = "u1"
1394 | def l_rate = 8
1395 |    //
1396 | g.V().has("v_user2", "user_id", l_user).
1397 |    outE("e_rated2").
1398 |    has("rating",gte(l_rate)).
1399 |    count()
1400 | 
1401 | //  15
1402 | -------------------------------------------------
1403 | 
1404 | 
1405 | -------------------------------------------------
1406 | //  10.4
1407 | 
1408 | def l_user = "u1"
1409 | def l_rate = 8
1410 |    //
1411 | g.V().has("v_user2", "user_id", l_user).
1412 |    sideEffect(
1413 |       outE("e_rated2").
1414 |       has("rating",lt(l_rate)).
1415 |       aggregate("a")
1416 |       ).
1417 |    outE("e_rated2").
1418 |    count()
1419 | 
1420 | //  32
1421 | -------------------------------------------------
1422 | 
1423 | 
1424 | -------------------------------------------------
1425 | //  10.5
1426 | 
1427 | def l_user = "u1"
1428 | def l_rate = 8
1429 |    //
1430 | g.V().has("v_user2", "user_id", l_user).
1431 |    sideEffect(
1432 |       outE("e_rated2").
1433 |       has("rating",lt(l_rate)).
1434 |       aggregate("a")
1435 |       ).
1436 |    outE("e_rated2").
1437 |    where(without("a")).
1438 |    count()
1439 | 
1440 | //  15
1441 | -------------------------------------------------
1442 | 
1443 | 
1444 | -------------------------------------------------
1445 | //  10.6
1446 | 
1447 | def l_user = "u1"
1448 | def l_rate = 8
1449 |    //
1450 | g.V().has("v_user2", "user_id", l_user).
1451 |    sideEffect(
1452 |       outE("e_rated2").
1453 |       has("rating",lt(l_rate)).
1454 |       aggregate("a")
1455 |       ).
1456 |    outE("e_rated2").
1457 |    where(without("a")).
1458 |    valueMap(true)
1459 | 
1460 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m102#07", "label": "e_rated2", "rating": "8"  },
1461 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m111#05", "label": "e_rated2", "rating": "10" },
1462 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m160#03", "label": "e_rated2", "rating": "8"  },
1463 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m223#07", "label": "e_rated2", "rating": "8"  },
1464 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m28#62",  "label": "e_rated2", "rating": "9"  },
1465 | //   ...
1466 | -------------------------------------------------
1467 | 
1468 | 
1469 | 
1470 | 
1471 | -------------------------------------------------
1472 | // %md
1473 | 
1474 | ### Final assembly
1475 | -------------------------------------------------
1476 | 
1477 | 
1478 | -------------------------------------------------
1479 | //  11.1
1480 | 
1481 | def l_user  = "u1"
1482 | def l_conn  = 2
1483 | def l_rate  = 8
1484 | 
1485 | g.withSack(1,sum).
1486 |    V().
1487 |    has("v_user2", "user_id", l_user).
1488 |    sideEffect(
1489 |       out("e_rated2").
1490 |       aggregate("se_movieSeenByUser")
1491 |       ).
1492 |    both("e_knows2").
1493 |    filter(
1494 |       outE("e_rated2").
1495 |       has("rating",gte(l_rate)).
1496 |       inV().
1497 |       inE("e_rated2").
1498 |       has("rating",gte(l_rate)).
1499 |       outV().
1500 |       filter(
1501 |          has("user_id", l_user).
1502 |          and().
1503 |          sack().is(gte(l_conn)))
1504 |   ).    
1505 |   outE("e_rated2").
1506 |   has("rating", gte(l_rate)).
1507 |   inV().
1508 |   dedup().
1509 |   where(
1510 |      without("se_movieSeenByUser")
1511 |      ).
1512 |   order().
1513 |      by(
1514 |         inE("e_rated2").
1515 |         values("rating").
1516 |         mean(),
1517 |         decr).
1518 |    limit(5).
1519 |    valueMap("movie_id", "title", "year")
1520 | 
1521 | // { "movie_id": [ "m278" ], "title": [ "The Simpsons (TV Series)" ], "year": [ "1989" ] },
1522 | // { "movie_id": [ "m274" ], "title": [ "One Flew Over the Cuckoos Nest" ], "year": [ "1975" ] },
1523 | // { "movie_id": [ "m13" ], "title": [ "The Pianist" ], "year": [ "2002" ] }, 
1524 | // { "movie_id": [ "m163" ], "title": [ "The Good', ' the Bad and the Ugly" ], "year": [ "1966" ] },
1525 | // { "movie_id": [ "m351" ], "title": [ "Forrest Gump" ], "year": [ "1994" ] }
1526 | -------------------------------------------------
1527 | 
1528 | 
1529 | 
1530 | 
1531 | -------------------------------------------------
1532 | -------------------------------------------------
1533 | -------------------------------------------------
1534 | -------------------------------------------------
1535 | 
1536 | 
1537 | // On the subtopic of 'normalization' ..
1538 | 
1539 | // Below requires an index
1540 | -------------------------------------------------
1541 | //  12.1
1542 | 
1543 | def l_movie = "m267"
1544 | def l_score = 7
1545 | 
1546 | g.V().has("v_movie2", "movie_id", l_movie).
1547 |    out("e_belongs_to2").
1548 |    in("e_belongs_to2").
1549 |    dedup().
1550 |    count()
1551 | 
1552 | //  317
1553 | -------------------------------------------------
1554 | 
1555 | 
1556 | //  Below is not done, no data returned
1557 | -------------------------------------------------
1558 | //  12.2
1559 | 
1560 | def l_movie = "m267"
1561 | def l_score = 70
1562 | 
1563 | g.V().has("v_movie2", "movie_id", l_movie).
1564 |    as("y1", "r1").
1565 |    out("e_belongs_to2").
1566 |    in("e_belongs_to2").
1567 |    dedup().
1568 |    has("movie_id", neq(l_movie)).
1569 |    as("y2", "r2").
1570 |    filter(
1571 |       math( "sqrt ( (y1 - y2)^2 + (r1 - r2)^2 )" ).
1572 |       by(values("year")).
1573 |       by(values("year")).
1574 |       by(inE("e_rated2").values("rating").mean().math( "ceil (_ * 10)" )).
1575 |       by(inE("e_rated2").values("rating").mean().math( "ceil (_ * 10)" )) // .
1576 |       // is(lte(l_score))
1577 |    )
1578 | -------------------------------------------------
1579 | 
1580 | 
1581 | 
1582 | 
1583 | -------------------------------------------------
1584 | -------------------------------------------------
1585 | -------------------------------------------------
1586 | -------------------------------------------------
1587 | 
1588 | 
1589 | From Artem
1590 | 
1591 | 
1592 | def similarityM3(user, minPositiveRating) { 
1593 |    g.V().has("user","userId",user).
1594 |       outE("rated").
1595 |       has("rating",gte(minPositiveRating)).
1596 |       inV().
1597 |       inE("rated").
1598 |       has("rating",gte(minPositiveRating)).
1599 |       outV().
1600 |       dedup().
1601 |       has("userId",neq(user)).
1602 |       toSet()
1603 |    }
1604 | 
1605 | def similarityM5(user, minPositiveRating, commonMovies) { 
1606 |    g.withSack(1,sum).
1607 |    V().
1608 |    has("user","userId",user).
1609 |    both("knows").
1610 |    filter(
1611 |       outE("rated").
1612 |       has("rating",gte(minPositiveRating)).
1613 |       inV().
1614 |       inE("rated").
1615 |       has("rating",gte(minPositiveRating)).
1616 |       outV().
1617 |       filter(
1618 |          has("userId",user).
1619 |          and().
1620 |          sack().is(gte(commonMovies))
1621 |          )
1622 |       ).
1623 |       toSet()
1624 |    }
1625 | 
1626 | def watchedMovies(user) {
1627 |    g.V().
1628 |    has("user","userId",user).
1629 |    out("rated").
1630 |    toSet() 
1631 |    }
1632 | 
1633 | def recommendM3(user, minPositiveRating) {
1634 |    g.V(similarityM3(user,minPositiveRating)).
1635 |    outE("rated").has("rating",gte(minPositiveRating)).
1636 |    inV().
1637 |    dedup().
1638 |    is(without(watchedMovies(user))).
1639 |    toSet()
1640 |    }
1641 | 
1642 | def recommendM5(user, minPositiveRating, commonMovies) {
1643 |    g.V(similarityM5(user,minPositiveRating,commonMovies)).
1644 |    outE("rated").has("rating",gte(minPositiveRating)).
1645 |    inV().
1646 |    dedup().
1647 |    is(without(watchedMovies(user))).
1648 |    toSet()
1649 |    }
1650 | 
1651 | 
1652 | 
1653 | MMMM
1654 | 
1655 | 
1656 | def m3 = recommendM3("u1", 8)
1657 | 
1658 | def m5 = recommendM5("u1", 8, 2)
1659 | 
1660 | m3.intersect(m5)
1661 | 
1662 | 
1663 | MMM
1664 | 
1665 | 
1666 | def user = "u1"  
1667 | def minPositiveRating = 8 
1668 | def commonMovies = 2
1669 | 
1670 | g.withSack(1,sum).V().has("user","userId",user).
1671 |   sideEffect(out("rated").aggregate("watchedMovies")).
1672 |   both("knows").
1673 |   filter(
1674 |     outE("rated").has("rating",gte(minPositiveRating)).inV().
1675 |     inE("rated").has("rating",gte(minPositiveRating)).outV().
1676 |     filter(has("userId",user).and().
1677 |            sack().is(gte(commonMovies)))
1678 |   ).    
1679 |   outE("rated").has("rating",gte(minPositiveRating)).
1680 |   inV().
1681 |   dedup().
1682 |   where(without("watchedMovies"))
1683 | 
1684 | 
1685 | MMM
1686 | 
1687 | 
1688 | def user = "u1"  
1689 | def minPositiveRating = 8 
1690 | def commonMovies = 2
1691 | // Number of final recommendations
1692 | def numRecommendations = 10
1693 | 
1694 | def rs1 = 
1695 | g.withSack(1,sum).V().has("user","userId",user).
1696 |   sideEffect(out("rated").aggregate("watchedMovies")).
1697 |   both("knows").
1698 |   filter(
1699 |     outE("rated").has("rating",gte(minPositiveRating)).inV().
1700 |     inE("rated").has("rating",gte(minPositiveRating)).outV().
1701 |     filter(has("userId",user).and().
1702 |            sack().is(gte(commonMovies)))
1703 |   ).    
1704 |   outE("rated").has("rating",gte(minPositiveRating)).
1705 |   inV().
1706 |   dedup().
1707 |   where(without("watchedMovies")).
1708 |   // Ordering by a movie average rating
1709 |   order().by(inE("rated").values("rating").mean(),decr).
1710 |   limit(numRecommendations).
1711 |   toList()
1712 | 
1713 | 
1714 | MMM
1715 | 
1716 | 
1717 | 


--------------------------------------------------------------------------------
/2019/DDN_2019_31f_KillrVideoDDL.cql:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | 
  4 | 
  5 | // -------------------------------------------------
  6 | 
  7 | 
  8 | DROP KEYSPACE IF EXISTS ks_ngkv;
  9 | 
 10 | CREATE KEYSPACE ks_ngkv
 11 |    WITH replication = {'class': 'SimpleStrategy',
 12 |       'replication_factor': 1}
 13 |    AND graph_engine = 'Native';
 14 | 
 15 | USE ks_ngkv;
 16 | 
 17 | 
 18 | // -------------------------------------------------
 19 | 
 20 | 
 21 | CREATE TABLE user2
 22 |    (
 23 |    user_id              TEXT,
 24 |    gender               TEXT,
 25 |    age                  INT,
 26 |    PRIMARY KEY((user_id))
 27 |    )
 28 |    WITH VERTEX LABEL v_user2
 29 |    ;
 30 | 
 31 | CREATE TABLE genre2
 32 |    (
 33 |    genre_id             TEXT,
 34 |    name                 TEXT,
 35 |    PRIMARY KEY((genre_id))
 36 |    )
 37 |    WITH VERTEX LABEL v_genre2
 38 |    ;
 39 | 
 40 | CREATE TABLE person2
 41 |    (
 42 |    person_id            TEXT,
 43 |    name                 TEXT,
 44 |    PRIMARY KEY((person_id))
 45 |    )
 46 |    WITH VERTEX LABEL v_person2
 47 |    ;
 48 | 
 49 | CREATE TABLE movie2
 50 |    (
 51 |    movie_id             TEXT,
 52 |    duration             INT,
 53 |    country              TEXT,
 54 |    year                 INT,
 55 |    production           LIST<TEXT>,
 56 |    title                TEXT,
 57 |    PRIMARY KEY((movie_id))
 58 |    )
 59 |    WITH VERTEX LABEL v_movie2
 60 |    ;
 61 | 
 62 | 
 63 | // -------------------------------------------------
 64 | 
 65 | 
 66 | CREATE TABLE screenwriter2
 67 |    (
 68 |    movie_id             TEXT,
 69 |    person_id            TEXT,
 70 |    PRIMARY KEY((movie_id), person_id)
 71 |    )
 72 |    WITH EDGE LABEL e_screenwriter2
 73 |    FROM v_movie2(movie_id)
 74 |    TO v_person2(person_id);
 75 | CREATE MATERIALIZED VIEW e_screenwriter_bi2
 76 |    AS SELECT movie_id, person_id
 77 |    FROM screenwriter2
 78 |    WHERE
 79 |       movie_id IS NOT NULL
 80 |    AND
 81 |       person_id IS NOT NULL
 82 |    PRIMARY KEY ((person_id), movie_id);
 83 | 
 84 | CREATE TABLE cinematographer2
 85 |    (
 86 |    movie_id             TEXT,
 87 |    person_id            TEXT,
 88 |    PRIMARY KEY((movie_id), person_id)
 89 |    )
 90 |    WITH EDGE LABEL e_cinematographer2
 91 |    FROM v_movie2(movie_id)
 92 |    TO v_person2(person_id);
 93 | CREATE MATERIALIZED VIEW e_cinematographer_bi2
 94 |    AS SELECT movie_id, person_id
 95 |    FROM cinematographer2
 96 |    WHERE
 97 |       movie_id IS NOT NULL
 98 |    AND
 99 |       person_id IS NOT NULL
100 |    PRIMARY KEY ((person_id), movie_id);
101 | 
102 | CREATE TABLE actor2
103 |    (
104 |    movie_id             TEXT,
105 |    person_id            TEXT,
106 |    id                   TEXT,
107 |    PRIMARY KEY((movie_id), person_id, id)
108 |    )
109 |    WITH EDGE LABEL e_actor2
110 |    FROM v_movie2(movie_id)
111 |    TO v_person2(person_id);
112 | CREATE MATERIALIZED VIEW e_actor_bi2
113 |    AS SELECT movie_id, person_id
114 |    FROM actor2
115 |    WHERE
116 |       movie_id IS NOT NULL
117 |    AND
118 |       person_id IS NOT NULL
119 |    AND
120 |       id IS NOT NULL
121 |    PRIMARY KEY ((person_id), movie_id, id);
122 | 
123 | CREATE TABLE director2
124 |    (
125 |    movie_id             TEXT,
126 |    person_id            TEXT,
127 |    PRIMARY KEY((movie_id), person_id)
128 |    )
129 |    WITH EDGE LABEL e_director2
130 |    FROM v_movie2(movie_id)
131 |    TO v_person2(person_id);
132 | CREATE MATERIALIZED VIEW e_director_bi2
133 |    AS SELECT movie_id, person_id
134 |    FROM director2
135 |    WHERE
136 |       movie_id IS NOT NULL
137 |    AND
138 |       person_id IS NOT NULL
139 |    PRIMARY KEY ((person_id), movie_id);
140 | 
141 | CREATE TABLE composer2
142 |    (
143 |    movie_id             TEXT,
144 |    person_id            TEXT,
145 |    PRIMARY KEY((movie_id), person_id)
146 |    )
147 |    WITH EDGE LABEL e_composer2
148 |    FROM v_movie2(movie_id)
149 |    TO v_person2(person_id);
150 | CREATE MATERIALIZED VIEW e_composer_bi2
151 |    AS SELECT movie_id, person_id
152 |    FROM composer2
153 |    WHERE
154 |       movie_id IS NOT NULL
155 |    AND
156 |       person_id IS NOT NULL
157 |    PRIMARY KEY ((person_id), movie_id);
158 | 
159 | 
160 | CREATE TABLE knows2
161 |    (
162 |    user_id_s            TEXT,
163 |    user_id_d            TEXT,
164 |    PRIMARY KEY((user_id_s), user_id_d)
165 |    )
166 |    WITH EDGE LABEL e_knows2
167 |    FROM v_user2(user_id_s)
168 |    TO v_user2(user_id_d);
169 | CREATE MATERIALIZED VIEW e_knows_bi2
170 |    AS SELECT user_id_s, user_id_d
171 |    FROM knows2
172 |    WHERE
173 |       user_id_s IS NOT NULL
174 |    AND
175 |       user_id_d IS NOT NULL
176 |    PRIMARY KEY ((user_id_d), user_id_s);
177 | 
178 | CREATE TABLE rated2
179 |    (
180 |    user_id              TEXT,
181 |    movie_id             TEXT,
182 |    rating               INT,
183 |    PRIMARY KEY((user_id), movie_id)
184 |    )
185 |    WITH EDGE LABEL e_rated2
186 |    FROM v_user2(user_id)
187 |    TO v_movie2(movie_id);
188 | CREATE MATERIALIZED VIEW e_rated_bi2
189 |    AS SELECT user_id, movie_id
190 |    FROM rated2
191 |    WHERE
192 |       user_id IS NOT NULL
193 |    AND
194 |       movie_id IS NOT NULL
195 |    PRIMARY KEY ((user_id), movie_id);
196 | 
197 | CREATE TABLE belongs_to2
198 |    (
199 |    movie_id             TEXT,
200 |    genre_id             TEXT,
201 |    PRIMARY KEY((movie_id), genre_id)
202 |    )
203 |    WITH EDGE LABEL e_belongs_to2
204 |    FROM v_movie2(movie_id)
205 |    TO v_genre2(genre_id);
206 | CREATE MATERIALIZED VIEW e_belongs_to_bi2
207 |    AS SELECT movie_id, genre_id
208 |    FROM belongs_to2
209 |    WHERE
210 |       movie_id IS NOT NULL
211 |    AND
212 |       genre_id IS NOT NULL
213 |    PRIMARY KEY ((movie_id), genre_id);
214 | 
215 | 
216 | // -------------------------------------------------
217 | 
218 | 
219 | CREATE SEARCH INDEX ON ks_ngkv.rated2
220 |    WITH COLUMNS rating { docValues : true };
221 | 
222 | CREATE SEARCH INDEX ON ks_ngkv.belongs_to2
223 |    WITH COLUMNS genre_id { docValues : true };
224 | 
225 | 
226 | //  I didn't do all of the secondary indexes yet-
227 | //
228 | //  Below is the index DDL from Artem's KillrVideo
229 | 
230 | 
231 |    // Define vertex indexes
232 | 
233 |    // schema.vertexLabel("genre").index("genresByName").materialized().by("name").add()
234 |    // schema.vertexLabel("person").index("personsByName").materialized().by("name").add()
235 |    // schema.vertexLabel("user").index("usersByAge").secondary().by("age").add()
236 | 
237 |    // schema.vertexLabel("movie").index("moviesByTitle").materialized().by("title").add()
238 |    // schema.vertexLabel("movie").index("search").search().by("title").asText().add()
239 |    // schema.vertexLabel("movie").index("moviesByYear").secondary().by("year").add()
240 | 
241 |    // Define edge indexes
242 | 
243 |    // schema.vertexLabel("user").index("toMoviesByRating").outE("rated").by("rating").add()
244 |    // schema.vertexLabel("movie").index("toUsersByRating").inE("rated").by("rating").add()
245 | 
246 | 
247 | // -------------------------------------------------
248 | 
249 | 
250 | COPY user2 (
251 |    user_id,
252 |    gender,
253 |    age
254 |    )
255 |    FROM '20 KV data as pipe/10 user.pipe'
256 |    WITH
257 |       DELIMITER = '|'    
258 |       AND
259 |       HEADER = TRUE
260 |    ;
261 | 
262 | COPY genre2 (
263 |    genre_id,
264 |    name
265 |    )
266 |    FROM '20 KV data as pipe/11 genre.pipe'
267 |    WITH
268 |       DELIMITER = '|'    
269 |       AND
270 |       HEADER = TRUE
271 |    ;
272 | 
273 | COPY person2 (
274 |    person_id,
275 |    name
276 |    )
277 |    FROM '20 KV data as pipe/12 person.pipe'
278 |    WITH
279 |       DELIMITER = '|'    
280 |       AND
281 |       HEADER = TRUE
282 |    ;
283 | 
284 | COPY movie2 (
285 |    movie_id,
286 |    duration,
287 |    country,
288 |    year,
289 |    production,
290 |    title
291 |    )
292 |    FROM '20 KV data as pipe/13 movie.pipe'
293 |    WITH
294 |       DELIMITER = '|'    
295 |       AND
296 |       HEADER = TRUE
297 |    ;
298 | 
299 | 
300 | // -------------------------------------------------
301 | 
302 | 
303 | COPY screenwriter2 (
304 |    movie_id,
305 |    person_id
306 |    )
307 |    FROM '20 KV data as pipe/20 screenwriter.pipe'
308 |    WITH
309 |       DELIMITER = '|'    
310 |       AND
311 |       HEADER = TRUE
312 |    ;
313 | 
314 | COPY cinematographer2 (
315 |    movie_id,
316 |    person_id
317 |    )
318 |    FROM '20 KV data as pipe/21 cinematographer.pipe'
319 |    WITH
320 |       DELIMITER = '|'    
321 |       AND
322 |       HEADER = TRUE
323 |    ;
324 | 
325 | COPY actor2 (
326 |    movie_id,
327 |    person_id,
328 |    id
329 |    )
330 |    FROM '20 KV data as pipe/22 actor.pipe'
331 |    WITH
332 |       DELIMITER = '|'    
333 |       AND
334 |       HEADER = TRUE
335 |    ;
336 | 
337 | COPY director2 (
338 |    movie_id,
339 |    person_id
340 |    )
341 |    FROM '20 KV data as pipe/23 director.pipe'
342 |    WITH
343 |       DELIMITER = '|'    
344 |       AND
345 |       HEADER = TRUE
346 |    ;
347 | 
348 | COPY composer2 (
349 |    movie_id,
350 |    person_id
351 |    )
352 |    FROM '20 KV data as pipe/24 composer.pipe'
353 |    WITH
354 |       DELIMITER = '|'    
355 |       AND
356 |       HEADER = TRUE
357 |    ;
358 | 
359 | 
360 | // -------------------------------------------------
361 | 
362 | 
363 | COPY knows2 (
364 |    user_id_s,
365 |    user_id_d
366 |    )
367 |    FROM '20 KV data as pipe/30 knows.pipe'
368 |    WITH
369 |       DELIMITER = '|'    
370 |       AND
371 |       HEADER = TRUE
372 |    ;
373 | 
374 | COPY rated2 (
375 |    user_id,
376 |    movie_id,
377 |    rating
378 |    )
379 |    FROM '20 KV data as pipe/31 rated.pipe'
380 |    WITH
381 |       DELIMITER = '|'    
382 |       AND
383 |       HEADER = TRUE
384 |    ;
385 | 
386 | COPY belongs_to2 (
387 |    movie_id,
388 |    genre_id
389 |    )
390 |    FROM '20 KV data as pipe/32 belongs_to.pipe'
391 |    WITH
392 |       DELIMITER = '|'    
393 |       AND
394 |       HEADER = TRUE
395 |    ;
396 | 
397 | 
398 | 
399 |    
400 | 
401 | 


--------------------------------------------------------------------------------
/2019/DDN_2019_32_Python68Client.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_32_Python68Client.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_32_Python68Client.py:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | #
  4 | #  Imports
  5 | #
  6 | 
  7 | from dse.cluster import Cluster, GraphExecutionProfile
  8 | from dse.cluster import EXEC_PROFILE_GRAPH_DEFAULT
  9 | from dse.cluster import EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT
 10 |    #
 11 | from dse.graph import GraphOptions, GraphProtocol, graph_graphson3_row_factory
 12 | 
 13 | 
 14 | ############################################################
 15 | ############################################################
 16 | 
 17 | 
 18 | #
 19 | #  Assuming a DSE at localhost with DSE Search and Graph
 20 | #  are turned on
 21 | #
 22 | l_graphName = 'ks_32'
 23 | 
 24 | 
 25 | l_execProfile1 = GraphExecutionProfile(                  # OLTP
 26 |    graph_options=GraphOptions(graph_name=l_graphName,
 27 |       graph_protocol=GraphProtocol.GRAPHSON_3_0),
 28 |       row_factory=graph_graphson3_row_factory
 29 |    )
 30 | l_execProfile2 = GraphExecutionProfile(                  # OLAP
 31 |    graph_options=GraphOptions(graph_name=l_graphName,
 32 |       graph_source='a',
 33 |       graph_protocol=GraphProtocol.GRAPHSON_3_0),
 34 |       row_factory=graph_graphson3_row_factory
 35 |    )
 36 | 
 37 | l_cluster = Cluster(execution_profiles = {
 38 |    EXEC_PROFILE_GRAPH_DEFAULT:           l_execProfile1,
 39 |    EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT: l_execProfile2},
 40 |    contact_points=['127.0.0.1']
 41 |    )
 42 | m_session = l_cluster.connect()
 43 | 
 44 | 
 45 | ############################################################
 46 | ############################################################
 47 | 
 48 | 
 49 | #
 50 | #  Create the DSE server side objects; keyspace, table,
 51 | #  indexes, ..
 52 | #
 53 | 
 54 | def init_db1():
 55 |    global m_session
 56 | 
 57 | 
 58 |    print ""
 59 |    print ""
 60 | 
 61 |    l_stmt =                                                    \
 62 |       "DROP KEYSPACE IF EXISTS ks_32;                      "
 63 |          #
 64 |    l_rows = m_session.execute(l_stmt)
 65 |       #
 66 |    print ""
 67 |    l_stmt2 = ' '.join(l_stmt.split())
 68 |    print "Drop Keyspace: " + l_stmt2
 69 | 
 70 | 
 71 |    l_stmt =                                                    \
 72 |       "CREATE KEYSPACE ks_32                               " + \
 73 |       "   WITH replication = {'class': 'SimpleStrategy',   " + \
 74 |       "      'replication_factor': 1}                      " + \
 75 |       "   AND graph_engine = 'Core';                       " + \
 76 |       "                                                    " 
 77 |          #
 78 |    l_rows = m_session.execute(l_stmt)
 79 |       #
 80 |    print ""
 81 |    l_stmt2 = ' '.join(l_stmt.split())
 82 |    print "Create Keyspace: " + l_stmt2
 83 | 
 84 | 
 85 |    l_stmt =                                                    \
 86 |       "CREATE TABLE ks_32.grid_square                      " + \
 87 |       "   (                                                " + \
 88 |       "   square_id            TEXT,                       " + \
 89 |       "   PRIMARY KEY((square_id))                         " + \
 90 |       "   )                                                " + \
 91 |       "WITH VERTEX LABEL grid_square                       " + \
 92 |       ";                                                   "
 93 |          #
 94 |    l_rows = m_session.execute(l_stmt)
 95 |       #
 96 |    print ""
 97 |    l_stmt2 = ' '.join(l_stmt.split())
 98 |    print "Create Vertex: " + l_stmt2
 99 | 
100 | 
101 |    l_stmt =                                                    \
102 |       "CREATE SEARCH INDEX ON ks_32.grid_square            " + \
103 |       "   WITH COLUMNS square_id { docValues : true };     "
104 |          #
105 |    l_rows = m_session.execute(l_stmt)
106 |       #
107 |    print ""
108 |    l_stmt2 = ' '.join(l_stmt.split())
109 |    print "Create Search Index: " + l_stmt2
110 | 
111 | 
112 |    l_stmt =                                                    \
113 |       "CREATE TABLE ks_32.connects_to                      " + \
114 |       "   (                                                " + \
115 |       "   square_id_src        TEXT,                       " + \
116 |       "   square_id_dst        TEXT,                       " + \
117 |       "   PRIMARY KEY((square_id_src), square_id_dst)      " + \
118 |       "   )                                                " + \
119 |       "WITH EDGE LABEL connects_to                         " + \
120 |       "   FROM grid_square(square_id_src)                  " + \
121 |       "   TO   grid_square(square_id_dst);                 " 
122 |          #
123 |    l_rows = m_session.execute(l_stmt)
124 |       #
125 |    print ""
126 |    l_stmt2 = ' '.join(l_stmt.split())
127 |    print "Create Edge: " + l_stmt2
128 | 
129 | 
130 |    l_stmt =                                                    \
131 |       "CREATE MATERIALIZED VIEW ks_32.connects_to_bi       " + \
132 |       "   AS SELECT square_id_src, square_id_dst           " + \
133 |       "   FROM connects_to                                 " + \
134 |       "   WHERE                                            " + \
135 |       "      square_id_src IS NOT NULL                     " + \
136 |       "   AND                                              " + \
137 |       "      square_id_dst IS NOT NULL                     " + \
138 |       "   PRIMARY KEY ((square_id_dst), square_id_src);    "
139 |          #
140 |    l_rows = m_session.execute(l_stmt)
141 |       #
142 |    print ""
143 |    l_stmt2 = ' '.join(l_stmt.split())
144 |    print "Create Bi-direction to Edge: " + l_stmt2
145 | 
146 | 
147 | ############################################################
148 | ############################################################
149 | 
150 | 
151 | #
152 | #  Load the Vertex and Edge with starter data
153 | #
154 | 
155 | def init_db2():
156 |    global m_session
157 |    global m_ins2
158 | 
159 | 
160 |    l_squares = [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
161 | 
162 | 
163 |    #
164 |    #  Insert data into the vertex; ks_32.grid_squares
165 |    #
166 |    l_ins1 = m_session.prepare(
167 |       "INSERT INTO ks_32.grid_square (square_id) VALUES ( ? )"
168 |       )
169 |    for r in l_squares:
170 |       for c in l_squares:
171 |          l_data = "x" + str(r) + "-" + str(c)
172 |          l_rows = m_session.execute(l_ins1, [ l_data ])
173 | 
174 | 
175 |    #
176 |    #  Insert data into the edge; ks_32.connects_to
177 |    #
178 |    for r in l_squares:
179 |       for c in l_squares:
180 |          #
181 |          #  Within 1 row; column current <---> column right
182 |          #
183 |          if (c < 21):
184 |             l_left_ = "x" + str(r) + "-" + str(c    )
185 |             l_right = "x" + str(r) + "-" + str(c + 2)
186 |                #
187 |             l_rows = m_session.execute(m_ins2, [ l_left_, l_right ] )
188 |             l_rows = m_session.execute(m_ins2, [ l_right, l_left_ ] )
189 | 
190 |          #
191 |          # Within 1 column; row current to row below
192 |          #
193 |          if (r < 21):
194 |             l_above = "x" + str(r    ) + "-" + str(c)
195 |             l_below = "x" + str(r + 2) + "-" + str(c)
196 |                #
197 |             l_rows = m_session.execute(m_ins2, [ l_above, l_below ] )
198 |             l_rows = m_session.execute(m_ins2, [ l_below, l_above ] )
199 | 
200 |    #
201 |    #  From the loops above, we end up missing the bottom,
202 |    #  right-most square. Do manually ..
203 |    #
204 |    l_rows = m_session.execute(m_ins2, [ "x19-21", "x21-21" ] )
205 |    l_rows = m_session.execute(m_ins2, [ "x21-21", "x19-21" ] )
206 |    l_rows = m_session.execute(m_ins2, [ "x21-19", "x21-21" ] )
207 |    l_rows = m_session.execute(m_ins2, [ "x21-21", "x21-19" ] )
208 | 
209 | 
210 | ############################################################
211 | ############################################################
212 | 
213 | 
214 | def run_traversals():
215 | 
216 |    l_stmt = "g.V().hasLabel('grid_square')"
217 | 
218 |    #  OLTP 
219 |    for l_elem in m_session.execute_graph(l_stmt, execution_profile=EXEC_PROFILE_GRAPH_DEFAULT ):
220 |       print l_elem
221 | 
222 |    print ""
223 |    print "MMM"
224 |    print ""
225 | 
226 |    #  OLAP
227 |    for l_elem in m_session.execute_graph(l_stmt, execution_profile=EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT ):
228 |       print l_elem
229 | 
230 | 
231 | ############################################################
232 | ############################################################
233 | 
234 | 
235 | #
236 | #  Our program proper
237 | #
238 | 
239 | if __name__=='__main__':
240 |    
241 |    init_db1()
242 | 
243 |    #
244 |    #  Our prepared INSERT(s) and DELETE(s)
245 |    #
246 |    m_ins2 = m_session.prepare(
247 |       "INSERT INTO ks_32.connects_to (square_id_src, " + \
248 |       "   square_id_dst) VALUES ( ?, ? )             "
249 |       )
250 |    m_del1 = m_session.prepare(
251 |       "DELETE FROM ks_32.connects_to WHERE           " + \
252 |       "   square_id_src = ? AND square_id_dst = ?    "
253 |       )
254 | 
255 |    init_db2()
256 |    
257 |    run_traversals()
258 | 
259 | 
260 | 
261 | 


--------------------------------------------------------------------------------
/2019/DDN_2019_33_ShortestPoint.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_33_ShortestPoint.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_33_ShortestPoint.tar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_33_ShortestPoint.tar


--------------------------------------------------------------------------------
/2019/DDN_2019_34_GremlinPrimer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_34_GremlinPrimer.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_35 Desktop.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_35 Desktop.pdf


--------------------------------------------------------------------------------
/2019/DDN_2019_36_cfstats.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_36_cfstats.pdf


--------------------------------------------------------------------------------
/2019/README.md:
--------------------------------------------------------------------------------
  1 | DataStax Developer's Notebook - Monthly Articles 2019
  2 | ===================
  3 | 
  4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
  5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
  6 | 
  7 | 
  8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
  9 | 
 10 | 2019 December - -
 11 | >Customer: My company is getting ready to go into production with our first (Cassandra) application. We’ve 
 12 | >noticed that one of our nodes contains way more data than the other nodes, and is way more utilized than 
 13 | >the other nodes. We’ve found “nodetool cfstats”, along with mention of tombstones, read/write latencies, 
 14 | >and more, and think we have a problem. Can you help ?
 15 | >
 16 | >Daniel: Excellent question ! You’ve got a lot going on above in this problem statement. Net/net, in this 
 17 | >document we will; explain cfstats, overview a (production readiness) examination, and more.
 18 | >
 19 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_36_cfstats.pdf)
 20 | 
 21 | 2019 November - -
 22 | >Customer: I’m a developer and have little time to learn the complexities of setting up and maintaining 
 23 | >servers. I get that I can stand up a DatasStax Enterprise server in 2 minutes or less, but I have at 
 24 | >least 10 of these types of challenges to overcome when getting applications out of the door. Can you help ?
 25 | >
 26 | >Daniel: Excellent question ! Well obviously you want some automation. DataStax Desktop was introduced 
 27 | >this year as a means to simply containerize your DataStax Enterpise (DSE) install. A fat client, DataStax 
 28 | >Desktop runs on Linux, MacOS and Windows, and fronts ends Docker and Kubernetes; really then, standing up 
 29 | >a new, single or set of DSEs is like a 3 button operation.
 30 | >
 31 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_35%20Desktop.pdf)
 32 | 
 33 | 2019 October - -
 34 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 
 35 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 
 36 | >Can you help ?
 37 | >
 38 | >Daniel: Excellent question ! This is the third of three documents in a series answering this question. In 
 39 | >the first document (August/2019), we set up the DataStax Enterprise (DSE) release 6.8 Python client side 
 40 | >library, and worked with the driver for both OLTP and OLAP style queries. In the second document in this 
 41 | >series (September/2019), we delivered a thin client Web user interface that allowed us to interact with 
 42 | >a (grid maze), prompting and then rendering the results to a DSE Graph shortest path query (traversal). 
 43 | >In this third and final document in this series, we backfill all of the DSE Graph (Apache Gremlin) traversal 
 44 | >steps you would need to know to write the shortest path query on your own, without aid.
 45 | >
 46 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_34_GremlinPrimer.pdf)
 47 | >
 48 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_34_GremlinPrimer.txt)
 49 | 
 50 | 
 51 | 2019 September - -
 52 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 
 53 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 
 54 | >Can you help ?
 55 | >
 56 | >Daniel: Excellent question ! This is the second of three documents in a series answering this question. 
 57 | >In the first document (August/2019), we set up the DataStax Enterprise (DSE) release 6.8 Python client 
 58 | >side library, and worked with the driver for both OLTP and OLAP style queries. In this second document, 
 59 | >we deliver a thin client Web user interface that allows us to interact with a (grid maze), prompting 
 60 | >and then rendering the results to a DSE Graph shortest path query (traversal). In the third and final 
 61 | >document in this series (October/2019), we will backfill all of the DSE Graph (Apache Gremlin) traversal 
 62 | >steps you would need to know to write the shortest path query on your own, without aid.
 63 | >
 64 | >All of the source code to the client program, written in Python, is available below as a Linux Tar ball.
 65 | >
 66 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_33_ShortestPoint.pdf)
 67 | >
 68 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_33_ShortestPoint.tar)
 69 | 
 70 | 
 71 | 2019 August - -
 72 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 
 73 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 
 74 | >Can you help ?
 75 | >
 76 | >Daniel: Excellent question ! Previously in this document series we have overviewed the topic of graph 
 77 | >databases (June/2019, updated from January/2019). Also, we have deep dived on the topic of product 
 78 | >recommendation engines using Apache Spark (DSE Analytics) machine learning, and also DSE Graph, 
 79 | >performing a compare/contrast of the analytics each environment offers (July/2019).
 80 | >
 81 | >In this edition of this document, we will address graph analytics, shortest path. While we previously 
 82 | overviewed graph, we’ve never detailed the graph query language titled, Apache Gremlin. Gremlin is a 
 83 | >large topic, way larger and more capable than SQL SELECT. Thus, we will, in this document, begin a 
 84 | >series of at least 3 articles, they being;
 85 | >
 86 | >  • Setup a DSE (Graph) version 6.8, Python client for both OLTP and OLAP. (This document)
 87 | >
 88 | >  • Deliver the shortest path solution using DSE Graph with a Python Web client user interface.
 89 | >
 90 | >  • Deliver a part-1 primer on Apache Gremlin, so that you may better understand the query (Gremlin 
 91 | >traversal) used to calculate shortest path.
 92 | >
 93 | >
 94 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_32_Python68Client.pdf)
 95 | >
 96 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_32_Python68Client.py)
 97 | 
 98 | 
 99 | 2019 July - -
100 | >Customer: I'm confused. I saw a presentation at the 2019 DataStax world conference (Accelerate 2019), 
101 | >detailing how to deliver a product recommendation engine using DSE Graph. I've also seen DSE articles 
102 | >detailing how to deliver a product recommendation engine using DSE Analytics. Can you help ?
103 | >
104 | >Daniel: Excellent question ! As discussed in previous editions of this document; there are 4 primary 
105 | >functional areas within DataStax Enterprise (DSE). DSE Analytics can deliver a ‘content-based’ product 
106 | >recommendation (aka, product-product). DSE Graph can deliver a ‘collaborative-based’ product recommendation 
107 | >engine (aka, user-user). Both DSE Analytics and DSE Graph use DSE Core as their storage engine, and DSE 
108 | >Search as their advanced index engine; a full integration, not just a connector.
109 | >
110 | >In this edition of this document we’ll detail all of the code needed to deliver the above, and include 
111 | >data. We’ll also use this edition of this document to provide a Graph query primer (Gremlin language 
112 | >primer), and answer the nuanced question of; Why Graph ?
113 | >
114 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31a_DSE%2C%20Reco%20Engines.pdf)
115 | >
116 | >[PowerPoint (mesaurably more detailed than above)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31b_DSE%2C%20Reco%20Engines.pptx)
117 | >
118 | >[Video recording of the PPT above](https://www.youtube.com/watch?v=15xUt1sZ48U&feature=youtu.be)
119 | >
120 | >[Just Grocery Data as Tar](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31c_JustGroceryData.tar)
121 | >
122 | >[All Program Code as Txt](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31d_AllCommands.txt)
123 | >
124 | >[DataStax KillrVideo demo DB data as Tar](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31e_KillrVideoDataAsPipe.tar)
125 | >
126 | >[DataStax KillrVideo demo DB DDL as Txt (vers 6.8, fyi)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31f_KillrVideoDDL.cql)
127 | 
128 | 
129 | 2019 June - -
130 | 
131 | >Customer: I saw the January/2019 article where you introduced graph computing using Apache TinkerPop, 
132 | >aka, DataStax Enterprise Graph. Now I see that you’ve released an EAP (early access program, preview) 
133 | >DataStax Enterprise version 6.8 with significant changes to graph, including a new storage model. I 
134 | >figure there’s a bunch of new stuff I need to know. Can you help ?
135 | >
136 | >Daniel: Excellent question ! Yes. In this edition of DataStax Developer’s Notebook (DDN), we’ll update 
137 | >the January/2019 document with new version 6.8 DSE capabilities, and do a bit of compare and contrast. 
138 | >Keep in mind as an EAP, version 6.8 is proposed, early preview. Version 6.8 may change drastically.
139 | >
140 | >In this document, we detail that you no longer need GraphFrames; inserting using standard DataFrames
141 | >saves a bunch of processing steps, and still performs just as well. We detail the new version 6.8 
142 | >storage model, which is also much simpler over version 6.7. (Everything is stored directly in DSE Core 
143 | >tables, and directly support DSE Core CQL queries.)
144 | >
145 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_30_GraphPrimer%2068.pdf)
146 | 
147 | 
148 | 2019 May - -
149 | 
150 | >Customer: As a developer I’ve been using Redis for 6 years, and now my company tells me I have to move 
151 | >all of my work to Apache Kafka. Can you help ?
152 | >
153 | >Daniel: Excellent question ! Management, huh ? We say that because Redis and Kafka are not the same. 
154 | >In fact, Redis seems to have really re-energized in the past 4 years, with many strategic enhancements. 
155 | >Redis has held the number four spot on DB-Engines.com database ranking for some time. Kafka, while 
156 | >used by nearly everyone, seems to place 60% of their workloads serving mainframe off loads; guaranteed 
157 | >message delivery possibly to multiple consumers. (A scale out of subscribe in publish/subscribe.)
158 | >
159 | >In this document, we’ll install and configure a single node (stand alone) Kafka cluster, learn to write 
160 | >and read messages, and install and configure the DataStax Kafka Connector (Kafka Connector). Using the 
161 | >Kafka Connector, you can push Kafka messages into DataStax Enterprise and the DataStax Distribution of 
162 | >Cassandra (DDAC) without writing any program code. Cool.
163 | >
164 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_29_Kafka.pdf)
165 | 
166 | 
167 | 2019 April - -
168 | 
169 | >Customer: I work on my laptop using Python, R, and Jupyter Notebooks, doing analytics for my company. 
170 | >I’ve been digging on the analytics topics built inside DataStrax Enterprise (pre-installed, data co-located 
171 | >with the analytics routines), but don’t see how I can make use of any of this. Can you help ?
172 | >
173 | >Daniel: Excellent question ! Isn’t Python and R the number one data science software pairing on the 
174 | >planet ? (Regardless, it must be in the top two.) DataStax Enterprise ships pre-configured with a 
175 | >Python interpreter, and R. For R, don’t know why, you must install one new external library.
176 | >
177 | >We do have a bias towards Spark/R, since Spark/R seemed to lead in the area of open source parallel c
178 | >apable routines. (Speed, performance, better documentation.)
179 | >
180 | >Jupyter is also an excellent choice, especially if you’re Python focused. We’ve not yet looked at 
181 | >the Anaconda distribution of Jupyter, which seems very promising. We’ll show a Jupyter install, 
182 | >but may ourselves stick with Apache Zeppelin, since Zeppelin seems to come pre-installed/pre-configured 
183 | >for so many more languages and options out of the box. (Less work for us.)
184 | >
185 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_28_Jupyter%2C%20R.pdf)
186 | 
187 | 
188 | 2019 March - -
189 | 
190 | >Customer: My company was using application server tiered security, and now needs to implement 
191 | >database tier level security. Can you help ?
192 | >
193 | >Daniel: Excellent question ! Obviously security is a broad topic; OS level security (the OS 
194 | >hosting DSE), database level security, data in flight, data at rest, and more.
195 | >
196 | >Minimally we’ll overview DSE security, and detail how to implement password protection of same.
197 | >
198 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_27_Security.pdf)
199 | 
200 | 
201 | 2019 February - -
202 | 
203 | >Customer: I need to program an inventory management system, and wish to use the time stamp, time to 
204 | >live, and other features found within DSE. Can you help ?
205 | >
206 | >Daniel: Excellent question ! The design pattern you implement differs when you are selling a distinct 
207 | >inventory (specifically numbered seats to a concert), or you are selling a true-count, number on hand 
208 | >inventory (all items are the same).
209 | >
210 | >Regardless, we will cover all of the relevant topics, and detail how to program same using DSE Core 
211 | >and DSE Analytics (Apache Spark).
212 | >
213 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_26_Inventory.pdf)
214 | 
215 | 
216 | 2019 January - - 
217 | 
218 | >Customer: Graph, graph, graph; what the heck is up with graph- I think (hope ?) there’s something graph 
219 | >databases do that standard relational databases do not, but I can’t articulate what that function or 
220 | >advantage actually is. Can you help ?
221 | >
222 | >Daniel: Excellent question !  Yes, but we’re going to take two editions of this document to do so. Sometimes 
223 | >there are nuances when discussing databases; what really is the difference between a data warehouse, 
224 | >data mart, data lake, other ? Why couldn’t you recreate some or most non-relational database function 
225 | >using a standard relational database ?
226 | >
227 | >In this edition of DataStax Developer’s Notebook (DDN), we provide a graph database primer; create a 
228 | >graph, and load it. In a future edition of this same document, we will actually have the chance to 
229 | >provide examples where you might determine that graph databases have an advantage over relational 
230 | >databases for certain use cases.
231 | >
232 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_25_GraphPrimer.pdf)
233 | >
234 | >[Resource Kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/41%20Simple%20Customer%20Graph.txt), all of the data and programs used in this edition in ASCII text format.
235 | >
236 | 


--------------------------------------------------------------------------------
/2020/DDN_2020_37_Parquet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_37_Parquet.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_38_FileMethods.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_38_FileMethods.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_39_DriverFutures.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_39_DriverFutures.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_40_SSL.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_40_SSL.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_41_GraphQL.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_41_GraphQL.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_42_AstraGeohash.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_42_AstraGeohash_Data.pipe.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash_Data.pipe.gz


--------------------------------------------------------------------------------
/2020/DDN_2020_42_AstraGeohash_Programs.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash_Programs.tar.gz


--------------------------------------------------------------------------------
/2020/DDN_2020_43_AstraApiProgramming.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_43_AstraApiProgramming.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_44_NoSQLBench.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_44_NoSQLBench.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_44_NoSQLBench.yaml:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | #  Run via,
  4 | #
  5 | #     nb  (file_name)
  6 | #     nb  (file_name) driver=cql
  7 | #
  8 | #     nb run workload=(file_name) driver=stdout tags=phase:rampup cycles=10
  9 | #
 10 | #     nb run workload=(file_name) driver=stdout tags=name:query1 cycles=10
 11 | 
 12 | #  Because of the DROP/CREATE KEYSPACE, this file does not run against Astra
 13 | 
 14 | 
 15 | scenarios:
 16 |   default:
 17 |     schema: run driver=stdout tags==phase:schema threads==1 cycles==UNDEF
 18 |     rampup: run driver=stdout tags==phase:rampup threads==auto cycles=10000000
 19 |     main: run driver=stdout tags==phase:main threads==auto cycles=100000
 20 | 
 21 | 
 22 | bindings:
 23 |   colx: Mod(<<uuidCount:100000000>>L); ToHashedUUID() -> java.util.UUID; ToString() -> String
 24 |   col8: FullNames() -> String
 25 | 
 26 | 
 27 | blocks:
 28 | 
 29 |   - tags:
 30 |       phase: schema
 31 |     params:
 32 |       prepared: false
 33 |     statements:
 34 | 
 35 |       - drop_keyspace: |
 36 |           DROP KEYSPACE IF EXISTS <<keyspace:ks_44>>;
 37 |         tags: 
 38 |            name: drop_keyspace
 39 | 
 40 |       - create_keyspace: |
 41 |           CREATE KEYSPACE <<keyspace:ks_44>>
 42 |              WITH replication = {'class': 'SimpleStrategy', 
 43 |                 'replication_factor': '1'};
 44 |         tags: 
 45 |            name: create_keyspace
 46 | 
 47 |       - create_table: |
 48 |           CREATE TABLE <<keyspace:ks_44>>.<<table:t1>>
 49 |              (
 50 |              col1    TEXT PRIMARY KEY,
 51 |              col2    TEXT,
 52 |              col3    TEXT,
 53 |              col4    TEXT,
 54 |              col5    TEXT,
 55 |              col6    TEXT,
 56 |              col7    TEXT,
 57 |              col8    TEXT,
 58 |              col9    TEXT,
 59 |              col0    TEXT
 60 |              );
 61 |         tags: 
 62 |            name: create_table
 63 | 
 64 |       - create_index: |
 65 |           CREATE CUSTOM INDEX col4_idx
 66 |              ON <<keyspace:ks_44>>.<<table:t1>> (col4) USING 'StorageAttachedIndex';
 67 |         tags: 
 68 |            name: create_index
 69 | 
 70 | 
 71 |   - tags:
 72 |       phase: rampup
 73 |     params:
 74 |       prepared: true
 75 |     statements:
 76 | 
 77 |       - insert: |
 78 |           INSERT INTO <<keyspace:ks_44>>.<<table:t1>> (col1, col4, col8)
 79 |              VALUES ( {colx}, {colx}, {col8} );
 80 |         tags: 
 81 |            name: insert
 82 | 
 83 | 
 84 |   - tags:
 85 |       phase: main
 86 |     params:
 87 |       prepared: true
 88 |     statements:
 89 | 
 90 |       - query1: |
 91 |           SELECT * FROM <<keyspace:ks_44>>.<<table:t1>> WHERE col1 = {colx} ;
 92 |         tags: 
 93 |            name: query1
 94 | 
 95 |       - query1: |
 96 |           SELECT * FROM <<keyspace:ks_44>>.<<table:t1>> WHERE col4 = {colx} ;
 97 |         tags: 
 98 |            name: query2
 99 | 
100 | 
101 | 
102 | 


--------------------------------------------------------------------------------
/2020/DDN_2020_44_NoSQLBench_Slides.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_44_NoSQLBench_Slides.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_45_KubernetesOperator.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_45_KubernetesOperator.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_45_KubernetesOperator.tar:
--------------------------------------------------------------------------------
  1 | 10-kind-config.yaml                                                                                 0000755 0000765 0000024 00000000300 13716020464 012647  0                                                                                                    ustar                                   dialout                                                                                                                                                                                                                kind: Cluster
  2 | apiVersion: kind.sigs.k8s.io/v1alpha3
  3 | networking:
  4 |   apiServerPort: 45451
  5 | nodes:
  6 | - role: control-plane
  7 | - role: worker
  8 | - role: worker
  9 | - role: worker
 10 | - role: worker
 11 | - role: worker
 12 | 
 13 |                                                                                                                                                                                                                                                                                                                                 11-storageclass-kind.yaml                                                                           0000755 0000765 0000024 00000000373 13716020464 014107  0                                                                                                    ustar                                   dialout                                                                                                                                                                                                                apiVersion: storage.k8s.io/v1
 14 | kind: StorageClass
 15 | metadata:
 16 |   annotations:
 17 |     storageclass.kubernetes.io/is-default-class: "true"
 18 |   name: server-storage
 19 | provisioner: rancher.io/local-path
 20 | reclaimPolicy: Delete
 21 | volumeBindingMode: WaitForFirstConsumer
 22 | 
 23 |                                                                                                                                                                                                                                                                      12-install-cass-operator-v1.1yaml                                                                   0000755 0000765 0000024 00000050062 13716020463 015325  0                                                                                                    ustar                                   dialout                                                                                                                                                                                                                ---
 24 | apiVersion: v1
 25 | kind: Namespace
 26 | metadata:
 27 |   name: cass-operator
 28 | ---
 29 | apiVersion: v1
 30 | kind: ServiceAccount
 31 | metadata:
 32 |   name: cass-operator
 33 |   namespace: cass-operator
 34 | ---
 35 | apiVersion: v1
 36 | data:
 37 |   tls.crt: ""
 38 |   tls.key: ""
 39 | kind: Secret
 40 | metadata:
 41 |   name: cass-operator-webhook-config
 42 |   namespace: cass-operator
 43 | ---
 44 | apiVersion: apiextensions.k8s.io/v1beta1
 45 | kind: CustomResourceDefinition
 46 | metadata:
 47 |   name: cassandradatacenters.cassandra.datastax.com
 48 | spec:
 49 |   group: cassandra.datastax.com
 50 |   names:
 51 |     kind: CassandraDatacenter
 52 |     listKind: CassandraDatacenterList
 53 |     plural: cassandradatacenters
 54 |     shortNames:
 55 |     - cassdc
 56 |     - cassdcs
 57 |     singular: cassandradatacenter
 58 |   scope: Namespaced
 59 |   subresources:
 60 |     status: {}
 61 |   validation:
 62 |     openAPIV3Schema:
 63 |       description: CassandraDatacenter is the Schema for the cassandradatacenters
 64 |         API
 65 |       properties:
 66 |         apiVersion:
 67 |           description: 'APIVersion defines the versioned schema of this representation
 68 |             of an object. Servers should convert recognized schemas to the latest
 69 |             internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources'
 70 |           type: string
 71 |         kind:
 72 |           description: 'Kind is a string value representing the REST resource this
 73 |             object represents. Servers may infer this from the endpoint the client
 74 |             submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds'
 75 |           type: string
 76 |         metadata:
 77 |           type: object
 78 |         spec:
 79 |           description: CassandraDatacenterSpec defines the desired state of a CassandraDatacenter
 80 |           properties:
 81 |             allowMultipleNodesPerWorker:
 82 |               description: Turning this option on allows multiple server pods to be
 83 |                 created on a k8s worker node. By default the operator creates just
 84 |                 one server pod per k8s worker node using k8s podAntiAffinity and requiredDuringSchedulingIgnoredDuringExecution.
 85 |               type: boolean
 86 |             canaryUpgrade:
 87 |               description: Indicates that configuration and container image changes
 88 |                 should only be pushed to the first rack of the datacenter
 89 |               type: boolean
 90 |             clusterName:
 91 |               description: The name by which CQL clients and instances will know the
 92 |                 cluster. If the same cluster name is shared by multiple Datacenters
 93 |                 in the same Kubernetes namespace, they will join together in a multi-datacenter
 94 |                 cluster.
 95 |               minLength: 2
 96 |               type: string
 97 |             configBuilderImage:
 98 |               description: Container image for the config builder init container.
 99 |               type: string
100 |             managementApiAuth:
101 |               description: Config for the Management API certificates
102 |               properties:
103 |                 insecure:
104 |                   type: object
105 |                 manual:
106 |                   properties:
107 |                     clientSecretName:
108 |                       type: string
109 |                     serverSecretName:
110 |                       type: string
111 |                     skipSecretValidation:
112 |                       type: boolean
113 |                   required:
114 |                   - clientSecretName
115 |                   - serverSecretName
116 |                   type: object
117 |               type: object
118 |             nodeSelector:
119 |               additionalProperties:
120 |                 type: string
121 |               description: 'A map of label keys and values to restrict Cassandra node
122 |                 scheduling to k8s workers with matchiing labels. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector'
123 |               type: object
124 |             racks:
125 |               description: A list of the named racks in the datacenter, representing
126 |                 independent failure domains. The number of racks should match the
127 |                 replication factor in the keyspaces you plan to create, and the number
128 |                 of racks cannot easily be changed once a datacenter is deployed.
129 |               items:
130 |                 description: Rack ...
131 |                 properties:
132 |                   name:
133 |                     description: The rack name
134 |                     minLength: 2
135 |                     type: string
136 |                   zone:
137 |                     description: Zone name to pin the rack, using node affinity
138 |                     type: string
139 |                 required:
140 |                 - name
141 |                 type: object
142 |               type: array
143 |             replaceNodes:
144 |               description: A list of pod names that need to be replaced.
145 |               items:
146 |                 type: string
147 |               type: array
148 |             resources:
149 |               description: Kubernetes resource requests and limits, per pod
150 |               properties:
151 |                 limits:
152 |                   additionalProperties:
153 |                     type: string
154 |                   description: 'Limits describes the maximum amount of compute resources
155 |                     allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
156 |                   type: object
157 |                 requests:
158 |                   additionalProperties:
159 |                     type: string
160 |                   description: 'Requests describes the minimum amount of compute resources
161 |                     required. If Requests is omitted for a container, it defaults
162 |                     to Limits if that is explicitly specified, otherwise to an implementation-defined
163 |                     value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
164 |                   type: object
165 |               type: object
166 |             rollingRestartRequested:
167 |               description: Whether to do a rolling restart at the next opportunity.
168 |                 The operator will set this back to false once the restart is in progress.
169 |               type: boolean
170 |             serverImage:
171 |               description: 'Cassandra server image name. More info: https://kubernetes.io/docs/concepts/containers/images'
172 |               type: string
173 |             serverType:
174 |               description: 'Server type: "cassandra" or "dse"'
175 |               enum:
176 |               - cassandra
177 |               - dse
178 |               type: string
179 |             serverVersion:
180 |               description: Version string for config builder, used to generate Cassandra
181 |                 server configuration
182 |               enum:
183 |               - 6.8.0
184 |               - 3.11.6
185 |               - 4.0.0
186 |               type: string
187 |             serviceAccount:
188 |               description: The k8s service account to use for the server pods
189 |               type: string
190 |             size:
191 |               description: Desired number of Cassandra server nodes
192 |               format: int32
193 |               minimum: 1
194 |               type: integer
195 |             stopped:
196 |               description: A stopped CassandraDatacenter will have no running server
197 |                 pods, like using "stop" with traditional System V init scripts. Other
198 |                 Kubernetes resources will be left intact, and volumes will re-attach
199 |                 when the CassandraDatacenter workload is resumed.
200 |               type: boolean
201 |             storageConfig:
202 |               description: Describes the persistent storage request of each server
203 |                 node
204 |               properties:
205 |                 cassandraDataVolumeClaimSpec:
206 |                   description: PersistentVolumeClaimSpec describes the common attributes
207 |                     of storage devices and allows a Source for provider-specific attributes
208 |                   properties:
209 |                     accessModes:
210 |                       description: 'AccessModes contains the desired access modes
211 |                         the volume should have. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#access-modes-1'
212 |                       items:
213 |                         type: string
214 |                       type: array
215 |                     dataSource:
216 |                       description: This field requires the VolumeSnapshotDataSource
217 |                         alpha feature gate to be enabled and currently VolumeSnapshot
218 |                         is the only supported data source. If the provisioner can
219 |                         support VolumeSnapshot data source, it will create a new volume
220 |                         and data will be restored to the volume at the same time.
221 |                         If the provisioner does not support VolumeSnapshot data source,
222 |                         volume will not be created and the failure will be reported
223 |                         as an event. In the future, we plan to support more data source
224 |                         types and the behavior of the provisioner may change.
225 |                       properties:
226 |                         apiGroup:
227 |                           description: APIGroup is the group for the resource being
228 |                             referenced. If APIGroup is not specified, the specified
229 |                             Kind must be in the core API group. For any other third-party
230 |                             types, APIGroup is required.
231 |                           type: string
232 |                         kind:
233 |                           description: Kind is the type of resource being referenced
234 |                           type: string
235 |                         name:
236 |                           description: Name is the name of resource being referenced
237 |                           type: string
238 |                       required:
239 |                       - kind
240 |                       - name
241 |                       type: object
242 |                     resources:
243 |                       description: 'Resources represents the minimum resources the
244 |                         volume should have. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#resources'
245 |                       properties:
246 |                         limits:
247 |                           additionalProperties:
248 |                             type: string
249 |                           description: 'Limits describes the maximum amount of compute
250 |                             resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
251 |                           type: object
252 |                         requests:
253 |                           additionalProperties:
254 |                             type: string
255 |                           description: 'Requests describes the minimum amount of compute
256 |                             resources required. If Requests is omitted for a container,
257 |                             it defaults to Limits if that is explicitly specified,
258 |                             otherwise to an implementation-defined value. More info:
259 |                             https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/'
260 |                           type: object
261 |                       type: object
262 |                     selector:
263 |                       description: A label query over volumes to consider for binding.
264 |                       properties:
265 |                         matchExpressions:
266 |                           description: matchExpressions is a list of label selector
267 |                             requirements. The requirements are ANDed.
268 |                           items:
269 |                             description: A label selector requirement is a selector
270 |                               that contains values, a key, and an operator that relates
271 |                               the key and values.
272 |                             properties:
273 |                               key:
274 |                                 description: key is the label key that the selector
275 |                                   applies to.
276 |                                 type: string
277 |                               operator:
278 |                                 description: operator represents a key's relationship
279 |                                   to a set of values. Valid operators are In, NotIn,
280 |                                   Exists and DoesNotExist.
281 |                                 type: string
282 |                               values:
283 |                                 description: values is an array of string values.
284 |                                   If the operator is In or NotIn, the values array
285 |                                   must be non-empty. If the operator is Exists or
286 |                                   DoesNotExist, the values array must be empty. This
287 |                                   array is replaced during a strategic merge patch.
288 |                                 items:
289 |                                   type: string
290 |                                 type: array
291 |                             required:
292 |                             - key
293 |                             - operator
294 |                             type: object
295 |                           type: array
296 |                         matchLabels:
297 |                           additionalProperties:
298 |                             type: string
299 |                           description: matchLabels is a map of {key,value} pairs.
300 |                             A single {key,value} in the matchLabels map is equivalent
301 |                             to an element of matchExpressions, whose key field is
302 |                             "key", the operator is "In", and the values array contains
303 |                             only "value". The requirements are ANDed.
304 |                           type: object
305 |                       type: object
306 |                     storageClassName:
307 |                       description: 'Name of the StorageClass required by the claim.
308 |                         More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#class-1'
309 |                       type: string
310 |                     volumeMode:
311 |                       description: volumeMode defines what type of volume is required
312 |                         by the claim. Value of Filesystem is implied when not included
313 |                         in claim spec. This is a beta feature.
314 |                       type: string
315 |                     volumeName:
316 |                       description: VolumeName is the binding reference to the PersistentVolume
317 |                         backing this claim.
318 |                       type: string
319 |                   type: object
320 |               type: object
321 |             superuserSecretName:
322 |               description: This secret defines the username and password for the Cassandra
323 |                 server superuser. If it is omitted, we will generate a secret instead.
324 |               type: string
325 |           required:
326 |           - clusterName
327 |           - serverType
328 |           - serverVersion
329 |           - size
330 |           - storageConfig
331 |           type: object
332 |         status:
333 |           description: CassandraDatacenterStatus defines the observed state of CassandraDatacenter
334 |           properties:
335 |             cassandraOperatorProgress:
336 |               description: Last known progress state of the Cassandra Operator
337 |               type: string
338 |             lastRollingRestart:
339 |               format: date-time
340 |               type: string
341 |             lastServerNodeStarted:
342 |               description: The timestamp when the operator last started a Server node
343 |                 with the management API
344 |               format: date-time
345 |               type: string
346 |             nodeReplacements:
347 |               items:
348 |                 type: string
349 |               type: array
350 |             nodeStatuses:
351 |               additionalProperties:
352 |                 properties:
353 |                   hostID:
354 |                     type: string
355 |                   nodeIP:
356 |                     type: string
357 |                 type: object
358 |               type: object
359 |             superUserUpserted:
360 |               description: The timestamp at which CQL superuser credentials were last
361 |                 upserted to the management API
362 |               format: date-time
363 |               type: string
364 |           type: object
365 |       type: object
366 |       x-kubernetes-preserve-unknown-fields: true
367 |   version: v1beta1
368 |   versions:
369 |   - name: v1beta1
370 |     served: true
371 |     storage: true
372 | ---
373 | apiVersion: rbac.authorization.k8s.io/v1
374 | kind: ClusterRole
375 | metadata:
376 |   creationTimestamp: null
377 |   name: cass-operator-cluster-role
378 | rules:
379 | - apiGroups:
380 |   - admissionregistration.k8s.io
381 |   resourceNames:
382 |   - cassandradatacenter-webhook-registration
383 |   resources:
384 |   - validatingwebhookconfigurations
385 |   verbs:
386 |   - create
387 |   - get
388 |   - update
389 | ---
390 | apiVersion: rbac.authorization.k8s.io/v1
391 | kind: ClusterRoleBinding
392 | metadata:
393 |   name: cass-operator
394 | roleRef:
395 |   apiGroup: rbac.authorization.k8s.io
396 |   kind: ClusterRole
397 |   name: cass-operator-cluster-role
398 | subjects:
399 | - kind: ServiceAccount
400 |   name: cass-operator
401 |   namespace: cass-operator
402 | ---
403 | apiVersion: rbac.authorization.k8s.io/v1
404 | kind: Role
405 | metadata:
406 |   name: cass-operator
407 |   namespace: cass-operator
408 | rules:
409 | - apiGroups:
410 |   - ""
411 |   resources:
412 |   - pods
413 |   - services
414 |   - endpoints
415 |   - persistentvolumeclaims
416 |   - events
417 |   - configmaps
418 |   - secrets
419 |   verbs:
420 |   - '*'
421 | - apiGroups:
422 |   - ""
423 |   resources:
424 |   - namespaces
425 |   verbs:
426 |   - get
427 | - apiGroups:
428 |   - apps
429 |   resources:
430 |   - deployments
431 |   - daemonsets
432 |   - replicasets
433 |   - statefulsets
434 |   verbs:
435 |   - '*'
436 | - apiGroups:
437 |   - monitoring.coreos.com
438 |   resources:
439 |   - servicemonitors
440 |   verbs:
441 |   - get
442 |   - create
443 | - apiGroups:
444 |   - apps
445 |   resourceNames:
446 |   - cass-operator
447 |   resources:
448 |   - deployments/finalizers
449 |   verbs:
450 |   - update
451 | - apiGroups:
452 |   - datastax.com
453 |   resources:
454 |   - '*'
455 |   verbs:
456 |   - '*'
457 | - apiGroups:
458 |   - policy
459 |   resources:
460 |   - poddisruptionbudgets
461 |   verbs:
462 |   - '*'
463 | - apiGroups:
464 |   - cassandra.datastax.com
465 |   resources:
466 |   - '*'
467 |   verbs:
468 |   - '*'
469 | ---
470 | apiVersion: rbac.authorization.k8s.io/v1
471 | kind: RoleBinding
472 | metadata:
473 |   name: cass-operator
474 |   namespace: cass-operator
475 | roleRef:
476 |   apiGroup: rbac.authorization.k8s.io
477 |   kind: Role
478 |   name: cass-operator
479 | subjects:
480 | - kind: ServiceAccount
481 |   name: cass-operator
482 | ---
483 | apiVersion: v1
484 | kind: Service
485 | metadata:
486 |   labels:
487 |     name: cass-operator-webhook
488 |   name: cassandradatacenter-webhook-service
489 |   namespace: cass-operator
490 | spec:
491 |   ports:
492 |   - port: 443
493 |     targetPort: 443
494 |   selector:
495 |     name: cass-operator
496 | ---
497 | apiVersion: apps/v1
498 | kind: Deployment
499 | metadata:
500 |   name: cass-operator
501 |   namespace: cass-operator
502 | spec:
503 |   replicas: 1
504 |   selector:
505 |     matchLabels:
506 |       name: cass-operator
507 |   template:
508 |     metadata:
509 |       labels:
510 |         name: cass-operator
511 |     spec:
512 |       containers:
513 |       - env:
514 |         - name: WATCH_NAMESPACE
515 |           valueFrom:
516 |             fieldRef:
517 |               fieldPath: metadata.namespace
518 |         - name: POD_NAME
519 |           valueFrom:
520 |             fieldRef:
521 |               fieldPath: metadata.name
522 |         - name: OPERATOR_NAME
523 |           value: cass-operator
524 |         - name: SKIP_VALIDATING_WEBHOOK
525 |           value: "FALSE"
526 |         image: datastax/cass-operator:1.1.0
527 |         imagePullPolicy: IfNotPresent
528 |         livenessProbe:
529 |           exec:
530 |             command:
531 |             - pgrep
532 |             - .*operator
533 |           failureThreshold: 3
534 |           initialDelaySeconds: 5
535 |           periodSeconds: 5
536 |           timeoutSeconds: 5
537 |         name: cass-operator
538 |         readinessProbe:
539 |           exec:
540 |             command:
541 |             - stat
542 |             - /tmp/operator-sdk-ready
543 |           failureThreshold: 1
544 |           initialDelaySeconds: 5
545 |           periodSeconds: 5
546 |           timeoutSeconds: 5
547 |         volumeMounts:
548 |         - mountPath: /tmp/k8s-webhook-server/serving-certs
549 |           name: cass-operator-certs-volume
550 |           readOnly: false
551 |       serviceAccountName: cass-operator
552 |       volumes:
553 |       - name: cass-operator-certs-volume
554 |         secret:
555 |           secretName: cass-operator-webhook-config
556 | ---
557 | apiVersion: admissionregistration.k8s.io/v1beta1
558 | kind: ValidatingWebhookConfiguration
559 | metadata:
560 |   name: cassandradatacenter-webhook-registration
561 | webhooks:
562 | - admissionReviewVersions:
563 |   - v1beta1
564 |   clientConfig:
565 |     service:
566 |       name: cassandradatacenter-webhook-service
567 |       namespace: cass-operator
568 |       path: /validate-cassandra-datastax-com-v1beta1-cassandradatacenter
569 |   failurePolicy: Ignore
570 |   matchPolicy: Equivalent
571 |   name: cassandradatacenter-webhook.cassandra.datastax.com
572 |   rules:
573 |   - apiGroups:
574 |     - cassandra.datastax.com
575 |     apiVersions:
576 |     - v1beta1
577 |     operations:
578 |     - CREATE
579 |     - UPDATE
580 |     - DELETE
581 |     resources:
582 |     - cassandradatacenters
583 |     scope: '*'
584 |   sideEffects: None
585 |   timeoutSeconds: 10
586 | 
587 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                               13-cassandra-cluster.yaml                                                                           0000755 0000765 0000024 00000001315 13716023657 014116  0                                                                                                    ustar                                   dialout                                                                                                                                                                                                                apiVersion: cassandra.datastax.com/v1beta1
588 | kind: CassandraDatacenter
589 | metadata:
590 |   name: dc1
591 | spec:
592 |   clusterName: cluster1
593 |   serverType: cassandra
594 |   serverVersion: "4.0.0"
595 |   managementApiAuth:
596 |     insecure: {}
597 |   size: 2
598 |   storageConfig:
599 |     cassandraDataVolumeClaimSpec:
600 |       storageClassName: server-storage
601 |       accessModes:
602 |         - ReadWriteOnce
603 |       resources:
604 |         requests:
605 |           storage: 5Gi
606 |   config:
607 |     cassandra-yaml:
608 |       authenticator: org.apache.cassandra.auth.PasswordAuthenticator
609 |       authorizer: org.apache.cassandra.auth.CassandraAuthorizer
610 |       role_manager: org.apache.cassandra.auth.CassandraRoleManager
611 |     jvm-options:
612 |       initial_heap_size: "800M"
613 |       max_heap_size: "800M"
614 | 
615 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    


--------------------------------------------------------------------------------
/2020/DDN_2020_46_BetterVersOf42.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_46_BetterVersOf42.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_47_VMs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_47_VMs.pdf


--------------------------------------------------------------------------------
/2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf


--------------------------------------------------------------------------------
/2020/README.md:
--------------------------------------------------------------------------------
  1 | DataStax Developer's Notebook - Monthly Articles 2020
  2 | ===================
  3 | 
  4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
  5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
  6 | 
  7 | 
  8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
  9 | 
 10 | December 2020 - -
 11 | >Customer: Enjoyed the past article on Apache Cassandra and virtualization (VMs). I didn’t see you detail how to recover from a failed 
 12 | >node (VM) though. Can you help ?
 13 | >
 14 | >Daniel: Excellent question ! Good catch; an oversight on our part. Is this edition of DDN we detail how to implement and test node 
 15 | >recovery from failure, when using virtual machines.
 16 | >
 17 | >While this is generally an automatic function of Cassandra, you can, when using network attached storage, perform some manual steps
 18 | >to recover nodes much faster. Also, we use these techniques to support development and quality-assurance; use the same steps for 
 19 | >'cluster cloning' and 'differentiated data', both topics to overvoew in this paper.
 20 | >
 21 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf)
 22 | 
 23 | November 2020 - -
 24 | >Customer: My company is finally going cloud. We wish to run performance tests and more for both virtual machine hosting, and then also 
 25 | >containers, Kubernetes. We want to see performance implications and also make ready to update our run-book. Can you help ?
 26 | >
 27 | >Daniel: Excellent question ! We’ve expertly done each; virtualization, and containers. We’ll begin a series of articles, in response to 
 28 | >this/your question. First, here, we’ll detail virtualization. We’ll share a number of techniques we use when automating tests and similar 
 29 | >when using virtual machines. All of this work will be done on GCP. After this article, on virtualization, we’ll move to containers; a 
 30 | >series of articles with first an overview, and then a number of recipes when on Kubernetes.
 31 | >
 32 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_47_VMs.pdf)
 33 | 
 34 | October 2020 - -
 35 | >Customer: I love the GraphQL, Python/Flask, OpenStreetView, geo-spatial discussion this series has had of late. I’m having trouble 
 36 | >putting it all together. Any chance you can put it all in one deliverable. Can you help ?
 37 | >
 38 | >Daniel: Excellent question ! In this article, we assemble all of the pieces we’ve recently discussed, putting them all in one 
 39 | >coordinated deliverable. We’ll detail the data format, start up scripts, the program proper, and even any HTML related to 
 40 | >OpenStreetView. (Eg., not Google Maps.)
 41 | >
 42 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_46_BetterVersOf42.pdf)
 43 | 
 44 | September 2020 - -
 45 | >Customer: My company is all in on micro-services, container and cloud for application development, server hosting including databases, 
 46 | >you name it. We’ve never hosted Cassandra inside containers, and wonder how best to get started. Can you help ?
 47 | >
 48 | >Daniel: Excellent question ! DataStax recently produced and open sourced its Kubernetes Operator, which will get you all that you need. 
 49 | >This operator supports open source Cassandra, DataStax Enterprise, and more.
 50 | >
 51 | >In the real world, expectedly, you’d use this operator to stand up pods hosting Cassandra on GKE or similar. For better learning and 
 52 | >debugging, this article will actually do this work on our laptop; greater control, greater control to break things for test, other.
 53 | >
 54 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_45_KubernetesOperator.pdf)
 55 | >
 56 | >[Download YAML files for labs here (TarBall format)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_45_KubernetesOperator.tar)
 57 | 
 58 | August 2020 - -
 59 | >Customer: My company has diffficulty moving applications into production related to data at scale. E.g., we program then unit 
 60 | >and system test with 5-15 rows of data, then when we get into production with millions of lines of data, things fail. There 
 61 | >has to be an easier way to overcome this challenge. Can you help ?
 62 | >
 63 | >Daniel: Excellent question ! With all of the pressures we face today just to get applications written, unit testing often suffers, 
 64 | >system testing suffers worse, and then testing applications at scale often never happens. Fortunately, we have an easy solution.
 65 | >
 66 | >For the past 10 years inside DataStax, we’ve perfected NoSQLBench, our now open source distributed data platform, volume data generation 
 67 | >and testing tool. In this article we will overview NoSQLBench, enabling you to see if NoSQLBench can meet your needs too.
 68 | >
 69 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench.pdf)
 70 | >
 71 | >[PowerPoint(Added detail to the above)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench_Slides.pdf)
 72 | >
 73 | >[The final YAML file/solution used in this article](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench.yaml)
 74 | 
 75 | July 2020 - -
 76 | >Customer: Okay, so my company is finally ready to “database as a service (DBaaS)". We also want to move to a micro-services 
 77 | >architecture, and possibly GraphQL. Can you help ?
 78 | >
 79 | >Daniel: Excellent question ! In this series we’ve previously covered GraphQL, and previously covered geo-spatial queries 
 80 | >using the DataStax Cassandra as a service titled, Astra, which also acted as our primer on Astra.
 81 | >
 82 | >In this article, we specifically cover Astra API programming.
 83 | >
 84 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_43_AstraApiProgramming.pdf)
 85 | >
 86 | >[Application program data](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_43_NoteBook.tar) in the form
 87 | >of a DataStax Studio Notebook, in standard TAR file form.
 88 | 
 89 | June 2020 - -
 90 | >Customer: My company is investigating using DataStax database as a service, titled DataStax Astra (Astra), to aid 
 91 | >in our application development. I know Astra is exactly equal to Apache Cassandra, which means that the DataStax 
 92 | >Enterprise DSE Search component is not present. 
 93 | >
 94 | >As such, we lose Solr/Lucene, and any geo-spatial index and query processing support. But, our application needs 
 95 | >geospatial query support. Can you help ?
 96 | >
 97 | >Daniel: Excellent question ! You will be surprised how easy this is to address. In this article we detail how you 
 98 | >deliver geospatial queries using DataStax Astra, or just the DataStax Enterprise (DSE) Core functional component, 
 99 | >(and not use DSE Search).
100 | >
101 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash.pdf)
102 | >
103 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash_Programs.tar.gz)
104 | >
105 | >[Application program data](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash_Data.pipe.gz)
106 | >
107 | >(Because of GitHub file size limits, the above data file contains only 250,000 of the promised 334,000 lines of data. Sorry.)
108 | >
109 | >[Demonstration video](https://www.youtube.com/watch?v=RVso51X0A08)
110 | 
111 | May 2020 - -
112 | >Customer: My company has got to improve its efficiency and time to delivery when creating business applications on 
113 | >Apache Cassandra and DataStax Enterprise. Can you help ?
114 | >
115 | >Daniel: Excellent question ! Since you specfically mentioned application development, we will give focus to API 
116 | >endpoint programming; a means to more greatly decouple your application from the database, allowing for greater 
117 | >flexibility in deployment, and even increasing performance of Web and mobile applications.
118 | >
119 | >While we might briefly mention REST and gRPC, the bulk of this document will center on GraphQL.
120 | >
121 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_41_GraphQL.pdf)
122 | 
123 | April 2020 - -
124 | >Customer: My company has started using more cloud instances for tasks like proof of concepts, and related. 
125 | >We used to just leave these boxes wide open, since they generally contain zero sensitive data. But, things 
126 | >being what they are, we feel like we should start securing these boxes. Can you help ?
127 | >
128 | >Daniel: Excellent question ! In the March/2019 edition of this document, we detailed how to implement 
129 | >native authentication using DataStax Enterprise (DSE). In this edition, we detail how to implement SSL 
130 | >between DSE server nodes (in the event you go multi-cloud), and then also SSL from client (node) to DSE 
131 | >cluster.
132 | >
133 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_40_SSL.pdf)
134 | 
135 | March 2020 - -
136 | >Customer: As a database application developer, I’ve never previously used a system with a natively asynchronous 
137 | >client side driver. What do I need to know; Can you help ?
138 | >
139 | >Daniel: Excellent question ! Yes, the DataStax Enterprise (DSE) client side drivers offer entirely native 
140 | >asynchronous operation.; fire and forget, or fire and listen. There are easy means to make the driver and 
141 | >any calls you issue block, and behave synchronously, but there’s little fun in that.
142 | >
143 | >The on line documentation covers the asynchronous query topic well, so we’ll review that and then extend 
144 | >into asynchronous write programming.
145 | >
146 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_39_DriverFutures.pdf)
147 | 
148 | February 2020 - -
149 | 
150 | >Customer: I’ve read all of the articles and documentation related to DataStax Enterprise (DSE) Graph, but am 
151 | >still not certain how these graph queries (traversals) actually execute. To me, this looks much like a SQL 
152 | >query processing engine, and I don’t know how or if to index or model this. Can you help ?
153 | >
154 | >Daniel: Excellent question ! In this document we’ll give a brief treatment to graph query processing; how 
155 | >graph traversals are actually (executed). For fun, we’ll actually talk a little bit about a close (graph) 
156 | >neighbor, Neo4J.
157 | >
158 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_38_FileMethods.pdf)
159 | 
160 | January 2020 - -
161 | 
162 | >Customer: My company maintains a lot of data on Hadoop, in Parquet and other formats, and need to perform integrated 
163 | >reporting with data resident inside DataStax. Can you help ?
164 | >
165 | >Daniel: Excellent question ! Yes. This is like a two-liner solution. We’ll detail all of the concepts and code inside 
166 | >this document.
167 | >
168 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_37_Parquet.pdf)
169 | 


--------------------------------------------------------------------------------
/2021/61_DemoProgram.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/61_DemoProgram.tar.gz


--------------------------------------------------------------------------------
/2021/DDN_2021_49_KubernetesPrimer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_49_KubernetesPrimer.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_50_KubernetesNodeRecovery.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_50_KubernetesNodeRecovery.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_51_KubernetesClusterCloning.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_51_KubernetesClusterCloning.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_52_KubernetesSnapshots.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_52_KubernetesSnapshots.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_53_MoreContainersHelm.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_53_MoreContainersHelm.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_53_ToolkitVersion2.tar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_53_ToolkitVersion2.tar


--------------------------------------------------------------------------------
/2021/DDN_2021_54_AstraSvcBroker.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_54_AstraSvcBroker.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_55_K8ssandra.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_55_K8ssandra.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_56_K8ssandra, Document API.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_56_K8ssandra, Document API.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_57_K8ssandra, GraphQL.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_57_K8ssandra, GraphQL.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_58_KastenVeeam.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_58_KastenVeeam.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_59_DseStargate.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_59_DseStargate.pdf


--------------------------------------------------------------------------------
/2021/DDN_2021_60_SnowFlake.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_60_SnowFlake.pdf


--------------------------------------------------------------------------------
/2021/README.md:
--------------------------------------------------------------------------------
  1 | DataStax Developer's Notebook - Monthly Articles 2021
  2 | ===================
  3 | 
  4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
  5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
  6 | 
  7 | 
  8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
  9 | 
 10 | December 2021 - -
 11 | >Customer: My company uses a ton of Apache Cassandra, and a ton of SnowFlake. We also want to move to writing applications using Node.js/REACT. 
 12 | >We’re having trouble understanding what each of Cassandra and SnowFlake should be used for together, and what a sample application might 
 13 | >look like. Can you help ?
 14 | >
 15 | >Daniel: Excellent question ! We’ll detail a sample application, written in Node.js and REACT, and then deliver an application that uses both 
 16 | >Apache Cassandra and SnowFlake.
 17 | >
 18 | >[View a quick demo of what we're building here](https://youtu.be/uDfVjStGA9o)
 19 | >
 20 | >[Download December whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_60_SnowFlake.pdf)
 21 | >
 22 | >[Download Source Code in Tar Format here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_60_SnowFlake.tar)
 23 | 
 24 | November 2021 - -
 25 | >Customer: My company has been on DataStax Enterprise for some time. We are super excited by the new, open source StarGate (subsystem), 
 26 | >and what that provides. It appears as though StarGate only works with open source Apache Cassandra. Can you help ?
 27 | >
 28 | >Daniel: Excellent question ! We’ll overview what you are seeing, and how to get StarGate to work with DataStax Enterprise. We’ll also 
 29 | >detail a bit of the landscape; what’s moving around, how and why.
 30 | >
 31 | >[Download November whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_59_DseStargate.pdf)
 32 | 
 33 | October 2021 - -
 34 | >Customer: My company is investigating a backup and recovery solution for all of our applications running atop Kubernetes. Can you help ?
 35 | >
 36 | >Daniel: Excellent question ! In the past we’ve referenced the open source Velero project from a VMware acquisition, and we’ve mentioned 
 37 | >NetApp Astra (a SaaS). This month we detail installation and use of Kasten/Veeam K10. With (Kasten) you can backup and restore databases, 
 38 | >applications, and not only backup and restore, but also clone to aid your development efforts.
 39 | >
 40 | >In this article we backup and recover the DataStax Kubernetes Operator for Apache Cassandra, and backup, restore, and clone Cassandra
 41 | >Datacenter objects (entire Cassandra clusters).
 42 | >
 43 | >[Download October whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_58_KastenVeeam.pdf)
 44 | 
 45 | September 2021 - -
 46 | >Customer: My company has enjoyed the last two articles on DataStax K8ssandra, and specifically the StarGate component to same. 
 47 | >We’ve seen details on REST and the Document API, but little on GraphQL. Can you help ?
 48 | >
 49 | >Daniel: Excellent question !  We’ve done a number of articles in this series on GraphQL. Most recently, in October/2020, we 
 50 | >delivered a geo-spatial thin client Web program using GraphQL against the DataStax database as a service. When using Astra, 
 51 | >the database is hosted, managed. Also when using Astra, the service end points are automatically created and maintained, and 
 52 | >are, behind the scenes, using K8ssandra and StarGate.
 53 | >
 54 | >So, in this article, we supply the final and previously missing piece; how to access the GraphQL component of your own hosted 
 55 | >K8ssandra/StarGate.
 56 | >
 57 | >[Download September whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_57_K8ssandra%2C%20GraphQL.pdf)
 58 | 
 59 | August 2021 - -
 60 | >Customer: My company enjoyed the last article on K8ssandra and StarGate. We are highly interested in the Document API that this 
 61 | >document referred to; use and some of the design elements. Can you help ?
 62 | >
 63 | >Daniel: Excellent question ! Last month we installed DataStax K8ssandra, which includes the StarGate component. Further, we did 
 64 | >a (Hello World) using REST, to serve as an install/validate of StarGate. This month we dive deeper not into just REST, but now 
 65 | >the Document API area of StarGate; make a table, insert documents, run a query.
 66 | >
 67 | >[Download August whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_56_K8ssandra%2C%20Document%20API.pdf)
 68 | 
 69 | July 2021 - -
 70 | >Customer: My company is looking for ways to accelerate our application development, through whatever means. Can you help ?
 71 | >
 72 | >Daniel: Excellent question !  One of the areas we can look at are programming APIs/gateways. On some level, there are only 
 73 | >four things you can do with data; insert, update, delete, and select. As such, why aren’t all means to execute these statements 
 74 | >automatically generated, automatically managed and scaled, and more.
 75 | >
 76 | >On February 10, 2021, DataStax released the open source K8ssandra project, which includes these automated functions, and more. 
 77 | >In this article, we detail an introduction to K8ssandra; installation and use. In subsequent articles, we dive deeper into REST, 
 78 | >GraphQL, and Document API configuration and use.
 79 | >
 80 | >[Download July whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_55_K8ssandra.pdf)
 81 | 
 82 | June 2021 - -
 83 | >Customer: My company is investigating using the DataStax Atra Service Broker within our Kubernetes systems. Can you help ?
 84 | >
 85 | >Daniel: Excellent question ! In this document we will install and use most of the early pieces of the DataStax Astra Service broker; install, 
 86 | >install verification, connection, yadda. By accident then, we instroduce Kuberenetes service brokers, broker instances, and (broker instance)
 87 | >bindings.
 88 | >
 89 | >[Download June whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_54_AstraSvcBroker.pdf)
 90 | 
 91 | May 2021 - -
 92 | >Customer: My company enjoyed the series of four articles centered on Cassandra atop Kubernetes. But, you left Cassandra and the Operator limited to 
 93 | >just one namespace. We seek to run Cassandra clusters in many concurrent namespaces. Can you help ?
 94 | >
 95 | >Daniel: Excellent question ! In this edition of this document, we take the work from the previous 4 articles, and move them to a multiple namespace 
 96 | >treatment. We’ll detail using the Cassandra Operator across Kubernetes namespaces, and we’ll detail Cassandra cluster cloning across namespaces. Be 
 97 | >advised, there are relevant limitations with Kuberenetes version 1.18 and lower as it relates to (cloning) across namespaces. (Everything is do-able, 
 98 | >it’s just more steps than you might expect.)
 99 | >
100 | >[Download May whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_53_MoreContainersHelm.pdf)
101 | >
102 | >[Download version 2.0 of the Toolkit for here, in Tar format](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_53_ToolkitVersion2.tar)
103 | >
104 | >[View a quick demo of Cassandra cluster cloning atop Kubernetes here](https://www.youtube.com/watch?v=paly5VVuAYM)
105 | 
106 | January 2021 thru April 2021 - -
107 | >Customer: My company is moving its operations to the cloud, including cloud native computing and Kubernetes. I believe we can run Apache Cassandra 
108 | >on Kubernetes. Can you help ?
109 | >
110 | >Daniel: Excellent question ! Kubernetes, and running Apache Cassandra on Kuberenetes are huge topics. As such, we’ll begin a four part series of articles that 
111 | >cover most of the day one through day seven topics. We wont do a general Kuberentes primer, many other capable Kubernetes primers exist, but will list our 
112 | >favorite resources here and there. 
113 | >
114 | >  • In the January article, we will get Apache Cassandra up and running on Kubernetes. 
115 | >
116 | >[Download January whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_49_KubernetesPrimer.pdf)
117 | >
118 | >  • In February, we detail recovery from failed/down Cassandra nodes.
119 | >
120 | >[Download February whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_50_KubernetesNodeRecovery.pdf)
121 | >
122 | >  • In March, we detail Cassandra cluster cloning, for QA and development. 
123 | >
124 | >[Download March whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_51_KubernetesClusterCloning.pdf)
125 | >
126 | >  • In April, we detail Kubernetes snapshotting in general.
127 | >
128 | >[Download April whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_52_KubernetesSnapshots.pdf)
129 | >
130 | >[Download the Toolkit for all 4 months/articles here, in Tar format](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_KubernetesPrimer_Toolkit.tar)
131 | 
132 | 
133 | 
134 | 


--------------------------------------------------------------------------------
/2022/DDN_2022_61_SchemaValidation.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2022/DDN_2022_61_SchemaValidation.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | DataStax Developer's Notebook - Monthly Articles 2022
 2 | ===================
 3 | 
 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** |
 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------|
 6 | 
 7 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 
 8 | 
 9 | January 2022 - -
10 | >Customer: My company wishes to activate SQL-style data ingegrity check constraints atop Apache Cassandra. Can you help ?
11 | >
12 | >Daniel: Excellent question ! You can do this, and we’ll detail all of the steps involved below.
13 | >
14 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2022/DDN_2022_61_SchemaValidation.pdf)
15 | 


--------------------------------------------------------------------------------