├── 2017 ├── DDN_2017_10_DsePrimer.pdf ├── DDN_2017_11_Apriori.pdf ├── DDN_2017_11_Apriori.tar ├── DDN_2017_12_UDFs.pdf └── README.md ├── 2018 ├── DDN_2018_13_AdvRep.pdf ├── DDN_2018_13_AdvRep.tar ├── DDN_2018_14_CQL-Search.pdf ├── DDN_2018_15_SearchPrimer.pdf ├── DDN_2018_16_Spatial.pdf ├── DDN_2018_16_Spatial.tar.gz ├── DDN_2018_17_Docker.pdf ├── DDN_2018_18_Studio.pdf ├── DDN_2018_19_QueryHarness.pdf ├── DDN_2018_20_Security.pdf ├── DDN_2018_21_Backup and Recovery.pdf ├── DDN_2018_22_ClientSideDriver.pdf ├── DDN_2018_23_IndexingCollections.pdf ├── DDN_2018_24_Zeppelin.pdf └── README.md ├── 2019 ├── 41 Simple Customer Graph.txt ├── DDN_2019_25_GraphPrimer.pdf ├── DDN_2019_26_Inventory.pdf ├── DDN_2019_27_Security.pdf ├── DDN_2019_28_Jupyter, R.pdf ├── DDN_2019_29_Kafka.pdf ├── DDN_2019_30_GraphPrimer 68.pdf ├── DDN_2019_31a_DSE, Reco Engines.pdf ├── DDN_2019_31b_DSE, Reco Engines.pptx ├── DDN_2019_31c_JustGroceryData.tar ├── DDN_2019_31d_AllCommands.txt ├── DDN_2019_31e_KillrVideoDataAsPipe.tar ├── DDN_2019_31f_KillrVideoDDL.cql ├── DDN_2019_32_Python68Client.pdf ├── DDN_2019_32_Python68Client.py ├── DDN_2019_33_ShortestPoint.pdf ├── DDN_2019_33_ShortestPoint.tar ├── DDN_2019_34_GremlinPrimer.pdf ├── DDN_2019_34_GremlinPrimer.txt ├── DDN_2019_35 Desktop.pdf ├── DDN_2019_36_cfstats.pdf └── README.md ├── 2020 ├── DDN_2020_37_Parquet.pdf ├── DDN_2020_38_FileMethods.pdf ├── DDN_2020_39_DriverFutures.pdf ├── DDN_2020_40_SSL.pdf ├── DDN_2020_41_GraphQL.pdf ├── DDN_2020_42_AstraGeohash.pdf ├── DDN_2020_42_AstraGeohash_Data.pipe.gz ├── DDN_2020_42_AstraGeohash_Programs.tar.gz ├── DDN_2020_43_AstraApiProgramming.pdf ├── DDN_2020_43_NoteBook.tar ├── DDN_2020_44_NoSQLBench.pdf ├── DDN_2020_44_NoSQLBench.yaml ├── DDN_2020_44_NoSQLBench_Slides.pdf ├── DDN_2020_45_KubernetesOperator.pdf ├── DDN_2020_45_KubernetesOperator.tar ├── DDN_2020_46_BetterVersOf42.pdf ├── DDN_2020_47_VMs.pdf ├── DDN_2020_48_NodeReplaceWoBootstrap.pdf └── README.md ├── 2021 ├── 61_DemoProgram.tar.gz ├── DDN_2021_49_KubernetesPrimer.pdf ├── DDN_2021_50_KubernetesNodeRecovery.pdf ├── DDN_2021_51_KubernetesClusterCloning.pdf ├── DDN_2021_52_KubernetesSnapshots.pdf ├── DDN_2021_53_MoreContainersHelm.pdf ├── DDN_2021_53_ToolkitVersion2.tar ├── DDN_2021_54_AstraSvcBroker.pdf ├── DDN_2021_55_K8ssandra.pdf ├── DDN_2021_56_K8ssandra, Document API.pdf ├── DDN_2021_57_K8ssandra, GraphQL.pdf ├── DDN_2021_58_KastenVeeam.pdf ├── DDN_2021_59_DseStargate.pdf ├── DDN_2021_60_SnowFlake.pdf ├── DDN_2021_60_SnowFlake.tar ├── DDN_2021_KubernetesPrimer_Toolkit.tar └── README.md ├── 2022 └── DDN_2022_61_SchemaValidation.pdf └── README.md /2017/DDN_2017_10_DsePrimer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_10_DsePrimer.pdf -------------------------------------------------------------------------------- /2017/DDN_2017_11_Apriori.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_11_Apriori.pdf -------------------------------------------------------------------------------- /2017/DDN_2017_11_Apriori.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_11_Apriori.tar -------------------------------------------------------------------------------- /2017/DDN_2017_12_UDFs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2017/DDN_2017_12_UDFs.pdf -------------------------------------------------------------------------------- /2017/README.md: -------------------------------------------------------------------------------- 1 | 2 | DataStax Developer's Notebook - Monthly Articles 2017 3 | =================== 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 9 | 10 | 2017 December - - 11 | 12 | >Customer: As my company is beginning significant development of new applications that target 13 | >the DataStax Enterprise database server (DSE), we are starting to explore some of the 14 | >programability that DSE offers us. Specifically, we are very much interested in user defined 15 | >functions (UDFs). Can you help ? 16 | > 17 | >Daniel: 18 | >Excellent question ! The topic of server side database programming can devolve into one of 19 | >those information technology religious arguments, something we generally seek to avoid. Still, 20 | >DataStax Enterprise (DSE) does offer user defined functions (UDFs), user defined aggregates 21 | >(UDAs), and user defined types (UDTs). 22 | > 23 | >Recall that a core value proposition which DSE delivers is time constant lookups, that is; 24 | >a CQL (Cassandra query language, similar to structured query language, SQL) SELECT, UPDATE 25 | >or DELETE that uses an equality on at least the partition key columns from the primary key. 26 | >These are the linearly scalable operations that DSE is distinguished for. If you seek to do 27 | >aggregates or other set operands on large data sets (online analytics style processing, OLAP), 28 | >DSE will likely need to perform a scatter gather query (a query that reads from multiple 29 | >concurrent nodes). Inherently, these types of queries do not perform in low single digit 30 | >millisecond time; just be advised. 31 | > 32 | >In this edition of this document we will detail all that you ask; user defined functions, 33 | >and also user defined aggregates and user defined types, and we’ll write application code 34 | >for each. 35 | > 36 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_12_UDFs.pdf). 37 | > 38 | 39 | 2017 November - - 40 | 41 | >Customer: 42 | >My company wants to deliver a, "customers who bought (x) also bought (y)" functionality to our 43 | >Web site. Can the DataStax database server help me do this ? 44 | > 45 | >Daniel: 46 | >Excellent question ! In this document we deliver all of the program code, sample data, and 47 | >instructions to deliver a recommendation engine ("customers who bought (x) bought (y) ..") 48 | >using DataStax Enterprise (DSE) and its Analytics Function powered by Apache Spark. 49 | > 50 | >First we code the solution by hand in Python, so you have the ability to fully dissect all 51 | >of the processing logic (master the Apriori algorithm). Then we move to using the supported 52 | >and parallel capable Spark FPGrowth library in both Python and Scala. 53 | > 54 | >Along the way we install Scala, Gradle, and use support functions like the DSE parallel 55 | >filesystem. We supply a 1.7 MB Tar ball with all data and programming. 56 | > 57 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_11_Apriori.pdf). 58 | > 59 | >[Resource Kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_11_Apriori.tar), 60 | >all of the programs and data used in this edition in Tar format. 61 | > 62 | 63 | 2017 October - - 64 | 65 | >Customer: 66 | >My company is investigating using DataStax for our new Customer/360 system in our customer call 67 | >center. I haven’t learned a new database in over 10 years, and should mention that I know none 68 | >of the NoSQL (post relational) databases. Can you help ? 69 | > 70 | >Daniel: 71 | >Excellent question ! We’ve expertly used a good number of the leading NoSQL databases and while 72 | >DataStax may take longer to master than some, DataStax is easily more capable (functionally, and 73 | >scalability wise), than any other systems we have experienced. 74 | > 75 | >DataStax supports operational AND analytics workloads on one integrated platform, offers no single 76 | >point of failure, is proven to scale past 1000 nodes, and is enterprise ready with all of the requisite 77 | >security and administrative (maintenance and self healing) features. 78 | > 79 | >In this document we will: 80 | > 81 | >• Walk through a reasonably complete primer on DataStax Enterprise (DSE) terms, its object hierarchy, history, use, operating conventions, configuration files, and more. 82 | > 83 | >• Build a 2 node DSE cluster from scratch with a NetworkTopologyStrategy. 84 | > 85 | >• Demonstrate network partition failure tolerance. 86 | > 87 | >• Demonstrate strong and eventual consistency. 88 | > 89 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/DDN_2017_10_DsePrimer.pdf). 90 | > 91 | 92 | -------------------------------------------------------------------------------- /2018/DDN_2018_13_AdvRep.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_13_AdvRep.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_13_AdvRep.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_13_AdvRep.tar -------------------------------------------------------------------------------- /2018/DDN_2018_14_CQL-Search.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_14_CQL-Search.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_15_SearchPrimer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_15_SearchPrimer.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_16_Spatial.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_16_Spatial.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_16_Spatial.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_16_Spatial.tar.gz -------------------------------------------------------------------------------- /2018/DDN_2018_17_Docker.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_17_Docker.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_18_Studio.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_18_Studio.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_19_QueryHarness.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_19_QueryHarness.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_20_Security.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_20_Security.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_21_Backup and Recovery.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_21_Backup and Recovery.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_22_ClientSideDriver.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_22_ClientSideDriver.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_23_IndexingCollections.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_23_IndexingCollections.pdf -------------------------------------------------------------------------------- /2018/DDN_2018_24_Zeppelin.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2018/DDN_2018_24_Zeppelin.pdf -------------------------------------------------------------------------------- /2018/README.md: -------------------------------------------------------------------------------- 1 | DataStax Developer's Notebook - Monthly Articles 2018 2 | =================== 3 | 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 9 | 10 | 2018 December - - 11 | 12 | >Customer: My developers want a quick and easy way to prototype Spark Scala, Spark Python, 13 | >and related. I know there are the Spark (Scala) and Python REPLs (read, evaluate, print and 14 | >loop; command prompts) that ship with DSE, but we want something more. Can you help ? 15 | > 16 | >Daniel: Excellent question! There are a number of free/open-source options here. In this document 17 | >we’ll install and use Apache Zeppelin to address this need. 18 | > 19 | >DSE Studio, based on Apache Zeppelin, ships with interpreters for; markdown, Spark/SQL, Gremlin, 20 | >and CQL. Apache Zeppelin ships with 20 or more interpreters, including Spark (Scala), Python, 21 | >Shell, and more. As such, many folks use both tools. 22 | > 23 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_24_Zeppelin.pdf) 24 | > 25 | 26 | 2018 November - - 27 | 28 | >Customer: My application needs to store a dynamic number of latitude/longitude pairs per single 29 | >database row, along with a tag for these values like; home, work, mobile, etcetera. We need to 30 | >perform distance (proximity) queries for any of the latitude/longitude pair values, as well as 31 | >queries on specific tags; just home, just work, other. Can you help ? 32 | > 33 | >Daniel: Excellent question ! All of these application requirements are easily served with DataStax 34 | >Enterprise Server (DSE). While we’ve covered DSE Search queries including spatial/geo-spatial in 35 | >past editions of this document, the specific requirement you have for (a dynamic count of attributes), 36 | >is something we have not covered in this series of documents previously. 37 | > 38 | >Using this same technique, DataStax Enterprise can also serve a polymorphic schema ability, 39 | >similar to MongoDB. 40 | > 41 | >This edition of this document will address this application requirement with no prerequisites, 42 | >although; you might well be served to visit the past editions of this document detailing DSE 43 | >Search and DSE (spatial/geo-spatial Search), to gain a deep understanding. 44 | > 45 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_23_IndexingCollections.pdf) 46 | 47 | 2018 October - - 48 | 49 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 50 | >customer call center. I am tasked with getting a simple (Hello World style) Java program 51 | >running against the DataStax server in a small Linux command line capable Docker container. 52 | >Can you help ? 53 | > 54 | >Daniel: Excellent question ! In the May/2018 edition of this document, we detailed how and 55 | >where to download a DataStax sponsored Docker container which includes the DataStax Enterprise 56 | >(DSE) server; boot and operate DSE. In the October/2017 edition of this document, we detailed 57 | >a DSE introduction, including table create, new data insert and query, and more. 58 | > 59 | >So, the only piece we are missing is the Java client program compile that targets DSE. To aid 60 | >in our compiling, we will document using the Apache Maven build automation tool. Inherently, 61 | >a given (any given) Java library will need other Java libraries, that themselves need more 62 | >Java libraries, rinse and repeat. It is best to automate resolution of this condition. 63 | > 64 | >We will install and configure all of the above, access the DSE server, and go home. 65 | > 66 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_22_ClientSideDriver.pdf) 67 | > 68 | 69 | 2018 September - - 70 | 71 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 72 | >customer call center. I’m a developer, and do not know how to administer DataStax Enterprise, 73 | >but, I need to know how to backup and restore tables for my programming and unit tests. Can 74 | >you help ? 75 | > 76 | >Daniel: Excellent question ! DataStax Enterprise (DSE) cab be backed up and restored using 77 | >DataStax Operations Center (DSE Ops Center), including activities to block stores like Amazon 78 | >S3, other. You can also perform sstabledump(s), and table unloads and loads, including bulk 79 | >unloads and loads. 80 | > 81 | >But, as you seek to perform these activities as part of your unit tests, we are going to detail 82 | >table backup and restore using snapshots; faster, less code, easily automated. 83 | > 84 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_21_Backup%20and%20Recovery.pdf) 85 | > 86 | 87 | 2018 August - - 88 | 89 | >Customer: My company is investigating using DataStax for our new Customer/360 system in our 90 | >customer call center. I’m a developer, and do not know how to administer DataStax Enterprise, 91 | >but, I need to know how to set up user authentication and authorization for my programming 92 | >and unit tests. Can you help ? 93 | > 94 | >Daniel: Excellent question ! Setting up authentication and authorization using DataStax 95 | >Enterprise (DSE) is super easy. Below we detail all of the relevant topics and steps to 96 | >achieve same, including source code for all. We detail table level access control, and in 97 | >the event you need it, row level access control. 98 | > 99 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_20_Security.pdf) 100 | > 101 | 102 | 2018 July - - 103 | 104 | >Customer: I inherited an existing DataStax Enterprise (DSE) server system, and users are 105 | >complaining about performance. I know nothing about DSE, and need to make the users happy. 106 | >Can you help ? 107 | > 108 | >Daniel: Excellent question ! Based on your timeline (how quickly and safely does this problem 109 | >need to be solved), you should probably contact DataStax for assistance. If you were already 110 | >trained/capable on DSE and wanted to solve this problem, this document will cover introductory 111 | >topics related to that goal. 112 | > 113 | >In short, this document discusses building a query harness; capturing and then executing a 114 | >representative set of queries to measure your system performance against, how and why- 115 | > 116 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_19_QueryHarness.pdf) 117 | > 118 | 119 | 2018 June - - 120 | 121 | >Customer: I'm confused by all of the options for loaders, developer's tools and similar. 122 | >Can you offer me an overvew, specifically detailing DSE Studio ? 123 | > 124 | >Daniel: Sure ! We overview all of the above, then detail install, configure and use of 125 | >DSE Studio version 6.0, including configuration and use of CQL and Spark/SQL. 126 | > 127 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_18_Studio.pdf) 128 | > 129 | 130 | 2018 May - - 131 | 132 | >Customer: My company is investigating using the cloud, containers including micro-services, 133 | >automated deployment tools (continuous innovation / continuous deployment), and more. Can you 134 | >help ? 135 | > 136 | >Daniel: Excellent question ! Huge and far ranging topics, obviously; we’ll offer a history 137 | >and primer on many of these pieces, a Cloud-101 if you will. Then, to offer some amount of 138 | >actionable content, we’ll delve a bit deeper into one container option, namely; Docker. 139 | > 140 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_17_Docker.pdf) 141 | > 142 | 143 | 2018 April - - 144 | 145 | >Customer: My company really enjoyed the last two documents in this series on the topic of 146 | >DataStax Enterprise (DSE) DSE Search, however; our application also needs to include geo-spatial 147 | >queries, specifically- 148 | > 149 | >• Find all points within a distance from a given point, including a compound Text Search. 150 | > 151 | >• Find all points within a polygon. 152 | > 153 | >• And while we’re at it; what is the best means to do end user testing of geo-spatial queries- 154 | > 155 | >Can you help ? 156 | > 157 | >Daniel: Excellent question ! In this document we will: 158 | > 159 | >• Review DSE Search, overview the main points from last month’s edition of this document. 160 | > 161 | >• Deliver the two DSE Search geo-spatial queries you detail above. 162 | > 163 | >• Deliver a graphical (custom) application you could use for end user testing. 164 | > 165 | >** This docment states that the accompanying download contains 330,000 sample 166 | >geo-points from the USA states of Colorado and Utah. This amount of data was 167 | >greater in size than GitHub cared for. So, you only get Colorado data at nearly 168 | >220,000 geo-points. 169 | > 170 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_16_Spatial.pdf) 171 | > 172 | >[Resource kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_16_Spatial.tar.gz), all of the data and programs used in this edition in Tar/Gunzip format. About 24 MB compressed. 173 | > 174 | >[Demonstration video](https://youtu.be/kF3IjwVyBtI), of the programs created and used in this document. 175 | > 176 | 177 | 2018 March - - 178 | 179 | >Customer: My company wants to use the secondary indexes, part of DataStax Enterprise (DSE) 180 | >Search, more specfically the (first name) synonym and Soundex style features to aid in 181 | >customer call center record lookup. Can you help ? 182 | > 183 | >Daniel: Excellent question ! DataStax Enterprise (DSE) Search is one of the four primary 184 | >functional areas inside DSE; the others being DSE Core, DSE Analytics, and DSE Graph. 185 | >Built atop Apache Solr, DSE Search is a large topic. As such, we will detail the programming 186 | >(use) of DSE Search, and let this document serve as a primer of sorts. 187 | > 188 | >We plan follow up editions of this document to cover not just programming, but capacity 189 | >planning of DSE Search, and tuning of DSE Search. 190 | > 191 | >**Mar 21 - This document was heavily revised: 30 new pages of content, 2 errors corrected.** 192 | > 193 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_15_SearchPrimer.pdf) 194 | > 195 | 196 | 2018 February - - 197 | 198 | >Customer: My company was invited to participate in the DataStax Enterprise (DSE) 6.0 early 199 | >release program. From discussions with DataStax, we learned there are a number of changes 200 | >related to CQL native processing of DSE Search commands. Can you help us understand what 201 | >this means ? 202 | > 203 | >Daniel: Excellent question ! On February 1, 2018, DataStax began accepting self nominations 204 | >to the DSE release 6.0 Early Access Process (EAP) at the following Url, 205 | > 206 | >https://academy.datastax.com/eap?destination=eap 207 | > 208 | >When your nomination is accepted, you receive early access to the DSE version 6.0 software 209 | >and documentation. In return, you are asked to formally test this release and participate 210 | >in feedback relative to your experiences. The 6.0 release is huge, with many topics far 211 | >larger than CQL native processing of DSE Search commands; this is a very cool, and strategic 212 | >release. 213 | > 214 | >In this document, we detail the DSE Core and DSE Search areas of functionality, their intent, 215 | >how they work pre release 6.0, and are planned to work in the 6.0 release. Further, we detail: 216 | > 217 | >• The four functional areas of DSE, including DSE Core with its network partition fault tolerance and time constant lookups. 218 | > 219 | >• We detail B-Tree+ and hash lookups, and which scale and why. 220 | > 221 | >• We define the DSE primary key, including its partitioning key and clustering key parts. 222 | > 223 | >• We detail what makes a query a DSE Core query versus a DSE Search query. 224 | > 225 | >• We highlight the new CQL native processing of DSE Search commands. 226 | > 227 | >• We overview DSE materialized views, and secondary indexes. 228 | > 229 | >• We detail how to add and drop table columns, and inform DSE Search indexes of same. 230 | > 231 | >• And we overview how to observe asynchronous/background index builds. 232 | > 233 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_14_CQL-Search.pdf) 234 | > 235 | 236 | 2018 January - - 237 | 238 | >Customer: My company is investigating using the DataStax advanced replication feature to 239 | >move data between data centers. Can you help ? 240 | > 241 | >Daniel: 242 | >Excellent question ! In this document we overview DataStax Enterprise (DSE) data replication, 243 | >advanced replication, and even recovery and diagnosis from failure of each of these sub-systems. 244 | >Also, since advanced replication falls into an area of DataStax Enterprise titled, ‘advanced 245 | >functionality’, we overview this topic as well. 246 | > 247 | >Just to be excessively chatty, we also detail DataStax Enterprise triggers and create yet 248 | >another user defined function (UDF). (UDFs were a topic we covered in last month’s edition of 249 | >this document.) 250 | > 251 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_13_AdvRep.pdf). 252 | > 253 | >[Resource kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/DDN_2018_13_AdvRep.tar), 254 | >all of the programs and data used in this edition in Tar format. 255 | > 256 | 257 | -------------------------------------------------------------------------------- /2019/41 Simple Customer Graph.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Simplest graph loading example using Customer 6 | 7 | 8 | . Paragraph 01, Studio, Gremlin 9 | -------------------- 10 | // Paragraph 01 11 | 12 | system.graphs() 13 | system.describe() 14 | -------------------- 15 | 16 | 17 | . Paragraph 02, Studio, Gremlin 18 | -------------------- 19 | // Paragraph 02 20 | 21 | schema.drop() 22 | schema.config().option('graph.allow_scan').set('true') 23 | -------------------- 24 | 25 | 26 | . Paragraph 03, Studio, Gremlin 27 | -------------------- 28 | // Paragraph 03 29 | 30 | // Property keys 31 | 32 | schema.propertyKey('nodeID' ).Int() .single().create() 33 | // 34 | schema.propertyKey('cust_num' ).Text().single().create() 35 | schema.propertyKey('cust_name').Text().single().create() 36 | schema.propertyKey('url' ).Text().single().create() 37 | schema.propertyKey('order_num').Text().single().create() 38 | schema.propertyKey('part' ).Text().single().create() 39 | schema.propertyKey('geo_name' ).Text().single().create() 40 | -------------------- 41 | 42 | 43 | . Paragraph 04, Studio, Gremlin 44 | -------------------- 45 | // Paragraph 04 46 | 47 | // Vertices 48 | 49 | schema.vertexLabel("customers"). 50 | partitionKey("nodeID"). 51 | properties( 52 | 'nodeID' , 53 | 'cust_num' , 54 | 'cust_name' , 55 | 'url' 56 | ).ifNotExists().create() 57 | 58 | schema.vertexLabel("orders"). 59 | partitionKey("nodeID"). 60 | properties( 61 | 'nodeID' , 62 | 'order_num' , 63 | 'cust_num' , 64 | 'part' 65 | ).ifNotExists().create() 66 | 67 | schema.vertexLabel("geographies"). 68 | partitionKey("nodeID"). 69 | properties( 70 | 'nodeID' , 71 | 'geo_name' 72 | ).ifNotExists().create() 73 | 74 | // Edges 75 | 76 | schema.edgeLabel("ordered"). 77 | single(). 78 | connection("customers", "orders"). 79 | ifNotExists().create() 80 | 81 | schema.edgeLabel("is_in_geo"). 82 | single(). 83 | connection("customers", "geographies"). 84 | ifNotExists().create() 85 | -------------------- 86 | 87 | 88 | 89 | 90 | ** Move from DSE Studio to Apache Zeppelin 91 | 92 | 93 | 94 | 95 | . Paragraph 01, Zepp 96 | -------------------- 97 | %spark 98 | 99 | // Paragraph 01 100 | 101 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max} 102 | 103 | val customers = sc.parallelize(Array( 104 | ("C-4001", "United Airlines" , "united.com" ), 105 | ("C-4002", "American Airlines", "aa.com" ), 106 | ("C-4003", "Air France" , "airfrance.us") 107 | ) ) 108 | 109 | val customers_df = customers.toDF ("cust_num", "cust_name", "url").coalesce(1) 110 | 111 | val customers_df_nodeID = customers_df.withColumn("nodeID", monotonically_increasing_id() + 4001) 112 | 113 | customers_df_nodeID.getClass() 114 | customers_df_nodeID.printSchema() 115 | customers_df_nodeID.count() 116 | customers_df_nodeID.show() 117 | 118 | customers_df_nodeID.registerTempTable("customers") 119 | -------------------- 120 | 121 | 122 | . Paragraph 02, Zepp 123 | -------------------- 124 | %spark 125 | 126 | // Paragraph 02 127 | 128 | val orders = sc.parallelize(Array( 129 | ("O-8001", "C-4001", "fuel" ), 130 | ("O-8002", "C-4002", "fuel" ), 131 | ("O-8003", "C-4002", "tires" ), 132 | ("O-8004", "C-4003", "wine" ), 133 | ("O-8005", "C-4003", "cheese" ), 134 | ("O-8006", "C-4003", "bread" ) 135 | ) ) 136 | val orders_df = orders.toDF ("order_num", "cust_num", "part").coalesce(1) 137 | 138 | val orders_df_nodeID = orders_df.withColumn("nodeID", monotonically_increasing_id() + 8001) 139 | 140 | orders_df_nodeID.getClass() 141 | orders_df_nodeID.printSchema() 142 | orders_df_nodeID.count() 143 | orders_df_nodeID.show() 144 | 145 | orders_df_nodeID.registerTempTable("orders") 146 | -------------------- 147 | 148 | 149 | . Paragraph 03, Zepp 150 | -------------------- 151 | %spark 152 | 153 | // Paragraph 03 154 | 155 | val ordered = spark.sql( 156 | "select " + 157 | " t1.nodeID as nodeID_C, " + 158 | " t2.nodeID as nodeID_O " + 159 | "from " + 160 | " customers t1, " + 161 | " orders t2 " + 162 | "where " + 163 | " t1.cust_num = t2.cust_num" 164 | ) 165 | 166 | ordered.getClass() 167 | ordered.printSchema() 168 | ordered.count() 169 | ordered.show() 170 | -------------------- 171 | 172 | 173 | . Paragraph 04, Zepp 174 | -------------------- 175 | %spark 176 | 177 | // Fourth paragraph 178 | 179 | import com.datastax.bdp.graph.spark.graphframe._ 180 | import org.apache.spark.ml.recommendation.ALS 181 | import org.apache.spark.sql.SparkSession 182 | import org.apache.spark.sql.functions._ 183 | import org.apache.spark.sql.types._ 184 | import org.apache.spark.ml.evaluation.RegressionEvaluator 185 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max} 186 | import java.net.URI 187 | import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession} 188 | import org.apache.hadoop.conf.Configuration 189 | import org.apache.hadoop.fs.{FileSystem, Path} 190 | import com.datastax.spark.connector._ 191 | import org.apache.spark.sql.cassandra._ 192 | import com.datastax.spark.connector.cql.CassandraConnectorConf 193 | import com.datastax.spark.connector.rdd.ReadConf 194 | import org.apache.spark.sql.expressions.Window 195 | 196 | val graphName = "customer_graph04" 197 | 198 | val g = spark.dseGraph(graphName) 199 | -------------------- 200 | 201 | 202 | . Paragraph 05, Zepp 203 | -------------------- 204 | %spark 205 | 206 | // Fifth paragraph 207 | 208 | val customers_V = customers_df_nodeID.withColumn("~label", lit("customers")). 209 | withColumn("nodeID" , col("nodeID") ). 210 | withColumn("cust_num" , col("cust_num") ). 211 | withColumn("cust_name", col("cust_name")). 212 | withColumn("url" , col("url") ) 213 | g.updateVertices(customers_V) 214 | 215 | val orders_V = orders_df_nodeID.withColumn("~label", lit("orders")). 216 | withColumn("nodeID" , col("nodeID") ). 217 | withColumn("order_num", col("order_num")). 218 | withColumn("cust_num" , col("cust_num") ). 219 | withColumn("part" , col("part") ) 220 | g.updateVertices(orders_V) 221 | 222 | g.V.hasLabel("customers").show() 223 | g.V.hasLabel("orders").show() 224 | -------------------- 225 | 226 | 227 | . Paragraph 06, Zepp 228 | -------------------- 229 | %spark 230 | 231 | // Sixth paragraph 232 | 233 | val orders_L = ordered. 234 | withColumn("srcLabel", lit("customers")). 235 | withColumn("dstLabel", lit("orders")). 236 | withColumn("edgeLabel", lit("ordered") 237 | ) 238 | orders_L.show() 239 | 240 | val ordered_E = orders_L.select( 241 | g.idColumn(col("srcLabel"), col("nodeID_C" )) as "src", 242 | g.idColumn(col("dstLabel"), col("nodeID_O")) as "dst", 243 | col("edgeLabel") as "~label" 244 | ) 245 | 246 | ordered_E.show() 247 | 248 | g.updateEdges(ordered_E) 249 | -------------------- 250 | 251 | 252 | . Paragraph 07, Zepp 253 | -------------------- 254 | %spark 255 | 256 | // Seventh paragraph 257 | 258 | g.E.hasLabel("ordered").show() 259 | 260 | g.V().hasLabel("customers").has("url", "aa.com").out("ordered").valueMap("order_num", "cust_num", "part").show() 261 | -------------------- 262 | 263 | 264 | 265 | 266 | // Now add geographies 267 | 268 | 269 | 270 | 271 | . Paragraph 08, Zepp 272 | -------------------- 273 | %spark 274 | 275 | // Paragraph 08 276 | 277 | import org.apache.spark.sql.functions.{monotonically_increasing_id, col, lit, concat, max} 278 | 279 | val geographies = sc.parallelize(Array( 280 | ("N.America"), 281 | ("S.America"), 282 | ("EMEA" ) 283 | ) ) 284 | 285 | val geographies_df = geographies.toDF ("geo_name").coalesce(1) 286 | 287 | val geographies_df_nodeID = geographies_df.withColumn("nodeID", monotonically_increasing_id() + 9001) 288 | 289 | geographies_df_nodeID.getClass() 290 | geographies_df_nodeID.printSchema() 291 | geographies_df_nodeID.count() 292 | geographies_df_nodeID.show() 293 | 294 | geographies_df_nodeID.registerTempTable("geographies") 295 | -------------------- 296 | 297 | 298 | . Paragraph 09, Zepp 299 | -------------------- 300 | %spark 301 | 302 | // Paragraph 09 303 | 304 | val geographies_V = geographies_df_nodeID.withColumn("~label", lit("geographies")). 305 | withColumn("nodeID" , col("nodeID") ). 306 | withColumn("geo_name" , col("geo_name")) 307 | g.updateVertices(geographies_V) 308 | 309 | g.V.hasLabel("geographies").show() 310 | -------------------- 311 | 312 | 313 | . Paragraph 10, Zepp 314 | -------------------- 315 | %spark 316 | 317 | // Paragraph 10 318 | 319 | val is_in_geo = sc.parallelize(Array( 320 | (4001, 9001), 321 | (4001, 9002), 322 | (4002, 9001), 323 | (4002, 9002), 324 | (4003, 9002), 325 | (4003, 9003) 326 | ) ) 327 | 328 | val is_in_geo_df = is_in_geo.toDF ("nodeID_C", "nodeID_G").coalesce(1) 329 | 330 | val geo_L = is_in_geo_df. 331 | withColumn("srcLabel", lit("customers")). 332 | withColumn("dstLabel", lit("geographies")). 333 | withColumn("edgeLabel", lit("is_in_geo") 334 | ) 335 | geo_L.show() 336 | 337 | val geo_E = geo_L.select( 338 | g.idColumn(col("srcLabel"), col("nodeID_C" )) as "src", 339 | g.idColumn(col("dstLabel"), col("nodeID_G")) as "dst", 340 | col("edgeLabel") as "~label" 341 | ) 342 | geo_E.show() 343 | 344 | g.updateEdges(geo_E) 345 | 346 | g.E.hasLabel("is_in_geo").show() 347 | 348 | g.V().hasLabel("customers").has("url", "aa.com").out("is_in_geo").valueMap("geo_name").show() 349 | -------------------- 350 | 351 | 352 | 353 | 354 | ** Done with customers; stop here 355 | 356 | 357 | 358 | 359 | More customer graph queries 360 | -------------------- 361 | 362 | // sql2gremlin.com 363 | 364 | // select * from t1 365 | g.V().hasLabel("customers").valueMap().show() 366 | 367 | // (subset of columns) 368 | g.V().hasLabel("customers").valueMap("nodeID", "cust_name").show() 369 | 370 | // Dervied columns 371 | g.V().hasLabel("customers").values("cust_name"). map {arg.upshift()} 372 | 373 | 374 | 375 | 376 | -------------------------------------------------------------------------------- /2019/DDN_2019_25_GraphPrimer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_25_GraphPrimer.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_26_Inventory.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_26_Inventory.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_27_Security.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_27_Security.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_28_Jupyter, R.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_28_Jupyter, R.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_29_Kafka.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_29_Kafka.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_30_GraphPrimer 68.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_30_GraphPrimer 68.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_31a_DSE, Reco Engines.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_31a_DSE, Reco Engines.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_31b_DSE, Reco Engines.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_31b_DSE, Reco Engines.pptx -------------------------------------------------------------------------------- /2019/DDN_2019_31d_AllCommands.txt: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | dse spark 6 | Version 2.4.0.6 7 | 8 | Reference Urls, 9 | https://spark.apache.org/docs/2.4.3/ml-frequent-pattern-mining.html 10 | https://en.wikipedia.org/wiki/Association_rule_learning 11 | 12 | 13 | 14 | 15 | ------------------------------------------------- 16 | %spark 17 | 18 | // 1.1 19 | 20 | val recs_0 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/10_grocery_sm.csv") 21 | val recs_1 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/20_grocery_lg.csv") 22 | 23 | recs_0.collect().take(5).foreach(println) 24 | println("--") 25 | recs_1.collect().take(5).foreach(println) 26 | 27 | // soy milk,lettuce 28 | // lettuce,diapers ,wine,chard 29 | // soy milk,diapers,wine,OJ 30 | // lettuce,soy milk,diapers,wine 31 | // lettuce,soy milk,diapers,OJ 32 | // -- 33 | // citrus fruit,semi-finished bread,margarine,ready soups 34 | // tropical fruit,yogurt,coffee 35 | // whole milk 36 | // pip fruit,yogurt,cream cheese ,meat spreads 37 | // other vegetables,whole milk,condensed milk,long life bakery product 38 | ------------------------------------------------- 39 | 40 | 41 | ------------------------------------------------- 42 | %spark 43 | 44 | // 1.2 45 | 46 | val i = recs_0.count() // number of rows, 5 47 | val j = recs_1.count() // number of rows, 9835 48 | 49 | // i: Long = 5 50 | // j: Long = 9835 51 | ------------------------------------------------- 52 | 53 | 54 | ------------------------------------------------- 55 | %spark 56 | 57 | // 1.3 58 | 59 | val i = recs_0.distinct().count() // number of distinct rows, 5 60 | val j = recs_1.distinct().count() // number of distinct rows, 7011 61 | 62 | // i: Long = 5 63 | // j: Long = 7011 64 | ------------------------------------------------- 65 | 66 | 67 | ------------------------------------------------- 68 | %spark 69 | 70 | // 1.4 71 | 72 | val recs_0s = recs_0.flatMap(s => s.trim.split(',')) 73 | val recs_1s = recs_1.flatMap(s => s.trim.split(',')) 74 | 75 | recs_0s.collect().take(5).foreach(println) 76 | println("--") 77 | recs_1s.collect().take(5).foreach(println) 78 | 79 | // soy milk 80 | // lettuce 81 | // lettuce 82 | // diapers 83 | // wine 84 | // -- 85 | // citrus fruit 86 | // semi-finished bread 87 | // margarine 88 | // ready soups 89 | // tropical fruit 90 | ------------------------------------------------- 91 | 92 | 93 | ------------------------------------------------- 94 | %spark 95 | 96 | // 1.5 97 | 98 | val i = recs_0s. count() // number of distinct rows, 5 99 | val j = recs_0s.distinct().count() 100 | val k = recs_1s. count() // number of distinct rows, 7011 101 | val l = recs_1s.distinct().count() 102 | 103 | // i: Long = 18 104 | // j: Long = 7 105 | // k: Long = 43367 106 | // l: Long = 169 107 | ------------------------------------------------- 108 | 109 | 110 | ------------------------------------------------- 111 | %spark 112 | 113 | // 1.6 114 | 115 | val recs_0_top = recs_0s.map(str => (str, 1)). 116 | reduceByKey((k, v) => k + v). 117 | sortBy(_._2, false). 118 | take(5). 119 | foreach(println) 120 | println("--") 121 | val recs_1_top = recs_1s.map(str => (str, 1)). 122 | reduceByKey((k, v) => k + v). 123 | sortBy(_._2, false). 124 | take(5). 125 | foreach(println) 126 | 127 | // (lettuce,4) 128 | // (soy milk,4) 129 | // (diapers,3) 130 | // (wine,3) 131 | // (OJ,2) 132 | // -- 133 | // (whole milk,2513) 134 | // (other vegetables,1903) 135 | // (rolls/buns,1809) 136 | // (soda,1715) 137 | // (yogurt,1372) 138 | ------------------------------------------------- 139 | 140 | 141 | 142 | 143 | ------------------------------------------------- 144 | %md 145 | 146 | ### Now actual ML routine 147 | ------------------------------------------------- 148 | 149 | 150 | 151 | 152 | https://spark.apache.org/docs/2.4.3/ml-frequent-pattern-mining.html 153 | 154 | ------------------------------------------------- 155 | %spark 156 | 157 | // 2.1 158 | 159 | import org.apache.spark.ml.fpm.FPGrowth 160 | ------------------------------------------------- 161 | 162 | 163 | // Data file, 164 | // soy milk,lettuce 165 | // lettuce,diapers ,wine,chard 166 | // soy milk,diapers,wine,OJ 167 | // lettuce,soy milk,diapers,wine 168 | // lettuce,soy milk,diapers,OJ 169 | 170 | 171 | // Support/minSupport 172 | // 173 | // If an item/item-set appears in 3 of 5 total 174 | // transactions, then its support is, 175 | // 3 / 5 = 0.60 or 60% 176 | // 177 | // "How frequently the item appears .. 178 | 179 | // Confidence/minConfidence (aka, Conviction) 180 | // 181 | // If the item/item-set appear 4 times (the antecedent), 182 | // and the consequent appears 2 times, then the 183 | // confidence is, 184 | // 2 / 4 = 0.50 or 50% 185 | // 186 | // ** Used to generate association rules 187 | // 188 | // "Indication of how often the rule is found to be true 189 | 190 | // Lift 191 | // 192 | // Support:(antecedent union consequent) / Support:(antecedent) * Support:(consequent) 193 | // 194 | // From above, 195 | // lettuce, soy milk -> wine 196 | // 197 | // 0.2 / 0.4 * 0.6 == 0.83 198 | // 199 | // Equal 1, implies probability of occurrence of antecedent and 200 | // consequent are independent of one another, no rule should 201 | // be drawn involving these two events 202 | // 203 | // > 1, degree to which antecedent/consequent are dependent on 204 | // one another, a useful prediction rule 205 | // 206 | // < 1, one item has a negative affect on the other and vice 207 | // versa, items can substitute for one another 208 | 209 | // Not supplied by the library currently, 210 | // Conviction 211 | // Rule Power Factor 212 | // 213 | // See (Wikipedia article) 214 | 215 | 216 | ------------------------------------------------- 217 | %spark 218 | 219 | // 2.2 220 | 221 | val recs_0 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/18 DSE Developers Notebook/31 Graph Reco Eng/02 Files/10_grocery_sm.csv") 222 | val recs_0s = recs_0.map(i_str => i_str.split(",")).toDF("items") 223 | 224 | 225 | val my_fpgrowth0 = new FPGrowth().setItemsCol("items").setMinSupport(0.1).setMinConfidence(0.1) 226 | val my_model0 = my_fpgrowth0.fit(recs_0s) 227 | 228 | my_model0.freqItemsets.show() 229 | 230 | // +--------------------+----+ 231 | // | items |freq| 232 | // +--------------------+----+ 233 | // | [diapers] | 3 | 234 | // | [diapers, soy milk]| 3 | 235 | // | [diapers, soy mil..| 2 | 236 | // | [diapers, lettuce] | 2 | 237 | // | [OJ] | 2 | 238 | // | [OJ, diapers] | 2 | 239 | // | [OJ, diapers, soy..| 2 | 240 | // | [OJ, diapers, soy..| 1 | 241 | // | [OJ, diapers, let..| 1 | 242 | // | [OJ, soy milk] | 2 | 243 | // | [OJ, soy milk, le..| 1 | 244 | // | [OJ, wine] | 1 | 245 | // ... 246 | ------------------------------------------------- 247 | 248 | 249 | ------------------------------------------------- 250 | %spark 251 | 252 | // 2.3 253 | 254 | my_model0.associationRules.show() 255 | 256 | // +--------------------+-----------+----------+------------------+ 257 | // | antecedent |consequent |confidence| lift | 258 | // +--------------------+-----------+----------+------------------+ 259 | // | [OJ, diapers, soy..| [lettuce] | 0.5 | 0.625 | 260 | // | [OJ, diapers, soy..| [wine] | 0.5 |0.8333333333333334| 261 | // | [soy milk] | [diapers] | 0.75 | 1.25 | 262 | // | [soy milk] | [OJ] | 0.5 | 1.25 | 263 | // | [soy milk] | [lettuce] | 0.75 | 0.9375 | 264 | // | [soy milk] | [wine] | 0.5 |0.8333333333333334| 265 | // | [diapers ] | [chard] | 1.0 | 5.0 | 266 | // | [diapers ] | [wine] | 1.0 |1.6666666666666667| 267 | // | [diapers ] | [lettuce] | 1.0 | 1.25 | 268 | // | [wine, soy milk, ..| [diapers] | 1.0 |1.6666666666666667| 269 | // | [OJ, wine, diapers]| [soy milk]| 1.0 | 1.25 | 270 | // | [diapers , chard] | [wine] | 1.0 |1.6666666666666667| 271 | // | [diapers , chard] | [lettuce] | 1.0 | 1.25 | 272 | // | [lettuce] | [diapers] | 0.5 |0.8333333333333334| 273 | // | [lettuce] | [OJ] | 0.25 | 0.625 | 274 | // | [lettuce] | [diapers] | 0.25 | 1.25 | 275 | // | [lettuce] | [chard] | 0.25 | 1.25 | 276 | // | [lettuce] | [soy milk]| 0.75 | 0.9375 | 277 | // | [lettuce] | [wine] | 0.5 |0.8333333333333334| 278 | // ... 279 | ------------------------------------------------- 280 | 281 | 282 | ------------------------------------------------- 283 | %spark 284 | 285 | // 2.4 286 | 287 | my_model0.transform(recs_0s).show() 288 | 289 | // +--------------------+--------------------+ 290 | // | items | prediction | 291 | // +--------------------+--------------------+ 292 | // | [soy milk, lettuce]|[diapers, OJ, win...| 293 | // |[lettuce, diapers...|[diapers, OJ, soy...| 294 | // |[soy milk, diaper...|[lettuce, diapers...| 295 | // |[lettuce, soy mil...|[OJ, diapers , ch...| 296 | // |[lettuce, soy mil...|[wine, diapers , ...| 297 | // +--------------------+--------------------+ 298 | // ... 299 | ------------------------------------------------- 300 | 301 | 302 | ------------------------------------------------- 303 | %spark 304 | 305 | // 2.5 306 | 307 | import scala.collection.mutable.WrappedArray; 308 | 309 | val recs_0_ar = my_model0.associationRules.collect() 310 | 311 | val recs_0_arl = recs_0_ar.map( row => ( 312 | row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"), 313 | row.get(1).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"), 314 | row.getDouble(2), 315 | row.getDouble(3)) 316 | ).toList 317 | 318 | val recs_0_ardf = recs_0_arl.toDF("antecedent", "consequent", "confidence", "lift") 319 | 320 | recs_0_ardf.show(20) 321 | 322 | // +---------------------+----------+----------+------------------+ 323 | // | antecedent |consequent|confidence| lift | 324 | // +---------------------+----------+----------+------------------+ 325 | // | OJ|diapers|soy milk | lettuce | 0.5 | 0.625 | 326 | // | OJ|diapers|soy milk | wine | 0.5 |0.8333333333333334| 327 | // | soy milk | diapers | 0.75 | 1.25 | 328 | // | soy milk | OJ | 0.5 | 1.25 | 329 | // | soy milk | lettuce | 0.75 | 0.9375 | 330 | // | soy milk | wine | 0.5 |0.8333333333333334| 331 | // ... 332 | ------------------------------------------------- 333 | 334 | 335 | Must use Studio for this 1 block 336 | ------------------------------------------------- 337 | // 2.6 338 | 339 | DROP KEYSPACE IF EXISTS ks_31; 340 | 341 | CREATE KEYSPACE ks_31 342 | WITH replication = {'class': 'SimpleStrategy', 343 | 'replication_factor': 1} 344 | AND graph_engine = 'Native'; 345 | 346 | USE ks_31; 347 | 348 | CREATE TABLE t_association_rules 349 | ( 350 | antecedent TEXT, 351 | consequent TEXT, 352 | version DOUBLE, 353 | confidence DOUBLE, 354 | lift DOUBLE, 355 | PRIMARY KEY((antecedent), consequent, version) 356 | ) ; 357 | ------------------------------------------------- 358 | 359 | 360 | ------------------------------------------------- 361 | %spark 362 | 363 | // 2.7 364 | 365 | // import org.apache.spark.sql.functions._ 366 | 367 | val recs_0_ardf2 = recs_0_ardf.withColumn("version", lit(1.0) ) 368 | 369 | recs_0_ardf2.write. 370 | format("org.apache.spark.sql.cassandra"). 371 | options(Map( "keyspace" -> "ks_31", "table" -> "t_association_rules" )). 372 | mode("append"). 373 | save 374 | 375 | val recs_0_test = spark.read. 376 | format("org.apache.spark.sql.cassandra"). 377 | options(Map("keyspace" -> "ks_31", "table" -> "t_association_rules")). 378 | load 379 | recs_0_test.count() 380 | recs_0_test.show(20) 381 | ------------------------------------------------- 382 | 383 | 384 | ------------------------------------------------- 385 | %sql 386 | 387 | -- 2.8 388 | 389 | select * from ks_31.t_association_rules 390 | where antecedent = 'lettuce' 391 | order by consequent, confidence desc 392 | ------------------------------------------------- 393 | 394 | 395 | 396 | 397 | ------------------------------------------------- 398 | %md 399 | 400 | ### Repeat above, but now for larger item set 401 | ------------------------------------------------- 402 | 403 | 404 | 405 | 406 | ------------------------------------------------- 407 | %spark 408 | 409 | // 3.1 410 | 411 | import org.apache.spark.ml.fpm.FPGrowth 412 | // 413 | import scala.collection.mutable.WrappedArray; 414 | 415 | val recs_1 = sc.textFile("file:///mnt/hgfs/My.20/MyShare_1/44 Topics_2019/41 Graph Reco Eng/02 Files/20_grocery_lg.csv") 416 | val recs_1s = recs_1.map(i_str => i_str.split(",")).toDF("items") 417 | 418 | val my_fpgrowth1 = new FPGrowth().setItemsCol("items").setMinSupport(0.01).setMinConfidence(0.01) 419 | val my_model1 = my_fpgrowth1.fit(recs_1s) 420 | 421 | val recs_1_ar = my_model1.associationRules.collect() 422 | 423 | val recs_1_arl = recs_1_ar.map( row => ( 424 | row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"), 425 | row.get(1).asInstanceOf[WrappedArray[WrappedArray[String]]].mkString("|"), 426 | row.getDouble(2), 427 | row.getDouble(3)) 428 | ).toList 429 | 430 | val recs_1_ardf = recs_1_arl.toDF("antecedent", "consequent", "confidence", "lift") 431 | 432 | val recs_1_ardf2 = recs_1_ardf.withColumn("version", lit(1.0) ) 433 | 434 | recs_1_ardf2.write. 435 | format("org.apache.spark.sql.cassandra"). 436 | options(Map( "keyspace" -> "ks_31", "table" -> "t_association_rules" )). 437 | mode("append"). 438 | save 439 | 440 | recs_1_ardf2.show(20) 441 | ------------------------------------------------- 442 | 443 | 444 | 445 | 446 | // +--------------------+--------------------+-------------------+------------------+---------+ 447 | // | antecedent | consequent | confidence | lift | version | 448 | // +--------------------+--------------------+-------------------+------------------+---------+ 449 | // | hygiene articles | whole milk | 0.3888888888888889|1.5219746208604146| 1.0 | 450 | // |domestic eggs|who...| other vegetables | 0.4101694915254237|2.1198197315567744| 1.0 | 451 | // | cream cheese | yogurt | 0.3251366120218579| 2.330698672911787| 1.0 | 452 | // | cream cheese | other vegetables | 0.3524590163934426|1.8215630195635881| 1.0 | 453 | // | cream cheese | whole milk | 0.4262295081967213|1.6681126992100093| 1.0 | 454 | // | sugar | other vegetables | 0.3183183183183183|1.6451185815347664| 1.0 | 455 | // | sugar | whole milk | 0.4444444444444444|1.7393995666976165| 1.0 | 456 | // | tropical fruit | shopping bags |0.12887596899224807| 1.308044535643715| 1.0 | 457 | // | tropical fruit |fruit/vegetable j...| 0.1308139534883721|1.8095010303208716| 1.0 | 458 | // | tropical fruit | pastry |0.12596899224806202| 1.415891472868217| 1.0 | 459 | // | tropical fruit | brown bread |0.10174418604651163| 1.56842330684552 | 1.0 | 460 | // | tropical fruit | whipped/sour cream |0.13178294573643412|1.8384188245642974| 1.0 | 461 | // | tropical fruit | citrus fruit |0.18992248062015504|2.2947022074929055| 1.0 | 462 | // | tropical fruit | newspapers | 0.1124031007751938|1.4082605046166001| 1.0 | 463 | // | tropical fruit | rolls/buns |0.23449612403100775| 1.274886334906004| 1.0 | 464 | // | tropical fruit | bottled water |0.17635658914728683|1.5956458640879172| 1.0 | 465 | // | tropical fruit | yogurt |0.27906976744186046|2.0004746084480303| 1.0 | 466 | // | tropical fruit | other vegetables |0.34205426356589147|1.7677896385551983| 1.0 | 467 | // | tropical fruit | soda |0.19864341085271317| 1.139159152032906| 1.0 | 468 | // | tropical fruit | root vegetables | 0.2005813953488372|1.8402220366192295| 1.0 | 469 | 470 | 471 | 472 | 473 | ------------------------------------------------- 474 | %sql 475 | -- 3.2 476 | 477 | select count(*) from ks_31.t_association_rules // 522 478 | ------------------------------------------------- 479 | 480 | 481 | ------------------------------------------------- 482 | %sql 483 | -- 3.3 484 | 485 | select "All records", count(*) 486 | from ks_31.t_association_rules 487 | union 488 | select "Great than 1", count(*) 489 | from ks_31.t_association_rules 490 | where lift > 1.0 491 | union 492 | select "Equal to 1", count(*) 493 | from ks_31.t_association_rules 494 | where lift = 1.0 495 | union 496 | select "Less than 1", count(*) 497 | from ks_31.t_association_rules 498 | where lift < 1.0 499 | 500 | // All records 522 501 | // Great than 1 502 502 | // Equal to 1 0 503 | // Less than 1 20 504 | 505 | // From above, 506 | // 507 | // 9835 total orders 508 | // 7011 unique orders 509 | // 169 unique items 510 | ------------------------------------------------- 511 | 512 | 513 | ------------------------------------------------- 514 | %sql 515 | -- 3.4 516 | 517 | select * from ks_31.t_association_rules 518 | order by lift desc 519 | limit 20 520 | 521 | // (Get screen shot) High is 3.37 522 | ------------------------------------------------- 523 | 524 | 525 | 526 | 527 | ------------------------------------------------- 528 | ------------------------------------------------- 529 | %md 530 | 531 | ### Move to Graph Recommendation Engine (Gremlin/Studio) 532 | ------------------------------------------------- 533 | ------------------------------------------------- 534 | 535 | 536 | Doc, 537 | Types of traversals, 538 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/using/QueryTOC.html 539 | Traversal API, 540 | (Top level Gremlin API doc) 541 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/refTPTOC.html 542 | 543 | 544 | Jist of graph reco engine- 545 | 546 | To answer what would you like; Who do you know, 547 | and what do they like ? 548 | 549 | 550 | ------------------------------------------------- 551 | // %md 552 | 553 | ### Just 'know'ing someone 554 | 555 | ### Protoype using smaller/known data set 556 | ------------------------------------------------- 557 | 558 | 559 | ------------------------------------------------- 560 | // 4.1 561 | 562 | USE ks_ngkv; 563 | 564 | DROP MATERIALIZED VIEW IF EXISTS e_knows_bi; 565 | // 566 | DROP TABLE IF EXISTS ordered; 567 | DROP TABLE IF EXISTS orders; 568 | // 569 | DROP TABLE IF EXISTS knows; 570 | DROP TABLE IF EXISTS user; 571 | 572 | CREATE TABLE user 573 | ( 574 | user_id TEXT, 575 | gender TEXT, 576 | age INT, 577 | PRIMARY KEY((user_id)) 578 | ) 579 | WITH VERTEX LABEL v_user 580 | ; 581 | CREATE TABLE knows 582 | ( 583 | user_id_s TEXT, 584 | user_id_d TEXT, 585 | PRIMARY KEY((user_id_s), user_id_d) 586 | ) 587 | WITH EDGE LABEL e_knows 588 | FROM v_user(user_id_s) 589 | TO v_user(user_id_d); 590 | CREATE MATERIALIZED VIEW e_knows_bi 591 | AS SELECT user_id_s, user_id_d 592 | FROM knows 593 | WHERE 594 | user_id_s IS NOT NULL 595 | AND 596 | user_id_d IS NOT NULL 597 | PRIMARY KEY ((user_id_d), user_id_s); 598 | ------------------------------------------------- 599 | 600 | 601 | ------------------------------------------------- 602 | // 4.2 603 | 604 | USE ks_ngkv; 605 | 606 | INSERT INTO user (user_id, gender, age) VALUES ('dave' , 'M', 30 ); 607 | INSERT INTO user (user_id, gender, age) VALUES ('denise' , 'F', 30 ); 608 | INSERT INTO user (user_id, gender, age) VALUES ('kiyu' , 'M', 30 ); 609 | INSERT INTO user (user_id, age) VALUES ('farrell', 40 ); 610 | INSERT INTO user (user_id, gender, age) VALUES ('hatcher', 'M', 30 ); 611 | INSERT INTO user (user_id, gender, age) VALUES ('morty' , 'M', 30 ); 612 | 613 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'kiyu' ); 614 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'denise' ); 615 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'morty' ); 616 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('farrell', 'hatcher'); 617 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu' , 'farrell'); 618 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('denise' , 'farrell'); 619 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('dave' , 'farrell'); 620 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu' , 'denise' ); 621 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu' , 'dave' ); 622 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('hatcher', 'denise' ); 623 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('kiyu' , 'hatcher'); 624 | INSERT INTO knows (user_id_s, user_id_d) VALUES ('denise' , 'dave' ); 625 | ------------------------------------------------- 626 | 627 | 628 | ------------------------------------------------- 629 | // 4.3 630 | 631 | def l_user = "farrell" 632 | // 633 | g.V().has("v_user", "user_id", l_user). 634 | out("e_knows"). 635 | valueMap(true, "user_id") 636 | 637 | // { "id": "v_user:denise#32", "label": "v_user", "user_id": [ "denise" ] }, 638 | // { "id": "v_user:hatcher#81", "label": "v_user", "user_id": [ "hatcher" ] }, 639 | // { "id": "v_user:kiyu#62", "label": "v_user", "user_id": [ "kiyu" ] }, 640 | // { "id": "v_user:morty#77", "label": "v_user", "user_id": [ "morty" ] } 641 | ------------------------------------------------- 642 | 643 | 644 | ------------------------------------------------- 645 | // 4.4 646 | 647 | def l_user = "farrell" 648 | // 649 | g.V().has("v_user", "user_id", l_user). 650 | both("e_knows"). 651 | valueMap(true, "user_id") 652 | 653 | // farrell knows 654 | // { "id": "v_user:denise#32", "label": "v_user", "user_id": [ "denise" ] }, 655 | // { "id": "v_user:hatcher#81", "label": "v_user", "user_id": [ "hatcher" ] }, 656 | // { "id": "v_user:kiyu#62", "label": "v_user", "user_id": [ "kiyu" ] }, 657 | // { "id": "v_user:morty#77", "label": "v_user", "user_id": [ "morty" ] }, 658 | 659 | // know farrell (and farrell knows them) 660 | // { "id": "v_user:denise#32", "label": "v_user", "user_id": [ "denise" ] }, 661 | // { "id": "v_user:kiyu#62", "label": "v_user", "user_id": [ "kiyu" ] } 662 | 663 | // knows farrell, farrell doesn't know him 664 | // { "id": "v_user:dave#38", "label": "v_user", "user_id": [ "dave" ] }, 665 | ------------------------------------------------- 666 | 667 | 668 | ------------------------------------------------- 669 | // 4.5 670 | 671 | def l_user = "farrell" 672 | // 673 | g.V().has("v_user", "user_id", l_user). 674 | in("e_knows"). 675 | valueMap(true, "user_id") 676 | 677 | // in 678 | // { "id": "v_user:dave#38", "label": "v_user", "user_id": [ "dave" ] }, 679 | // { "id": "v_user:denise#32", "label": "v_user", "user_id": [ "denise" ] }, 680 | // { "id": "v_user:kiyu#62", "label": "v_user", "user_id": [ "kiyu" ] } 681 | ------------------------------------------------- 682 | 683 | 684 | 685 | 686 | ------------------------------------------------- 687 | // %md 688 | 689 | ### The 'as' step modulator 690 | 691 | ### Use a simpler (non-recursive) relationship 692 | ------------------------------------------------- 693 | 694 | 695 | Step modulators, 696 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/tpStepModTOC.html 697 | 698 | 'as' "Label an object in a traversal to use later in the traversal. 699 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravAs.html 700 | 701 | -- 702 | 703 | Similar to 'store()', which is a step 704 | "The store() step is a sideEffect step that stores information for later use in the traversal. 705 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravStore.html 706 | 707 | And also, 'sack()', a step, and 'withSack(), a step modulator 708 | 709 | From, 710 | http://www.doanduyhai.com/blog/?p=13404 711 | 712 | "A sack is a local datastructure relative to each traverser, unlike global datastructure as 713 | when using aggregate()/store()/group(x)/subgraph(x).... It mean that each traverser is now 714 | equipped with his own sack. A sack can contain any type of data structure: 715 | 716 | 717 | ------------------------------------------------- 718 | // 5.1 719 | 720 | USE ks_ngkv; 721 | 722 | DROP TABLE IF EXISTS ordered; 723 | DROP TABLE IF EXISTS orders; 724 | 725 | CREATE TABLE orders 726 | ( 727 | order_id INT, 728 | item TEXT, 729 | PRIMARY KEY((order_id)) 730 | ) 731 | WITH VERTEX LABEL v_orders 732 | ; 733 | CREATE TABLE ordered 734 | ( 735 | user_id TEXT, 736 | order_id INT, 737 | order_date TEXT, 738 | PRIMARY KEY((user_id), order_id) 739 | ) 740 | WITH EDGE LABEL e_ordered 741 | FROM v_user(user_id) 742 | TO v_orders(order_id); 743 | 744 | INSERT INTO orders (order_id, item) VALUES (101, 'beer' ); 745 | INSERT INTO orders (order_id, item) VALUES (102, 'chips' ); 746 | INSERT INTO orders (order_id, item) VALUES (103, 'wine' ); 747 | INSERT INTO orders (order_id, item) VALUES (104, 'cheese'); 748 | 749 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('farrell', 101, 'Dec-31-2008'); 750 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('farrell', 102, 'Dec-31-2008'); 751 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('denise' , 103, 'Dec-31-2008'); 752 | INSERT INTO ordered (user_id, order_id, order_date) VALUES ('denise' , 104, 'Dec-31-2008'); 753 | ------------------------------------------------- 754 | 755 | 756 | ------------------------------------------------- 757 | // 5.2 758 | 759 | def l_user = "farrell" 760 | // 761 | g.V().has("v_user", "user_id", l_user). 762 | valueMap(true) 763 | 764 | // { "id": "v_user:farrell#82", "label": "v_user", "user_id": [ "farrell" ], "age": [ "40" ] } 765 | ------------------------------------------------- 766 | 767 | 768 | ------------------------------------------------- 769 | // 5.3 770 | 771 | def l_user = "denise" 772 | // 773 | g.V().has("v_user", "user_id", l_user). 774 | valueMap(true) 775 | 776 | // { "id": "v_user:denise#32", "label": "v_user", "gender": [ "F" ], "user_id": [ "denise" ], "age": [ "30" ] } 777 | ------------------------------------------------- 778 | 779 | 780 | ------------------------------------------------- 781 | // 5.4 782 | 783 | def l_user = "farrell" 784 | // 785 | g.V().has("v_user", "user_id", l_user). 786 | outE("e_ordered"). 787 | valueMap(true) 788 | 789 | // { "id": "v_user:farrell#82->e_ordered#03->v_orders:101#27", "label": "e_ordered", "order_date": "Dec-31-2008" }, 790 | // { "id": "v_user:farrell#82->e_ordered#03->v_orders:102#24", "label": "e_ordered", "order_date": "Dec-31-2008" } 791 | ------------------------------------------------- 792 | 793 | 794 | ------------------------------------------------- 795 | // 5.5 796 | 797 | def l_user = "farrell" 798 | // 799 | g.V().has("v_user", "user_id", l_user). 800 | out("e_ordered"). 801 | valueMap(true) 802 | 803 | // { "id": "v_orders:101#27", "label": "v_orders", "item": [ "beer" ], "order_id": [ "101" ] }, 804 | // { "id": "v_orders:102#24", "label": "v_orders", "item": [ "chips" ], "order_id": [ "102" ] } 805 | ------------------------------------------------- 806 | 807 | 808 | ------------------------------------------------- 809 | // 5.6 810 | 811 | def l_user = "farrell" 812 | // 813 | g.V().has("v_user", "user_id", l_user). 814 | as("user"). 815 | out("e_ordered"). 816 | as("order"). 817 | select("user", "order"). 818 | by(valueMap(true)). 819 | by(valueMap(true)) 820 | 821 | { "user": { "id": "v_user:farrell#82", "label": "v_user", "user_id": [ "farrell" ], "age": [ "40" ] }, 822 | "order": { "id": "v_orders:101#27", "label": "v_orders", "item": [ "beer" ], "order_id": [ "101" ] } }, 823 | { "user": { "id": "v_user:farrell#82", "label": "v_user", "user_id": [ "farrell" ], "age": [ "40" ] }, 824 | "order": { "id": "v_orders:102#24", "label": "v_orders", "item": [ "chips" ], "order_id": [ "102" ] } } 825 | ------------------------------------------------- 826 | 827 | 828 | // Why isn't order an embedded array to user ? 829 | // 830 | // Think, 831 | // 832 | // SELECT o.* 833 | // FROM user AS 'u' 834 | // INNER JOIN order AS 'o' ON u.user_id=o.user_id 835 | 836 | 837 | ------------------------------------------------- 838 | // 5.7 839 | 840 | def l_user = "farrell" 841 | // 842 | g.V().has("v_user", "user_id", l_user). 843 | as('a'). 844 | out("e_knows"). 845 | as('b'). 846 | out("e_knows"). 847 | where(eq('a')). // this line 848 | select('a', 'b'). 849 | by(values('user_id')). 850 | by(values('user_id')) 851 | 852 | // { "a": "farrell", "b": "denise" }, 853 | // { "a": "farrell", "b": "kiyu" } 854 | ------------------------------------------------- 855 | 856 | 'this line' .. .. 857 | "each traverser compares the id value of 'b' to the id value of 'a' and if they 858 | aren't equal then it is filtered out 859 | 860 | 861 | 862 | 863 | ------------------------------------------------- 864 | // %md 865 | 866 | ### Back to KV 867 | ------------------------------------------------- 868 | 869 | 870 | 871 | 872 | ------------------------------------------------- 873 | // 6.1 874 | 875 | def l_user = "u1" 876 | // 877 | g.V().has("v_user2", "user_id", l_user). 878 | valueMap(true) 879 | 880 | // { "id": "v_user2:u1#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u1" ], "age": [ "17" ] } 881 | ------------------------------------------------- 882 | 883 | 884 | ------------------------------------------------- 885 | // 6.2 886 | 887 | def l_user = "u1" 888 | // 889 | g.V().has("v_user2", "user_id", l_user). 890 | as('a'). 891 | out("e_knows2"). 892 | as('b'). 893 | out("e_knows2"). 894 | where(eq('a')). 895 | select('a', 'b'). 896 | by(values('user_id')). 897 | by(values('user_id')) 898 | 899 | // No data 900 | ------------------------------------------------- 901 | 902 | 903 | ------------------------------------------------- 904 | // 6.3 905 | 906 | def l_user = "u1" 907 | // 908 | g.V().has("v_user2", "user_id", l_user). 909 | out("e_knows2"). 910 | valueMap(true) 911 | 912 | // { "id": "v_user2:u151#66", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u151" ], "age": [ "17" ] }, 913 | // { "id": "v_user2:u192#77", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u192" ], "age": [ "14" ] }, 914 | // { "id": "v_user2:u74#16", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u74" ], "age": [ "15" ] }, 915 | // { "id": "v_user2:u83#24", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u83" ], "age": [ "14" ] } 916 | ------------------------------------------------- 917 | 918 | 919 | ------------------------------------------------- 920 | // 6.4 921 | 922 | def l_user = "u1" 923 | // 924 | g.V().has("v_user2", "user_id", l_user). 925 | in("e_knows2"). 926 | valueMap(true) 927 | 928 | // { "id": "v_user2:u199#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u199" ], "age": [ "15" ] } 929 | ------------------------------------------------- 930 | 931 | 932 | ------------------------------------------------- 933 | // 6.5 934 | 935 | def l_user = "u1" 936 | // 937 | g.V().has("v_user2", "user_id", l_user). 938 | both("e_knows2"). 939 | valueMap(true) 940 | 941 | // { "id": "v_user2:u151#66", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u151" ], "age": [ "17" ] }, 942 | // { "id": "v_user2:u192#77", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u192" ], "age": [ "14" ] }, 943 | // { "id": "v_user2:u74#16", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u74" ], "age": [ "15" ] }, 944 | // { "id": "v_user2:u83#24", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u83" ], "age": [ "14" ] }, 945 | // 946 | // { "id": "v_user2:u199#70", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u199" ], "age": [ "15" ] } 947 | ------------------------------------------------- 948 | 949 | 950 | 951 | 952 | ------------------------------------------------- 953 | // %md 954 | 955 | ### Have at least (n) connections in common 956 | 957 | ### Protoype using smaller/known data set 958 | ------------------------------------------------- 959 | 960 | 961 | 962 | 963 | ------------------------------------------------- 964 | // 7.1 965 | 966 | def l_user = "farrell" 967 | def l_conn = 2 968 | // 969 | g.V().has("v_user", "user_id", l_user). 970 | out("e_knows"). 971 | in("e_knows"). 972 | filter( 973 | has("user_id", neq(l_user)) 974 | ). 975 | dedup(). 976 | valueMap("user_id") 977 | 978 | // { "user_id": [ "hatcher" ] }, 979 | // { "user_id": [ "kiyu" ] } 980 | ------------------------------------------------- 981 | 982 | 983 | Above 984 | Total set 985 | VALUES ('farrell', 'kiyu' ); 986 | VALUES ('farrell', 'denise' ); 987 | VALUES ('farrell', 'morty' ); 988 | VALUES ('farrell', 'hatcher'); 989 | VALUES ('kiyu' , 'farrell'); 990 | VALUES ('denise' , 'farrell'); 991 | VALUES ('dave' , 'farrell'); 992 | VALUES ('kiyu' , 'denise' ); 993 | VALUES ('kiyu' , 'dave' ); 994 | VALUES ('hatcher', 'denise' ); 995 | VALUES ('kiyu' , 'hatcher'); 996 | VALUES ('denise' , 'dave' ); 997 | After 'has/out' 998 | VALUES ('farrell', 'kiyu' ); 999 | VALUES ('farrell', 'denise' ); 1000 | VALUES ('farrell', 'morty' ); 1001 | VALUES ('farrell', 'hatcher'); 1002 | After 'in' 1003 | VALUES ('farrell', 'kiyu' ); 1004 | VALUES ('farrell', 'denise' ); 1005 | VALUES ('farrell', 'morty' ); 1006 | VALUES ('farrell', 'hatcher'); 1007 | VALUES ('kiyu' , 'denise' ); 1008 | VALUES ('hatcher', 'denise' ); 1009 | VALUES ('kiyu' , 'hatcher'); 1010 | After 'filter' 1011 | VALUES ('kiyu' , 'denise' ); 1012 | VALUES ('hatcher', 'denise' ); 1013 | VALUES ('kiyu' , 'hatcher'); 1014 | After 'dedup' // Notice this is on keys .. 1015 | { "user_id": [ "hatcher" ] }, 1016 | { "user_id": [ "kiyu" ] } 1017 | 1018 | 1019 | ------------------------------------------------- 1020 | // 7.2 1021 | 1022 | def l_user = "farrell" 1023 | def l_conn = 2 1024 | // 1025 | g.withSack(1, sum). 1026 | V().has("v_user", "user_id", l_user). 1027 | out("e_knows"). 1028 | in("e_knows"). 1029 | filter( 1030 | sack().is(gte(l_conn)). 1031 | and(). 1032 | has("user_id", neq(l_user)) 1033 | ). 1034 | dedup(). 1035 | valueMap("user_id") 1036 | 1037 | // { "user_id": [ "kiyu" ] } 1038 | ------------------------------------------------- 1039 | 1040 | 1041 | Above 1042 | Total set 1043 | VALUES ('farrell', 'kiyu' ); 1044 | VALUES ('farrell', 'denise' ); 1045 | VALUES ('farrell', 'morty' ); 1046 | VALUES ('farrell', 'hatcher'); 1047 | VALUES ('kiyu' , 'farrell'); 1048 | VALUES ('denise' , 'farrell'); 1049 | VALUES ('dave' , 'farrell'); 1050 | VALUES ('kiyu' , 'denise' ); 1051 | VALUES ('kiyu' , 'dave' ); 1052 | VALUES ('hatcher', 'denise' ); 1053 | VALUES ('kiyu' , 'hatcher'); 1054 | VALUES ('denise' , 'dave' ); 1055 | After 'has/out' 1056 | VALUES ('farrell', 'kiyu' ); 1057 | VALUES ('farrell', 'denise' ); 1058 | VALUES ('farrell', 'morty' ); 1059 | VALUES ('farrell', 'hatcher'); 1060 | After 'in' 1061 | VALUES ('farrell', 'kiyu' ); 1062 | VALUES ('farrell', 'denise' ); 1063 | VALUES ('kiyu' , 'denise' ); 1064 | VALUES ('hatcher', 'denise' ); 1065 | VALUES ('farrell', 'morty' ); 1066 | VALUES ('farrell', 'hatcher'); 1067 | VALUES ('kiyu' , 'hatcher'); 1068 | After 'filter', just not farrell 1069 | VALUES ('kiyu' , 'denise' ); 1070 | VALUES ('hatcher', 'denise' ); 1071 | VALUES ('kiyu' , 'hatcher'); 1072 | After 'filter', 2 or more in common with farrell 1073 | VALUES ('kiyu' , 'denise' ); 1074 | VALUES ('kiyu' , 'hatcher'); 1075 | After 'dedup' // Notice this is on keys .. 1076 | // { "user_id": [ "kiyu" ] } 1077 | 1078 | 1079 | 1080 | 1081 | ------------------------------------------------- 1082 | // %md 1083 | 1084 | ### Back to KV 1085 | ------------------------------------------------- 1086 | 1087 | 1088 | 1089 | 1090 | ------------------------------------------------- 1091 | // 8.1 1092 | 1093 | def l_user = "u1" 1094 | def l_conn = 2 1095 | // 1096 | g.withSack(1, sum). 1097 | V().has("v_user2", "user_id", l_user). 1098 | out("e_knows2"). 1099 | both("e_knows2"). 1100 | filter( 1101 | sack().is(gte(l_conn)). 1102 | and(). 1103 | has("user_id", neq(l_user)) 1104 | ). 1105 | dedup(). 1106 | valueMap("user_id") 1107 | 1108 | // { "user_id": [ "u49" ] } 1109 | ------------------------------------------------- 1110 | 1111 | 1112 | ------------------------------------------------- 1113 | // 8.2 1114 | 1115 | def l_user = "u1" 1116 | // 1117 | g.V().has("v_user2", "user_id", l_user). 1118 | out("e_knows2"). 1119 | valueMap("user_id") 1120 | 1121 | { "user_id": [ "u151" ] }, 1122 | { "user_id": [ "u192" ] }, 1123 | { "user_id": [ "u74" ] }, // <-- 1124 | { "user_id": [ "u83" ] } // <-- 1125 | ------------------------------------------------- 1126 | 1127 | 1128 | ------------------------------------------------- 1129 | // 8.3 1130 | 1131 | def l_user = "u49" 1132 | // 1133 | g.V().has("v_user2", "user_id", l_user). 1134 | both("e_knows2"). 1135 | valueMap("user_id") 1136 | 1137 | { "user_id": [ "u132" ] }, 1138 | { "user_id": [ "u150" ] }, 1139 | { "user_id": [ "u165" ] }, 1140 | { "user_id": [ "u435" ] }, 1141 | { "user_id": [ "u74" ] }, // <-- 1142 | { "user_id": [ "u926" ] }, 1143 | { "user_id": [ "u116" ] }, 1144 | { "user_id": [ "u122" ] }, 1145 | { "user_id": [ "u135" ] }, 1146 | { "user_id": [ "u47" ] }, 1147 | { "user_id": [ "u54" ] }, 1148 | { "user_id": [ "u80" ] }, 1149 | { "user_id": [ "u83" ] } // <-- 1150 | ------------------------------------------------- 1151 | 1152 | 1153 | 1154 | 1155 | ------------------------------------------------- 1156 | // %md 1157 | 1158 | ### Staying in KV 1159 | 1160 | ### What likes do any two persons share (Do they like the same movies) 1161 | ------------------------------------------------- 1162 | 1163 | 1164 | 1165 | 1166 | ------------------------------------------------- 1167 | // 9.1 1168 | 1169 | def l_user = "u1" 1170 | def l_rate = 8 1171 | // 1172 | g.V().has("v_user2", "user_id", l_user). 1173 | outE("e_rated2"). 1174 | count() 1175 | 1176 | // 32 1177 | ------------------------------------------------- 1178 | 1179 | 1180 | // This traversal requires an index 1181 | ------------------------------------------------- 1182 | // 9.2 1183 | 1184 | def l_user = "u1" 1185 | def l_rate = 8 1186 | // 1187 | g.V().has("v_user2", "user_id", l_user). 1188 | outE("e_rated2"). 1189 | has("rating",gte(l_rate)). 1190 | count() 1191 | 1192 | // 15 1193 | ------------------------------------------------- 1194 | 1195 | 1196 | // Step 1: Users like the same movie 1197 | // User liked same move as target user 1198 | ------------------------------------------------- 1199 | // 9.3 1200 | 1201 | def l_user = "u1" 1202 | def l_rate = 8 1203 | // 1204 | g.V().has("v_user2", "user_id", l_user). 1205 | outE("e_rated2"). 1206 | has("rating", gte(l_rate)). 1207 | inV(). 1208 | inE("e_rated2"). 1209 | has("rating", gte(l_rate)). 1210 | count() 1211 | 1212 | // 962 1213 | ------------------------------------------------- 1214 | 1215 | 1216 | // Step 2: Same as above, data view 1217 | ------------------------------------------------- 1218 | // 9.4 1219 | 1220 | def l_user = "u1" 1221 | def l_rate = 8 1222 | // 1223 | g.V().has("v_user2", "user_id", l_user). 1224 | outE("e_rated2"). 1225 | has("rating", gte(l_rate)). 1226 | as("a"). 1227 | inV(). 1228 | inE("e_rated2"). 1229 | has("rating", gte(l_rate)). 1230 | as("b"). 1231 | limit(5). 1232 | select("a", "b"). 1233 | by(valueMap(true)). 1234 | by(valueMap(true)) 1235 | 1236 | // { 1237 | // "a": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }, 1238 | // "b": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" } 1239 | // }, 1240 | // { 1241 | // "a": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }, 1242 | // "b": { "id": "v_user2:u118#25->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "9" } 1243 | // }, 1244 | // { 1245 | // "a": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }, 1246 | // "b": { "id": "v_user2:u119#24->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "8" } 1247 | // }, 1248 | // { 1249 | // "a": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }, 1250 | // "b": { "id": "v_user2:u125#15->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "9" } 1251 | // }, 1252 | // { 1253 | // "a": { "id": "v_user2:u1#16->e_rated2#10 ->v_movie2:m567#00", "label": "e_rated2", "rating": "10" }, 1254 | // "b": { "id": "v_user2:u143#15->e_rated2#10->v_movie2:m567#00", "label": "e_rated2", "rating": "8" } 1255 | // } 1256 | // ... 1257 | ------------------------------------------------- 1258 | 1259 | 1260 | // Step 3: Same as 9.3, not the target user, remove 1261 | // duplicate user ids, count 1262 | ------------------------------------------------- 1263 | // 9.5 1264 | 1265 | def l_user = "u1" 1266 | def l_rate = 8 1267 | // 1268 | g.V().has("v_user2", "user_id", l_user). 1269 | outE("e_rated2"). 1270 | has("rating", gte(l_rate)). 1271 | inV(). 1272 | inE("e_rated2"). 1273 | has("rating", gte(l_rate)). 1274 | // 1275 | outV(). 1276 | has("user_id", neq(l_user)). 1277 | dedup(). 1278 | count() 1279 | 1280 | // 599 1281 | ------------------------------------------------- 1282 | 1283 | 1284 | // Step 4: Same as above, data view 1285 | ------------------------------------------------- 1286 | // 9.6 1287 | 1288 | def l_user = "u1" 1289 | def l_rate = 8 1290 | // 1291 | g.V().has("v_user2", "user_id", l_user). 1292 | outE("e_rated2"). 1293 | has("rating", gte(l_rate)). 1294 | inV(). 1295 | inE("e_rated2"). 1296 | has("rating", gte(l_rate)). 1297 | // 1298 | outV(). 1299 | has("user_id", neq(l_user)). 1300 | dedup(). 1301 | limit(5). 1302 | valueMap(true) 1303 | 1304 | // { "id": "v_user2:u1027#65", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u1027" ], "age": [ "22" ] }, 1305 | // { "id": "v_user2:u106#14" , "label": "v_user2", "gender": [ "F" ], "user_id": [ "u106" ], "age": [ "14" ] }, 1306 | // { "id": "v_user2:u168#22" , "label": "v_user2", "gender": [ "M" ], "user_id": [ "u168" ], "age": [ "15" ] }, 1307 | // { "id": "v_user2:u19#77" , "label": "v_user2", "gender": [ "F" ], "user_id": [ "u19" ], "age": [ "13" ] }, 1308 | // { "id": "v_user2:u2#19" , "label": "v_user2", "gender": [ "M" ], "user_id": [ "u2" ], "age": [ "12" ] } 1309 | // ... 1310 | ------------------------------------------------- 1311 | 1312 | 1313 | // Step 5: Other user likes the same movie as 1314 | // target user (n) or more times 1315 | ------------------------------------------------- 1316 | // 9.7 1317 | 1318 | def l_user = "u1" 1319 | def l_rate = 8 1320 | def l_count = 5 1321 | // 1322 | g.withSack(1, sum). 1323 | V().has("v_user2", "user_id", l_user). 1324 | outE("e_rated2"). 1325 | has("rating", gte(l_rate)). 1326 | inV(). 1327 | inE("e_rated2"). 1328 | has("rating", gte(l_rate)). 1329 | // 1330 | outV(). 1331 | filter( 1332 | sack().is(gte(l_count)). 1333 | and(). 1334 | has("user_id", neq(l_user)) 1335 | ). 1336 | dedup(). 1337 | valueMap(true) 1338 | 1339 | // { "id": "v_user2:u156#69", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u156" ], "age": [ "15" ] }, 1340 | // { "id": "v_user2:u119#78", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u119" ], "age": [ "17" ] }, 1341 | // { "id": "v_user2:u13#17", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u13" ], "age": [ "14" ] }, 1342 | // { "id": "v_user2:u143#65", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u143" ], "age": [ "17" ] }, 1343 | // { "id": "v_user2:u4#67", "label": "v_user2", "gender": [ "F" ], "user_id": [ "u4" ], "age": [ "16" ] }, 1344 | // { "id": "v_user2:u85#22", "label": "v_user2", "gender": [ "M" ], "user_id": [ "u85" ], "age": [ "13" ] } 1345 | ------------------------------------------------- 1346 | 1347 | 1348 | 1349 | 1350 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravSideEffect.html 1351 | https://docs.datastax.com/en/dse/6.7/dse-dev/datastax_enterprise/graph/reference/traversal/refTravAggregate.html 1352 | ------------------------------------------------- 1353 | // %md 1354 | 1355 | ### Staying in KV 1356 | 1357 | ### sideEffect(), and where(without()) 1358 | ------------------------------------------------- 1359 | 1360 | 1361 | ------------------------------------------------- 1362 | // 10.1 1363 | 1364 | def l_user = "u1" 1365 | def l_rate = 8 1366 | // 1367 | g.V().has("v_user2", "user_id", l_user). 1368 | outE("e_rated2"). 1369 | count() 1370 | 1371 | // 32 1372 | ------------------------------------------------- 1373 | 1374 | 1375 | ------------------------------------------------- 1376 | // 10.2 1377 | 1378 | def l_user = "u1" 1379 | def l_rate = 8 1380 | // 1381 | g.V().has("v_user2", "user_id", l_user). 1382 | outE("e_rated2"). 1383 | has("rating",lt(l_rate)). 1384 | count() 1385 | 1386 | // 17 1387 | ------------------------------------------------- 1388 | 1389 | 1390 | ------------------------------------------------- 1391 | // 10.3 1392 | 1393 | def l_user = "u1" 1394 | def l_rate = 8 1395 | // 1396 | g.V().has("v_user2", "user_id", l_user). 1397 | outE("e_rated2"). 1398 | has("rating",gte(l_rate)). 1399 | count() 1400 | 1401 | // 15 1402 | ------------------------------------------------- 1403 | 1404 | 1405 | ------------------------------------------------- 1406 | // 10.4 1407 | 1408 | def l_user = "u1" 1409 | def l_rate = 8 1410 | // 1411 | g.V().has("v_user2", "user_id", l_user). 1412 | sideEffect( 1413 | outE("e_rated2"). 1414 | has("rating",lt(l_rate)). 1415 | aggregate("a") 1416 | ). 1417 | outE("e_rated2"). 1418 | count() 1419 | 1420 | // 32 1421 | ------------------------------------------------- 1422 | 1423 | 1424 | ------------------------------------------------- 1425 | // 10.5 1426 | 1427 | def l_user = "u1" 1428 | def l_rate = 8 1429 | // 1430 | g.V().has("v_user2", "user_id", l_user). 1431 | sideEffect( 1432 | outE("e_rated2"). 1433 | has("rating",lt(l_rate)). 1434 | aggregate("a") 1435 | ). 1436 | outE("e_rated2"). 1437 | where(without("a")). 1438 | count() 1439 | 1440 | // 15 1441 | ------------------------------------------------- 1442 | 1443 | 1444 | ------------------------------------------------- 1445 | // 10.6 1446 | 1447 | def l_user = "u1" 1448 | def l_rate = 8 1449 | // 1450 | g.V().has("v_user2", "user_id", l_user). 1451 | sideEffect( 1452 | outE("e_rated2"). 1453 | has("rating",lt(l_rate)). 1454 | aggregate("a") 1455 | ). 1456 | outE("e_rated2"). 1457 | where(without("a")). 1458 | valueMap(true) 1459 | 1460 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m102#07", "label": "e_rated2", "rating": "8" }, 1461 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m111#05", "label": "e_rated2", "rating": "10" }, 1462 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m160#03", "label": "e_rated2", "rating": "8" }, 1463 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m223#07", "label": "e_rated2", "rating": "8" }, 1464 | // { "id": "v_user2:u1#16->e_rated2#10->v_movie2:m28#62", "label": "e_rated2", "rating": "9" }, 1465 | // ... 1466 | ------------------------------------------------- 1467 | 1468 | 1469 | 1470 | 1471 | ------------------------------------------------- 1472 | // %md 1473 | 1474 | ### Final assembly 1475 | ------------------------------------------------- 1476 | 1477 | 1478 | ------------------------------------------------- 1479 | // 11.1 1480 | 1481 | def l_user = "u1" 1482 | def l_conn = 2 1483 | def l_rate = 8 1484 | 1485 | g.withSack(1,sum). 1486 | V(). 1487 | has("v_user2", "user_id", l_user). 1488 | sideEffect( 1489 | out("e_rated2"). 1490 | aggregate("se_movieSeenByUser") 1491 | ). 1492 | both("e_knows2"). 1493 | filter( 1494 | outE("e_rated2"). 1495 | has("rating",gte(l_rate)). 1496 | inV(). 1497 | inE("e_rated2"). 1498 | has("rating",gte(l_rate)). 1499 | outV(). 1500 | filter( 1501 | has("user_id", l_user). 1502 | and(). 1503 | sack().is(gte(l_conn))) 1504 | ). 1505 | outE("e_rated2"). 1506 | has("rating", gte(l_rate)). 1507 | inV(). 1508 | dedup(). 1509 | where( 1510 | without("se_movieSeenByUser") 1511 | ). 1512 | order(). 1513 | by( 1514 | inE("e_rated2"). 1515 | values("rating"). 1516 | mean(), 1517 | decr). 1518 | limit(5). 1519 | valueMap("movie_id", "title", "year") 1520 | 1521 | // { "movie_id": [ "m278" ], "title": [ "The Simpsons (TV Series)" ], "year": [ "1989" ] }, 1522 | // { "movie_id": [ "m274" ], "title": [ "One Flew Over the Cuckoos Nest" ], "year": [ "1975" ] }, 1523 | // { "movie_id": [ "m13" ], "title": [ "The Pianist" ], "year": [ "2002" ] }, 1524 | // { "movie_id": [ "m163" ], "title": [ "The Good', ' the Bad and the Ugly" ], "year": [ "1966" ] }, 1525 | // { "movie_id": [ "m351" ], "title": [ "Forrest Gump" ], "year": [ "1994" ] } 1526 | ------------------------------------------------- 1527 | 1528 | 1529 | 1530 | 1531 | ------------------------------------------------- 1532 | ------------------------------------------------- 1533 | ------------------------------------------------- 1534 | ------------------------------------------------- 1535 | 1536 | 1537 | // On the subtopic of 'normalization' .. 1538 | 1539 | // Below requires an index 1540 | ------------------------------------------------- 1541 | // 12.1 1542 | 1543 | def l_movie = "m267" 1544 | def l_score = 7 1545 | 1546 | g.V().has("v_movie2", "movie_id", l_movie). 1547 | out("e_belongs_to2"). 1548 | in("e_belongs_to2"). 1549 | dedup(). 1550 | count() 1551 | 1552 | // 317 1553 | ------------------------------------------------- 1554 | 1555 | 1556 | // Below is not done, no data returned 1557 | ------------------------------------------------- 1558 | // 12.2 1559 | 1560 | def l_movie = "m267" 1561 | def l_score = 70 1562 | 1563 | g.V().has("v_movie2", "movie_id", l_movie). 1564 | as("y1", "r1"). 1565 | out("e_belongs_to2"). 1566 | in("e_belongs_to2"). 1567 | dedup(). 1568 | has("movie_id", neq(l_movie)). 1569 | as("y2", "r2"). 1570 | filter( 1571 | math( "sqrt ( (y1 - y2)^2 + (r1 - r2)^2 )" ). 1572 | by(values("year")). 1573 | by(values("year")). 1574 | by(inE("e_rated2").values("rating").mean().math( "ceil (_ * 10)" )). 1575 | by(inE("e_rated2").values("rating").mean().math( "ceil (_ * 10)" )) // . 1576 | // is(lte(l_score)) 1577 | ) 1578 | ------------------------------------------------- 1579 | 1580 | 1581 | 1582 | 1583 | ------------------------------------------------- 1584 | ------------------------------------------------- 1585 | ------------------------------------------------- 1586 | ------------------------------------------------- 1587 | 1588 | 1589 | From Artem 1590 | 1591 | 1592 | def similarityM3(user, minPositiveRating) { 1593 | g.V().has("user","userId",user). 1594 | outE("rated"). 1595 | has("rating",gte(minPositiveRating)). 1596 | inV(). 1597 | inE("rated"). 1598 | has("rating",gte(minPositiveRating)). 1599 | outV(). 1600 | dedup(). 1601 | has("userId",neq(user)). 1602 | toSet() 1603 | } 1604 | 1605 | def similarityM5(user, minPositiveRating, commonMovies) { 1606 | g.withSack(1,sum). 1607 | V(). 1608 | has("user","userId",user). 1609 | both("knows"). 1610 | filter( 1611 | outE("rated"). 1612 | has("rating",gte(minPositiveRating)). 1613 | inV(). 1614 | inE("rated"). 1615 | has("rating",gte(minPositiveRating)). 1616 | outV(). 1617 | filter( 1618 | has("userId",user). 1619 | and(). 1620 | sack().is(gte(commonMovies)) 1621 | ) 1622 | ). 1623 | toSet() 1624 | } 1625 | 1626 | def watchedMovies(user) { 1627 | g.V(). 1628 | has("user","userId",user). 1629 | out("rated"). 1630 | toSet() 1631 | } 1632 | 1633 | def recommendM3(user, minPositiveRating) { 1634 | g.V(similarityM3(user,minPositiveRating)). 1635 | outE("rated").has("rating",gte(minPositiveRating)). 1636 | inV(). 1637 | dedup(). 1638 | is(without(watchedMovies(user))). 1639 | toSet() 1640 | } 1641 | 1642 | def recommendM5(user, minPositiveRating, commonMovies) { 1643 | g.V(similarityM5(user,minPositiveRating,commonMovies)). 1644 | outE("rated").has("rating",gte(minPositiveRating)). 1645 | inV(). 1646 | dedup(). 1647 | is(without(watchedMovies(user))). 1648 | toSet() 1649 | } 1650 | 1651 | 1652 | 1653 | MMMM 1654 | 1655 | 1656 | def m3 = recommendM3("u1", 8) 1657 | 1658 | def m5 = recommendM5("u1", 8, 2) 1659 | 1660 | m3.intersect(m5) 1661 | 1662 | 1663 | MMM 1664 | 1665 | 1666 | def user = "u1" 1667 | def minPositiveRating = 8 1668 | def commonMovies = 2 1669 | 1670 | g.withSack(1,sum).V().has("user","userId",user). 1671 | sideEffect(out("rated").aggregate("watchedMovies")). 1672 | both("knows"). 1673 | filter( 1674 | outE("rated").has("rating",gte(minPositiveRating)).inV(). 1675 | inE("rated").has("rating",gte(minPositiveRating)).outV(). 1676 | filter(has("userId",user).and(). 1677 | sack().is(gte(commonMovies))) 1678 | ). 1679 | outE("rated").has("rating",gte(minPositiveRating)). 1680 | inV(). 1681 | dedup(). 1682 | where(without("watchedMovies")) 1683 | 1684 | 1685 | MMM 1686 | 1687 | 1688 | def user = "u1" 1689 | def minPositiveRating = 8 1690 | def commonMovies = 2 1691 | // Number of final recommendations 1692 | def numRecommendations = 10 1693 | 1694 | def rs1 = 1695 | g.withSack(1,sum).V().has("user","userId",user). 1696 | sideEffect(out("rated").aggregate("watchedMovies")). 1697 | both("knows"). 1698 | filter( 1699 | outE("rated").has("rating",gte(minPositiveRating)).inV(). 1700 | inE("rated").has("rating",gte(minPositiveRating)).outV(). 1701 | filter(has("userId",user).and(). 1702 | sack().is(gte(commonMovies))) 1703 | ). 1704 | outE("rated").has("rating",gte(minPositiveRating)). 1705 | inV(). 1706 | dedup(). 1707 | where(without("watchedMovies")). 1708 | // Ordering by a movie average rating 1709 | order().by(inE("rated").values("rating").mean(),decr). 1710 | limit(numRecommendations). 1711 | toList() 1712 | 1713 | 1714 | MMM 1715 | 1716 | 1717 | -------------------------------------------------------------------------------- /2019/DDN_2019_31f_KillrVideoDDL.cql: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | // ------------------------------------------------- 6 | 7 | 8 | DROP KEYSPACE IF EXISTS ks_ngkv; 9 | 10 | CREATE KEYSPACE ks_ngkv 11 | WITH replication = {'class': 'SimpleStrategy', 12 | 'replication_factor': 1} 13 | AND graph_engine = 'Native'; 14 | 15 | USE ks_ngkv; 16 | 17 | 18 | // ------------------------------------------------- 19 | 20 | 21 | CREATE TABLE user2 22 | ( 23 | user_id TEXT, 24 | gender TEXT, 25 | age INT, 26 | PRIMARY KEY((user_id)) 27 | ) 28 | WITH VERTEX LABEL v_user2 29 | ; 30 | 31 | CREATE TABLE genre2 32 | ( 33 | genre_id TEXT, 34 | name TEXT, 35 | PRIMARY KEY((genre_id)) 36 | ) 37 | WITH VERTEX LABEL v_genre2 38 | ; 39 | 40 | CREATE TABLE person2 41 | ( 42 | person_id TEXT, 43 | name TEXT, 44 | PRIMARY KEY((person_id)) 45 | ) 46 | WITH VERTEX LABEL v_person2 47 | ; 48 | 49 | CREATE TABLE movie2 50 | ( 51 | movie_id TEXT, 52 | duration INT, 53 | country TEXT, 54 | year INT, 55 | production LIST, 56 | title TEXT, 57 | PRIMARY KEY((movie_id)) 58 | ) 59 | WITH VERTEX LABEL v_movie2 60 | ; 61 | 62 | 63 | // ------------------------------------------------- 64 | 65 | 66 | CREATE TABLE screenwriter2 67 | ( 68 | movie_id TEXT, 69 | person_id TEXT, 70 | PRIMARY KEY((movie_id), person_id) 71 | ) 72 | WITH EDGE LABEL e_screenwriter2 73 | FROM v_movie2(movie_id) 74 | TO v_person2(person_id); 75 | CREATE MATERIALIZED VIEW e_screenwriter_bi2 76 | AS SELECT movie_id, person_id 77 | FROM screenwriter2 78 | WHERE 79 | movie_id IS NOT NULL 80 | AND 81 | person_id IS NOT NULL 82 | PRIMARY KEY ((person_id), movie_id); 83 | 84 | CREATE TABLE cinematographer2 85 | ( 86 | movie_id TEXT, 87 | person_id TEXT, 88 | PRIMARY KEY((movie_id), person_id) 89 | ) 90 | WITH EDGE LABEL e_cinematographer2 91 | FROM v_movie2(movie_id) 92 | TO v_person2(person_id); 93 | CREATE MATERIALIZED VIEW e_cinematographer_bi2 94 | AS SELECT movie_id, person_id 95 | FROM cinematographer2 96 | WHERE 97 | movie_id IS NOT NULL 98 | AND 99 | person_id IS NOT NULL 100 | PRIMARY KEY ((person_id), movie_id); 101 | 102 | CREATE TABLE actor2 103 | ( 104 | movie_id TEXT, 105 | person_id TEXT, 106 | id TEXT, 107 | PRIMARY KEY((movie_id), person_id, id) 108 | ) 109 | WITH EDGE LABEL e_actor2 110 | FROM v_movie2(movie_id) 111 | TO v_person2(person_id); 112 | CREATE MATERIALIZED VIEW e_actor_bi2 113 | AS SELECT movie_id, person_id 114 | FROM actor2 115 | WHERE 116 | movie_id IS NOT NULL 117 | AND 118 | person_id IS NOT NULL 119 | AND 120 | id IS NOT NULL 121 | PRIMARY KEY ((person_id), movie_id, id); 122 | 123 | CREATE TABLE director2 124 | ( 125 | movie_id TEXT, 126 | person_id TEXT, 127 | PRIMARY KEY((movie_id), person_id) 128 | ) 129 | WITH EDGE LABEL e_director2 130 | FROM v_movie2(movie_id) 131 | TO v_person2(person_id); 132 | CREATE MATERIALIZED VIEW e_director_bi2 133 | AS SELECT movie_id, person_id 134 | FROM director2 135 | WHERE 136 | movie_id IS NOT NULL 137 | AND 138 | person_id IS NOT NULL 139 | PRIMARY KEY ((person_id), movie_id); 140 | 141 | CREATE TABLE composer2 142 | ( 143 | movie_id TEXT, 144 | person_id TEXT, 145 | PRIMARY KEY((movie_id), person_id) 146 | ) 147 | WITH EDGE LABEL e_composer2 148 | FROM v_movie2(movie_id) 149 | TO v_person2(person_id); 150 | CREATE MATERIALIZED VIEW e_composer_bi2 151 | AS SELECT movie_id, person_id 152 | FROM composer2 153 | WHERE 154 | movie_id IS NOT NULL 155 | AND 156 | person_id IS NOT NULL 157 | PRIMARY KEY ((person_id), movie_id); 158 | 159 | 160 | CREATE TABLE knows2 161 | ( 162 | user_id_s TEXT, 163 | user_id_d TEXT, 164 | PRIMARY KEY((user_id_s), user_id_d) 165 | ) 166 | WITH EDGE LABEL e_knows2 167 | FROM v_user2(user_id_s) 168 | TO v_user2(user_id_d); 169 | CREATE MATERIALIZED VIEW e_knows_bi2 170 | AS SELECT user_id_s, user_id_d 171 | FROM knows2 172 | WHERE 173 | user_id_s IS NOT NULL 174 | AND 175 | user_id_d IS NOT NULL 176 | PRIMARY KEY ((user_id_d), user_id_s); 177 | 178 | CREATE TABLE rated2 179 | ( 180 | user_id TEXT, 181 | movie_id TEXT, 182 | rating INT, 183 | PRIMARY KEY((user_id), movie_id) 184 | ) 185 | WITH EDGE LABEL e_rated2 186 | FROM v_user2(user_id) 187 | TO v_movie2(movie_id); 188 | CREATE MATERIALIZED VIEW e_rated_bi2 189 | AS SELECT user_id, movie_id 190 | FROM rated2 191 | WHERE 192 | user_id IS NOT NULL 193 | AND 194 | movie_id IS NOT NULL 195 | PRIMARY KEY ((user_id), movie_id); 196 | 197 | CREATE TABLE belongs_to2 198 | ( 199 | movie_id TEXT, 200 | genre_id TEXT, 201 | PRIMARY KEY((movie_id), genre_id) 202 | ) 203 | WITH EDGE LABEL e_belongs_to2 204 | FROM v_movie2(movie_id) 205 | TO v_genre2(genre_id); 206 | CREATE MATERIALIZED VIEW e_belongs_to_bi2 207 | AS SELECT movie_id, genre_id 208 | FROM belongs_to2 209 | WHERE 210 | movie_id IS NOT NULL 211 | AND 212 | genre_id IS NOT NULL 213 | PRIMARY KEY ((movie_id), genre_id); 214 | 215 | 216 | // ------------------------------------------------- 217 | 218 | 219 | CREATE SEARCH INDEX ON ks_ngkv.rated2 220 | WITH COLUMNS rating { docValues : true }; 221 | 222 | CREATE SEARCH INDEX ON ks_ngkv.belongs_to2 223 | WITH COLUMNS genre_id { docValues : true }; 224 | 225 | 226 | // I didn't do all of the secondary indexes yet- 227 | // 228 | // Below is the index DDL from Artem's KillrVideo 229 | 230 | 231 | // Define vertex indexes 232 | 233 | // schema.vertexLabel("genre").index("genresByName").materialized().by("name").add() 234 | // schema.vertexLabel("person").index("personsByName").materialized().by("name").add() 235 | // schema.vertexLabel("user").index("usersByAge").secondary().by("age").add() 236 | 237 | // schema.vertexLabel("movie").index("moviesByTitle").materialized().by("title").add() 238 | // schema.vertexLabel("movie").index("search").search().by("title").asText().add() 239 | // schema.vertexLabel("movie").index("moviesByYear").secondary().by("year").add() 240 | 241 | // Define edge indexes 242 | 243 | // schema.vertexLabel("user").index("toMoviesByRating").outE("rated").by("rating").add() 244 | // schema.vertexLabel("movie").index("toUsersByRating").inE("rated").by("rating").add() 245 | 246 | 247 | // ------------------------------------------------- 248 | 249 | 250 | COPY user2 ( 251 | user_id, 252 | gender, 253 | age 254 | ) 255 | FROM '20 KV data as pipe/10 user.pipe' 256 | WITH 257 | DELIMITER = '|' 258 | AND 259 | HEADER = TRUE 260 | ; 261 | 262 | COPY genre2 ( 263 | genre_id, 264 | name 265 | ) 266 | FROM '20 KV data as pipe/11 genre.pipe' 267 | WITH 268 | DELIMITER = '|' 269 | AND 270 | HEADER = TRUE 271 | ; 272 | 273 | COPY person2 ( 274 | person_id, 275 | name 276 | ) 277 | FROM '20 KV data as pipe/12 person.pipe' 278 | WITH 279 | DELIMITER = '|' 280 | AND 281 | HEADER = TRUE 282 | ; 283 | 284 | COPY movie2 ( 285 | movie_id, 286 | duration, 287 | country, 288 | year, 289 | production, 290 | title 291 | ) 292 | FROM '20 KV data as pipe/13 movie.pipe' 293 | WITH 294 | DELIMITER = '|' 295 | AND 296 | HEADER = TRUE 297 | ; 298 | 299 | 300 | // ------------------------------------------------- 301 | 302 | 303 | COPY screenwriter2 ( 304 | movie_id, 305 | person_id 306 | ) 307 | FROM '20 KV data as pipe/20 screenwriter.pipe' 308 | WITH 309 | DELIMITER = '|' 310 | AND 311 | HEADER = TRUE 312 | ; 313 | 314 | COPY cinematographer2 ( 315 | movie_id, 316 | person_id 317 | ) 318 | FROM '20 KV data as pipe/21 cinematographer.pipe' 319 | WITH 320 | DELIMITER = '|' 321 | AND 322 | HEADER = TRUE 323 | ; 324 | 325 | COPY actor2 ( 326 | movie_id, 327 | person_id, 328 | id 329 | ) 330 | FROM '20 KV data as pipe/22 actor.pipe' 331 | WITH 332 | DELIMITER = '|' 333 | AND 334 | HEADER = TRUE 335 | ; 336 | 337 | COPY director2 ( 338 | movie_id, 339 | person_id 340 | ) 341 | FROM '20 KV data as pipe/23 director.pipe' 342 | WITH 343 | DELIMITER = '|' 344 | AND 345 | HEADER = TRUE 346 | ; 347 | 348 | COPY composer2 ( 349 | movie_id, 350 | person_id 351 | ) 352 | FROM '20 KV data as pipe/24 composer.pipe' 353 | WITH 354 | DELIMITER = '|' 355 | AND 356 | HEADER = TRUE 357 | ; 358 | 359 | 360 | // ------------------------------------------------- 361 | 362 | 363 | COPY knows2 ( 364 | user_id_s, 365 | user_id_d 366 | ) 367 | FROM '20 KV data as pipe/30 knows.pipe' 368 | WITH 369 | DELIMITER = '|' 370 | AND 371 | HEADER = TRUE 372 | ; 373 | 374 | COPY rated2 ( 375 | user_id, 376 | movie_id, 377 | rating 378 | ) 379 | FROM '20 KV data as pipe/31 rated.pipe' 380 | WITH 381 | DELIMITER = '|' 382 | AND 383 | HEADER = TRUE 384 | ; 385 | 386 | COPY belongs_to2 ( 387 | movie_id, 388 | genre_id 389 | ) 390 | FROM '20 KV data as pipe/32 belongs_to.pipe' 391 | WITH 392 | DELIMITER = '|' 393 | AND 394 | HEADER = TRUE 395 | ; 396 | 397 | 398 | 399 | 400 | 401 | -------------------------------------------------------------------------------- /2019/DDN_2019_32_Python68Client.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_32_Python68Client.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_32_Python68Client.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | # 4 | # Imports 5 | # 6 | 7 | from dse.cluster import Cluster, GraphExecutionProfile 8 | from dse.cluster import EXEC_PROFILE_GRAPH_DEFAULT 9 | from dse.cluster import EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT 10 | # 11 | from dse.graph import GraphOptions, GraphProtocol, graph_graphson3_row_factory 12 | 13 | 14 | ############################################################ 15 | ############################################################ 16 | 17 | 18 | # 19 | # Assuming a DSE at localhost with DSE Search and Graph 20 | # are turned on 21 | # 22 | l_graphName = 'ks_32' 23 | 24 | 25 | l_execProfile1 = GraphExecutionProfile( # OLTP 26 | graph_options=GraphOptions(graph_name=l_graphName, 27 | graph_protocol=GraphProtocol.GRAPHSON_3_0), 28 | row_factory=graph_graphson3_row_factory 29 | ) 30 | l_execProfile2 = GraphExecutionProfile( # OLAP 31 | graph_options=GraphOptions(graph_name=l_graphName, 32 | graph_source='a', 33 | graph_protocol=GraphProtocol.GRAPHSON_3_0), 34 | row_factory=graph_graphson3_row_factory 35 | ) 36 | 37 | l_cluster = Cluster(execution_profiles = { 38 | EXEC_PROFILE_GRAPH_DEFAULT: l_execProfile1, 39 | EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT: l_execProfile2}, 40 | contact_points=['127.0.0.1'] 41 | ) 42 | m_session = l_cluster.connect() 43 | 44 | 45 | ############################################################ 46 | ############################################################ 47 | 48 | 49 | # 50 | # Create the DSE server side objects; keyspace, table, 51 | # indexes, .. 52 | # 53 | 54 | def init_db1(): 55 | global m_session 56 | 57 | 58 | print "" 59 | print "" 60 | 61 | l_stmt = \ 62 | "DROP KEYSPACE IF EXISTS ks_32; " 63 | # 64 | l_rows = m_session.execute(l_stmt) 65 | # 66 | print "" 67 | l_stmt2 = ' '.join(l_stmt.split()) 68 | print "Drop Keyspace: " + l_stmt2 69 | 70 | 71 | l_stmt = \ 72 | "CREATE KEYSPACE ks_32 " + \ 73 | " WITH replication = {'class': 'SimpleStrategy', " + \ 74 | " 'replication_factor': 1} " + \ 75 | " AND graph_engine = 'Core'; " + \ 76 | " " 77 | # 78 | l_rows = m_session.execute(l_stmt) 79 | # 80 | print "" 81 | l_stmt2 = ' '.join(l_stmt.split()) 82 | print "Create Keyspace: " + l_stmt2 83 | 84 | 85 | l_stmt = \ 86 | "CREATE TABLE ks_32.grid_square " + \ 87 | " ( " + \ 88 | " square_id TEXT, " + \ 89 | " PRIMARY KEY((square_id)) " + \ 90 | " ) " + \ 91 | "WITH VERTEX LABEL grid_square " + \ 92 | "; " 93 | # 94 | l_rows = m_session.execute(l_stmt) 95 | # 96 | print "" 97 | l_stmt2 = ' '.join(l_stmt.split()) 98 | print "Create Vertex: " + l_stmt2 99 | 100 | 101 | l_stmt = \ 102 | "CREATE SEARCH INDEX ON ks_32.grid_square " + \ 103 | " WITH COLUMNS square_id { docValues : true }; " 104 | # 105 | l_rows = m_session.execute(l_stmt) 106 | # 107 | print "" 108 | l_stmt2 = ' '.join(l_stmt.split()) 109 | print "Create Search Index: " + l_stmt2 110 | 111 | 112 | l_stmt = \ 113 | "CREATE TABLE ks_32.connects_to " + \ 114 | " ( " + \ 115 | " square_id_src TEXT, " + \ 116 | " square_id_dst TEXT, " + \ 117 | " PRIMARY KEY((square_id_src), square_id_dst) " + \ 118 | " ) " + \ 119 | "WITH EDGE LABEL connects_to " + \ 120 | " FROM grid_square(square_id_src) " + \ 121 | " TO grid_square(square_id_dst); " 122 | # 123 | l_rows = m_session.execute(l_stmt) 124 | # 125 | print "" 126 | l_stmt2 = ' '.join(l_stmt.split()) 127 | print "Create Edge: " + l_stmt2 128 | 129 | 130 | l_stmt = \ 131 | "CREATE MATERIALIZED VIEW ks_32.connects_to_bi " + \ 132 | " AS SELECT square_id_src, square_id_dst " + \ 133 | " FROM connects_to " + \ 134 | " WHERE " + \ 135 | " square_id_src IS NOT NULL " + \ 136 | " AND " + \ 137 | " square_id_dst IS NOT NULL " + \ 138 | " PRIMARY KEY ((square_id_dst), square_id_src); " 139 | # 140 | l_rows = m_session.execute(l_stmt) 141 | # 142 | print "" 143 | l_stmt2 = ' '.join(l_stmt.split()) 144 | print "Create Bi-direction to Edge: " + l_stmt2 145 | 146 | 147 | ############################################################ 148 | ############################################################ 149 | 150 | 151 | # 152 | # Load the Vertex and Edge with starter data 153 | # 154 | 155 | def init_db2(): 156 | global m_session 157 | global m_ins2 158 | 159 | 160 | l_squares = [ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] 161 | 162 | 163 | # 164 | # Insert data into the vertex; ks_32.grid_squares 165 | # 166 | l_ins1 = m_session.prepare( 167 | "INSERT INTO ks_32.grid_square (square_id) VALUES ( ? )" 168 | ) 169 | for r in l_squares: 170 | for c in l_squares: 171 | l_data = "x" + str(r) + "-" + str(c) 172 | l_rows = m_session.execute(l_ins1, [ l_data ]) 173 | 174 | 175 | # 176 | # Insert data into the edge; ks_32.connects_to 177 | # 178 | for r in l_squares: 179 | for c in l_squares: 180 | # 181 | # Within 1 row; column current <---> column right 182 | # 183 | if (c < 21): 184 | l_left_ = "x" + str(r) + "-" + str(c ) 185 | l_right = "x" + str(r) + "-" + str(c + 2) 186 | # 187 | l_rows = m_session.execute(m_ins2, [ l_left_, l_right ] ) 188 | l_rows = m_session.execute(m_ins2, [ l_right, l_left_ ] ) 189 | 190 | # 191 | # Within 1 column; row current to row below 192 | # 193 | if (r < 21): 194 | l_above = "x" + str(r ) + "-" + str(c) 195 | l_below = "x" + str(r + 2) + "-" + str(c) 196 | # 197 | l_rows = m_session.execute(m_ins2, [ l_above, l_below ] ) 198 | l_rows = m_session.execute(m_ins2, [ l_below, l_above ] ) 199 | 200 | # 201 | # From the loops above, we end up missing the bottom, 202 | # right-most square. Do manually .. 203 | # 204 | l_rows = m_session.execute(m_ins2, [ "x19-21", "x21-21" ] ) 205 | l_rows = m_session.execute(m_ins2, [ "x21-21", "x19-21" ] ) 206 | l_rows = m_session.execute(m_ins2, [ "x21-19", "x21-21" ] ) 207 | l_rows = m_session.execute(m_ins2, [ "x21-21", "x21-19" ] ) 208 | 209 | 210 | ############################################################ 211 | ############################################################ 212 | 213 | 214 | def run_traversals(): 215 | 216 | l_stmt = "g.V().hasLabel('grid_square')" 217 | 218 | # OLTP 219 | for l_elem in m_session.execute_graph(l_stmt, execution_profile=EXEC_PROFILE_GRAPH_DEFAULT ): 220 | print l_elem 221 | 222 | print "" 223 | print "MMM" 224 | print "" 225 | 226 | # OLAP 227 | for l_elem in m_session.execute_graph(l_stmt, execution_profile=EXEC_PROFILE_GRAPH_ANALYTICS_DEFAULT ): 228 | print l_elem 229 | 230 | 231 | ############################################################ 232 | ############################################################ 233 | 234 | 235 | # 236 | # Our program proper 237 | # 238 | 239 | if __name__=='__main__': 240 | 241 | init_db1() 242 | 243 | # 244 | # Our prepared INSERT(s) and DELETE(s) 245 | # 246 | m_ins2 = m_session.prepare( 247 | "INSERT INTO ks_32.connects_to (square_id_src, " + \ 248 | " square_id_dst) VALUES ( ?, ? ) " 249 | ) 250 | m_del1 = m_session.prepare( 251 | "DELETE FROM ks_32.connects_to WHERE " + \ 252 | " square_id_src = ? AND square_id_dst = ? " 253 | ) 254 | 255 | init_db2() 256 | 257 | run_traversals() 258 | 259 | 260 | 261 | -------------------------------------------------------------------------------- /2019/DDN_2019_33_ShortestPoint.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_33_ShortestPoint.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_33_ShortestPoint.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_33_ShortestPoint.tar -------------------------------------------------------------------------------- /2019/DDN_2019_34_GremlinPrimer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_34_GremlinPrimer.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_35 Desktop.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_35 Desktop.pdf -------------------------------------------------------------------------------- /2019/DDN_2019_36_cfstats.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2019/DDN_2019_36_cfstats.pdf -------------------------------------------------------------------------------- /2019/README.md: -------------------------------------------------------------------------------- 1 | DataStax Developer's Notebook - Monthly Articles 2019 2 | =================== 3 | 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 9 | 10 | 2019 December - - 11 | >Customer: My company is getting ready to go into production with our first (Cassandra) application. We’ve 12 | >noticed that one of our nodes contains way more data than the other nodes, and is way more utilized than 13 | >the other nodes. We’ve found “nodetool cfstats”, along with mention of tombstones, read/write latencies, 14 | >and more, and think we have a problem. Can you help ? 15 | > 16 | >Daniel: Excellent question ! You’ve got a lot going on above in this problem statement. Net/net, in this 17 | >document we will; explain cfstats, overview a (production readiness) examination, and more. 18 | > 19 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_36_cfstats.pdf) 20 | 21 | 2019 November - - 22 | >Customer: I’m a developer and have little time to learn the complexities of setting up and maintaining 23 | >servers. I get that I can stand up a DatasStax Enterprise server in 2 minutes or less, but I have at 24 | >least 10 of these types of challenges to overcome when getting applications out of the door. Can you help ? 25 | > 26 | >Daniel: Excellent question ! Well obviously you want some automation. DataStax Desktop was introduced 27 | >this year as a means to simply containerize your DataStax Enterpise (DSE) install. A fat client, DataStax 28 | >Desktop runs on Linux, MacOS and Windows, and fronts ends Docker and Kubernetes; really then, standing up 29 | >a new, single or set of DSEs is like a 3 button operation. 30 | > 31 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_35%20Desktop.pdf) 32 | 33 | 2019 October - - 34 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 35 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 36 | >Can you help ? 37 | > 38 | >Daniel: Excellent question ! This is the third of three documents in a series answering this question. In 39 | >the first document (August/2019), we set up the DataStax Enterprise (DSE) release 6.8 Python client side 40 | >library, and worked with the driver for both OLTP and OLAP style queries. In the second document in this 41 | >series (September/2019), we delivered a thin client Web user interface that allowed us to interact with 42 | >a (grid maze), prompting and then rendering the results to a DSE Graph shortest path query (traversal). 43 | >In this third and final document in this series, we backfill all of the DSE Graph (Apache Gremlin) traversal 44 | >steps you would need to know to write the shortest path query on your own, without aid. 45 | > 46 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_34_GremlinPrimer.pdf) 47 | > 48 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_34_GremlinPrimer.txt) 49 | 50 | 51 | 2019 September - - 52 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 53 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 54 | >Can you help ? 55 | > 56 | >Daniel: Excellent question ! This is the second of three documents in a series answering this question. 57 | >In the first document (August/2019), we set up the DataStax Enterprise (DSE) release 6.8 Python client 58 | >side library, and worked with the driver for both OLTP and OLAP style queries. In this second document, 59 | >we deliver a thin client Web user interface that allows us to interact with a (grid maze), prompting 60 | >and then rendering the results to a DSE Graph shortest path query (traversal). In the third and final 61 | >document in this series (October/2019), we will backfill all of the DSE Graph (Apache Gremlin) traversal 62 | >steps you would need to know to write the shortest path query on your own, without aid. 63 | > 64 | >All of the source code to the client program, written in Python, is available below as a Linux Tar ball. 65 | > 66 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_33_ShortestPoint.pdf) 67 | > 68 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_33_ShortestPoint.tar) 69 | 70 | 71 | 2019 August - - 72 | >Customer: My company has a number of shortest-path problems, for example; airlines, get me from SFO to 73 | >JFK for passenger and freight routing. I understand graph analytics may be a means to solve this problem. 74 | >Can you help ? 75 | > 76 | >Daniel: Excellent question ! Previously in this document series we have overviewed the topic of graph 77 | >databases (June/2019, updated from January/2019). Also, we have deep dived on the topic of product 78 | >recommendation engines using Apache Spark (DSE Analytics) machine learning, and also DSE Graph, 79 | >performing a compare/contrast of the analytics each environment offers (July/2019). 80 | > 81 | >In this edition of this document, we will address graph analytics, shortest path. While we previously 82 | overviewed graph, we’ve never detailed the graph query language titled, Apache Gremlin. Gremlin is a 83 | >large topic, way larger and more capable than SQL SELECT. Thus, we will, in this document, begin a 84 | >series of at least 3 articles, they being; 85 | > 86 | > • Setup a DSE (Graph) version 6.8, Python client for both OLTP and OLAP. (This document) 87 | > 88 | > • Deliver the shortest path solution using DSE Graph with a Python Web client user interface. 89 | > 90 | > • Deliver a part-1 primer on Apache Gremlin, so that you may better understand the query (Gremlin 91 | >traversal) used to calculate shortest path. 92 | > 93 | > 94 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_32_Python68Client.pdf) 95 | > 96 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_32_Python68Client.py) 97 | 98 | 99 | 2019 July - - 100 | >Customer: I'm confused. I saw a presentation at the 2019 DataStax world conference (Accelerate 2019), 101 | >detailing how to deliver a product recommendation engine using DSE Graph. I've also seen DSE articles 102 | >detailing how to deliver a product recommendation engine using DSE Analytics. Can you help ? 103 | > 104 | >Daniel: Excellent question ! As discussed in previous editions of this document; there are 4 primary 105 | >functional areas within DataStax Enterprise (DSE). DSE Analytics can deliver a ‘content-based’ product 106 | >recommendation (aka, product-product). DSE Graph can deliver a ‘collaborative-based’ product recommendation 107 | >engine (aka, user-user). Both DSE Analytics and DSE Graph use DSE Core as their storage engine, and DSE 108 | >Search as their advanced index engine; a full integration, not just a connector. 109 | > 110 | >In this edition of this document we’ll detail all of the code needed to deliver the above, and include 111 | >data. We’ll also use this edition of this document to provide a Graph query primer (Gremlin language 112 | >primer), and answer the nuanced question of; Why Graph ? 113 | > 114 | >[Download Whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31a_DSE%2C%20Reco%20Engines.pdf) 115 | > 116 | >[PowerPoint (mesaurably more detailed than above)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31b_DSE%2C%20Reco%20Engines.pptx) 117 | > 118 | >[Video recording of the PPT above](https://www.youtube.com/watch?v=15xUt1sZ48U&feature=youtu.be) 119 | > 120 | >[Just Grocery Data as Tar](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31c_JustGroceryData.tar) 121 | > 122 | >[All Program Code as Txt](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31d_AllCommands.txt) 123 | > 124 | >[DataStax KillrVideo demo DB data as Tar](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31e_KillrVideoDataAsPipe.tar) 125 | > 126 | >[DataStax KillrVideo demo DB DDL as Txt (vers 6.8, fyi)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_31f_KillrVideoDDL.cql) 127 | 128 | 129 | 2019 June - - 130 | 131 | >Customer: I saw the January/2019 article where you introduced graph computing using Apache TinkerPop, 132 | >aka, DataStax Enterprise Graph. Now I see that you’ve released an EAP (early access program, preview) 133 | >DataStax Enterprise version 6.8 with significant changes to graph, including a new storage model. I 134 | >figure there’s a bunch of new stuff I need to know. Can you help ? 135 | > 136 | >Daniel: Excellent question ! Yes. In this edition of DataStax Developer’s Notebook (DDN), we’ll update 137 | >the January/2019 document with new version 6.8 DSE capabilities, and do a bit of compare and contrast. 138 | >Keep in mind as an EAP, version 6.8 is proposed, early preview. Version 6.8 may change drastically. 139 | > 140 | >In this document, we detail that you no longer need GraphFrames; inserting using standard DataFrames 141 | >saves a bunch of processing steps, and still performs just as well. We detail the new version 6.8 142 | >storage model, which is also much simpler over version 6.7. (Everything is stored directly in DSE Core 143 | >tables, and directly support DSE Core CQL queries.) 144 | > 145 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_30_GraphPrimer%2068.pdf) 146 | 147 | 148 | 2019 May - - 149 | 150 | >Customer: As a developer I’ve been using Redis for 6 years, and now my company tells me I have to move 151 | >all of my work to Apache Kafka. Can you help ? 152 | > 153 | >Daniel: Excellent question ! Management, huh ? We say that because Redis and Kafka are not the same. 154 | >In fact, Redis seems to have really re-energized in the past 4 years, with many strategic enhancements. 155 | >Redis has held the number four spot on DB-Engines.com database ranking for some time. Kafka, while 156 | >used by nearly everyone, seems to place 60% of their workloads serving mainframe off loads; guaranteed 157 | >message delivery possibly to multiple consumers. (A scale out of subscribe in publish/subscribe.) 158 | > 159 | >In this document, we’ll install and configure a single node (stand alone) Kafka cluster, learn to write 160 | >and read messages, and install and configure the DataStax Kafka Connector (Kafka Connector). Using the 161 | >Kafka Connector, you can push Kafka messages into DataStax Enterprise and the DataStax Distribution of 162 | >Cassandra (DDAC) without writing any program code. Cool. 163 | > 164 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_29_Kafka.pdf) 165 | 166 | 167 | 2019 April - - 168 | 169 | >Customer: I work on my laptop using Python, R, and Jupyter Notebooks, doing analytics for my company. 170 | >I’ve been digging on the analytics topics built inside DataStrax Enterprise (pre-installed, data co-located 171 | >with the analytics routines), but don’t see how I can make use of any of this. Can you help ? 172 | > 173 | >Daniel: Excellent question ! Isn’t Python and R the number one data science software pairing on the 174 | >planet ? (Regardless, it must be in the top two.) DataStax Enterprise ships pre-configured with a 175 | >Python interpreter, and R. For R, don’t know why, you must install one new external library. 176 | > 177 | >We do have a bias towards Spark/R, since Spark/R seemed to lead in the area of open source parallel c 178 | >apable routines. (Speed, performance, better documentation.) 179 | > 180 | >Jupyter is also an excellent choice, especially if you’re Python focused. We’ve not yet looked at 181 | >the Anaconda distribution of Jupyter, which seems very promising. We’ll show a Jupyter install, 182 | >but may ourselves stick with Apache Zeppelin, since Zeppelin seems to come pre-installed/pre-configured 183 | >for so many more languages and options out of the box. (Less work for us.) 184 | > 185 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_28_Jupyter%2C%20R.pdf) 186 | 187 | 188 | 2019 March - - 189 | 190 | >Customer: My company was using application server tiered security, and now needs to implement 191 | >database tier level security. Can you help ? 192 | > 193 | >Daniel: Excellent question ! Obviously security is a broad topic; OS level security (the OS 194 | >hosting DSE), database level security, data in flight, data at rest, and more. 195 | > 196 | >Minimally we’ll overview DSE security, and detail how to implement password protection of same. 197 | > 198 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_27_Security.pdf) 199 | 200 | 201 | 2019 February - - 202 | 203 | >Customer: I need to program an inventory management system, and wish to use the time stamp, time to 204 | >live, and other features found within DSE. Can you help ? 205 | > 206 | >Daniel: Excellent question ! The design pattern you implement differs when you are selling a distinct 207 | >inventory (specifically numbered seats to a concert), or you are selling a true-count, number on hand 208 | >inventory (all items are the same). 209 | > 210 | >Regardless, we will cover all of the relevant topics, and detail how to program same using DSE Core 211 | >and DSE Analytics (Apache Spark). 212 | > 213 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_26_Inventory.pdf) 214 | 215 | 216 | 2019 January - - 217 | 218 | >Customer: Graph, graph, graph; what the heck is up with graph- I think (hope ?) there’s something graph 219 | >databases do that standard relational databases do not, but I can’t articulate what that function or 220 | >advantage actually is. Can you help ? 221 | > 222 | >Daniel: Excellent question ! Yes, but we’re going to take two editions of this document to do so. Sometimes 223 | >there are nuances when discussing databases; what really is the difference between a data warehouse, 224 | >data mart, data lake, other ? Why couldn’t you recreate some or most non-relational database function 225 | >using a standard relational database ? 226 | > 227 | >In this edition of DataStax Developer’s Notebook (DDN), we provide a graph database primer; create a 228 | >graph, and load it. In a future edition of this same document, we will actually have the chance to 229 | >provide examples where you might determine that graph databases have an advantage over relational 230 | >databases for certain use cases. 231 | > 232 | >[Download here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/DDN_2019_25_GraphPrimer.pdf) 233 | > 234 | >[Resource Kit](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/41%20Simple%20Customer%20Graph.txt), all of the data and programs used in this edition in ASCII text format. 235 | > 236 | -------------------------------------------------------------------------------- /2020/DDN_2020_37_Parquet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_37_Parquet.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_38_FileMethods.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_38_FileMethods.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_39_DriverFutures.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_39_DriverFutures.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_40_SSL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_40_SSL.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_41_GraphQL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_41_GraphQL.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_42_AstraGeohash.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_42_AstraGeohash_Data.pipe.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash_Data.pipe.gz -------------------------------------------------------------------------------- /2020/DDN_2020_42_AstraGeohash_Programs.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_42_AstraGeohash_Programs.tar.gz -------------------------------------------------------------------------------- /2020/DDN_2020_43_AstraApiProgramming.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_43_AstraApiProgramming.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_44_NoSQLBench.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_44_NoSQLBench.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_44_NoSQLBench.yaml: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Run via, 4 | # 5 | # nb (file_name) 6 | # nb (file_name) driver=cql 7 | # 8 | # nb run workload=(file_name) driver=stdout tags=phase:rampup cycles=10 9 | # 10 | # nb run workload=(file_name) driver=stdout tags=name:query1 cycles=10 11 | 12 | # Because of the DROP/CREATE KEYSPACE, this file does not run against Astra 13 | 14 | 15 | scenarios: 16 | default: 17 | schema: run driver=stdout tags==phase:schema threads==1 cycles==UNDEF 18 | rampup: run driver=stdout tags==phase:rampup threads==auto cycles=10000000 19 | main: run driver=stdout tags==phase:main threads==auto cycles=100000 20 | 21 | 22 | bindings: 23 | colx: Mod(<>L); ToHashedUUID() -> java.util.UUID; ToString() -> String 24 | col8: FullNames() -> String 25 | 26 | 27 | blocks: 28 | 29 | - tags: 30 | phase: schema 31 | params: 32 | prepared: false 33 | statements: 34 | 35 | - drop_keyspace: | 36 | DROP KEYSPACE IF EXISTS <>; 37 | tags: 38 | name: drop_keyspace 39 | 40 | - create_keyspace: | 41 | CREATE KEYSPACE <> 42 | WITH replication = {'class': 'SimpleStrategy', 43 | 'replication_factor': '1'}; 44 | tags: 45 | name: create_keyspace 46 | 47 | - create_table: | 48 | CREATE TABLE <>.<> 49 | ( 50 | col1 TEXT PRIMARY KEY, 51 | col2 TEXT, 52 | col3 TEXT, 53 | col4 TEXT, 54 | col5 TEXT, 55 | col6 TEXT, 56 | col7 TEXT, 57 | col8 TEXT, 58 | col9 TEXT, 59 | col0 TEXT 60 | ); 61 | tags: 62 | name: create_table 63 | 64 | - create_index: | 65 | CREATE CUSTOM INDEX col4_idx 66 | ON <>.<> (col4) USING 'StorageAttachedIndex'; 67 | tags: 68 | name: create_index 69 | 70 | 71 | - tags: 72 | phase: rampup 73 | params: 74 | prepared: true 75 | statements: 76 | 77 | - insert: | 78 | INSERT INTO <>.<> (col1, col4, col8) 79 | VALUES ( {colx}, {colx}, {col8} ); 80 | tags: 81 | name: insert 82 | 83 | 84 | - tags: 85 | phase: main 86 | params: 87 | prepared: true 88 | statements: 89 | 90 | - query1: | 91 | SELECT * FROM <>.<> WHERE col1 = {colx} ; 92 | tags: 93 | name: query1 94 | 95 | - query1: | 96 | SELECT * FROM <>.<> WHERE col4 = {colx} ; 97 | tags: 98 | name: query2 99 | 100 | 101 | 102 | -------------------------------------------------------------------------------- /2020/DDN_2020_44_NoSQLBench_Slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_44_NoSQLBench_Slides.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_45_KubernetesOperator.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_45_KubernetesOperator.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_45_KubernetesOperator.tar: -------------------------------------------------------------------------------- 1 | 10-kind-config.yaml0000755000076500000240000000030013716020464012647 0ustar dialoutkind: Cluster 2 | apiVersion: kind.sigs.k8s.io/v1alpha3 3 | networking: 4 | apiServerPort: 45451 5 | nodes: 6 | - role: control-plane 7 | - role: worker 8 | - role: worker 9 | - role: worker 10 | - role: worker 11 | - role: worker 12 | 13 | 11-storageclass-kind.yaml0000755000076500000240000000037313716020464014107 0ustar dialoutapiVersion: storage.k8s.io/v1 14 | kind: StorageClass 15 | metadata: 16 | annotations: 17 | storageclass.kubernetes.io/is-default-class: "true" 18 | name: server-storage 19 | provisioner: rancher.io/local-path 20 | reclaimPolicy: Delete 21 | volumeBindingMode: WaitForFirstConsumer 22 | 23 | 12-install-cass-operator-v1.1yaml0000755000076500000240000005006213716020463015325 0ustar dialout--- 24 | apiVersion: v1 25 | kind: Namespace 26 | metadata: 27 | name: cass-operator 28 | --- 29 | apiVersion: v1 30 | kind: ServiceAccount 31 | metadata: 32 | name: cass-operator 33 | namespace: cass-operator 34 | --- 35 | apiVersion: v1 36 | data: 37 | tls.crt: "" 38 | tls.key: "" 39 | kind: Secret 40 | metadata: 41 | name: cass-operator-webhook-config 42 | namespace: cass-operator 43 | --- 44 | apiVersion: apiextensions.k8s.io/v1beta1 45 | kind: CustomResourceDefinition 46 | metadata: 47 | name: cassandradatacenters.cassandra.datastax.com 48 | spec: 49 | group: cassandra.datastax.com 50 | names: 51 | kind: CassandraDatacenter 52 | listKind: CassandraDatacenterList 53 | plural: cassandradatacenters 54 | shortNames: 55 | - cassdc 56 | - cassdcs 57 | singular: cassandradatacenter 58 | scope: Namespaced 59 | subresources: 60 | status: {} 61 | validation: 62 | openAPIV3Schema: 63 | description: CassandraDatacenter is the Schema for the cassandradatacenters 64 | API 65 | properties: 66 | apiVersion: 67 | description: 'APIVersion defines the versioned schema of this representation 68 | of an object. Servers should convert recognized schemas to the latest 69 | internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#resources' 70 | type: string 71 | kind: 72 | description: 'Kind is a string value representing the REST resource this 73 | object represents. Servers may infer this from the endpoint the client 74 | submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#types-kinds' 75 | type: string 76 | metadata: 77 | type: object 78 | spec: 79 | description: CassandraDatacenterSpec defines the desired state of a CassandraDatacenter 80 | properties: 81 | allowMultipleNodesPerWorker: 82 | description: Turning this option on allows multiple server pods to be 83 | created on a k8s worker node. By default the operator creates just 84 | one server pod per k8s worker node using k8s podAntiAffinity and requiredDuringSchedulingIgnoredDuringExecution. 85 | type: boolean 86 | canaryUpgrade: 87 | description: Indicates that configuration and container image changes 88 | should only be pushed to the first rack of the datacenter 89 | type: boolean 90 | clusterName: 91 | description: The name by which CQL clients and instances will know the 92 | cluster. If the same cluster name is shared by multiple Datacenters 93 | in the same Kubernetes namespace, they will join together in a multi-datacenter 94 | cluster. 95 | minLength: 2 96 | type: string 97 | configBuilderImage: 98 | description: Container image for the config builder init container. 99 | type: string 100 | managementApiAuth: 101 | description: Config for the Management API certificates 102 | properties: 103 | insecure: 104 | type: object 105 | manual: 106 | properties: 107 | clientSecretName: 108 | type: string 109 | serverSecretName: 110 | type: string 111 | skipSecretValidation: 112 | type: boolean 113 | required: 114 | - clientSecretName 115 | - serverSecretName 116 | type: object 117 | type: object 118 | nodeSelector: 119 | additionalProperties: 120 | type: string 121 | description: 'A map of label keys and values to restrict Cassandra node 122 | scheduling to k8s workers with matchiing labels. More info: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector' 123 | type: object 124 | racks: 125 | description: A list of the named racks in the datacenter, representing 126 | independent failure domains. The number of racks should match the 127 | replication factor in the keyspaces you plan to create, and the number 128 | of racks cannot easily be changed once a datacenter is deployed. 129 | items: 130 | description: Rack ... 131 | properties: 132 | name: 133 | description: The rack name 134 | minLength: 2 135 | type: string 136 | zone: 137 | description: Zone name to pin the rack, using node affinity 138 | type: string 139 | required: 140 | - name 141 | type: object 142 | type: array 143 | replaceNodes: 144 | description: A list of pod names that need to be replaced. 145 | items: 146 | type: string 147 | type: array 148 | resources: 149 | description: Kubernetes resource requests and limits, per pod 150 | properties: 151 | limits: 152 | additionalProperties: 153 | type: string 154 | description: 'Limits describes the maximum amount of compute resources 155 | allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' 156 | type: object 157 | requests: 158 | additionalProperties: 159 | type: string 160 | description: 'Requests describes the minimum amount of compute resources 161 | required. If Requests is omitted for a container, it defaults 162 | to Limits if that is explicitly specified, otherwise to an implementation-defined 163 | value. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' 164 | type: object 165 | type: object 166 | rollingRestartRequested: 167 | description: Whether to do a rolling restart at the next opportunity. 168 | The operator will set this back to false once the restart is in progress. 169 | type: boolean 170 | serverImage: 171 | description: 'Cassandra server image name. More info: https://kubernetes.io/docs/concepts/containers/images' 172 | type: string 173 | serverType: 174 | description: 'Server type: "cassandra" or "dse"' 175 | enum: 176 | - cassandra 177 | - dse 178 | type: string 179 | serverVersion: 180 | description: Version string for config builder, used to generate Cassandra 181 | server configuration 182 | enum: 183 | - 6.8.0 184 | - 3.11.6 185 | - 4.0.0 186 | type: string 187 | serviceAccount: 188 | description: The k8s service account to use for the server pods 189 | type: string 190 | size: 191 | description: Desired number of Cassandra server nodes 192 | format: int32 193 | minimum: 1 194 | type: integer 195 | stopped: 196 | description: A stopped CassandraDatacenter will have no running server 197 | pods, like using "stop" with traditional System V init scripts. Other 198 | Kubernetes resources will be left intact, and volumes will re-attach 199 | when the CassandraDatacenter workload is resumed. 200 | type: boolean 201 | storageConfig: 202 | description: Describes the persistent storage request of each server 203 | node 204 | properties: 205 | cassandraDataVolumeClaimSpec: 206 | description: PersistentVolumeClaimSpec describes the common attributes 207 | of storage devices and allows a Source for provider-specific attributes 208 | properties: 209 | accessModes: 210 | description: 'AccessModes contains the desired access modes 211 | the volume should have. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#access-modes-1' 212 | items: 213 | type: string 214 | type: array 215 | dataSource: 216 | description: This field requires the VolumeSnapshotDataSource 217 | alpha feature gate to be enabled and currently VolumeSnapshot 218 | is the only supported data source. If the provisioner can 219 | support VolumeSnapshot data source, it will create a new volume 220 | and data will be restored to the volume at the same time. 221 | If the provisioner does not support VolumeSnapshot data source, 222 | volume will not be created and the failure will be reported 223 | as an event. In the future, we plan to support more data source 224 | types and the behavior of the provisioner may change. 225 | properties: 226 | apiGroup: 227 | description: APIGroup is the group for the resource being 228 | referenced. If APIGroup is not specified, the specified 229 | Kind must be in the core API group. For any other third-party 230 | types, APIGroup is required. 231 | type: string 232 | kind: 233 | description: Kind is the type of resource being referenced 234 | type: string 235 | name: 236 | description: Name is the name of resource being referenced 237 | type: string 238 | required: 239 | - kind 240 | - name 241 | type: object 242 | resources: 243 | description: 'Resources represents the minimum resources the 244 | volume should have. More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#resources' 245 | properties: 246 | limits: 247 | additionalProperties: 248 | type: string 249 | description: 'Limits describes the maximum amount of compute 250 | resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' 251 | type: object 252 | requests: 253 | additionalProperties: 254 | type: string 255 | description: 'Requests describes the minimum amount of compute 256 | resources required. If Requests is omitted for a container, 257 | it defaults to Limits if that is explicitly specified, 258 | otherwise to an implementation-defined value. More info: 259 | https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/' 260 | type: object 261 | type: object 262 | selector: 263 | description: A label query over volumes to consider for binding. 264 | properties: 265 | matchExpressions: 266 | description: matchExpressions is a list of label selector 267 | requirements. The requirements are ANDed. 268 | items: 269 | description: A label selector requirement is a selector 270 | that contains values, a key, and an operator that relates 271 | the key and values. 272 | properties: 273 | key: 274 | description: key is the label key that the selector 275 | applies to. 276 | type: string 277 | operator: 278 | description: operator represents a key's relationship 279 | to a set of values. Valid operators are In, NotIn, 280 | Exists and DoesNotExist. 281 | type: string 282 | values: 283 | description: values is an array of string values. 284 | If the operator is In or NotIn, the values array 285 | must be non-empty. If the operator is Exists or 286 | DoesNotExist, the values array must be empty. This 287 | array is replaced during a strategic merge patch. 288 | items: 289 | type: string 290 | type: array 291 | required: 292 | - key 293 | - operator 294 | type: object 295 | type: array 296 | matchLabels: 297 | additionalProperties: 298 | type: string 299 | description: matchLabels is a map of {key,value} pairs. 300 | A single {key,value} in the matchLabels map is equivalent 301 | to an element of matchExpressions, whose key field is 302 | "key", the operator is "In", and the values array contains 303 | only "value". The requirements are ANDed. 304 | type: object 305 | type: object 306 | storageClassName: 307 | description: 'Name of the StorageClass required by the claim. 308 | More info: https://kubernetes.io/docs/concepts/storage/persistent-volumes#class-1' 309 | type: string 310 | volumeMode: 311 | description: volumeMode defines what type of volume is required 312 | by the claim. Value of Filesystem is implied when not included 313 | in claim spec. This is a beta feature. 314 | type: string 315 | volumeName: 316 | description: VolumeName is the binding reference to the PersistentVolume 317 | backing this claim. 318 | type: string 319 | type: object 320 | type: object 321 | superuserSecretName: 322 | description: This secret defines the username and password for the Cassandra 323 | server superuser. If it is omitted, we will generate a secret instead. 324 | type: string 325 | required: 326 | - clusterName 327 | - serverType 328 | - serverVersion 329 | - size 330 | - storageConfig 331 | type: object 332 | status: 333 | description: CassandraDatacenterStatus defines the observed state of CassandraDatacenter 334 | properties: 335 | cassandraOperatorProgress: 336 | description: Last known progress state of the Cassandra Operator 337 | type: string 338 | lastRollingRestart: 339 | format: date-time 340 | type: string 341 | lastServerNodeStarted: 342 | description: The timestamp when the operator last started a Server node 343 | with the management API 344 | format: date-time 345 | type: string 346 | nodeReplacements: 347 | items: 348 | type: string 349 | type: array 350 | nodeStatuses: 351 | additionalProperties: 352 | properties: 353 | hostID: 354 | type: string 355 | nodeIP: 356 | type: string 357 | type: object 358 | type: object 359 | superUserUpserted: 360 | description: The timestamp at which CQL superuser credentials were last 361 | upserted to the management API 362 | format: date-time 363 | type: string 364 | type: object 365 | type: object 366 | x-kubernetes-preserve-unknown-fields: true 367 | version: v1beta1 368 | versions: 369 | - name: v1beta1 370 | served: true 371 | storage: true 372 | --- 373 | apiVersion: rbac.authorization.k8s.io/v1 374 | kind: ClusterRole 375 | metadata: 376 | creationTimestamp: null 377 | name: cass-operator-cluster-role 378 | rules: 379 | - apiGroups: 380 | - admissionregistration.k8s.io 381 | resourceNames: 382 | - cassandradatacenter-webhook-registration 383 | resources: 384 | - validatingwebhookconfigurations 385 | verbs: 386 | - create 387 | - get 388 | - update 389 | --- 390 | apiVersion: rbac.authorization.k8s.io/v1 391 | kind: ClusterRoleBinding 392 | metadata: 393 | name: cass-operator 394 | roleRef: 395 | apiGroup: rbac.authorization.k8s.io 396 | kind: ClusterRole 397 | name: cass-operator-cluster-role 398 | subjects: 399 | - kind: ServiceAccount 400 | name: cass-operator 401 | namespace: cass-operator 402 | --- 403 | apiVersion: rbac.authorization.k8s.io/v1 404 | kind: Role 405 | metadata: 406 | name: cass-operator 407 | namespace: cass-operator 408 | rules: 409 | - apiGroups: 410 | - "" 411 | resources: 412 | - pods 413 | - services 414 | - endpoints 415 | - persistentvolumeclaims 416 | - events 417 | - configmaps 418 | - secrets 419 | verbs: 420 | - '*' 421 | - apiGroups: 422 | - "" 423 | resources: 424 | - namespaces 425 | verbs: 426 | - get 427 | - apiGroups: 428 | - apps 429 | resources: 430 | - deployments 431 | - daemonsets 432 | - replicasets 433 | - statefulsets 434 | verbs: 435 | - '*' 436 | - apiGroups: 437 | - monitoring.coreos.com 438 | resources: 439 | - servicemonitors 440 | verbs: 441 | - get 442 | - create 443 | - apiGroups: 444 | - apps 445 | resourceNames: 446 | - cass-operator 447 | resources: 448 | - deployments/finalizers 449 | verbs: 450 | - update 451 | - apiGroups: 452 | - datastax.com 453 | resources: 454 | - '*' 455 | verbs: 456 | - '*' 457 | - apiGroups: 458 | - policy 459 | resources: 460 | - poddisruptionbudgets 461 | verbs: 462 | - '*' 463 | - apiGroups: 464 | - cassandra.datastax.com 465 | resources: 466 | - '*' 467 | verbs: 468 | - '*' 469 | --- 470 | apiVersion: rbac.authorization.k8s.io/v1 471 | kind: RoleBinding 472 | metadata: 473 | name: cass-operator 474 | namespace: cass-operator 475 | roleRef: 476 | apiGroup: rbac.authorization.k8s.io 477 | kind: Role 478 | name: cass-operator 479 | subjects: 480 | - kind: ServiceAccount 481 | name: cass-operator 482 | --- 483 | apiVersion: v1 484 | kind: Service 485 | metadata: 486 | labels: 487 | name: cass-operator-webhook 488 | name: cassandradatacenter-webhook-service 489 | namespace: cass-operator 490 | spec: 491 | ports: 492 | - port: 443 493 | targetPort: 443 494 | selector: 495 | name: cass-operator 496 | --- 497 | apiVersion: apps/v1 498 | kind: Deployment 499 | metadata: 500 | name: cass-operator 501 | namespace: cass-operator 502 | spec: 503 | replicas: 1 504 | selector: 505 | matchLabels: 506 | name: cass-operator 507 | template: 508 | metadata: 509 | labels: 510 | name: cass-operator 511 | spec: 512 | containers: 513 | - env: 514 | - name: WATCH_NAMESPACE 515 | valueFrom: 516 | fieldRef: 517 | fieldPath: metadata.namespace 518 | - name: POD_NAME 519 | valueFrom: 520 | fieldRef: 521 | fieldPath: metadata.name 522 | - name: OPERATOR_NAME 523 | value: cass-operator 524 | - name: SKIP_VALIDATING_WEBHOOK 525 | value: "FALSE" 526 | image: datastax/cass-operator:1.1.0 527 | imagePullPolicy: IfNotPresent 528 | livenessProbe: 529 | exec: 530 | command: 531 | - pgrep 532 | - .*operator 533 | failureThreshold: 3 534 | initialDelaySeconds: 5 535 | periodSeconds: 5 536 | timeoutSeconds: 5 537 | name: cass-operator 538 | readinessProbe: 539 | exec: 540 | command: 541 | - stat 542 | - /tmp/operator-sdk-ready 543 | failureThreshold: 1 544 | initialDelaySeconds: 5 545 | periodSeconds: 5 546 | timeoutSeconds: 5 547 | volumeMounts: 548 | - mountPath: /tmp/k8s-webhook-server/serving-certs 549 | name: cass-operator-certs-volume 550 | readOnly: false 551 | serviceAccountName: cass-operator 552 | volumes: 553 | - name: cass-operator-certs-volume 554 | secret: 555 | secretName: cass-operator-webhook-config 556 | --- 557 | apiVersion: admissionregistration.k8s.io/v1beta1 558 | kind: ValidatingWebhookConfiguration 559 | metadata: 560 | name: cassandradatacenter-webhook-registration 561 | webhooks: 562 | - admissionReviewVersions: 563 | - v1beta1 564 | clientConfig: 565 | service: 566 | name: cassandradatacenter-webhook-service 567 | namespace: cass-operator 568 | path: /validate-cassandra-datastax-com-v1beta1-cassandradatacenter 569 | failurePolicy: Ignore 570 | matchPolicy: Equivalent 571 | name: cassandradatacenter-webhook.cassandra.datastax.com 572 | rules: 573 | - apiGroups: 574 | - cassandra.datastax.com 575 | apiVersions: 576 | - v1beta1 577 | operations: 578 | - CREATE 579 | - UPDATE 580 | - DELETE 581 | resources: 582 | - cassandradatacenters 583 | scope: '*' 584 | sideEffects: None 585 | timeoutSeconds: 10 586 | 587 | 13-cassandra-cluster.yaml0000755000076500000240000000131513716023657014116 0ustar dialoutapiVersion: cassandra.datastax.com/v1beta1 588 | kind: CassandraDatacenter 589 | metadata: 590 | name: dc1 591 | spec: 592 | clusterName: cluster1 593 | serverType: cassandra 594 | serverVersion: "4.0.0" 595 | managementApiAuth: 596 | insecure: {} 597 | size: 2 598 | storageConfig: 599 | cassandraDataVolumeClaimSpec: 600 | storageClassName: server-storage 601 | accessModes: 602 | - ReadWriteOnce 603 | resources: 604 | requests: 605 | storage: 5Gi 606 | config: 607 | cassandra-yaml: 608 | authenticator: org.apache.cassandra.auth.PasswordAuthenticator 609 | authorizer: org.apache.cassandra.auth.CassandraAuthorizer 610 | role_manager: org.apache.cassandra.auth.CassandraRoleManager 611 | jvm-options: 612 | initial_heap_size: "800M" 613 | max_heap_size: "800M" 614 | 615 | -------------------------------------------------------------------------------- /2020/DDN_2020_46_BetterVersOf42.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_46_BetterVersOf42.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_47_VMs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_47_VMs.pdf -------------------------------------------------------------------------------- /2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf -------------------------------------------------------------------------------- /2020/README.md: -------------------------------------------------------------------------------- 1 | DataStax Developer's Notebook - Monthly Articles 2020 2 | =================== 3 | 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 9 | 10 | December 2020 - - 11 | >Customer: Enjoyed the past article on Apache Cassandra and virtualization (VMs). I didn’t see you detail how to recover from a failed 12 | >node (VM) though. Can you help ? 13 | > 14 | >Daniel: Excellent question ! Good catch; an oversight on our part. Is this edition of DDN we detail how to implement and test node 15 | >recovery from failure, when using virtual machines. 16 | > 17 | >While this is generally an automatic function of Cassandra, you can, when using network attached storage, perform some manual steps 18 | >to recover nodes much faster. Also, we use these techniques to support development and quality-assurance; use the same steps for 19 | >'cluster cloning' and 'differentiated data', both topics to overvoew in this paper. 20 | > 21 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_48_NodeReplaceWoBootstrap.pdf) 22 | 23 | November 2020 - - 24 | >Customer: My company is finally going cloud. We wish to run performance tests and more for both virtual machine hosting, and then also 25 | >containers, Kubernetes. We want to see performance implications and also make ready to update our run-book. Can you help ? 26 | > 27 | >Daniel: Excellent question ! We’ve expertly done each; virtualization, and containers. We’ll begin a series of articles, in response to 28 | >this/your question. First, here, we’ll detail virtualization. We’ll share a number of techniques we use when automating tests and similar 29 | >when using virtual machines. All of this work will be done on GCP. After this article, on virtualization, we’ll move to containers; a 30 | >series of articles with first an overview, and then a number of recipes when on Kubernetes. 31 | > 32 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_47_VMs.pdf) 33 | 34 | October 2020 - - 35 | >Customer: I love the GraphQL, Python/Flask, OpenStreetView, geo-spatial discussion this series has had of late. I’m having trouble 36 | >putting it all together. Any chance you can put it all in one deliverable. Can you help ? 37 | > 38 | >Daniel: Excellent question ! In this article, we assemble all of the pieces we’ve recently discussed, putting them all in one 39 | >coordinated deliverable. We’ll detail the data format, start up scripts, the program proper, and even any HTML related to 40 | >OpenStreetView. (Eg., not Google Maps.) 41 | > 42 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_46_BetterVersOf42.pdf) 43 | 44 | September 2020 - - 45 | >Customer: My company is all in on micro-services, container and cloud for application development, server hosting including databases, 46 | >you name it. We’ve never hosted Cassandra inside containers, and wonder how best to get started. Can you help ? 47 | > 48 | >Daniel: Excellent question ! DataStax recently produced and open sourced its Kubernetes Operator, which will get you all that you need. 49 | >This operator supports open source Cassandra, DataStax Enterprise, and more. 50 | > 51 | >In the real world, expectedly, you’d use this operator to stand up pods hosting Cassandra on GKE or similar. For better learning and 52 | >debugging, this article will actually do this work on our laptop; greater control, greater control to break things for test, other. 53 | > 54 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_45_KubernetesOperator.pdf) 55 | > 56 | >[Download YAML files for labs here (TarBall format)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_45_KubernetesOperator.tar) 57 | 58 | August 2020 - - 59 | >Customer: My company has diffficulty moving applications into production related to data at scale. E.g., we program then unit 60 | >and system test with 5-15 rows of data, then when we get into production with millions of lines of data, things fail. There 61 | >has to be an easier way to overcome this challenge. Can you help ? 62 | > 63 | >Daniel: Excellent question ! With all of the pressures we face today just to get applications written, unit testing often suffers, 64 | >system testing suffers worse, and then testing applications at scale often never happens. Fortunately, we have an easy solution. 65 | > 66 | >For the past 10 years inside DataStax, we’ve perfected NoSQLBench, our now open source distributed data platform, volume data generation 67 | >and testing tool. In this article we will overview NoSQLBench, enabling you to see if NoSQLBench can meet your needs too. 68 | > 69 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench.pdf) 70 | > 71 | >[PowerPoint(Added detail to the above)](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench_Slides.pdf) 72 | > 73 | >[The final YAML file/solution used in this article](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_44_NoSQLBench.yaml) 74 | 75 | July 2020 - - 76 | >Customer: Okay, so my company is finally ready to “database as a service (DBaaS)". We also want to move to a micro-services 77 | >architecture, and possibly GraphQL. Can you help ? 78 | > 79 | >Daniel: Excellent question ! In this series we’ve previously covered GraphQL, and previously covered geo-spatial queries 80 | >using the DataStax Cassandra as a service titled, Astra, which also acted as our primer on Astra. 81 | > 82 | >In this article, we specifically cover Astra API programming. 83 | > 84 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_43_AstraApiProgramming.pdf) 85 | > 86 | >[Application program data](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_43_NoteBook.tar) in the form 87 | >of a DataStax Studio Notebook, in standard TAR file form. 88 | 89 | June 2020 - - 90 | >Customer: My company is investigating using DataStax database as a service, titled DataStax Astra (Astra), to aid 91 | >in our application development. I know Astra is exactly equal to Apache Cassandra, which means that the DataStax 92 | >Enterprise DSE Search component is not present. 93 | > 94 | >As such, we lose Solr/Lucene, and any geo-spatial index and query processing support. But, our application needs 95 | >geospatial query support. Can you help ? 96 | > 97 | >Daniel: Excellent question ! You will be surprised how easy this is to address. In this article we detail how you 98 | >deliver geospatial queries using DataStax Astra, or just the DataStax Enterprise (DSE) Core functional component, 99 | >(and not use DSE Search). 100 | > 101 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash.pdf) 102 | > 103 | >[Application program code](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash_Programs.tar.gz) 104 | > 105 | >[Application program data](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_42_AstraGeohash_Data.pipe.gz) 106 | > 107 | >(Because of GitHub file size limits, the above data file contains only 250,000 of the promised 334,000 lines of data. Sorry.) 108 | > 109 | >[Demonstration video](https://www.youtube.com/watch?v=RVso51X0A08) 110 | 111 | May 2020 - - 112 | >Customer: My company has got to improve its efficiency and time to delivery when creating business applications on 113 | >Apache Cassandra and DataStax Enterprise. Can you help ? 114 | > 115 | >Daniel: Excellent question ! Since you specfically mentioned application development, we will give focus to API 116 | >endpoint programming; a means to more greatly decouple your application from the database, allowing for greater 117 | >flexibility in deployment, and even increasing performance of Web and mobile applications. 118 | > 119 | >While we might briefly mention REST and gRPC, the bulk of this document will center on GraphQL. 120 | > 121 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_41_GraphQL.pdf) 122 | 123 | April 2020 - - 124 | >Customer: My company has started using more cloud instances for tasks like proof of concepts, and related. 125 | >We used to just leave these boxes wide open, since they generally contain zero sensitive data. But, things 126 | >being what they are, we feel like we should start securing these boxes. Can you help ? 127 | > 128 | >Daniel: Excellent question ! In the March/2019 edition of this document, we detailed how to implement 129 | >native authentication using DataStax Enterprise (DSE). In this edition, we detail how to implement SSL 130 | >between DSE server nodes (in the event you go multi-cloud), and then also SSL from client (node) to DSE 131 | >cluster. 132 | > 133 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_40_SSL.pdf) 134 | 135 | March 2020 - - 136 | >Customer: As a database application developer, I’ve never previously used a system with a natively asynchronous 137 | >client side driver. What do I need to know; Can you help ? 138 | > 139 | >Daniel: Excellent question ! Yes, the DataStax Enterprise (DSE) client side drivers offer entirely native 140 | >asynchronous operation.; fire and forget, or fire and listen. There are easy means to make the driver and 141 | >any calls you issue block, and behave synchronously, but there’s little fun in that. 142 | > 143 | >The on line documentation covers the asynchronous query topic well, so we’ll review that and then extend 144 | >into asynchronous write programming. 145 | > 146 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_39_DriverFutures.pdf) 147 | 148 | February 2020 - - 149 | 150 | >Customer: I’ve read all of the articles and documentation related to DataStax Enterprise (DSE) Graph, but am 151 | >still not certain how these graph queries (traversals) actually execute. To me, this looks much like a SQL 152 | >query processing engine, and I don’t know how or if to index or model this. Can you help ? 153 | > 154 | >Daniel: Excellent question ! In this document we’ll give a brief treatment to graph query processing; how 155 | >graph traversals are actually (executed). For fun, we’ll actually talk a little bit about a close (graph) 156 | >neighbor, Neo4J. 157 | > 158 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_38_FileMethods.pdf) 159 | 160 | January 2020 - - 161 | 162 | >Customer: My company maintains a lot of data on Hadoop, in Parquet and other formats, and need to perform integrated 163 | >reporting with data resident inside DataStax. Can you help ? 164 | > 165 | >Daniel: Excellent question ! Yes. This is like a two-liner solution. We’ll detail all of the concepts and code inside 166 | >this document. 167 | > 168 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/DDN_2020_37_Parquet.pdf) 169 | -------------------------------------------------------------------------------- /2021/61_DemoProgram.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/61_DemoProgram.tar.gz -------------------------------------------------------------------------------- /2021/DDN_2021_49_KubernetesPrimer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_49_KubernetesPrimer.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_50_KubernetesNodeRecovery.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_50_KubernetesNodeRecovery.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_51_KubernetesClusterCloning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_51_KubernetesClusterCloning.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_52_KubernetesSnapshots.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_52_KubernetesSnapshots.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_53_MoreContainersHelm.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_53_MoreContainersHelm.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_53_ToolkitVersion2.tar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_53_ToolkitVersion2.tar -------------------------------------------------------------------------------- /2021/DDN_2021_54_AstraSvcBroker.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_54_AstraSvcBroker.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_55_K8ssandra.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_55_K8ssandra.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_56_K8ssandra, Document API.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_56_K8ssandra, Document API.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_57_K8ssandra, GraphQL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_57_K8ssandra, GraphQL.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_58_KastenVeeam.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_58_KastenVeeam.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_59_DseStargate.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_59_DseStargate.pdf -------------------------------------------------------------------------------- /2021/DDN_2021_60_SnowFlake.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2021/DDN_2021_60_SnowFlake.pdf -------------------------------------------------------------------------------- /2021/README.md: -------------------------------------------------------------------------------- 1 | DataStax Developer's Notebook - Monthly Articles 2021 2 | =================== 3 | 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | 8 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 9 | 10 | December 2021 - - 11 | >Customer: My company uses a ton of Apache Cassandra, and a ton of SnowFlake. We also want to move to writing applications using Node.js/REACT. 12 | >We’re having trouble understanding what each of Cassandra and SnowFlake should be used for together, and what a sample application might 13 | >look like. Can you help ? 14 | > 15 | >Daniel: Excellent question ! We’ll detail a sample application, written in Node.js and REACT, and then deliver an application that uses both 16 | >Apache Cassandra and SnowFlake. 17 | > 18 | >[View a quick demo of what we're building here](https://youtu.be/uDfVjStGA9o) 19 | > 20 | >[Download December whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_60_SnowFlake.pdf) 21 | > 22 | >[Download Source Code in Tar Format here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_60_SnowFlake.tar) 23 | 24 | November 2021 - - 25 | >Customer: My company has been on DataStax Enterprise for some time. We are super excited by the new, open source StarGate (subsystem), 26 | >and what that provides. It appears as though StarGate only works with open source Apache Cassandra. Can you help ? 27 | > 28 | >Daniel: Excellent question ! We’ll overview what you are seeing, and how to get StarGate to work with DataStax Enterprise. We’ll also 29 | >detail a bit of the landscape; what’s moving around, how and why. 30 | > 31 | >[Download November whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_59_DseStargate.pdf) 32 | 33 | October 2021 - - 34 | >Customer: My company is investigating a backup and recovery solution for all of our applications running atop Kubernetes. Can you help ? 35 | > 36 | >Daniel: Excellent question ! In the past we’ve referenced the open source Velero project from a VMware acquisition, and we’ve mentioned 37 | >NetApp Astra (a SaaS). This month we detail installation and use of Kasten/Veeam K10. With (Kasten) you can backup and restore databases, 38 | >applications, and not only backup and restore, but also clone to aid your development efforts. 39 | > 40 | >In this article we backup and recover the DataStax Kubernetes Operator for Apache Cassandra, and backup, restore, and clone Cassandra 41 | >Datacenter objects (entire Cassandra clusters). 42 | > 43 | >[Download October whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_58_KastenVeeam.pdf) 44 | 45 | September 2021 - - 46 | >Customer: My company has enjoyed the last two articles on DataStax K8ssandra, and specifically the StarGate component to same. 47 | >We’ve seen details on REST and the Document API, but little on GraphQL. Can you help ? 48 | > 49 | >Daniel: Excellent question ! We’ve done a number of articles in this series on GraphQL. Most recently, in October/2020, we 50 | >delivered a geo-spatial thin client Web program using GraphQL against the DataStax database as a service. When using Astra, 51 | >the database is hosted, managed. Also when using Astra, the service end points are automatically created and maintained, and 52 | >are, behind the scenes, using K8ssandra and StarGate. 53 | > 54 | >So, in this article, we supply the final and previously missing piece; how to access the GraphQL component of your own hosted 55 | >K8ssandra/StarGate. 56 | > 57 | >[Download September whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_57_K8ssandra%2C%20GraphQL.pdf) 58 | 59 | August 2021 - - 60 | >Customer: My company enjoyed the last article on K8ssandra and StarGate. We are highly interested in the Document API that this 61 | >document referred to; use and some of the design elements. Can you help ? 62 | > 63 | >Daniel: Excellent question ! Last month we installed DataStax K8ssandra, which includes the StarGate component. Further, we did 64 | >a (Hello World) using REST, to serve as an install/validate of StarGate. This month we dive deeper not into just REST, but now 65 | >the Document API area of StarGate; make a table, insert documents, run a query. 66 | > 67 | >[Download August whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_56_K8ssandra%2C%20Document%20API.pdf) 68 | 69 | July 2021 - - 70 | >Customer: My company is looking for ways to accelerate our application development, through whatever means. Can you help ? 71 | > 72 | >Daniel: Excellent question ! One of the areas we can look at are programming APIs/gateways. On some level, there are only 73 | >four things you can do with data; insert, update, delete, and select. As such, why aren’t all means to execute these statements 74 | >automatically generated, automatically managed and scaled, and more. 75 | > 76 | >On February 10, 2021, DataStax released the open source K8ssandra project, which includes these automated functions, and more. 77 | >In this article, we detail an introduction to K8ssandra; installation and use. In subsequent articles, we dive deeper into REST, 78 | >GraphQL, and Document API configuration and use. 79 | > 80 | >[Download July whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_55_K8ssandra.pdf) 81 | 82 | June 2021 - - 83 | >Customer: My company is investigating using the DataStax Atra Service Broker within our Kubernetes systems. Can you help ? 84 | > 85 | >Daniel: Excellent question ! In this document we will install and use most of the early pieces of the DataStax Astra Service broker; install, 86 | >install verification, connection, yadda. By accident then, we instroduce Kuberenetes service brokers, broker instances, and (broker instance) 87 | >bindings. 88 | > 89 | >[Download June whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_54_AstraSvcBroker.pdf) 90 | 91 | May 2021 - - 92 | >Customer: My company enjoyed the series of four articles centered on Cassandra atop Kubernetes. But, you left Cassandra and the Operator limited to 93 | >just one namespace. We seek to run Cassandra clusters in many concurrent namespaces. Can you help ? 94 | > 95 | >Daniel: Excellent question ! In this edition of this document, we take the work from the previous 4 articles, and move them to a multiple namespace 96 | >treatment. We’ll detail using the Cassandra Operator across Kubernetes namespaces, and we’ll detail Cassandra cluster cloning across namespaces. Be 97 | >advised, there are relevant limitations with Kuberenetes version 1.18 and lower as it relates to (cloning) across namespaces. (Everything is do-able, 98 | >it’s just more steps than you might expect.) 99 | > 100 | >[Download May whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_53_MoreContainersHelm.pdf) 101 | > 102 | >[Download version 2.0 of the Toolkit for here, in Tar format](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_53_ToolkitVersion2.tar) 103 | > 104 | >[View a quick demo of Cassandra cluster cloning atop Kubernetes here](https://www.youtube.com/watch?v=paly5VVuAYM) 105 | 106 | January 2021 thru April 2021 - - 107 | >Customer: My company is moving its operations to the cloud, including cloud native computing and Kubernetes. I believe we can run Apache Cassandra 108 | >on Kubernetes. Can you help ? 109 | > 110 | >Daniel: Excellent question ! Kubernetes, and running Apache Cassandra on Kuberenetes are huge topics. As such, we’ll begin a four part series of articles that 111 | >cover most of the day one through day seven topics. We wont do a general Kuberentes primer, many other capable Kubernetes primers exist, but will list our 112 | >favorite resources here and there. 113 | > 114 | > • In the January article, we will get Apache Cassandra up and running on Kubernetes. 115 | > 116 | >[Download January whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_49_KubernetesPrimer.pdf) 117 | > 118 | > • In February, we detail recovery from failed/down Cassandra nodes. 119 | > 120 | >[Download February whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_50_KubernetesNodeRecovery.pdf) 121 | > 122 | > • In March, we detail Cassandra cluster cloning, for QA and development. 123 | > 124 | >[Download March whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_51_KubernetesClusterCloning.pdf) 125 | > 126 | > • In April, we detail Kubernetes snapshotting in general. 127 | > 128 | >[Download April whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_52_KubernetesSnapshots.pdf) 129 | > 130 | >[Download the Toolkit for all 4 months/articles here, in Tar format](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/DDN_2021_KubernetesPrimer_Toolkit.tar) 131 | 132 | 133 | 134 | -------------------------------------------------------------------------------- /2022/DDN_2022_61_SchemaValidation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/farrell0/DataStax-Developers-Notebook/8870d7c5691cc86f6fb8531d2fef4b615363179e/2022/DDN_2022_61_SchemaValidation.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | DataStax Developer's Notebook - Monthly Articles 2022 2 | =================== 3 | 4 | | **[Monthly Articles - 2022](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/README.md)** | **[Monthly Articles - 2021](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2021/README.md)** | **[Monthly Articles - 2020](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2020/README.md)** | **[Monthly Articles - 2019](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2019/README.md)** | **[Monthly Articles - 2018](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2018/README.md)** | **[Monthly Articles - 2017](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2017/README.md)** | 5 | |-------------------------|--------------------------|--------------------------|--------------------------|--------------------------|--------------------------| 6 | 7 | This is a personal blog where we answer one or more questions each month from DataStax customers in a non-official, non-warranted, non much of anything forum. 8 | 9 | January 2022 - - 10 | >Customer: My company wishes to activate SQL-style data ingegrity check constraints atop Apache Cassandra. Can you help ? 11 | > 12 | >Daniel: Excellent question ! You can do this, and we’ll detail all of the steps involved below. 13 | > 14 | >[Download whitepaper here](https://github.com/farrell0/DataStax-Developers-Notebook/blob/master/2022/DDN_2022_61_SchemaValidation.pdf) 15 | --------------------------------------------------------------------------------