C
12 |A
23 |B
34 |cassandra.yaml configuration file.
5 | ( ) Enabling full query logging on a node using nodetool ensures the feature will be enabled even when the node is restarted.
6 | (*) Setting a directory for full query log files in cassandra.yaml enables the full query logging feature.
7 |
8 |
9 | >>2. Which of the following is not a command supported by the full query logging tool fqltool? <<
10 | ( ) fqltool dump
11 | (*) fqltool archive
12 | ( ) fqltool replay
13 | ( ) fqltool compare
14 |
15 |
16 | >>3. Full query log files are human-readable <<
17 | ( ) TRUE
18 | (*) FALSE
19 |
20 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-migrate-cassandra-3-to-4/step7.md:
--------------------------------------------------------------------------------
1 | In this step, you will verify that the Cassandra node has been upgraded and that the data is still available.
2 |
3 | Verify that the version is 4.0.0
4 | ```
5 | nodetool version
6 | ```{{execute T1}}
7 |
8 | Make sure the node is in the *UP* and *NORMAL* (*UN*) state.
9 | ```
10 | nodetool status | grep -v UN
11 | ```{{execute T1}}
12 |
13 | Verify that there are no errors.
14 | ```
15 | grep -e "WARN" -e "ERROR" /usr/share/cassandra/logs/system.log
16 | ```{{execute T1}}
17 |
18 | Open a cql shell.
19 | ```
20 | cqlsh
21 | ```{{execute T1}}
22 |
23 | Use the keyspace.
24 | ```
25 | USE united_states;
26 | ```{{execute T1}}
27 |
28 | Verify that the data has been loaded.
29 | ```
30 | SELECT * FROM cities_by_state;
31 | ```{{execute T1}}
32 |
33 | If you can see the data, you have successfullly upgraded from Cassandra 3.11.9 to 4.0.0!
--------------------------------------------------------------------------------
/cql/step10.md:
--------------------------------------------------------------------------------
1 | If you are familiar with SQL, CQL may look quite similar.
2 | Indeed, there are many syntactic similarities between the two languages, but there are also many
3 | important differences. Here are just a few facts about CQL that highlight some of the differences:
4 |
5 | - CQL supports tables with single-row and multi-row partitions
6 | - CQL table primary key consists of a mandatory partition key and an optional clustering key
7 | - CQL does not support referential integrity constraints
8 | - CQL updates or inserts may result in upserts
9 | - CQL queries cannot retrieve data based on an arbitrary table column
10 | - CQL supports no joins or other binary operations
11 | - CQL CRUD operations are executed with a tunable consistency level
12 | - CQL supports lightweight transactions but not ACID transactions
13 |
14 | If some of the above facts do not sound familiar, you know that there are more about CQL to learn!
15 |
16 |
--------------------------------------------------------------------------------
/cassandra-data-modeling/investment-data/step3.md:
--------------------------------------------------------------------------------
1 | Execute the CQL script to insert sample data:
2 | ```sql
3 | SOURCE '~/investment_data.cql'
4 | ```{{execute}}
5 |
6 | Retrieve all rows from table `accounts_by_user`:
7 | ```sql
8 | SELECT * FROM accounts_by_user;
9 | ```{{execute}}
10 |
11 | Retrieve all rows from table `positions_by_account`:
12 | ```sql
13 | SELECT * FROM positions_by_account;
14 | ```{{execute}}
15 |
16 | Retrieve all rows from table `trades_by_a_d`:
17 | ```sql
18 | SELECT * FROM trades_by_a_d;
19 | ```{{execute}}
20 |
21 | Retrieve all rows from table `trades_by_a_td`:
22 | ```sql
23 | SELECT * FROM trades_by_a_td;
24 | ```{{execute}}
25 |
26 | Retrieve all rows from table `trades_by_a_std`:
27 | ```sql
28 | SELECT * FROM trades_by_a_std;
29 | ```{{execute}}
30 |
31 | Retrieve all rows from table `trades_by_a_sd`:
32 | ```sql
33 | SELECT * FROM trades_by_a_sd;
34 | ```{{execute}}
--------------------------------------------------------------------------------
/cassandra-fundamentals/cql/step10.md:
--------------------------------------------------------------------------------
1 | If you are familiar with SQL, CQL may look quite similar.
2 | Indeed, there are many syntactic similarities between the two languages, but there are also many
3 | important differences. Here are just a few facts about CQL that highlight some of the differences:
4 |
5 | - CQL supports tables with single-row and multi-row partitions
6 | - CQL table primary key consists of a mandatory partition key and an optional clustering key
7 | - CQL does not support referential integrity constraints
8 | - CQL updates or inserts may result in upserts
9 | - CQL queries cannot retrieve data based on an arbitrary table column
10 | - CQL supports no joins or other binary operations
11 | - CQL CRUD operations are executed with a tunable consistency level
12 | - CQL supports lightweight transactions but not ACID transactions
13 |
14 | If some of the above facts do not sound familiar, you know that there are more about CQL to learn!
15 |
16 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-migrate-cassandra-3-to-4/step1.md:
--------------------------------------------------------------------------------
1 | In this step, a script running in the background is installing JDK 8 and Cassandra 3.11.9. The script creates a *single-node* Cassandra cluster. The script performs the following actions:
2 |
3 | 1. Remove JDK 11 (the base image for this exercise has JDK 11 installed by default. Cassandra 3.x *does not* support JDK 11)
4 | 2. Install JDK 8
5 | 3. Install Cassandra 3.11.9 and configure environment variables
6 | 4. Start Cassandra
7 |
8 | Wait until you see the message *Cassandra setup complete*. (This may take a few minutes.)
9 |
10 | 
11 |
12 | Click to verify that the Cassandra version is 3.11.9.
13 | ```
14 | nodetool version
15 | ```{{execute T1}}
16 |
17 | You should see the correct version.
18 | 
19 |
20 | After verifying the version, clear the screen and continue to the next step.
21 | ```
22 | clear
23 | ```{{execute T1}}
24 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-internode-message/step6.md:
--------------------------------------------------------------------------------
1 | Let's use _CQLSH_ to query the virtual tables.
2 |
3 | Notice that each Cassandra node has two local tables.
4 | Let's look at the tables in the second node.
5 |
6 | ```
7 | cqlsh node2
8 | ```{{execute}}
9 |
10 | Here's the query to see the tables' contents.
11 |
12 | ```
13 | SELECT * FROM system_views.internode_inbound;
14 | SELECT * FROM system_views.internode_outbound;
15 | ```{{execute}}
16 |
17 | Let's switch and look at the contents of the tables in the first node.
18 |
19 | ```
20 | QUIT
21 | cqlsh node1
22 | ```{{execute}}
23 |
24 | Now, query the first nodes' tables.
25 | ```
26 | SELECT * FROM system_views.internode_inbound;
27 | SELECT * FROM system_views.internode_outbound;
28 | ```{{execute}}
29 |
30 | Notice that the tables in the first node show the DC-East datacenter, whereas the tables in the second node showed the DC-West datacenter.
31 |
32 | Exit _CQLSH_.
33 |
34 | ```
35 | QUIT
36 | ```{{execute}}
37 |
--------------------------------------------------------------------------------
/cassandra-data-modeling/shopping-cart-data/step6.md:
--------------------------------------------------------------------------------
1 | Save an active shopping cart with name `My Birthday` and id `4e66baf8-f3ad-4c3b-9151-52be4574f2de`,
2 | and designate a different cart with name `Gifts for Mom` and id `19925cc1-4f8b-4a44-b893-2a49a8434fc8` to be a new active shopping cart for user `jen`:
3 |
4 | cassandra.yaml.
7 |
8 |
9 | >>2. How should you handle sensitive data when sharing audit logs? <<
10 | ( ) Audit logs do not contain sesitive data.
11 | (*) Manually redact sensitive data in the audit logs.
12 | ( ) Use nodetool to redact specific fieds in the audit logs.
13 |
14 |
15 | >>3. Which command disables the audit log for the finance keyspace? <<
16 | ( ) nodetool auditlog --ignore finance
17 | ( ) nodetool auditlog --disable --keyspace finance
18 | (*) nodetool enableauditlog --excluded-keyspaces finance
19 |
--------------------------------------------------------------------------------
/cassandra-fundamentals/queries/quiz.md:
--------------------------------------------------------------------------------
1 | Here is a short quiz for you.
2 |
3 | Q1. A table with a composite partition key can be queried using ...
4 |
5 | - [ ] A. only the first column of the partition key
6 | - [ ] B. any subset of partition key columns, as long as the primary key definition order is respected
7 | - [ ] C. all columns of the partition key
8 |
9 | C
12 |B
25 |B
38 |internode_inbound, keeps track of inbound messaging metrics, and internode_outbound keeps track of the outbound metrics.
6 | Both tables are in the system_views keyspace.
7 |
8 | Note that these are not real tables.
9 | They merely _appear_ as tables to allow access to the metrics they contain.
10 |
11 | We'll use _CQLSH_ to look at these tables.
12 |
13 | ```
14 | cqlsh node1
15 | ```{{execute}}
16 |
17 | In _CQLSH_, the following command shows what the inbound table looks like.
18 |
19 | ```
20 | DESCRIBE TABLE system_views.internode_inbound;
21 | ```{{execute}}
22 |
23 | Here's the outbound table.
24 |
25 | ```
26 | DESCRIBE TABLE system_views.internode_outbound;
27 | ```{{execute}}
28 |
29 | Notice that these descriptions are embedded within comments.
30 | This is because the tables are virtual and were never actually created.
31 |
32 | Exit _CQLSH_ using the following command.
33 |
34 | ```
35 | QUIT
36 | ```{{execute}}
37 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-internode-message/step3.md:
--------------------------------------------------------------------------------
1 | Besides changing the way threads receive messages, Cassandra developers did a lot of cleanup and tuning of the internode message code path.
2 |
3 | As developers work on code and make changes, sometimes the code can become a bit brittle or inefficient.
4 | Developers refer to this as _Technical Debt_.
5 |
6 | It's good to retire technical debt by refactoring or cleaning up the code, and that is exactly what developers did with the internode message code for the 4.X release.
7 | The benefits of retiring technical debt include:
8 | * More efficient code, which means the code requires less processing
9 | * Code that is easier to read and understand so future changes are easier
10 | * Code that is more robust, yielding faster and more predictable response times
11 |
12 | The Cassandra 4.X cleanup includes:
13 | * Protocol improvements that remove redundant information and make the protocol more efficient
14 | * Handling corner cases where code didn't deal gracefully with exceptions
15 | * Buffer optimization that reduces memory requirements due to internode messaging
16 | * Introduction of messaging timeouts under certain conditions
17 | * Optimizations that allow a node to bypass long code paths when sending messages to itself
18 |
19 | The bottom line for these changes is Cassandra is faster, more efficient and more robust!
20 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-full-query-logging/index.json:
--------------------------------------------------------------------------------
1 | {
2 | "title": "Full Query Logging",
3 | "description": "New Features in Cassandra 4",
4 | "difficulty": "Easy",
5 | "time": "15 minutes",
6 | "details": {
7 | "assets": {
8 | "host01": [
9 | {"file": "wait.sh", "target": "/usr/local/bin", "chmod": "+x"}
10 | ]
11 | },
12 | "steps": [
13 | {
14 | "title": "Enable Full Query Logging via nodetool",
15 | "text": "step1.md"
16 | },
17 | {
18 | "title": "Create Schema and Perform Queries",
19 | "text": "step2.md"
20 | },
21 | {
22 | "title": "Use fqltool to review Full Query Logs",
23 | "text": "step3.md"
24 | },
25 | {
26 | "title": "Configure Full Query Logging via cassandra.yaml",
27 | "text": "step4.md"
28 | },
29 | {
30 | "title": "Test Your Understanding",
31 | "text": "quiz.md"
32 | }
33 | ],
34 | "intro": {
35 | "text": "intro.md",
36 | "courseData": "background.sh",
37 | "code": "foreground.sh"
38 | },
39 | "finish": {
40 | "text": "finish.md"
41 | }
42 | },
43 | "environment": {
44 | "uilayout": "editor-terminal",
45 | "uieditorpath": "/"
46 | },
47 | "backend": {
48 | "imageid": "datastax-oss-cassandra"
49 | }
50 | }
51 |
--------------------------------------------------------------------------------
/cassandra-features-4x/cassandra4-internode-message/step7.md:
--------------------------------------------------------------------------------
1 | Since there are only two nodes, you would expect that the number of bytes sent from one node should be equal to the number of bytes received by the other node.
2 | Let's see if we can demonstrate that.
3 |
4 | We have prepared two files containing the CQL queries for these tables.
5 | We will run these queries on separate nodes nearly simultaneously and look at the results.
6 |
7 | Take a look at the inbound query.
8 |
9 | ```
10 | cat in.cql
11 | ```{{execute}}
12 |
13 | You see we are only looking at two fields: the number of operations and the number of bytes.
14 | Isolating these metrics makes it a little easier to compare the results across nodes.
15 |
16 |
17 | Here's the outbound query
18 |
19 | ```
20 | cat out.cql
21 | ```{{execute}}
22 |
23 | Here's the command to execute both of these queries on separate nodes nearly simultaneously.
24 |
25 | ```
26 | cqlsh node1 -f in.cql; cqlsh node2 -f out.cql
27 | ```{{execute}}
28 |
29 | Often, the number of bytes written will exceed the number of bytes read.
30 | You can make sense of this by considering the number of operations.
31 | You see that the number of write operations often exceeds the number of read operations (until the read node catches up).
32 |
33 | Re-run the queries (by clicking above) until the number of operations is the same for both nodes.
34 | You see that the number of bytes transferred also matches.
35 |
--------------------------------------------------------------------------------
/cassandra-fundamentals/queries/step1.md:
--------------------------------------------------------------------------------
1 | CQL queries look just like SQL queries. However, while you will see familiar clauses `SELECT`, `FROM`, `WHERE`, `GROUP BY`
2 | and `ORDER BY`, CQL queries are much more restrictive in what goes into those clauses.
3 |
4 | A CQL query can only retrieve data from a single table, so there are no joins, self-joins, nested queries, unions, intersections and so forth.
5 | Moreover, only columns that are declared in table's `PRIMARY KEY` definition can be used to filter, group or order rows.
6 | The *primary key definition order* must be respected when filtering and grouping, such that a complete partition key must be used and
7 | when a clustering key column is used, any preceding clustering column in the primary key definition must also be used.
8 | When ordering rows, the *clustering order* declared in the table definition must be respected. Ordering only applies to rows within a partition and
9 | can be either preserved or reversed.
10 |
11 | These restrictions ensure that your queries only use efficient data access patterns, which include *retrieving one row*,
12 | *retrieving all rows or a subset of rows from one partition* and *retrieving rows from at most a few partitions*.
13 | The smaller the number of partitions a query touches, the better performance and throughput can be expected. When studying
14 | our query examples in this tutorial, pay attention to data access patterns they implement.
15 |
16 |
--------------------------------------------------------------------------------
/cassandra-data-modeling/structure.json:
--------------------------------------------------------------------------------
1 | {
2 | "title": "Data Modeling By Example",
3 | "description": "Learn how to create efficient and scalable Cassandra data models for IoT, e-commerce, finance, and more.",
4 | "items": [
5 | { "path": "sensor-data",
6 | "title": "Sensor Data Modeling",
7 | "description": "Learn how to create a data model for temperature monitoring sensor networks" },
8 | { "path": "messaging-data",
9 | "title": "Messaging Data Modeling",
10 | "description": "Learn how to create a data model for an email system" },
11 | { "path": "music-data",
12 | "title": "Digital Library Data Modeling",
13 | "description": "Learn how to create a data model for a digital music library" },
14 | { "path": "investment-data",
15 | "title": "Investment Portfolio Data Modeling",
16 | "description": "Learn how to create a data model for investment accounts or portfolios" },
17 | { "path": "time-series-data",
18 | "title": "Time Series Data Modeling",
19 | "description": "Learn how to create a data model for time series data" },
20 | { "path": "shopping-cart-data",
21 | "title": "Shopping Cart Data Modeling",
22 | "description": "Learn how to create a data model for online shopping carts" },
23 | { "path": "order-management-data",
24 | "title": "Order Management Data Modeling",
25 | "description": "Learn how to create a data model for an order management system" }
26 | ]
27 | }
--------------------------------------------------------------------------------
/cassandra-data-modeling/order-management-data/step8.md:
--------------------------------------------------------------------------------
1 | Cancel order `113-3827060-8722206` placed by user `joe` on `2020-11-17` at `22:20:43` by updating its status from `pending` to `canceled`:
2 |
3 | Step 1. Update the "source-of-truth" table using a light weight transaction:
7 | 8 | ```sql 9 | UPDATE orders_by_id 10 | SET order_status = 'canceled' 11 | WHERE order_id = '113-3827060-8722206' 12 | IF order_status = 'pending'; 13 | ```{{execute}} 14 | 15 | 16 |Step 2. Update the other tables if and only if the previous transaction was successfully applied:
17 | 18 | ```sql 19 | UPDATE orders_by_user 20 | SET order_status = 'canceled' 21 | WHERE order_id = '113-3827060-8722206' 22 | AND user_id = 'joe' 23 | AND order_timestamp = '2020-11-17 22:20:43'; 24 | 25 | INSERT INTO order_status_history_by_id (order_id, status_timestamp, order_status) 26 | VALUES ('113-3827060-8722206',TOTIMESTAMP(NOW()),'canceled'); 27 | ```{{execute}} 28 | 29 |Step 3. Optionally, verify the changes:
30 | 31 | ```sql 32 | SELECT order_status 33 | FROM orders_by_id 34 | WHERE order_id = '113-3827060-8722206'; 35 | 36 | SELECT order_status 37 | FROM orders_by_user 38 | WHERE order_id = '113-3827060-8722206' 39 | AND user_id = 'joe' 40 | AND order_timestamp = '2020-11-17 22:20:43'; 41 | 42 | SELECT order_status 43 | FROM order_status_history_by_id 44 | WHERE order_id = '113-3827060-8722206' 45 | LIMIT 1; 46 | ```{{execute}} 47 | 48 |14 | **Status:** 15 | Look at the first two characters of the status. 16 | Each character has an individual meaning. 17 | The sequence `UN` means the node's status is `Up` and state is `Normal`. 18 |
19 | --- 20 | 21 |  22 | 23 | Now that the node is running, you will create a keyspace and table. 24 | Start the CQL Shell (*cqlsh*) so you can issue CQL commands. 25 | 26 | ``` 27 | cqlsh 28 | ```{{execute}} 29 | 30 | Create the `music` keyspace. 31 | 32 | ``` 33 | create KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; 34 | ```{{execute}} 35 | 36 | Use the `music` keyspace. 37 | 38 | ``` 39 | use music; 40 | ```{{execute}} 41 | 42 | Create the `songs` table. 43 | 44 | ``` 45 | CREATE TABLE songs ( 46 | artist TEXT, 47 | title TEXT, 48 | year INT, 49 | PRIMARY KEY ((artist), title) 50 | ); 51 | ```{{execute}} 52 | 53 | Type `exit` to close *cqlsh*. 54 | ``` 55 | exit 56 | ```{{execute}} 57 | 58 | # Summary 59 | 60 | In this step, you have verified that Cassandra is running and created the *music* keyspace and the *songs* table. -------------------------------------------------------------------------------- /cassandra-fundamentals/cql/background.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | apt-get update 3 | apt install -y openjdk-11-jre-headless 4 | export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" 5 | wget https://archive.apache.org/dist/cassandra/4.0.0/apache-cassandra-4.0.0-bin.tar.gz 6 | tar xzf apache-cassandra-4.0.0-bin.tar.gz 7 | sed -i 's/^cluster_name: .*$/cluster_name: "Cassandra Cluster"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 8 | #sed -i "s/^num_tokens:.*$/num_tokens: 1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 9 | #sed -i "s/^# initial_token:.*$/initial_token: -9223372036854775808/g" apache-cassandra-4.0.0/conf/cassandra.yaml 10 | sed -i 's/^endpoint_snitch: .*$/endpoint_snitch: GossipingPropertyFileSnitch/g' apache-cassandra-4.0.0/conf/cassandra.yaml 11 | sed -i 's/^dc=.*$/dc=DC-Houston/g' apache-cassandra-4.0.0/conf/cassandra-rackdc.properties 12 | sed -i "s/^listen_address:.*$/listen_address: 127.0.0.1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 13 | sed -i 's/^rpc_address:.*$/rpc_address: 127.0.0.1/g' apache-cassandra-4.0.0/conf/cassandra.yaml 14 | echo '127.0.0.1 node1' >> /etc/hosts 15 | #echo '[[HOST2_IP]] node2' >> /etc/hosts 16 | sed -i 's/^ - seeds:.*$/ - seeds: "127.0.0.1"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 17 | mv apache-cassandra-4.0.0 /usr/share/cassandra 18 | rm apache-cassandra-4.0.0-bin.tar.gz 19 | echo 'PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin"' >> .bashrc 20 | export PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin" 21 | source .bashrc 22 | /usr/share/cassandra/bin/cassandra -R 23 | while [ `grep "Starting listening for CQL clients" /usr/share/cassandra/logs/system.log | wc -l` -lt 1 ]; do 24 | sleep 15 25 | done 26 | echo "done" >> /opt/katacoda-background-finished 27 | -------------------------------------------------------------------------------- /cassandra-fundamentals/queries/step6.md: -------------------------------------------------------------------------------- 1 | Table `ratings_by_user` stores information about movie ratings organized by users, 2 | such that each partition contains all ratings left by one particular user. 3 | This table has multi-row partitions and 4 | the primary key defined as `PRIMARY KEY ((email), title, year)`. 5 | Let's first retrieve all rows from the table to learn how the data looks like and then focus 6 | on predicates that the primary key can support. 7 | 8 | Q1. Retrieve all rows: 9 | ``` 10 | SELECT * FROM ratings_by_user; 11 | ```{{execute}} 12 | 13 | Q2. Retrieve one partition: 14 | ``` 15 | SELECT * FROM ratings_by_user 16 | WHERE email = 'joe@datastax.com'; 17 | ```{{execute}} 18 | 19 | Q3. Retrieve two partitions: 20 | ``` 21 | SELECT * FROM ratings_by_user 22 | WHERE email IN ('joe@datastax.com', 23 | 'jen@datastax.com'); 24 | ```{{execute}} 25 | 26 | Q4. Retrieve one row: 27 | ``` 28 | SELECT * FROM ratings_by_user 29 | WHERE email = 'jim@datastax.com' 30 | AND title = 'Alice in Wonderland' 31 | AND year = 2010; 32 | ```{{execute}} 33 | 34 | Q5 - Q8. Retrieve a subset of rows from a partition: 35 | ``` 36 | SELECT * FROM ratings_by_user 37 | WHERE email = 'jim@datastax.com' 38 | AND title = 'Alice in Wonderland' 39 | AND year IN (2010, 1951); 40 | ```{{execute}} 41 | ``` 42 | SELECT * FROM ratings_by_user 43 | WHERE email = 'jim@datastax.com' 44 | AND title = 'Alice in Wonderland' 45 | AND year > 1950; 46 | ```{{execute}} 47 | ``` 48 | SELECT * FROM ratings_by_user 49 | WHERE email = 'jim@datastax.com' 50 | AND title = 'Alice in Wonderland'; 51 | ```{{execute}} 52 | ``` 53 | SELECT * FROM ratings_by_user 54 | WHERE email = 'jim@datastax.com' 55 | AND title < 'Charlie and the Chocolate Factory'; 56 | ```{{execute}} -------------------------------------------------------------------------------- /cassandra-data-modeling/sensor-data/background.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | apt-get update 3 | apt install -y openjdk-11-jre-headless 4 | export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" 5 | wget https://archive.apache.org/dist/cassandra/4.0.0/apache-cassandra-4.0.0-bin.tar.gz 6 | tar xzf apache-cassandra-4.0.0-bin.tar.gz 7 | sed -i 's/^cluster_name: .*$/cluster_name: "Cassandra Cluster"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 8 | #sed -i "s/^num_tokens:.*$/num_tokens: 1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 9 | #sed -i "s/^# initial_token:.*$/initial_token: -9223372036854775808/g" apache-cassandra-4.0.0/conf/cassandra.yaml 10 | sed -i 's/^endpoint_snitch: .*$/endpoint_snitch: GossipingPropertyFileSnitch/g' apache-cassandra-4.0.0/conf/cassandra.yaml 11 | sed -i 's/^dc=.*$/dc=DC-Houston/g' apache-cassandra-4.0.0/conf/cassandra-rackdc.properties 12 | sed -i "s/^listen_address:.*$/listen_address: 127.0.0.1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 13 | sed -i 's/^rpc_address:.*$/rpc_address: 127.0.0.1/g' apache-cassandra-4.0.0/conf/cassandra.yaml 14 | echo '127.0.0.1 node1' >> /etc/hosts 15 | #echo '[[HOST2_IP]] node2' >> /etc/hosts 16 | sed -i 's/^ - seeds:.*$/ - seeds: "127.0.0.1"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 17 | mv apache-cassandra-4.0.0 /usr/share/cassandra 18 | rm apache-cassandra-4.0.0-bin.tar.gz 19 | echo 'PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin"' >> .bashrc 20 | export PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin" 21 | source .bashrc 22 | /usr/share/cassandra/bin/cassandra -R 23 | while [ `grep "Starting listening for CQL clients" /usr/share/cassandra/logs/system.log | wc -l` -lt 1 ]; do 24 | sleep 15 25 | done 26 | echo "done" >> /opt/katacoda-background-finished 27 | -------------------------------------------------------------------------------- /cassandra-data-modeling/investment-data/background.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | apt-get update 3 | apt install -y openjdk-11-jre-headless 4 | export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" 5 | wget https://archive.apache.org/dist/cassandra/4.0.0/apache-cassandra-4.0.0-bin.tar.gz 6 | tar xzf apache-cassandra-4.0.0-bin.tar.gz 7 | sed -i 's/^cluster_name: .*$/cluster_name: "Cassandra Cluster"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 8 | #sed -i "s/^num_tokens:.*$/num_tokens: 1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 9 | #sed -i "s/^# initial_token:.*$/initial_token: -9223372036854775808/g" apache-cassandra-4.0.0/conf/cassandra.yaml 10 | sed -i 's/^endpoint_snitch: .*$/endpoint_snitch: GossipingPropertyFileSnitch/g' apache-cassandra-4.0.0/conf/cassandra.yaml 11 | sed -i 's/^dc=.*$/dc=DC-Houston/g' apache-cassandra-4.0.0/conf/cassandra-rackdc.properties 12 | sed -i "s/^listen_address:.*$/listen_address: 127.0.0.1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 13 | sed -i 's/^rpc_address:.*$/rpc_address: 127.0.0.1/g' apache-cassandra-4.0.0/conf/cassandra.yaml 14 | echo '127.0.0.1 node1' >> /etc/hosts 15 | #echo '[[HOST2_IP]] node2' >> /etc/hosts 16 | sed -i 's/^ - seeds:.*$/ - seeds: "127.0.0.1"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 17 | mv apache-cassandra-4.0.0 /usr/share/cassandra 18 | rm apache-cassandra-4.0.0-bin.tar.gz 19 | echo 'PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin"' >> .bashrc 20 | export PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin" 21 | source .bashrc 22 | /usr/share/cassandra/bin/cassandra -R 23 | while [ `grep "Starting listening for CQL clients" /usr/share/cassandra/logs/system.log | wc -l` -lt 1 ]; do 24 | sleep 15 25 | done 26 | echo "done" >> /opt/katacoda-background-finished 27 | -------------------------------------------------------------------------------- /cassandra-data-modeling/messaging-data/background.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | apt-get update 3 | apt install -y openjdk-11-jre-headless 4 | export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" 5 | wget https://archive.apache.org/dist/cassandra/4.0.0/apache-cassandra-4.0.0-bin.tar.gz 6 | tar xzf apache-cassandra-4.0.0-bin.tar.gz 7 | sed -i 's/^cluster_name: .*$/cluster_name: "Cassandra Cluster"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 8 | #sed -i "s/^num_tokens:.*$/num_tokens: 1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 9 | #sed -i "s/^# initial_token:.*$/initial_token: -9223372036854775808/g" apache-cassandra-4.0.0/conf/cassandra.yaml 10 | sed -i 's/^endpoint_snitch: .*$/endpoint_snitch: GossipingPropertyFileSnitch/g' apache-cassandra-4.0.0/conf/cassandra.yaml 11 | sed -i 's/^dc=.*$/dc=DC-Houston/g' apache-cassandra-4.0.0/conf/cassandra-rackdc.properties 12 | sed -i "s/^listen_address:.*$/listen_address: 127.0.0.1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 13 | sed -i 's/^rpc_address:.*$/rpc_address: 127.0.0.1/g' apache-cassandra-4.0.0/conf/cassandra.yaml 14 | echo '127.0.0.1 node1' >> /etc/hosts 15 | #echo '[[HOST2_IP]] node2' >> /etc/hosts 16 | sed -i 's/^ - seeds:.*$/ - seeds: "127.0.0.1"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 17 | mv apache-cassandra-4.0.0 /usr/share/cassandra 18 | rm apache-cassandra-4.0.0-bin.tar.gz 19 | echo 'PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin"' >> .bashrc 20 | export PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin" 21 | source .bashrc 22 | /usr/share/cassandra/bin/cassandra -R 23 | while [ `grep "Starting listening for CQL clients" /usr/share/cassandra/logs/system.log | wc -l` -lt 1 ]; do 24 | sleep 15 25 | done 26 | echo "done" >> /opt/katacoda-background-finished 27 | -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-migrate-cassandra-3-to-4/index.json: -------------------------------------------------------------------------------- 1 | { 2 | "private": false, 3 | "title": "Migrate Cassandra 3.x -> 4.x", 4 | "description": "Learn how to perform 'zero-downtime' migration from Cassandra 3.x -> 4.x", 5 | "difficulty": "Intermediate", 6 | "time": "20 minutes", 7 | "details": { 8 | "assets": { 9 | "host01": [ 10 | {"file": "wait.sh", "target": "/usr/local/bin", "chmod": "+x"} 11 | ] 12 | }, 13 | "steps": [ 14 | { 15 | "title": "Create a Cassandra 3.11.9 Cluster", 16 | "text": "step1.md" 17 | }, 18 | { 19 | "title": "Populate the Cluster", 20 | "text": "step2.md" 21 | }, 22 | { 23 | "title": "Verify that the Cluster is Ready to Upgrade", 24 | "text": "step3.md" 25 | }, 26 | { 27 | "title": "Prepare the 3.x Cluster for Migration", 28 | "text": "step4.md" 29 | }, 30 | { 31 | "title": "Install Cassandra 4.0", 32 | "text": "step5.md" 33 | }, 34 | { 35 | "title": "Start New Node", 36 | "text": "step6.md" 37 | }, 38 | { 39 | "title": "Verify New Node", 40 | "text": "step7.md" 41 | }, 42 | { 43 | "title": "Test Your Understanding", 44 | "text": "quiz.md" 45 | } 46 | ], 47 | "intro": { 48 | "courseData": "background.sh", 49 | "code": "foreground.sh", 50 | "text": "intro.md" 51 | }, 52 | "finish": { 53 | "text": "finish.md" 54 | } 55 | }, 56 | "environment": { 57 | "uilayout": "terminal" 58 | }, 59 | "backend": { 60 | "imageid": "ubuntu:1804" 61 | } 62 | } -------------------------------------------------------------------------------- /cassandra-data-modeling/sensor-data/step5.md: -------------------------------------------------------------------------------- 1 | Find hourly average temperatures for every sensor in network `forest-net` and date range [`2020-07-05`,`2020-07-06`] within the week of `2020-07-05`; 2 | order by date (desc) and hour (desc): 3 | 4 |full_query_logging_options:7 |
log_dir: /tmp/fqllogs8 | 9 | 10 | # Configurable Properties 11 | Here are the configurable properties for full query logging: 12 | 13 | - `log_dir`: Enable full query logging by setting this property to an existing directory location. 14 | - `roll_cycle`: Sets the frequency at which log segments are rolled - DAILY, HOURLY (the default), or MINUTELY. 15 | - `block`: Determines whether writes to the full query log will block query completion if full query logging falls behind, defaults to true. 16 | - `max_queue_weight`: Sets the maximum size of the in-memory queue of full query logs to be written to disk before blocking occurs, defaults to 256 MiB. 17 | - `max_log_size`: Sets the maximum size of full query log files on disk (default 16 GiB). After this value is exceeded, the oldest log file will be deleted. 18 | - `archive_command`: Optionally, provides a command that will be used to archive full query log files before deletion. 19 | - `max_archive_retries`: Sets a maximum number of times a failed archive command will be retried (defaults to 10) 20 | 21 | # Summary 22 | 23 | In this step, you learned how to enable full query logging in the `cassandra.yaml` file and explored the configurable properties of full query logging. 24 | 25 | 26 | -------------------------------------------------------------------------------- /cassandra-data-modeling/investment-data/step2.md: -------------------------------------------------------------------------------- 1 | Create table `accounts_by_user`: 2 | ```sql 3 | CREATE TABLE accounts_by_user ( 4 | username TEXT, 5 | account_number TEXT, 6 | cash_balance DECIMAL, 7 | name TEXT STATIC, 8 | PRIMARY KEY ((username),account_number) 9 | ); 10 | ```{{execute}} 11 | 12 | Create table `positions_by_account`: 13 | ```sql 14 | CREATE TABLE positions_by_account ( 15 | account TEXT, 16 | symbol TEXT, 17 | quantity DECIMAL, 18 | PRIMARY KEY ((account),symbol) 19 | ); 20 | ```{{execute}} 21 | 22 | Create table `trades_by_a_d`: 23 | ```sql 24 | CREATE TABLE trades_by_a_d ( 25 | account TEXT, 26 | trade_id TIMEUUID, 27 | type TEXT, 28 | symbol TEXT, 29 | shares DECIMAL, 30 | price DECIMAL, 31 | amount DECIMAL, 32 | PRIMARY KEY ((account),trade_id) 33 | ) WITH CLUSTERING ORDER BY (trade_id DESC); 34 | ```{{execute}} 35 | 36 | Create table `trades_by_a_td`: 37 | ```sql 38 | CREATE TABLE trades_by_a_td ( 39 | account TEXT, 40 | trade_id TIMEUUID, 41 | type TEXT, 42 | symbol TEXT, 43 | shares DECIMAL, 44 | price DECIMAL, 45 | amount DECIMAL, 46 | PRIMARY KEY ((account),type,trade_id) 47 | ) WITH CLUSTERING ORDER BY (type ASC, trade_id DESC); 48 | ```{{execute}} 49 | 50 | Create table `trades_by_a_std`: 51 | ```sql 52 | CREATE TABLE trades_by_a_std ( 53 | account TEXT, 54 | trade_id TIMEUUID, 55 | type TEXT, 56 | symbol TEXT, 57 | shares DECIMAL, 58 | price DECIMAL, 59 | amount DECIMAL, 60 | PRIMARY KEY ((account),symbol,type,trade_id) 61 | ) WITH CLUSTERING ORDER BY (symbol ASC, type ASC, trade_id DESC); 62 | ```{{execute}} 63 | 64 | Create table `trades_by_a_sd`: 65 | ```sql 66 | CREATE TABLE trades_by_a_sd ( 67 | account TEXT, 68 | trade_id TIMEUUID, 69 | type TEXT, 70 | symbol TEXT, 71 | shares DECIMAL, 72 | price DECIMAL, 73 | amount DECIMAL, 74 | PRIMARY KEY ((account),symbol,trade_id) 75 | ) WITH CLUSTERING ORDER BY (symbol ASC, trade_id DESC); 76 | ```{{execute}} -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-internode-message/index.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "Apache Cassandra™ Internode Messaging Improvements", 3 | "description": "New Features in Cassandra 4", 4 | "difficulty": "Beginner", 5 | "time": "10 minutes", 6 | "details": { 7 | "assets": { 8 | "host01": [ 9 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"}, 10 | {"file": "in.cql", "target": "/root/"}, 11 | {"file": "out.cql", "target": "/root/"} 12 | ], 13 | "host02": [ 14 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"}, 15 | {"file": "in.cql", "target": "/root/"}, 16 | {"file": "out.cql", "target": "/root/"} 17 | ] 18 | }, 19 | "steps": [ 20 | { 21 | "title": "Welcome to Internode Messaging", 22 | "text": "step1.md" 23 | }, 24 | { 25 | "title": "Asynchronous Messages", 26 | "text": "step2.md" 27 | }, 28 | { 29 | "title": "Cleaning Up Technical Debt", 30 | "text": "step3.md" 31 | }, 32 | { 33 | "title": "Check Out the Cluster", 34 | "text": "step4.md" 35 | }, 36 | { 37 | "title": "Internode Metrics and Virtual Tables", 38 | "text": "step5.md" 39 | }, 40 | { 41 | "title": "Review the Metrics", 42 | "text": "step6.md" 43 | }, 44 | { 45 | "title": "Do the Metrics Add Up?", 46 | "text": "step7.md" 47 | }, 48 | { 49 | "title": "Test your understanding", 50 | "text": "quiz.md" 51 | } 52 | ], 53 | "intro": { 54 | "courseData": "background.sh", 55 | "code": "foreground.sh", 56 | "text": "intro.md" 57 | }, 58 | "finish": { 59 | "text": "finish.md" 60 | } 61 | }, 62 | "environment": { 63 | "uilayout": "terminal" 64 | }, 65 | "backend": { 66 | "imageid": "docker-swarm" 67 | } 68 | } 69 | -------------------------------------------------------------------------------- /cassandra-data-modeling/time-series-data/index.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "Time Series Data Modeling Example for Cassandra", 3 | "description": "Explore how time series data can be stored and queried in Cassandra NoSQL database", 4 | "difficulty": "Beginner", 5 | "time": "15 minutes", 6 | "details": { 7 | "assets": { 8 | "host01": [ 9 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"}, 10 | {"file": "time_series_data.tar.gz", "target": "/root/"} 11 | ] 12 | }, 13 | "steps": [ 14 | { 15 | "title": "Create a keyspace", 16 | "text": "step1.md" 17 | }, 18 | { 19 | "title": "Create tables", 20 | "text": "step2.md" 21 | }, 22 | { 23 | "title": "Populate tables using DSBulk", 24 | "text": "step3.md" 25 | }, 26 | { 27 | "title": "Start the CQL shell", 28 | "text": "step4.md" 29 | }, 30 | { 31 | "title": "Design query Q1", 32 | "text": "step5.md" 33 | }, 34 | { 35 | "title": "Design query Q2", 36 | "text": "step6.md" 37 | }, 38 | { 39 | "title": "Design query Q3", 40 | "text": "step7.md" 41 | }, 42 | { 43 | "title": "Design query Q4", 44 | "text": "step8.md" 45 | }, 46 | { 47 | "title": "Design query Q5", 48 | "text": "step9.md" 49 | }, 50 | { 51 | "title": "Design query Q6", 52 | "text": "step10.md" 53 | }, 54 | { 55 | "title": "Design query Q7", 56 | "text": "step11.md" 57 | } 58 | ], 59 | "intro": { 60 | "courseData": "background.sh", 61 | "code": "foreground.sh", 62 | "text": "intro.md" 63 | }, 64 | "finish": { 65 | "text": "finish.md" 66 | } 67 | }, 68 | "environment": { 69 | "uilayout": "terminal" 70 | }, 71 | "backend": { 72 | "imageid": "ubuntu20.04" 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /cassandra-fundamentals/queries/index.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "Queries in Apache Cassandra™", 3 | "description": "Learn how to retrieve data from Cassandra tables", 4 | "difficulty": "Beginner", 5 | "time": "20 minutes", 6 | "details": { 7 | "assets": { 8 | "host01": [ 9 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"} 10 | ] 11 | }, 12 | "steps": [ 13 | { 14 | "title": "Querying tables", 15 | "text": "step1.md" 16 | }, 17 | { 18 | "title": "Syntax", 19 | "text": "step2.md" 20 | }, 21 | { 22 | "title": "Let's get started ...", 23 | "text": "step3.md" 24 | }, 25 | { 26 | "title": "Querying table \"users\"", 27 | "text": "step4.md" 28 | }, 29 | { 30 | "title": "Querying table \"movies\"", 31 | "text": "step5.md" 32 | }, 33 | { 34 | "title": "Querying table \"ratings_by_user\"", 35 | "text": "step6.md" 36 | }, 37 | { 38 | "title": "Querying table \"ratings_by_movie\"", 39 | "text": "step7.md" 40 | }, 41 | { 42 | "title": "Using aggregates and functions", 43 | "text": "step8.md" 44 | }, 45 | { 46 | "title": "Grouping rows", 47 | "text": "step9.md" 48 | }, 49 | { 50 | "title": "Ordering rows", 51 | "text": "step10.md" 52 | }, 53 | { 54 | "title": "Setting limits", 55 | "text": "step11.md" 56 | }, 57 | { 58 | "title": "Test Your Understanding", 59 | "text": "quiz.md" 60 | } 61 | ], 62 | "intro": { 63 | "courseData": "background.sh", 64 | "code": "foreground.sh", 65 | "text": "intro.md" 66 | }, 67 | "finish": { 68 | "text": "finish.md" 69 | } 70 | }, 71 | "environment": { 72 | "uilayout": "terminal" 73 | }, 74 | "backend": { 75 | "imageid": "ubuntu20.04" 76 | } 77 | } 78 | -------------------------------------------------------------------------------- /cassandra-data-modeling/music-data/background.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | apt-get update 3 | apt install -y openjdk-11-jre-headless 4 | export JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64" 5 | tar -xzvf music_data.tar.gz 6 | rm music_data.tar.gz 7 | wget https://downloads.datastax.com/dsbulk/dsbulk.tar.gz 8 | tar -xzvf dsbulk.tar.gz 9 | rm dsbulk.tar.gz 10 | mv dsbulk* dsbulk 11 | echo 'PATH="$PATH:/root/dsbulk/bin"' >> .bashrc 12 | export PATH="$PATH:/root/dsbulk/bin" 13 | wget https://archive.apache.org/dist/cassandra/4.0.0/apache-cassandra-4.0.0-bin.tar.gz 14 | tar xzf apache-cassandra-4.0.0-bin.tar.gz 15 | sed -i 's/^cluster_name: .*$/cluster_name: "Cassandra Cluster"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 16 | #sed -i "s/^num_tokens:.*$/num_tokens: 1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 17 | #sed -i "s/^# initial_token:.*$/initial_token: -9223372036854775808/g" apache-cassandra-4.0.0/conf/cassandra.yaml 18 | sed -i 's/^endpoint_snitch: .*$/endpoint_snitch: GossipingPropertyFileSnitch/g' apache-cassandra-4.0.0/conf/cassandra.yaml 19 | sed -i 's/^dc=.*$/dc=DC-Houston/g' apache-cassandra-4.0.0/conf/cassandra-rackdc.properties 20 | sed -i "s/^listen_address:.*$/listen_address: 127.0.0.1/g" apache-cassandra-4.0.0/conf/cassandra.yaml 21 | sed -i 's/^rpc_address:.*$/rpc_address: 127.0.0.1/g' apache-cassandra-4.0.0/conf/cassandra.yaml 22 | echo '127.0.0.1 node1' >> /etc/hosts 23 | #echo '[[HOST2_IP]] node2' >> /etc/hosts 24 | sed -i 's/^ - seeds:.*$/ - seeds: "127.0.0.1"/g' apache-cassandra-4.0.0/conf/cassandra.yaml 25 | mv apache-cassandra-4.0.0 /usr/share/cassandra 26 | rm apache-cassandra-4.0.0-bin.tar.gz 27 | echo 'PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin"' >> .bashrc 28 | export PATH="$PATH:/usr/share/cassandra/bin:/usr/share/cassandra/tools/bin" 29 | source .bashrc 30 | /usr/share/cassandra/bin/cassandra -R 31 | while [ `grep "Starting listening for CQL clients" /usr/share/cassandra/logs/system.log | wc -l` -lt 1 ]; do 32 | sleep 15 33 | done 34 | echo "done" >> /opt/katacoda-background-finished 35 | -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-repair-improvements/index.json: -------------------------------------------------------------------------------- 1 | { 2 | "title": "Repair Improvements", 3 | "description": "Learn how to manage incremental repair in a Cassandra 4.0 cluster", 4 | "difficulty": "Intermediate", 5 | "time": "25 minutes", 6 | "details": { 7 | "assets": { 8 | "host01": [ 9 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"}, 10 | {"file": "delete_nongases.cql", "target": "/root/"}, 11 | {"file": "elements.csv", "target": "/root/"} 12 | ], 13 | "host02": [ 14 | {"file": "wait.sh", "target": "/usr/local/bin/", "chmod": "+x"}, 15 | {"file": "delete_nongases.cql", "target": "/root/"}, 16 | {"file": "elements.csv", "target": "/root/"} 17 | ] 18 | }, 19 | "steps": [ 20 | { 21 | "title": "Setup & create data", 22 | "text": "step1.md" 23 | }, 24 | { 25 | "title": "Fun with SSTables", 26 | "text": "step2.md" 27 | }, 28 | { 29 | "title": "The need for repair", 30 | "text": "step3.md" 31 | }, 32 | { 33 | "title": "Incremental repair", 34 | "text": "step4.md" 35 | }, 36 | { 37 | "title": "Test Your Understanding", 38 | "text": "quiz.md" 39 | } 40 | ], 41 | "intro": { 42 | "courseData": "background.sh", 43 | "code": "foreground.sh", 44 | "text": "intro.md" 45 | }, 46 | "finish": { 47 | "text": "finish.md" 48 | } 49 | }, 50 | "environment": { 51 | "uilayout": "terminal", 52 | "terminals": [ 53 | {"name": "Node1 Admin", "target": "host01"}, 54 | {"name": "Node1 Console", "target": "host01"}, 55 | {"name": "Node1 CQLSH", "target": "host01"}, 56 | {"name": "Node2 Admin", "target": "host02"}, 57 | {"name": "Node2 Console", "target": "host02"}, 58 | {"name": "Node2 CQLSH", "target": "host02"} 59 | ] 60 | }, 61 | "backend": { 62 | "imageid": "docker-swarm" 63 | } 64 | } 65 | -------------------------------------------------------------------------------- /cassandra-fundamentals/queries/step7.md: -------------------------------------------------------------------------------- 1 | Table `ratings_by_movie` stores information about ratings organized by movies, 2 | such that each partition contains all ratings for one particular movie. 3 | This table has multi-row partitions and 4 | the primary key defined as `PRIMARY KEY ((title, year), email)`. 5 | Let's first retrieve all rows from the table to learn how the data looks like and then focus 6 | on predicates that the primary key can support. 7 | 8 | Q1. Retrieve all rows: 9 |
5 | **Note:** 6 | Settings in `cassandra.yaml` only take effect after a node start or re-start. 7 |
8 | --- 9 | 10 | Stop the Cassandra service 11 | ``` 12 | service cassandra stop 13 | ```{{execute}} 14 | 15 | Verify that Cassandra has stopped 16 | ``` 17 | nodetool status 18 | ```{{execute}} 19 | 20 | You should see a message like this: 21 |  22 | 23 | Click to open the `/etc/cassandra/cassandra.yaml`{{open}} file in the editor. 24 | 25 | Add the YAML configuration to enable audit logging: 26 |audit_logging_options:27 |
enabled: true28 | 29 | Re-start the Cassandra service 30 | ``` 31 | service cassandra start 32 | ```{{execute}} 33 | 34 | Verify that Cassandra has started 35 | ``` 36 | nodetool status 37 | ```{{execute}} 38 | 39 | --- 40 |
41 | **Note:** 42 | You may need to run `nodetool status` a few times before Cassandra has finished the startup process. 43 |
44 | --- 45 | 46 | Next you will insert another song and verify that the insertion shows up in the audit logs. 47 | 48 | Open cqlsh 49 | ``` 50 | cqlsh 51 | ```{{execute}} 52 | 53 | 54 | Insert another song into the *songs* table. 55 | ``` 56 | use music; 57 | INSERT INTO songs (artist, title, year) VALUES('Paul Simon', 'Kodachrome', 1973); 58 | ```{{execute}} 59 | 60 | Type `exit` to close *cqlsh*. 61 | ``` 62 | exit 63 | ```{{execute}} 64 | 65 | View the audit logs. 66 | ``` 67 | auditlogviewer /var/log/cassandra/audit 68 | ```{{execute}} 69 | 70 | You should now see that Paul Simon's *Kodachrome* has been inserted. 71 | 72 |  73 | 74 | # Summary 75 | 76 | In this step, you modified `cassandra.yaml` and re-started the server to enable audit logging. You then used *auditlogviewer* to verify that the operations you performed were recorded in the audit logs. -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-migrate-cassandra-3-to-4/step2.md: -------------------------------------------------------------------------------- 1 | In this step, you will create a keyspace and a table and populate them with some data. 2 | 3 | Click to start a CQL shell (cqlsh) to execute CQL commands in the cluster. 4 | ``` 5 | cqlsh 6 | ```{{execute T1}} 7 | 8 | Create a keyspace. 9 | ``` 10 | CREATE KEYSPACE united_states WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; 11 | ```{{execute T1}} 12 | 13 | Use the keyspace. 14 | ``` 15 | USE united_states; 16 | ```{{execute T1}} 17 | 18 | Create the table. 19 | ``` 20 | CREATE TABLE cities_by_state( 21 | state text, 22 | name text, 23 | population int, 24 | PRIMARY KEY((state), name) 25 | ); 26 | ```{{execute T1}} 27 | 28 | Insert the top 10 (by population) cities in the United States. 29 | ``` 30 | INSERT INTO cities_by_state (state, name, population) 31 | VALUES ('New York','New York City',8622357); 32 | INSERT INTO cities_by_state (state, name, population) 33 | VALUES ('California','Los Angeles',4085014); 34 | INSERT INTO cities_by_state (state, name, population) 35 | VALUES ('Illinois','Chicago',2670406); 36 | INSERT INTO cities_by_state (state, name, population) 37 | VALUES ('Texas','Houston',2378146); 38 | INSERT INTO cities_by_state (state, name, population) 39 | VALUES ('Arizona','Phoenix',1743469); 40 | INSERT INTO cities_by_state (state, name, population) 41 | VALUES ('Pennsylvania','Philadelphia',1590402); 42 | INSERT INTO cities_by_state (state, name, population) 43 | VALUES ('Texas','San Antonio',1579504); 44 | INSERT INTO cities_by_state (state, name, population) 45 | VALUES ('California','San Diego',1469490); 46 | INSERT INTO cities_by_state (state, name, population) 47 | VALUES ('Texas','Dallas',1400337); 48 | INSERT INTO cities_by_state (state, name, population) 49 | VALUES ('California','San Jose',1036242); 50 | ```{{execute T1}} 51 | 52 | Verify that the data has been loaded. 53 | ``` 54 | SELECT * FROM cities_by_state; 55 | ```{{execute T1}} 56 | 57 | Retrieve all the cities in California. 58 | ``` 59 | SELECT * FROM cities_by_state WHERE state = 'California'; 60 | ```{{execute T1}} 61 | 62 | Exit the CQL shell and clear the screen. 63 | ``` 64 | exit 65 | clear 66 | ```{{execute T1}} 67 | 68 | You have loaded the data, continue to the next step. -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-full-query-logging/step2.md: -------------------------------------------------------------------------------- 1 | In this step, you will connect using *cqlsh* and create a keyspace and table, perform some queries, and verify that full query logs are being created 2 | 3 | Start the CQL Shell (*cqlsh*) so you can issue CQL commands. 4 | 5 | ``` 6 | cqlsh 7 | ```{{execute}} 8 | 9 | Create the `movies` keyspace. 10 | 11 | ``` 12 | create KEYSPACE movies WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; 13 | ```{{execute}} 14 | 15 | Use the `movies` keyspace. 16 | 17 | ``` 18 | use movies; 19 | ```{{execute}} 20 | 21 | Create the `movie_metadata` table. 22 | 23 | ``` 24 | CREATE TABLE movie_metadata( 25 | imdb_id text, 26 | overview text, 27 | release_date text, 28 | title text, 29 | average_rating float, 30 | PRIMARY KEY(imdb_id)); 31 | ```{{execute}} 32 | 33 | Insert a row into the *movie_metadata* table. 34 | ``` 35 | INSERT INTO movie_metadata ( 36 | imdb_id, overview, release_date, title, average_rating 37 | ) VALUES('tt0114709', 'Led by Woody, Andy''s toys live happily in his room until Andy''s birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy''s heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.', '10/30/95', 'Toy Story', 7.7); 38 | ```{{execute}} 39 | 40 | Now let's do a `SELECT` 41 | 42 | ``` 43 | SELECT * FROM movie_metadata WHERE imdb_id = 'tt0114709'; 44 | ```{{execute}} 45 | 46 | You should see the row you just inserted. 47 | 48 | Type `exit` to close *cqlsh*. 49 | ``` 50 | exit 51 | ```{{execute}} 52 | 53 | Now, let's check the contents of our log directory to see if anything has been created: 54 | 55 | ``` 56 | ls /tmp/fqllogs 57 | ```{{execute}} 58 | 59 | You'll see two files, a file with a date timestamp in the name, and another file which provides a directory of all the dated files that have been written. You can try opening these files if you wish, but the contents won't make a lot of sense since they are binary data. Don't worry, Cassandra has a way to read this data. 60 | 61 | # Summary 62 | 63 | In this step, you have created the *movies* keyspace and the *movie_metadata* table, and performed some queries, and verified that full query logs were created. -------------------------------------------------------------------------------- /cassandra-features-4x/virtual-tables/step4.md: -------------------------------------------------------------------------------- 1 | Now, we want to look at the clients currently connected to this node through CQL. 2 | This is done by querying the virtual table `system_views.clients`: 3 | 4 | ``` 5 | SELECT port, connection_stage, driver_name, protocol_version, username FROM clients ; 6 | ```{{execute T2}} 7 | 8 | Wait a minute ... who are these clients? 9 | 10 | It turns out that `cqlsh` uses the Python driver. 11 | This driver keeps two connections alive on two different ports 12 | (the port numbers are chosen dynamically). 13 | So you are simply looking at the connection between your own `cqlsh` 14 | and the node. 15 | 16 | Let's create more connections. 17 | First, let's start a Python interpreter console (or _REPL_) and connect to the 18 | node from there. 19 | Go to the third terminal and type 20 | ``` 21 | python3 22 | ```{{execute T3}} 23 | 24 | Next, import the Python drivers and use them to connect to the local node 25 | (which is the default connection, so you don't need to provide IP addresses): 26 | ``` 27 | from dse.cluster import Cluster 28 | cluster = Cluster(protocol_version=4) 29 | session = cluster.connect() 30 | ```{{execute T3}} 31 | 32 | (Note: the drivers, `dse-driver==2.11.1`, have been preinstalled in Python for 33 | this scenario). 34 | 35 | In the Python REPL, try the following loop - which achieves the same effect 36 | as the query you ran earlier in `cqlsh` - **press Enter** to 37 | make it run: 38 | ``` 39 | rows = session.execute('SELECT port, connection_stage, ' 40 | 'driver_name, protocol_version FROM ' 41 | 'system_views.clients') 42 | for row in rows: 43 | print('%5i %8s %36s %2i' % ( 44 | row.port, row.connection_stage, 45 | row.driver_name, row.protocol_version 46 | )) 47 | ```{{execute T3}} 48 | 49 | How many rows are there? Look at the ports used and the protocol versions. 50 | Notice that the latter matches the required version specified a few lines above, 51 | when creating the `Cluster` object (`protocol_version=4`). 52 | 53 | Suppose you want to make sure all your clients have been upgraded to the 54 | more recent protocol (version 5). Check by issuing, in `cqlsh`, 55 | the following command (note its `WHERE` clause): 56 | ``` 57 | SELECT address, protocol_version, username FROM clients WHERE protocol_version < 5 ALLOW FILTERING ; 58 | ```{{execute T2}} 59 | 60 | Recall that for virtual tables there's no need to worry about 61 | full-cluster scans. 62 | -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-migrate-cassandra-3-to-4/step3.md: -------------------------------------------------------------------------------- 1 | In this step, we will verify that the Cassandra 3.x cluster is ready to be upgraded. There are 9 factors to consider: 2 | 3 | **Current State** 4 | All nodes in the cluster need to be in an ‘Up and Normal’ state. Check that there are no nodes in the cluster that are in a state different to *Up and Normal*. This command will list any nodes **not** in the *UN* state. 5 | ``` 6 | nodetool status | grep -v UN 7 | ```{{execute T1}} 8 | 9 | **Disk Space** 10 | Verify that each node has at least 50% diskspace free. 11 | ``` 12 | df -h 13 | ```{{execute T1}} 14 | 15 | **Errors** 16 | Ensure that there are no unresolved errors on nodes. Take alook at logged warnings as well. 17 | 18 | ``` 19 | grep -e "WARN" -e "ERROR" /usr/share/cassandra/logs/system.log 20 | ```{{execute T1}} 21 | 22 | **Gossip Stable** 23 | Verify all entries in the gossip information output have the gossip state ‘STATUS:NORMAL’. Use the following command to check if there are any nodes that have a status other than ‘NORMAL’. 24 | ``` 25 | nodetool gossipinfo | grep STATUS | grep -v NORMAL 26 | ```{{execute T1}} 27 | 28 | **Dropped Messages** 29 | Establish no Dropped Message log messages have been recorded on any node in the previous 72 hours. 30 | ``` 31 | nodetool tpstats | grep -A 12 Dropped 32 | ```{{execute T1}} 33 | 34 | **Backups Disabled** 35 | Verify that all automatic backups have been disabled. This includes disabling *Medusa* and any scripts that call `nodetool snapshot` until the upgrade is complete. 36 | 37 | **Repair Disabled** 38 | Verify that *repairs* have been disabled. This includes disabling automated repairs in *Reaper*. 39 | 40 | **Monitoring** 41 | Upgrading may result in a temporary reduction in performance, as it simulates a series of temporary node failures. Understanding how the upgrade impacts the performance of the system, both during and after, is crucial when working through the process. 42 | 43 | **Availability** 44 | Confirm that areas of the application that require Strong Consistency are using the `LOCAL_QUORUM` Consistency Level and a Replication Factor of 3. 45 | 46 | When `LOCAL_QUORUM` is used with a Replication Factor below 3, all nodes must be available for requests to start. A rolling restart using this configuration will result in full or partial unavailability while a node is *DOWN*. 47 | 48 | --- 49 | 50 | Clear the screen. 51 | ``` 52 | clear 53 | ```{{execute T1}} 54 | 55 | You are now ready to continue to the next step and begin the upgrade. -------------------------------------------------------------------------------- /cassandra-fundamentals/queries/step2.md: -------------------------------------------------------------------------------- 1 | To retrieve data from a table, Cassandra Query Language provides statement `SELECT` with the following simplified syntax: 2 | 3 | ``` 4 | SELECT [DISTINCT] * | 5 | select_expression [AS column_name][ , ... ] 6 | FROM [keyspace_name.] table_name 7 | [WHERE partition_key_predicate 8 | [AND clustering_key_predicate]] 9 | [GROUP BY primary_key_column_name][ , ... ] 10 | [ORDER BY clustering_key_column_name ASC|DESC][ , ... ] 11 | [PER PARTITION LIMIT number] 12 | [LIMIT number] 13 | [ALLOW FILTERING] 14 | ``` 15 | 16 | The `SELECT` clause specifies what to project into a final result. The projection list can include all columns using wildcard `*`, 17 | individual column names, aggregates, such as `COUNT` and `AVG`, and numerous functions that work with write-time timestamps, 18 | TTLs, and values of various data types. It is even possible to create user-defined aggregates and functions using 19 | statements `CREATE AGGREGATE` and `CREATE FUNCTION`. 20 | 21 | The `FROM` clause uses keyspace name and table name to identify an existing table. 22 | If a keyspace name is omitted, the current working keyspace is used. 23 | 24 | The `WHERE` clause supplies partition and row filtering predicates. At the very least, 25 | *all* partition key column values should be provided. Predicates for *one or more* clustering key columns can 26 | further restrict the result, as long as the primary key definition order is respected. All predicates must be *equality* predicates (`=` and `IN`), 27 | except the last clustering key column predicate can be an *inequality* predicate (`>`, `<`, `>=`, `<=`). 28 | 29 | The `GROUP BY` clause can group rows based on partition and clustering key columns, as long as the primary key definition order is respected. 30 | 31 | The `ORDER BY` clause can retrieve rows from each partition based on the clustering order declared in a table definition or its reverse. 32 | Even when `ORDER BY` is not used, a query result still preserves the clustering order. 33 | 34 | The `PER PARTITION LIMIT` and `LIMIT` clauses are used to specify the maximum number of rows per partition or overall, respectively, 35 | that can appear in a final result. 36 | 37 | Finally, `ALLOW FILTERING` allows Cassandra to scan data to execute queries. While this relaxes many restrictions on what predicates can be used in the `WHERE` clause, 38 | scanning is a very inefficient access pattern that should not be used in production. Only in rare cases, when a partition key is known, 39 | scanning rows within one partition may be ok. Even then, a new table, materialized view or secondary index should be considered instead as a better alternative. 40 | As a rule of thumb, you should avoid using `ALLOW FILTERING` in your queries and you can expect us to do the same in our examples. -------------------------------------------------------------------------------- /cassandra-data-modeling/music-data/step3.md: -------------------------------------------------------------------------------- 1 | Load data into table `performers`: 2 | ```bash 3 | dsbulk load -url performers.csv \ 4 | -k music_data \ 5 | -t performers \ 6 | -header true \ 7 | -logDir /tmp/logs 8 | ```{{execute}} 9 | 10 | Retrieve some rows from table `performers`: 11 | ```sql 12 | cqlsh -e "SELECT * FROM music_data.performers LIMIT 10;" 13 | ```{{execute}} 14 | 15 | Load data into tables `albums_by_performer`, `albums_by_title` and `albums_by_genre`: 16 | ```bash 17 | dsbulk load -url albums.csv \ 18 | -k music_data \ 19 | -t albums_by_performer \ 20 | -header true \ 21 | -logDir /tmp/logs 22 | 23 | dsbulk load -url albums.csv \ 24 | -k music_data \ 25 | -t albums_by_title \ 26 | -header true \ 27 | -logDir /tmp/logs 28 | 29 | dsbulk load -url albums.csv \ 30 | -k music_data \ 31 | -t albums_by_genre \ 32 | -header true \ 33 | -logDir /tmp/logs 34 | ```{{execute}} 35 | 36 | Retrieve some rows from tables `albums_by_performer`, `albums_by_title` and `albums_by_genre`: 37 | ```sql 38 | cqlsh -e "SELECT * FROM music_data.albums_by_performer LIMIT 5;" 39 | cqlsh -e "SELECT * FROM music_data.albums_by_title LIMIT 5;" 40 | cqlsh -e "SELECT * FROM music_data.albums_by_genre LIMIT 5;" 41 | ```{{execute}} 42 | 43 | Load data into tables `tracks_by_title` and `tracks_by_album`: 44 | ```bash 45 | dsbulk load -url tracks.csv \ 46 | -k music_data \ 47 | -t tracks_by_title \ 48 | -header true \ 49 | -m "0=album_title, \ 50 | 1=album_year, \ 51 | 2=genre, \ 52 | 3=number, \ 53 | 4=title" \ 54 | -logDir /tmp/logs 55 | 56 | dsbulk load -url tracks.csv \ 57 | -k music_data \ 58 | -t tracks_by_album \ 59 | -header true \ 60 | -m "0=album_title, \ 61 | 1=album_year, \ 62 | 2=genre, \ 63 | 3=number, \ 64 | 4=title" \ 65 | -logDir /tmp/logs 66 | ```{{execute}} 67 | 68 | Retrieve some rows from tables `tracks_by_title` and `tracks_by_album`: 69 | ```sql 70 | cqlsh -e "SELECT * FROM music_data.tracks_by_title LIMIT 5;" 71 | cqlsh -e "SELECT * FROM music_data.tracks_by_album LIMIT 5;" 72 | ```{{execute}} 73 | 74 | 75 | 76 | -------------------------------------------------------------------------------- /cassandra-features-4x/cassandra4-repair-improvements/step2.md: -------------------------------------------------------------------------------- 1 | We are about to bring the cluster to the conditions that warrant a data 2 | repair; but first, we have to make sure all recently-inserted rows, probably 3 | still lingering in memory (in the memtables), are flushed to disk in the 4 | form of SSTables. 5 | 6 | ### Flushing data 7 | 8 | Each time a table is created, it gets an ID that is used, among other things, 9 | also in the name of the directory containing the corresponding data. 10 | To identify the full name of the data directory for `elements`, look at 11 | the result of this command on Node1: 12 | ``` 13 | ls /usr/share/cassandra/data/data/chemistry/ 14 | ```{{Execute T3}} 15 | The output will be something similar to 16 | `elements-8f40e960043011ec8f376feadc8291b4`. 17 | 18 | Since the rows we just inserted are just a few, probably the data directory 19 | is still empty: 20 | this can be verified with (**NOTE**: copy and paste the 21 | actual ID in the command before executing): 22 | ``` 23 | ls /usr/share/cassandra/data/data/chemistry/elements-