├── Book_Version └── readme.md ├── Examples ├── ChristmasTree.kql ├── Readme.md └── StandardColumns.txt ├── Other └── Advanced │ └── CH3.md ├── README.md └── Series_Images ├── Addicted to KQL Promo Image Smaller.png ├── Addicted to KQL Promo Image Smallest.png ├── Addicted to KQL Promo Image.png ├── Part0.png ├── StandardColumns.png ├── angelchristmas.png ├── christmastree.png ├── dovechristmas.png ├── readme.md ├── seriesimage.png ├── seriesimagesmall.png └── tree.png /Book_Version/readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Examples/ChristmasTree.kql: -------------------------------------------------------------------------------- 1 | //Hey, look! A Christmas tree made from KQL! 2 | 3 | //With a star on top... 4 | 5 | let tree_height = 15; 6 | let invisible_space = '\u00AD'; 7 | range i from 0 to tree_height*2 step 2 8 | | extend side_width = tree_height + 1 - i /2 9 | | extend side_space = strcat_array(repeat(strcat(invisible_space,''), side_width), " ") 10 | | project Merry_Christmas = case(i != 0, strcat(side_space, "🌲", strcat_array(repeat("-", i-1), ""), @"🌲", side_space), strcat(side_space, " 🌟", side_space)) 11 | 12 | //What it looks like: https://github.com/rod-trent/AddictedtoKQL/blob/main/Series_Images/christmastree.png 13 | 14 | 15 | //With an angel on top... 16 | 17 | let tree_height = 15; 18 | let invisible_space = '\u00AD'; 19 | range i from 0 to tree_height*2 step 2 20 | | extend side_width = tree_height + 1 - i /2 21 | | extend side_space = strcat_array(repeat(strcat(invisible_space,''), side_width), " ") 22 | | project Merry_Christmas = case(i != 0, strcat(side_space, "🌲", strcat_array(repeat("-", i-1), ""), @"🌲", side_space), strcat(side_space, " 👼", side_space)) 23 | 24 | ///What it looks like: https://github.com/rod-trent/AddictedtoKQL/blob/main/Series_Images/angelchristmas.png 25 | 26 | 27 | //With a dove on top... 28 | 29 | let tree_height = 15; 30 | let invisible_space = '\u00AD'; 31 | range i from 0 to tree_height*2 step 2 32 | | extend side_width = tree_height + 1 - i /2 33 | | extend side_space = strcat_array(repeat(strcat(invisible_space,''), side_width), " ") 34 | | project Merry_Christmas = case(i != 0, strcat(side_space, "🌲", strcat_array(repeat("-", i-1), ""), @"🌲", side_space), strcat(side_space, " 🕊️", side_space)) 35 | 36 | ///What it looks like: https://github.com/rod-trent/AddictedtoKQL/blob/main/Series_Images/dovechristmas.png 37 | -------------------------------------------------------------------------------- /Examples/Readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Examples/StandardColumns.txt: -------------------------------------------------------------------------------- 1 | //From the Addicted to KQL series talking about Standard Columns 2 | 3 | SecurityEvent //table name 4 | | project _BilledSize //showing standard column: _BilledSize 5 | -------------------------------------------------------------------------------- /Other/Advanced/CH3.md: -------------------------------------------------------------------------------- 1 | # Chapter 3: Unlocking Insights with Advanced KQL Operators 2 | 3 | In today's data-driven landscape, the ability to query vast amounts of information is not just an advantage but a necessity. As we become more dependent on data to make informed decisions, the tools we use to interrogate this data must evolve in complexity and capability. Enter Kusto Query Language (KQL), a powerful language designed to make querying large and complex datasets both efficient and accessible. 4 | 5 | As you may already know, KQL is utilized across various Microsoft services, including Azure Data Explorer, Application Insights, and Log Analytics. While basic KQL operators can handle a wide array of query requirements, the full power of KQL lies in its advanced operators. These provide nuanced control, increased efficiency, and deeper insights into the data being explored. 6 | 7 | In this chapter, we will delve into the world of Advanced KQL operators. These operators enable intricate manipulations and analysis of data that are simply not possible with the basic operators alone. From pattern recognition to statistical evaluations, Advanced KQL operators facilitate a higher level of data understanding. 8 | 9 | Some of the topics we'll cover in this chapter include: 10 | 11 | Joins and Data Relationships: How to relate and combine data from different sources or tables. 12 | 13 | Time Series Analysis: Utilizing specific operators that allow for in-depth examination of data across time intervals. 14 | 15 | Pattern Recognition and Machine Learning: Understanding the integration of KQL with machine learning algorithms. 16 | 17 | Custom Functions: Crafting your functions using KQL for tailored data manipulation. 18 | 19 | Optimization Techniques: Learning the subtle arts of query tuning and optimization to handle vast datasets efficiently. 20 | 21 | Through a mixture of theory, examples, and real-world scenarios, this chapter will equip you with the knowledge and skills required to harness the full capabilities of Advanced KQL operators. Whether you're a data scientist seeking to unearth new insights or an IT professional striving for optimized performance, these operators offer tools to take your querying abilities to the next level. 22 | 23 | Prepare to dive into an engaging exploration of the Advanced KQL landscape, where data becomes not just a raw resource but a wellspring of knowledge and understanding. 24 | 25 | By the end of this chapter, the mysteries of Advanced KQL operators will be unraveled, providing you with a robust set of tools to approach data in ways you might never have thought possible. 26 | 27 | # Using KQL Variables 28 | 29 | ## Introduction to KQL Variables 30 | 31 | ### What are variables in KQL? 32 | 33 | KQL variables are used to store and reference values within a query. They act as placeholders that can be assigned different values, such as constants or calculated results, and then used throughout the query. This allows for better organization, readability, and reusability of code. 34 | 35 | ### Why use variables in KQL queries? 36 | 37 | Variables in KQL queries offer several advantages. They allow for the creation of reusable code snippets, promote better code organization, enhance query readability, and facilitate easier maintenance and debugging. Additionally, variables can be used to parameterize queries, making them more flexible and adaptable to different scenarios. 38 | 39 | ### Benefits of using variables 40 | 41 | Reusability: Variables allow you to define values or functions that can be used multiple times within a query or even across multiple queries. 42 | 43 | Code organization: By using variables, you can break down complex expressions into smaller, more manageable parts, improving the overall structure and organization of your code. 44 | 45 | Readability: Variables make queries easier to read and understand, as they provide descriptive names for values and functions. 46 | 47 | Flexibility: With variables, you can easily modify or update values in a single place, rather than searching for and changing them throughout your query. 48 | 49 | Debugging: Variables can be helpful during the debugging process, as you can easily inspect and analyze their values at different stages of the query execution. 50 | 51 | ## Creating Constants with let 52 | 53 | ### Syntax for creating constants 54 | 55 | To create a constant variable in KQL, you use the let statement followed by the variable name, an equal sign, and the value you want to assign. Constants are useful for storing values that remain constant throughout the query execution. 56 | 57 | ### Example: Setting a constant value 58 | 59 | Let's say we want to filter our data based on a specific region. We can use a constant variable to store the region name and easily change it whenever needed. Here's an example: 60 | 61 | let regionName = "Asia"; 62 | 63 | AppAvailabilityResults \| where Location contains regionName 64 | 65 | In this example, the regionName variable is set to "Asia". We then use the variable in the where clause to filter the data based on the region. 66 | 67 | ### Advantages of using constants 68 | 69 | Easy modification: Constants allow you to change a value in a single place, making it simpler to update or modify the query behavior. 70 | 71 | Improved readability: Constants provide descriptive names, enhancing the clarity and understanding of the query logic. 72 | 73 | Code maintenance: By using constants, you reduce the risk of introducing errors during manual value changes, ensuring the consistency and correctness of your queries. 74 | 75 | ### Calculated Values with let 76 | 77 | #### Syntax for creating calculated values 78 | 79 | In addition to constants, you can use the let statement to create variables that hold calculated values. These values are derived from expressions or functions and can be used in various parts of your query. 80 | 81 | #### Example: Calculating time differences 82 | 83 | Let's say we want to calculate the time difference in seconds between two timestamps. We can use a calculated value to store this result and reuse it throughout the query. Here's an example: 84 | 85 | let startTime = datetime(2023-06-01); 86 | 87 | let endTime = now(); 88 | 89 | let timeDiffInSeconds = (endTime - startTime) / 1s; 90 | 91 | AppAvailabilityResults 92 | 93 | \| extend ElapsedSeconds = timeDiffInSeconds 94 | 95 | In this example, we calculate the time difference in seconds between the startTime and endTime variables. We then use the extend operator to create a new column called ElapsedSeconds and assign the calculated value to it. 96 | 97 | #### Flexibility and reusability of calculated values 98 | 99 | Calculated values provide flexibility by allowing you to reuse complex calculations in multiple parts of your query. By storing the result in a variable, you can easily refer to it without having to repeat the calculation logic. This improves query readability and reduces the risk of errors. 100 | 101 | #### Reusable Functions with let 102 | 103 | Syntax for creating functions 104 | 105 | Another powerful feature of the let statement is the ability to define reusable functions. Functions allow you to encapsulate complex logic and reuse it across queries. 106 | 107 | Example: Creating a function to format names 108 | 109 | Let's say we frequently need to concatenate the first name and last name of a country in our data. We can create a function to simplify this task. Here's an example: 110 | 111 | let formatFullName = (firstName:string, lastName:string) { 112 | 113 | strcat(firstName, " ", lastName) 114 | 115 | }; 116 | 117 | AppAvailabilityResults 118 | 119 | \| project CountryFullName = formatFullName(ClientCity, ClientStateOrProvince) 120 | 121 | In this example, we define the formatFullName function that takes two parameters: firstName and lastName. The function uses the strcat operator to concatenate the two strings with a space in between. We then use the function in the project operator to create a new column called CountryFullName. 122 | 123 | Benefits of using reusable functions 124 | 125 | Code reusability: Functions allow you to define complex logic once and reuse it across multiple queries, improving code efficiency and reducing redundancy. 126 | 127 | Readability and maintainability: Functions make queries more readable by abstracting complex operations into self-contained units. This improves code organization and makes it easier to understand and maintain. 128 | 129 | Modular design: By using functions, you can break down complex queries into smaller, more manageable pieces, promoting a modular and scalable query design. 130 | 131 | ### Using Multiple Variables in Queries 132 | 133 | #### Syntax for using multiple variables 134 | 135 | KQL allows you to use multiple variables within a query. You can define and reference multiple variables to create more dynamic and flexible queries. 136 | 137 | #### Example: Filtering data using multiple variables 138 | 139 | Let's say we want to filter our data based on multiple criteria, such as country and city. We can use multiple variables to store these values and easily modify them as needed. Here's an example: 140 | 141 | let country = "United States"; 142 | 143 | let city = "Washington"; 144 | 145 | AppAvailabilityResults 146 | 147 | \| where ClientCountryOrRegion == country and ClientCity == city 148 | 149 | In this example, we define two variables, country and city, and assign them specific values. We then use these variables in the where clause to filter the data based on the country and city. 150 | 151 | #### Improved query readability and maintainability 152 | 153 | Using multiple variables in queries improves readability by providing descriptive names for different criteria or parameters. It also enhances query maintainability, as you can easily modify the variable values without having to search and update them throughout the query. 154 | 155 | ### Working with Default Values in Functions 156 | 157 | #### Syntax for specifying default values 158 | 159 | KQL allows you to specify default values for function parameters. This feature provides flexibility and allows you to handle cases where certain parameters are not explicitly provided. 160 | 161 | #### Example: Using default values in a function 162 | 163 | Let's say we have a function that calculates the time difference in days between two timestamps. We can specify a default value for one of the timestamps to handle cases where it is not provided. Here's an example: 164 | 165 | let timeDiffInDays = (startDate: datetime, endDate: datetime = now()) { 166 | 167 | toscalar(endDate - startDate) / 1d 168 | 169 | }; 170 | 171 | MyTable 172 | 173 | \| extend ElapsedDays = timeDiffInDays(StartTime) 174 | 175 | In this example, we define the timeDiffInDays function with two parameters: startDate and endDate. We specify a default value of now() for the endDate parameter. If the endDate is not provided explicitly when calling the function, it defaults to the current timestamp. We then use the function in the extend operator to calculate the elapsed days between the StartTime and the default endDate. 176 | 177 | #### Considerations and best practices 178 | 179 | When using default values in functions, it's important to document the default behavior and ensure it aligns with your intended functionality. Additionally, be mindful of any potential impact on query performance when using dynamic default values. 180 | 181 | ### Creating Views with let 182 | 183 | #### Syntax for creating views 184 | 185 | In addition to values and functions, the let statement can also be used to create views in KQL. Views are virtual tables based on the result set of a query, providing a convenient way to organize and reuse data. 186 | 187 | #### Example: Creating a view based on a query 188 | 189 | Let's say we frequently need to work with data from a specific region. We can create a view that filters the data based on the region and reuse it in our queries. Here's an example: 190 | 191 | let AsiaRegion = view () { 192 | 193 | AppAvailabilityResults 194 | 195 | \| where ClientCountryOrRegion == "Asia" 196 | 197 | }; 198 | 199 | AsiaRegion 200 | 201 | \| project Name, OperationName 202 | 203 | In this example, we define the AsiaRegion view using the let statement. The view includes a query that filters the data based on the region. We then use the view in a subsequent query to project specific columns from the filtered data. 204 | 205 | #### Leveraging views for data organization and reuse 206 | 207 | Views provide a powerful way to organize and reuse query logic. By encapsulating complex queries into views, you can simplify your subsequent queries and promote code reuse and maintainability. 208 | 209 | ### Optimizing Queries with Materialization 210 | 211 | #### Syntax for using the materialize() function 212 | 213 | KQL provides the materialize() function to cache the results of a subquery during query execution, improving performance by avoiding redundant computations. 214 | 215 | #### Example: Caching subquery results for performance 216 | 217 | Let's say we have a complex query that involves computing a total count and then using it multiple times. We can use the materialize() function to cache the subquery results and reuse them efficiently. Here's an example: 218 | 219 | let totalEventsPerDay = AppAvailabilityResults 220 | 221 | \| summarize TotalEvents = count() by Day = startofday(TimeGenerated); 222 | 223 | let cachedResult = materialize(totalEventsPerDay); 224 | 225 | cachedResult 226 | 227 | \| project Day, Percentage = TotalEvents / toscalar(cachedResult \| summarize sum(TotalEvents)) 228 | 229 | In this example, we compute the total count of events per day and store it in the totalEventsPerDay variable. We then use the materialize() function to cache the results of the subquery. By doing so, subsequent invocations of the cachedResult variable will use the cached data, improving query performance. 230 | 231 | #### Enhancing query performance with materialization 232 | 233 | Using the materialize() function can significantly improve query performance by eliminating redundant computations. However, it's important to use it judiciously and consider the trade-off between query performance and memory usage. 234 | 235 | ### Best Practices for Using Variables in KQL 236 | 237 | #### Naming conventions for variables 238 | 239 | When naming variables in KQL, it's a best practice to use descriptive and meaningful names that convey their purpose or value. This promotes code readability and understanding. 240 | 241 | #### Avoiding naming conflicts 242 | 243 | To avoid naming conflicts, it's important to choose variable names that are unique within the scope of your query. Be mindful of potential clashes with reserved keywords or existing column names. 244 | 245 | #### Organizing and documenting your variables 246 | 247 | To improve code maintainability, organize your variables logically within your query. Additionally, consider documenting the purpose and usage of each variable to facilitate collaboration and future modifications. 248 | 249 | ### Conclusion 250 | 251 | In this section, we explored the power and versatility of variables in the Kusto Query Language. We learned how to create constants, calculated values, and reusable functions using the let statement. Additionally, we explored the benefits of using multiple variables, creating views, optimizing queries with materialization, and best practices for variable usage. By leveraging variables effectively, you can enhance the readability, maintainability, and performance of your KQL queries. 252 | 253 | Now that you have a solid understanding of using variables in KQL, it's time to apply this knowledge to your own queries and unlock the full potential of the language. Experiment with different scenarios, explore advanced features, and continue to refine your skills. Happy querying! 254 | 255 | # Uniting Queries with KQL Unions 256 | 257 | ## Understanding the Union Operator 258 | 259 | ### Introduction to the Union Operator 260 | 261 | The union operator in KQL allows you to combine data from multiple tables into a single result set. Unlike the join operator, which combines columns into a single row, the union operator simply appends rows from one table to another. This is particularly useful when you want to merge datasets that have similar structures but different records. 262 | 263 | ### Syntax and Parameters 264 | 265 | The syntax of the union operator is straightforward. It consists of the keyword union followed by the tables or table references you want to combine. Here is the basic syntax: 266 | 267 | Table1 \| union Table2 268 | 269 | You can also specify additional parameters to modify the behavior of the union operator. These parameters include kind, withsource, and isfuzzy. Let's explore each parameter in detail: 270 | 271 | kind: This parameter determines how the columns are combined in the result set. The inner option retains only the columns that are common to all input tables, while the outer option includes all columns from any input table. The default is outer. 272 | 273 | withsource: When specified, this parameter adds a column to the output that indicates the source table for each row. It can be useful for tracking the origin of the data in the result set. 274 | 275 | isfuzzy: Setting this parameter to true allows fuzzy resolution of union legs. It means that even if some of the tables referenced in the union do not exist or are inaccessible, the query will still execute. The default is false. 276 | 277 | Now that we have a basic understanding of the union operator, let's explore its usage with some practical examples. 278 | 279 | ## Basic Usage of Union Operator 280 | 281 | ### Combining Two Tables 282 | 283 | To illustrate the basic usage of the union operator, let's consider a scenario where we have two tables: Sales_2022 and Sales_2023. We want to combine the sales data from both tables into a single result set. 284 | 285 | Sales_2022 \| union Sales_2023 286 | 287 | This query will append the rows from Sales_2023 to the rows from Sales_2022 and return the combined result set. 288 | 289 | ### Handling Columns with Different Names 290 | 291 | In some cases, the tables you want to union may have columns with different names. The union operator handles this situation by aligning the columns based on their positions in the query. Let's consider the following example: 292 | 293 | Table1 294 | 295 | \| project Name, Age 296 | 297 | Table2 298 | 299 | \| project FullName, YearsOld 300 | 301 | Table1 \| union Table2 302 | 303 | In this example, Table1 has columns Name and Age, while Table2 has columns FullName and YearsOld. When we union these tables, the columns will be aligned as follows: 304 | 305 | | Name | Age | 306 | |----------|----------| 307 | | FullName | YearsOld | 308 | 309 | As you can see, the columns with different names are still included in the result set, but they are aligned based on their positions in the query. 310 | 311 | ## Advanced Techniques with the Union Operator 312 | 313 | ### Filtering and Sorting Unioned Data 314 | 315 | The union operator allows you to apply filters and sorting to the unioned data. You can use the where clause to filter the rows based on specific conditions, and the order by clause to sort the rows based on one or more columns. Let's see an example: 316 | 317 | Table1 318 | 319 | \| where Category == "Electronics" 320 | 321 | Table2 322 | 323 | \| where Category == "Clothing" 324 | 325 | (Table1 \| union Table2) 326 | 327 | \| order by Price desc 328 | 329 | In this example, we filter Table1 to include only rows where the Category is "Electronics", and Table2 to include only rows where the Category is "Clothing". Then we union the filtered tables and sort the result set in descending order based on the Price column. 330 | 331 | ### Using Let Statements with Union 332 | 333 | You can also use let statements with the union operator to create named variables for the tables you want to union. This can make your query more readable and easier to maintain. Let's see an example: 334 | 335 | let Table1 = Sales_2022 \| where Region == "North" 336 | 337 | let Table2 = Sales_2023 \| where Region == "South" 338 | 339 | (Table1 \| union Table2) 340 | 341 | \| summarize sum(Revenue) by Region 342 | 343 | In this example, we use let statements to create variables Table1 and Table2, which contain the filtered data from Sales_2022 and Sales_2023 tables, respectively. Then we union these tables and summarize the total revenue by region. 344 | 345 | ## Best Practices for Using the Union Operator 346 | 347 | ### Avoiding Wildcards in Table References 348 | 349 | When specifying table references in the union operator, it is recommended to avoid using wildcards, especially in large databases. Using wildcards can lead to inefficient execution and unpredictable results, as new tables may be added over time. Instead, explicitly list the tables you want to union. 350 | 351 | ### Optimizing Union Performance 352 | 353 | To optimize the performance of the union operator, consider the following best practices: 354 | 355 | Reduce the number of columns in the result set by using the project operator to select only the necessary columns. 356 | 357 | Use filters (where clause) to limit the number of rows processed by the union operator. 358 | 359 | Ensure that the columns being unioned have compatible data types to avoid potential errors. 360 | 361 | By following these best practices, you can improve the efficiency and reliability of your union queries. 362 | 363 | ## Real-World Example 364 | 365 | ### Tracking Security Incidents 366 | 367 | Suppose you have two tables, SecurityIncidents and SecurityAlerts, containing information about security incidents and alerts in your system. You want to track the number of incidents and alerts reported by each user. Here's how you can use the union operator to accomplish this: 368 | 369 | let SecurityIncidents = datatable(User: string, IncidentType: string)["Alice", "Data Breach", "Bob", "Unauthorized Access"]; 370 | 371 | let SecurityAlerts = datatable(User: string, AlertType: string)["Alice", "Suspicious Activity", "Charlie", "Malware Detection"]; 372 | 373 | (SecurityIncidents \| union SecurityAlerts) 374 | 375 | \| summarize count() by User 376 | 377 | In this example, we create two variables, SecurityIncidents and SecurityAlerts, which contain the sample data for the respective tables. Then we union these tables and count the number of incidents and alerts reported by each user. 378 | 379 | ## Union Operator vs Join Operator 380 | 381 | ### Key Differences 382 | 383 | While the union and join operators are both used to combine data from multiple tables, they have some key differences: 384 | 385 | The union operator combines rows from different tables into a single result set, while the join operator combines columns from different tables into a single row. 386 | 387 | The union operator does not require a common column between the tables, while the join operator relies on a common column for matching records. 388 | 389 | The union operator appends rows from one table to another, while the join operator combines rows based on matching values in the common column(s). 390 | 391 | ### Choosing the Right Operator 392 | 393 | The choice between the union and join operators depends on the nature of your data and the desired outcome of your query. If you want to combine rows from different tables or datasets, the union operator is the appropriate choice. On the other hand, if you need to combine columns from different tables based on a common column, the join operator is the way to go. 394 | 395 | ## Conclusion 396 | 397 | The union operator in Azure Data Explorer (Kusto) is a powerful tool for combining data from multiple tables into a single result set. By understanding its syntax, parameters, and best practices, you can leverage the full potential of the union operator in your data analysis and querying tasks. Whether you need to merge datasets, track incidents, or analyze sales data, the union operator provides a flexible and efficient solution. 398 | 399 | In this section, we have explored the various aspects of the union operator, including its syntax, parameters, usage examples, and best practices. Armed with this knowledge, you can confidently incorporate the union operator into your KQL queries and unlock new insights from your data. 400 | 401 | # The Power of Joining Data 402 | 403 | ## Understanding the Basics of Joining Data 404 | 405 | Before diving into the different flavors of KQL joins, let's start with the basics. The join operator in KQL allows you to merge rows from two or more tables based on matching values in specified columns. This enables you to combine data from different sources and create new relationships between data points. 406 | 407 | To perform a join, you need two tables with at least one column containing matching values. The join operator then matches the rows in these tables based on the specified conditions and creates a new table with the merged results. 408 | 409 | It's important to note that the join operation in KQL is similar to the join operation in SQL. However, KQL provides additional flavors of joins that offer more flexibility and control over the merging process. 410 | 411 | ## Joining Tables with the Innerunique Flavor 412 | 413 | The innerunique flavor is the default join flavor in KQL. It performs an inner join on the tables, combining rows that have matching values in the specified columns. The resulting table includes all columns from both tables, with duplicate rows from the left table removed. 414 | 415 | To illustrate the innerunique join, let's consider a scenario where we have two tables: Fruit and Preparation. The Fruit table contains information about different fruits, including their names and corresponding numbers. The Preparation table contains information about various preparations for these fruits. We want to join these tables based on the common number column. 416 | 417 | let Fruit = datatable(number:int, fruit:string) 418 | 419 | [ 420 | 421 | 1, "Apple", 422 | 423 | 1, "Pear" 424 | 425 | ]; 426 | 427 | let Preparation = datatable(number:int, preparation:string) 428 | 429 | [ 430 | 431 | 1, "Slices", 432 | 433 | 1, "Juice" 434 | 435 | ]; 436 | 437 | Fruit 438 | 439 | \| join kind=innerunique Preparation on number 440 | 441 | The resulting table will include the columns number, fruit, and preparation, with the duplicate row for the number 1 removed. This innerunique join allows us to combine the information from both tables and obtain a merged dataset. 442 | 443 | ## Exploring the Inner Join Flavor 444 | 445 | In addition to the innerunique flavor, KQL provides the inner join flavor, which performs a standard inner join on the tables. This means that only the rows with matching values in the specified columns are included in the resulting table. 446 | 447 | To demonstrate the inner join flavor, let's continue with our previous example of joining the Fruit and Preparation tables. We will use the same tables and join them based on the number column. 448 | 449 | let Fruit = datatable(number:int, fruit:string) 450 | 451 | [ 452 | 453 | 1, "Apple", 454 | 455 | 1, "Pear" 456 | 457 | ]; 458 | 459 | let Preparation = datatable(number:int, preparation:string) 460 | 461 | [ 462 | 463 | 1, "Slices", 464 | 465 | 1, "Juice" 466 | 467 | ]; 468 | 469 | Fruit 470 | 471 | \| join kind=inner Preparation on number 472 | 473 | The resulting table will include the columns number, fruit, and preparation, but only the rows with matching numbers will be included. This inner join allows us to obtain the common records between the two tables and analyze them together. 474 | 475 | ## Unleashing the Power of Leftouter Join 476 | 477 | The leftouter join flavor in KQL allows you to include all rows from the left table and only the matching rows from the right table. This means that even if there are no matching values in the specified columns, the rows from the left table will still be included in the resulting table. 478 | 479 | To illustrate the power of the leftouter join, let's consider a scenario where we have two tables: Fruit and Preparation. The Fruit table contains information about different fruits, including their names and corresponding numbers. The Preparation table contains information about various preparations for these fruits. We want to join these tables based on the common number column. 480 | 481 | let Fruit = datatable(number:int, fruit:string) 482 | 483 | [ 484 | 485 | 1, "Apple", 486 | 487 | 2, "Pear" 488 | 489 | ]; 490 | 491 | let Preparation = datatable(number:int, preparation:string) 492 | 493 | [ 494 | 495 | 1, "Slices", 496 | 497 | 1, "Juice", 498 | 499 | 2, "Juice" 500 | 501 | ]; 502 | 503 | Fruit 504 | 505 | \| join kind=leftouter Preparation on number 506 | 507 | The resulting table will include the columns number, fruit, and preparation, with all rows from the Fruit table and only the matching rows from the Preparation table. This leftouter join allows us to include all fruits, even if they don't have any preparations associated with them. 508 | 509 | ## Going Beyond with Rightouter Join 510 | 511 | Similar to the leftouter join, the rightouter join flavor in KQL allows you to include all rows from the right table and only the matching rows from the left table. This means that even if there are no matching values in the specified columns, the rows from the right table will still be included in the resulting table. 512 | 513 | To demonstrate the power of the rightouter join, let's continue with our previous example of joining the Fruit and Preparation tables. We will use the same tables and join them based on the number column. 514 | 515 | let Fruit = datatable(number:int, fruit:string) 516 | 517 | [ 518 | 519 | 1, "Apple", 520 | 521 | 2, "Pear", 522 | 523 | ]; 524 | 525 | let Preparation = datatable(number:int, preparation:string) 526 | 527 | [ 528 | 529 | 1, "Slices", 530 | 531 | 2, "Juice", 532 | 533 | 3, "Dry" 534 | 535 | ]; 536 | 537 | Fruit 538 | 539 | \| join kind=rightouter Preparation on number 540 | 541 | The resulting table will include the columns number, fruit, and preparation, with all rows from the Preparation table and only the matching rows from the Fruit table. This rightouter join allows us to include all preparations, even if there are no corresponding fruits. 542 | 543 | ## The Complete Picture with Fullouter Join 544 | 545 | The fullouter join flavor in KQL combines the power of the leftouter and rightouter joins. It includes all rows from both the left and right tables, regardless of matching values in the specified columns. This means that even if there are no matching values, the rows from both tables will still be included in the resulting table. 546 | 547 | To illustrate the complete picture provided by the fullouter join, let's consider a scenario where we have two tables: Fruit and Preparation. The Fruit table contains information about different fruits, including their names and corresponding numbers. The Preparation table contains information about various preparations for these fruits. We want to join these tables based on the common number column. 548 | 549 | let Fruit = datatable(number:int, fruit:string) 550 | 551 | [ 552 | 553 | 1, "Apple", 554 | 555 | 2, "Pear", 556 | 557 | 4, "Banana" 558 | 559 | ]; 560 | 561 | let Preparation = datatable(number:int, preparation:string) 562 | 563 | [ 564 | 565 | 1, "Slices", 566 | 567 | 1, "Juice", 568 | 569 | 2, "Juice", 570 | 571 | 3, "Dry" 572 | 573 | ]; 574 | 575 | Fruit 576 | 577 | \| join kind=fullouter Preparation on number 578 | 579 | The resulting table will include the columns number, fruit, and preparation, with all rows from both the Fruit and Preparation tables. This fullouter join allows us to see the complete picture of all fruits and preparations, regardless of matching values. 580 | 581 | ## Simplifying with Leftsemi Join 582 | 583 | The leftsemi join flavor in KQL allows you to include all rows from the left table that have matching values in the specified columns, while excluding the non-matching rows from both tables. This means that only the rows from the left table that have matches in the right table will be included in the resulting table. 584 | 585 | To simplify the join operation with the leftsemi join, let's consider a scenario where we have two tables: Fruit and Preparation. The Fruit table contains information about different fruits, including their names and corresponding numbers. The Preparation table contains information about various preparations for these fruits. We want to join these tables based on the common number column. 586 | 587 | let Fruit = datatable(number:int, fruit:string) 588 | 589 | [ 590 | 591 | 1, "Apple", 592 | 593 | 2, "Pear", 594 | 595 | 4, "Banana" 596 | 597 | ]; 598 | 599 | let Preparation = datatable(number:int, preparation:string) 600 | 601 | [ 602 | 603 | 1, "Slices", 604 | 605 | 1, "Juice", 606 | 607 | 2, "Juice", 608 | 609 | 3, "Dry" 610 | 611 | ]; 612 | 613 | Fruit 614 | 615 | \| join kind=leftsemi Preparation on number 616 | 617 | The resulting table will include the columns number and fruit, with only the rows from the Fruit table that have matching numbers in the Preparation table. This leftsemi join allows us to simplify the join operation and focus on the relevant rows in the left table. 618 | 619 | ## Finding Matches with Rightsemi Join 620 | 621 | Similar to the leftsemi join, the rightsemi join flavor in KQL allows you to include all rows from the right table that have matching values in the specified columns, while excluding the non-matching rows from both tables. This means that only the rows from the right table that have matches in the left table will be included in the resulting table. 622 | 623 | To find matches with the rightsemi join, let's continue with our previous example of joining the Fruit and Preparation tables. We will use the same tables and join them based on the number column. 624 | 625 | let Fruit = datatable(number:int, fruit:string) 626 | 627 | [ 628 | 629 | 1, "Apple", 630 | 631 | 2, "Pear", 632 | 633 | 4, "Banana" 634 | 635 | ]; 636 | 637 | let Preparation = datatable(number:int, preparation:string) 638 | 639 | [ 640 | 641 | 1, "Slices", 642 | 643 | 1, "Juice", 644 | 645 | 2, "Juice", 646 | 647 | 3, "Dry" 648 | 649 | ]; 650 | 651 | Fruit 652 | 653 | \| join kind=rightsemi Preparation on number 654 | 655 | The resulting table will include the columns number and preparation, with only the rows from the Preparation table that have matching numbers in the Fruit table. This rightsemi join allows us to find the relevant matches in the right table and focus on those rows. 656 | 657 | ## Excluding Matches with Leftanti Join 658 | 659 | The leftanti join flavor in KQL allows you to exclude the rows from the left table that have matching values in the specified columns, while including all the rows from both tables. This means that only the rows from the left table that do not have matches in the right table will be included in the resulting table. 660 | 661 | To exclude matches with the leftanti join, let's consider a scenario where we have two tables: Fruit and Preparation. The Fruit table contains information about different fruits, including their names and corresponding numbers. The Preparation table contains information about various preparations for these fruits. We want to join these tables based on the common number column and exclude the matching rows from the left table. 662 | 663 | let Fruit = datatable(number:int, fruit:string) 664 | 665 | [ 666 | 667 | 1, "Apple", 668 | 669 | 2, "Pear", 670 | 671 | 4, "Banana" 672 | 673 | ]; 674 | 675 | let Preparation = datatable(number:int, preparation:string) 676 | 677 | [ 678 | 679 | 1, "Slices", 680 | 681 | 1, "Juice", 682 | 683 | 2, "Juice", 684 | 685 | 3, "Dry" 686 | 687 | ]; 688 | 689 | Fruit 690 | 691 | \| join kind=leftanti Preparation on number 692 | 693 | The resulting table will include the columns number and fruit, with only the rows from the Fruit table that do not have matching numbers in the Preparation table. This leftanti join allows us to exclude the matching rows from the left table and focus on the non-matching rows. 694 | 695 | ## Filtering Matches with Rightanti Join 696 | 697 | Similar to the leftanti join, the rightanti join flavor in KQL allows you to exclude the rows from the right table that have matching values in the specified columns, while including all the rows from both tables. This means that only the rows from the right table that do not have matches in the left table will be included in the resulting table. 698 | 699 | To filter matches with the rightanti join, let's continue with our previous example of joining the Fruit and Preparation tables. We will use the same tables and join them based on the number column, excluding the matching rows from the right table. 700 | 701 | let Fruit = datatable(number:int, fruit:string) 702 | 703 | [ 704 | 705 | 1, "Apple", 706 | 707 | 2, "Pear", 708 | 709 | 4, "Banana" 710 | 711 | ]; 712 | 713 | let Preparation = datatable(number:int, preparation:string) 714 | 715 | [ 716 | 717 | 1, "Slices", 718 | 719 | 1, "Juice", 720 | 721 | 2, "Juice", 722 | 723 | 3, "Dry" 724 | 725 | ]; 726 | 727 | Fruit 728 | 729 | \| join kind=rightanti Preparation on number 730 | 731 | The resulting table will include the columns number and preparation, with only the rows from the Preparation table that do not have matching numbers in the Fruit table. This rightanti join allows us to filter out the matching rows from the right table and focus on the non-matching rows. 732 | 733 | ## Best Practices and Performance Optimization 734 | 735 | When working with joins in KQL, it's important to follow best practices to optimize performance and ensure efficient query execution. Here are some tips to keep in mind: 736 | 737 | Choose the appropriate join flavor: Select the join flavor that best suits your specific use case and requirements. Consider factors such as the desired output, data volume, and performance implications. 738 | 739 | Optimize column selection: When performing joins, be mindful of the columns you select in the output. Only include the necessary columns to reduce the amount of data transferred and improve query performance. 740 | 741 | Use appropriate filters: Apply filters to limit the data before performing the join operation. This can significantly reduce the amount of data processed and improve query performance. 742 | 743 | Consider table sizes: Take into account the sizes of the tables involved in the join operation. If one table is significantly larger than the other, consider using it as the left table to optimize performance. 744 | 745 | Review and optimize query execution plans: Monitor and analyze the query execution plans to identify potential performance bottlenecks. Consider using hints or other optimization techniques to improve query performance. 746 | 747 | Following these best practices will help you optimize your join operations in KQL and ensure efficient and effective data analysis. 748 | 749 | ## Conclusion 750 | 751 | In this section, we have explored the power and flexibility of KQL joins. We have learned about the various flavors of joins, including innerunique, inner, leftouter, rightouter, fullouter, leftsemi, rightsemi, leftanti, and rightanti joins. Each flavor offers unique capabilities and allows you to merge and analyze your data in different ways. 752 | 753 | By understanding the basics of joining data, exploring each join flavor, and following best practices, you now have the knowledge and skills to effectively merge and analyze your data in KQL. Whether you're working with large datasets or performing cross-cluster joins, KQL provides the tools and capabilities to support your data analysis needs. 754 | 755 | Remember to optimize your join operations, review query execution plans, and leverage the power of query-generated tables to unlock the full potential of KQL joins. With these insights and techniques, you can harness the power of joining data and gain valuable insights from your datasets. 756 | 757 | Happy joining! 758 | 759 | # Using the Externaldata KQL Operator 760 | 761 | In today's digital landscape, businesses are increasingly relying on Azure as their infrastructure backbone. The ability to query Azure using the Kusto Query Language (KQL) has become essential for gaining insights into the Azure services organizations utilize. In this section, we will explore the concept of externaldata in KQL and how it empowers users to extract valuable information from external storage artifacts, such as Azure Blob Storage and Azure Data Lake Storage. By leveraging the externaldata operator, businesses can unlock the full potential of their data and make informed decisions based on deep analysis and patterns discovered within their Azure environment. 762 | 763 | ## Understanding the Externaldata Operator 764 | 765 | The externaldata operator is a powerful tool within the KQL arsenal. It enables users to retrieve data from external storage artifacts, transforming it into a structured table format defined within the query itself. This operator supports a variety of storage services, including Azure Blob Storage and Azure Data Lake Storage, making it versatile and adaptable to different data sources. 766 | 767 | ## Syntax and Parameters 768 | 769 | To utilize the externaldata operator, it is important to understand its syntax and parameters. The basic syntax for the externaldata operator is as follows: 770 | 771 | externaldata (ColumnName: ColumnType [, ...]) 772 | 773 | [StorageConnectionString [, ...]] 774 | 775 | [with (PropertyName = PropertyValue [, ...])] 776 | 777 | The operator accepts the following parameters: 778 | 779 | ColumnName and ColumnType: These parameters define the schema of the resulting table, specifying the names and types of the columns. 780 | 781 | StorageConnectionString: This parameter specifies the connection string of the external storage artifact from which the data will be retrieved. 782 | 783 | PropertyName and PropertyValue: These optional parameters allow for additional customization, such as specifying the data format or authentication methods. 784 | 785 | ## Use Cases for the Externaldata Operator 786 | 787 | The externaldata operator can be employed in various scenarios to enhance data analysis and gain valuable insights. Let's explore two sample use cases to demonstrate the versatility of this operator. 788 | 789 | ### Basic Use Case: Analyzing Processor Utilization 790 | 791 | Imagine a scenario where you have a set of servers and applications hosted in Azure, with logs and metrics collected using Azure monitoring services. To identify applications that are experiencing high processor utilization, you can utilize the externaldata operator. The following query demonstrates the basic use case: 792 | 793 | InsightsMetrics 794 | 795 | \| where TimeGenerated \> ago(30m) 796 | 797 | \| where Origin == "vm.azm.ms" 798 | 799 | \| where Namespace == "Processor" 800 | 801 | \| where Name == "UtilizationPercentage" 802 | 803 | \| summarize avg(Val) by bin(TimeGenerated, 5m), Computer 804 | 805 | \| join kind=leftouter (ComputerGroup) on Computer 806 | 807 | \| where isnotempty(Computer1) 808 | 809 | \| sort by avg_Val desc nulls first 810 | 811 | This query retrieves the average processor utilization for each computer and joins the results with a list of specified computers. The output provides valuable insights into the applications and servers experiencing high processor utilization. 812 | 813 | ### Enhanced Use Case: Dynamic Thresholds for Processor Utilization 814 | 815 | In a more advanced use case, you may want to query logs for selected applications or servers that have different processor utilization thresholds. This requires dynamically updating the thresholds without modifying the KQL query itself. To achieve this, you can leverage the externaldata operator in conjunction with an external data file, such as a CSV, to store and retrieve threshold values. The following query demonstrates the enhanced use case: 816 | 817 | let Thresholds = externaldata (Computer: string, Threshold: int) 818 | 819 | [@"https://raw.githubusercontent.com/rod-trent/SentinelKQL/master/thresholds.csv"] 820 | 821 | with (format="csv"); 822 | 823 | InsightsMetrics 824 | 825 | \| where TimeGenerated \> ago(30m) 826 | 827 | \| where Origin == "vm.azm.ms" 828 | 829 | \| where Namespace == "Processor" 830 | 831 | \| where Name == "UtilizationPercentage" 832 | 833 | \| join kind=inner (Thresholds) on Computer 834 | 835 | \| where Val \> Threshold 836 | 837 | \| sort by Val desc nulls first 838 | 839 | In this query, the externaldata operator retrieves the thresholds from the external CSV file, which contains the computer names and their respective utilization thresholds. The join operation allows for dynamic comparison of the processor utilization values with the corresponding thresholds. 840 | 841 | ## Best Practices and Considerations 842 | 843 | When utilizing the externaldata operator, it is important to keep a few best practices and considerations in mind: 844 | 845 | Ensure that the external storage artifact is accessible and the connection string is accurate. 846 | 847 | Validate and sanitize the data retrieved from external sources to avoid security risks and maintain data integrity. 848 | 849 | Consider performance implications when working with large datasets. The externaldata operator is optimized for small reference tables rather than large data volumes. 850 | 851 | Familiarize yourself with the available data formats and authentication methods supported by the externaldata operator. 852 | 853 | ## Conclusion 854 | 855 | The externaldata operator in KQL empowers users to harness the power of external storage artifacts in Azure, enabling deep analysis and insights. By utilizing this operator, businesses can extract valuable information, discover patterns, and make informed decisions based on their Azure data. Whether it's analyzing processor utilization or dynamically adjusting thresholds, the externaldata operator offers unparalleled flexibility and versatility. Embrace the power of externaldata in KQL and unlock the full potential of your Azure environment. 856 | 857 | Remember, mastering the Kusto Query Language opens the door to endless possibilities and empowers you to extract actionable insights from your data. Stay curious, continue exploring, and make data-driven decisions with confidence. 858 | 859 | # Query IP Ranges Using KQL 860 | 861 | As businesses and organizations increasingly rely on digital infrastructure, the need to manage and analyze IP addresses becomes crucial. Thankfully, with the power of Kusto Query Language (KQL), it is now easier than ever to query IP ranges and gain insights from your data. In this section, we will explore the various functions available in KQL to query IP ranges, including ipv4_is_in_range(), ipv4_is_match(), ipv6_compare(), and ipv6_is_match(). We will delve into the syntax, parameters, and examples of each function, equipping you with the knowledge to effectively work with IP ranges in KQL. 862 | 863 | ## Understanding IP-Prefix Notation 864 | 865 | Before we dive into the details of querying IP ranges using KQL, it's essential to understand IP-prefix notation. IP-prefix notation, also known as CIDR notation, is a concise way of representing an IP address and its associated network mask. It consists of the base IP address followed by a slash ("/") and the prefix length. 866 | 867 | For IPv4, the prefix length ranges from 0 to 32, while for IPv6, it ranges from 0 to 128. The prefix length denotes the number of leading 1 bits in the netmask, determining the range of IP addresses belonging to the network. 868 | 869 | For example, the IP address 192.168.2.0 with a netmask of 255.255.255.0 can be represented in IP-prefix notation as 192.168.2.0/24. In this case, the prefix length is 24, indicating that the first 24 bits of the IP address represent the network, leaving 8 bits for host addresses. 870 | 871 | ## ipv4_is_in_range() Function 872 | 873 | The ipv4_is_in_range() function allows you to check if an IPv4 address falls within a specified IP range. It takes two parameters: the IPv4 address to check and the IPv4 range in IP-prefix notation. The function returns true if the address is within the range, false if it is not, and null if there is an issue with the IP address conversion. 874 | 875 | ### Syntax 876 | 877 | The syntax for the ipv4_is_in_range() function is as follows: 878 | 879 | ipv4_is_in_range(Ipv4Address, Ipv4Range) 880 | 881 | Where: 882 | 883 | Ipv4Address is a string representing the IPv4 address to check. 884 | 885 | Ipv4Range is a string representing the IPv4 range or a list of ranges in IP-prefix notation. 886 | 887 | ### Example 888 | 889 | Let's consider an example to understand how the ipv4_is_in_range() function works: 890 | 891 | datatable(ip_address:string, ip_range:string) 892 | 893 | [ 894 | 895 | '192.168.1.1', '192.168.1.1', // Equal IPs 896 | 897 | '192.168.1.1', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison 898 | 899 | ] 900 | 901 | \| extend result = ipv4_is_in_range(ip_address, ip_range) 902 | 903 | The above query compares two IP addresses, '192.168.1.1' and '192.168.1.255/24', using the ipv4_is_in_range() function. The result table shows that the first IP address is equal to the second IP address, and both fall within the specified range. 904 | 905 | ## ipv4_is_match() Function 906 | 907 | The ipv4_is_match() function is used to match and compare two IPv4 strings, taking into account IP-prefix notation and an optional prefix length. It returns true if the two strings match, false if they don't, and null if there is an issue with the IP address conversion. 908 | 909 | ### Syntax 910 | 911 | The syntax for the ipv4_is_match() function is as follows: 912 | 913 | ipv4_is_match(ip1, ip2[, prefix]) 914 | 915 | Where: 916 | 917 | ip1 and ip2 are strings representing the IPv4 addresses to compare. 918 | 919 | prefix is an optional integer (0 to 32) that represents the number of most significant bits to consider. 920 | 921 | ### Example 922 | 923 | Let's explore an example to understand how the ipv4_is_match() function works: 924 | 925 | datatable(ip1_string:string, ip2_string:string) 926 | 927 | [ 928 | 929 | '192.168.1.0', '192.168.1.0', // Equal IPs 930 | 931 | '192.168.1.1/24', '192.168.1.255', // 24 bit IP-prefix is used for comparison 932 | 933 | '192.168.1.1', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison 934 | 935 | '192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison 936 | 937 | ] 938 | 939 | \| extend result = ipv4_is_match(ip1_string, ip2_string) 940 | 941 | In the above example, we compare IP addresses using the ipv4_is_match() function. The results indicate whether the IP addresses match based on the specified IP-prefix notation and prefix length. 942 | 943 | ## ipv6_compare() Function 944 | 945 | The ipv6_compare() function allows you to compare two IPv6 or IPv4 network address strings, considering IP-prefix notation and an optional prefix length. It returns 0 if the first string is equal to the second string, 1 if it is greater, -1 if it is less, and null if there is an issue with the IP address conversion. 946 | 947 | ### Syntax 948 | 949 | The syntax for the ipv6_compare() function is as follows: 950 | 951 | ipv6_compare(ip1, ip2[, prefix]) 952 | 953 | Where: 954 | 955 | ip1 and ip2 are strings representing the IPv6 or IPv4 addresses to compare. 956 | 957 | prefix is an optional integer (0 to 128) that represents the number of most significant bits to consider. 958 | 959 | ### Example 960 | 961 | Let's consider an example to understand how the ipv6_compare() function works: 962 | 963 | datatable(ip1_string:string, ip2_string:string, prefix:long) 964 | 965 | [ 966 | 967 | '192.168.1.1', '192.168.1.0', 31, // 31 bit IP-prefix is used for comparison 968 | 969 | '192.168.1.1/24', '192.168.1.255', 31, // 24 bit IP-prefix is used for comparison 970 | 971 | '192.168.1.1', '192.168.1.255', 24, // 24 bit IP-prefix is used for comparison 972 | 973 | ] 974 | 975 | \| extend result = ipv6_compare(ip1_string, ip2_string, prefix) 976 | 977 | In the above example, we compare IPv6 and IPv4 addresses using the ipv6_compare() function. The result table shows the comparison results based on the specified IP-prefix notation and prefix length. 978 | 979 | ## ipv6_is_match() Function 980 | 981 | The ipv6_is_match() function is used to match and compare two IPv6 or IPv4 network address strings, considering IP-prefix notation and an optional prefix length. It returns true if the two strings match, false if they don't, and null if there is an issue with the IP address conversion. 982 | 983 | ### Syntax 984 | 985 | The syntax for the ipv6_is_match() function is as follows: 986 | 987 | ipv6_is_match(ip1, ip2[, prefix]) 988 | 989 | Where: 990 | 991 | ip1 and ip2 are strings representing the IPv6 or IPv4 addresses to compare. 992 | 993 | prefix is an optional integer (0 to 128) that represents the number of most significant bits to consider. 994 | 995 | ### Example 996 | 997 | Let's explore an example to understand how the ipv6_is_match() function works: 998 | 999 | datatable(ip1_string:string, ip2_string:string) 1000 | 1001 | [ 1002 | 1003 | // IPv4 are compared as IPv6 addresses 1004 | 1005 | '192.168.1.1', '192.168.1.1', // Equal IPs 1006 | 1007 | '192.168.1.1/24', '192.168.1.255', // 24 bit IP4-prefix is used for comparison 1008 | 1009 | '192.168.1.1', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison 1010 | 1011 | '192.168.1.1/30', '192.168.1.255/24', // 24 bit IP4-prefix is used for comparison 1012 | 1013 | // IPv6 cases 1014 | 1015 | 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7994', // Equal IPs 1016 | 1017 | 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998', // 120 bit IP6-prefix is used for comparison 1018 | 1019 | 'fe80::85d:e82c:9446:7994', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison 1020 | 1021 | 'fe80::85d:e82c:9446:7994/120', 'fe80::85d:e82c:9446:7998/120', // 120 bit IP6-prefix is used for comparison 1022 | 1023 | // Mixed case of IPv4 and IPv6 1024 | 1025 | '192.168.1.1', '::ffff:c0a8:0101', // Equal IPs 1026 | 1027 | '192.168.1.1/24', '::ffff:c0a8:01ff', // 24 bit IP-prefix is used for comparison 1028 | 1029 | '::ffff:c0a8:0101', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison 1030 | 1031 | '::192.168.1.1/30', '192.168.1.255/24', // 24 bit IP-prefix is used for comparison 1032 | 1033 | ] 1034 | 1035 | \| extend result = ipv6_is_match(ip1_string, ip2_string) 1036 | 1037 | In the above example, we compare IPv6 and IPv4 addresses using the ipv6_is_match() function. The result table showcases the comparison results based on the specified IP-prefix notation and prefix length. 1038 | 1039 | ## Conclusion 1040 | 1041 | In this section, we explored the functions available in KQL for querying IP ranges. We learned about ipv4_is_in_range(), ipv4_is_match(), ipv6_compare(), and ipv6_is_match(), understanding their syntax, parameters, and examples. Armed with this knowledge, you can now effectively query and analyze IP ranges in your data using KQL. Remember to leverage IP-prefix notation to represent IP addresses and their associated network masks accurately. Happy querying! 1042 | 1043 | # Using the ipv4_is_private() KQL Function 1044 | 1045 | ## Introduction to the ipv4_is_private() Function 1046 | 1047 | The ipv4_is_private() function is a powerful tool in the Kusto Query Language (KQL) that allows us to determine if an IPv4 address belongs to a private network. But what exactly is a private network address, and why is it important to identify them? 1048 | 1049 | ### What is a Private Network Address? 1050 | 1051 | A private network address is an IP address that has been reserved for use within private networks. These addresses are not routable over the public internet, meaning they cannot be used to communicate directly with devices outside the private network. Instead, private network addresses are used for internal communication within a specific network. 1052 | 1053 | ### Purpose of Private Network Addresses 1054 | 1055 | The purpose of using private network addresses is to conserve public IP address space. With the increasing number of devices connected to the internet, the availability of public IP addresses is limited. By using private network addresses, organizations can create their own internal networks without consuming public IP addresses. 1056 | 1057 | ## Understanding Private IPv4 Address Ranges 1058 | 1059 | The Internet Engineering Task Force (IETF) has designated specific IP address ranges as private network addresses. These ranges are reserved and should not be used on the public internet. Let's explore the three primary private IPv4 address ranges: 1060 | 1061 | | IP Address Range | Number of Addresses | Largest CIDR Block (Subnet Mask) | 1062 | |-------------------------------|---------------------|----------------------------------| 1063 | | 10.0.0.0 – 10.255.255.255 | 16,777,216 | 10.0.0.0/8 (255.0.0.0) | 1064 | | 172.16.0.0 – 172.31.255.255 | 1,048,576 | 172.16.0.0/12 (255.240.0.0) | 1065 | | 192.168.0.0 – 192.168.255.255 | 65,536 | 192.168.0.0/16 (255.255.0.0) | 1066 | 1067 | These ranges are reserved and should not be used on the public internet. Any IP address falling within these ranges is considered a private network address. 1068 | 1069 | ## Syntax and Parameters of the ipv4_is_private() Function 1070 | 1071 | To effectively use the ipv4_is_private() function in Kusto Query Language (KQL), it's essential to understand its syntax and parameters. 1072 | 1073 | ### Syntax Conventions 1074 | 1075 | The syntax for the ipv4_is_private() function is as follows: 1076 | 1077 | ipv4_is_private(ip) 1078 | 1079 | The ip parameter represents the IPv4 address that you want to check for private network membership. The function returns a boolean value: true if the IP address belongs to any of the private network ranges, false if it doesn't, and null if the input is not a valid IPv4 address string. 1080 | 1081 | ### Parameters of the Function 1082 | 1083 | The ipv4_is_private() function accepts only one parameter: 1084 | 1085 | ip (string): An expression representing an IPv4 address. IPv4 strings can be masked using IP-prefix notation. 1086 | 1087 | ## How to Use the ipv4_is_private() Function 1088 | 1089 | To demonstrate the usage of the ipv4_is_private() function, let's look at some examples of checking the membership of IPv4 addresses in private networks. 1090 | 1091 | ipv4_is_private('192.168.1.1/24') == true 1092 | 1093 | ipv4_is_private('10.1.2.3/24') == true 1094 | 1095 | ipv4_is_private('202.1.2.3') == false 1096 | 1097 | ipv4_is_private("127.0.0.1") == false 1098 | 1099 | In the above examples, we pass different IP addresses to the ipv4_is_private() function. The function returns true if the IP address belongs to any of the private network ranges, and false otherwise. 1100 | 1101 | ### Sample Code and Output 1102 | 1103 | To further illustrate the usage of the ipv4_is_private() function, let's run a query using the Kusto Query Language (KQL): 1104 | 1105 | datatable(ip_string:string) [ 1106 | 1107 | '10.1.2.3', 1108 | 1109 | '192.168.1.1/24', 1110 | 1111 | '127.0.0.1', 1112 | 1113 | ] 1114 | 1115 | \| extend result = ipv4_is_private(ip_string) 1116 | 1117 | The above query creates a datatable with three IP addresses. We then extend the datatable with a new column called result, which uses the ipv4_is_private() function to check the private network membership of each IP address. The output will indicate whether each IP address belongs to a private network. 1118 | 1119 | ## Deep Dive into IP-Prefix Notation 1120 | 1121 | In the context of the ipv4_is_private() function, it's essential to understand IP-prefix notation, also known as CIDR notation. This notation is used to represent IP addresses and their associated network masks. 1122 | 1123 | ### Understanding CIDR Notation 1124 | 1125 | CIDR notation is a concise way of representing an IP address and its network mask. The notation uses a forward slash (/) followed by the prefix length, which represents the number of leading 1 bits in the netmask. The prefix length determines the range of IP addresses that belong to the network. 1126 | 1127 | For example, the IP address 192.168.2.0/24 represents the IP address 192.168.2.0 with a netmask of 255.255.255.0. The prefix length in this case is 24, indicating that the first 24 bits of the IP address are fixed, while the last 8 bits can vary. 1128 | 1129 | ### IPv4 vs. IPv6 Prefix Lengths 1130 | 1131 | In IPv4, the prefix length ranges from 0 to 32, while in IPv6, it ranges from 0 to 128. The larger the prefix length, the smaller the range of IP addresses that belong to the network. For example, a prefix length of 32 represents a single IP address, 1132 | 1133 | ## Leveraging the ipv4_is_private() Function in Real-World Scenarios 1134 | 1135 | The ipv4_is_private() function can be incredibly useful in various real-world scenarios. Let's explore two common use cases: 1136 | 1137 | ### Network Security and Access Control 1138 | 1139 | By leveraging the ipv4_is_private() function, organizations can enhance network security and access control measures. They can validate incoming IP addresses to ensure they belong to the expected private network ranges. This helps prevent unauthorized access attempts and ensures that only trusted IP addresses are allowed to communicate with the internal network. 1140 | 1141 | ### Network Monitoring and Troubleshooting 1142 | 1143 | Network administrators can also use the ipv4_is_private() function for network monitoring and troubleshooting purposes. By analyzing network traffic and identifying private network addresses, they can gain insights into the internal communication patterns of their network. This information can be valuable for identifying bottlenecks, diagnosing network issues, and optimizing network performance. 1144 | 1145 | The ipv4_is_private() function in Kusto Query Language (KQL) provides a powerful tool for identifying private network addresses. By leveraging this function, organizations can enhance network security, optimize network performance, and gain valuable insights into their network infrastructure. 1146 | 1147 | ## EXTRA: Getting Geolocation from an IP Address Using KQL 1148 | 1149 | In today's digital age, understanding the geographical location of IP addresses has become a crucial aspect of data analysis. Whether it's identifying the origin of network traffic or determining the location of users or devices, having geolocation information can provide valuable insights. In this article, we will explore how to retrieve geolocation information from IP addresses using KQL (Kusto Query Language), a powerful query language used in Azure Data Explorer. 1150 | 1151 | ### The geo_info_from_ip_address() Function 1152 | 1153 | KQL provides a function called geo_info_from_ip_address() that allows you to retrieve geolocation information about IPv4 or IPv6 addresses. This function takes an IP address as a parameter and returns a dynamic object containing information about the IP address's whereabouts, if available. 1154 | 1155 | The function returns the following fields: 1156 | 1157 | country: The country name where the IP address is located. 1158 | 1159 | state: The state or subdivision name. 1160 | 1161 | city: The city name. 1162 | 1163 | latitude: The latitude coordinate of the location. 1164 | 1165 | longitude: The longitude coordinate of the location. 1166 | 1167 | It's important to note that IP geolocation is inherently imprecise, and the provided location is often near the center of the population. Therefore, it should not be used to identify specific addresses or households. 1168 | 1169 | ### Syntax 1170 | 1171 | The syntax for the geo_info_from_ip_address() function is as follows: 1172 | 1173 | geo_info_from_ip_address(IpAddress) 1174 | 1175 | The IpAddress parameter is a string representing the IPv4 or IPv6 address for which you want to retrieve geolocation information. 1176 | 1177 | ### Examples 1178 | 1179 | Let's explore some examples to understand how the geo_info_from_ip_address() function works. 1180 | 1181 | #### Example 1: Retrieving Geolocation from an IPv4 Address 1182 | 1183 | Suppose we want to retrieve geolocation information from the IPv4 address '20.53.203.50'. We can use the following query: 1184 | 1185 | print ip_location=geo_info_from_ip_address('20.53.203.50') 1186 | 1187 | The output will be: 1188 | 1189 | ip_location{"country": "Australia", "state": "New South Wales", "city": "Sydney", "latitude": -33.8715, "longitude": 151.2006} 1190 | 1191 | From the output, we can see that the IP address '20.53.203.50' is located in Sydney, New South Wales, Australia, with latitude -33.8715 and longitude 151.2006. 1192 | 1193 | #### Example 2: Retrieving Geolocation from an IPv6 Address 1194 | 1195 | Now, let's retrieve geolocation information from an IPv6 address. Consider the IPv6 address '2a03:2880:f12c:83:face:b00c::25de'. We can use the following query: 1196 | 1197 | print ip_location=geo_info_from_ip_address('2a03:2880:f12c:83:face:b00c::25de') 1198 | 1199 | The output will be: 1200 | 1201 | ip_location{"country": "United States", "state": "Florida", "city": "Boca Raton", "latitude": 26.3594, "longitude": -80.0771} 1202 | 1203 | From the output, we can see that the IPv6 address '2a03:2880:f12c:83:face:b00c::25de' is located in Boca Raton, Florida, United States, with latitude 26.3594 and longitude -80.0771. 1204 | 1205 | ### Limitations and Considerations 1206 | 1207 | It's important to understand the limitations and considerations when using the geo_info_from_ip_address() function: 1208 | 1209 | IP geolocation is not always accurate and can be affected by various factors such as proxy servers and VPNs. The location provided should be used as a general indication rather than an exact address. 1210 | 1211 | The function utilizes GeoLite2 data created by MaxMind, a leading provider of IP intelligence and online fraud prevention tools. However, the data may not always be up-to-date, and the accuracy may vary. 1212 | 1213 | The function is built on the MaxMind DB Reader library provided under the ISC license. 1214 | 1215 | ### Conclusion 1216 | 1217 | Retrieving geolocation information from IP addresses using KQL can provide valuable insights in data analysis. By leveraging the geo_info_from_ip_address() function, you can enrich your data with geographic context, identifying the origin of network traffic or the location of users or devices. However, it's important to consider the limitations and understand that IP geolocation is inherently imprecise. With KQL and the geo_info_from_ip_address() function, you can unlock the power of geolocation analysis in your data exploration. 1218 | 1219 | To learn more about the syntax and usage of the geo_info_from_ip_address() function, refer to the [official documentation](https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/geo-info-from-ip-address-function). 1220 | 1221 | # Working with Multivalued Strings in KQL 1222 | 1223 | In the world of data analysis, dealing with multivalued strings can be challenging. Fortunately, the Kusto Query Language (KQL) offers powerful operators like mv-expand and parse to help extract and manipulate data from these complex string structures. In this guide, we will explore the functionality of these operators and learn how to effectively work with multivalued strings in KQL. 1224 | 1225 | ## Understanding the mv-expand Operator 1226 | 1227 | The mv-expand operator is a versatile tool in KQL that allows you to expand multivalued dynamic arrays or property bags into multiple records. Unlike aggregation operators that pack multiple values into a single array, such as summarize and make-list(), mv-expand generates a new record for each element in the array or property bag. This operator duplicates all non-expanded columns to ensure data consistency in the output. 1228 | 1229 | Syntax: 1230 | 1231 | T \| mv-expand [bagexpansion=(bag\|array)] [with_itemindex=IndexColumnName] ColumnName [to typeof(Typename)] [, ColumnName ...] [limit Rowlimit] 1232 | 1233 | The mv-expand operator takes several parameters, including bagexpansion, with_itemindex, ColumnName, Typename, and Rowlimit. These parameters allow you to control the expansion behavior and data types of the expanded columns. You can also limit the number of rows generated from each original row using the limit parameter. 1234 | 1235 | Modes of expansion: 1236 | 1237 | bagexpansion=bag or kind=bag: Property bags are expanded into single-entry property bags. 1238 | 1239 | bagexpansion=array or kind=array: Property bags are expanded into [key, value] array structures for uniform access to keys and values. 1240 | 1241 | ## Examples of mv-expand 1242 | 1243 | Let's dive into some examples to see mv-expand in action: 1244 | 1245 | ### Example 1: Single column - array expansion 1246 | 1247 | Suppose we have a datatable with two columns: a (integer) and b (dynamic array). 1248 | 1249 | datatable (a: int, b: dynamic) 1250 | 1251 | [ 1252 | 1253 | 1, dynamic([10, 20]), 1254 | 1255 | 2, dynamic(['a', 'b']) 1256 | 1257 | ] 1258 | 1259 | \| mv-expand b 1260 | 1261 | Output: 1262 | 1263 | a b 1264 | 1265 | 1 10 1266 | 1267 | 1 20 1268 | 1269 | 2 a 1270 | 1271 | 2 b 1272 | 1273 | In this example, the b column is expanded, creating new rows for each element in the dynamic array. 1274 | 1275 | ### Example 2: Single column - bag expansion 1276 | 1277 | Consider a datatable with two columns: a (integer) and b (dynamic property bag). 1278 | 1279 | datatable (a: int, b: dynamic) 1280 | 1281 | [ 1282 | 1283 | 1, dynamic({"prop1": "a1", "prop2": "b1"}), 1284 | 1285 | 2, dynamic({"prop1": "a2", "prop2": "b2"}) 1286 | 1287 | ] 1288 | 1289 | \| mv-expand b 1290 | 1291 | Output: 1292 | 1293 | a b 1294 | 1295 | 1 {"prop1": "a1"} 1296 | 1297 | 1 {"prop2": "b1"} 1298 | 1299 | 2 {"prop1": "a2"} 1300 | 1301 | 2 {"prop2": "b2"} 1302 | 1303 | In this example, the b column is expanded, creating new rows with separate property bag entries. 1304 | 1305 | ### Example 3: Single column - bag expansion to key-value pairs 1306 | 1307 | Let's expand a bag into key-value pairs using the mv-expand operator and extend to create new columns. 1308 | 1309 | datatable (a: int, b: dynamic) 1310 | 1311 | [ 1312 | 1313 | 1, dynamic({"prop1": "a1", "prop2": "b1"}), 1314 | 1315 | 2, dynamic({"prop1": "a2", "prop2": "b2"}) 1316 | 1317 | ] 1318 | 1319 | \| mv-expand bagexpansion=array b 1320 | 1321 | \| extend key = b[0], val = b[1] 1322 | 1323 | Output: 1324 | 1325 | a b key val 1326 | 1327 | 1 ["prop1","a1"] prop1 a1 1328 | 1329 | 1 ["prop2","b1"] prop2 b1 1330 | 1331 | 2 ["prop1","a2"] prop1 a2 1332 | 1333 | 2 ["prop2","b2"] prop2 b2 1334 | 1335 | In this example, the bag is expanded into key-value pairs, allowing uniform access to the properties. 1336 | 1337 | ### Example 4: Zipped two columns 1338 | 1339 | We can expand two columns simultaneously by using mv-expand consecutively. 1340 | 1341 | datatable (a: int, b: dynamic, c: dynamic) 1342 | 1343 | [ 1344 | 1345 | 1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 4, 3]) 1346 | 1347 | ] 1348 | 1349 | \| mv-expand b, c 1350 | 1351 | Output: 1352 | 1353 | a b c 1354 | 1355 | 1 {"prop1":"a"} 5 1356 | 1357 | 1 {"prop1":"a"} 4 1358 | 1359 | 1 {"prop1":"a"} 3 1360 | 1361 | 1 {"prop2":"b"} 5 1362 | 1363 | 1 {"prop2":"b"} 4 1364 | 1365 | 1 {"prop2":"b"} 3 1366 | 1367 | In this example, both the b and c columns are expanded, resulting in a cartesian product of the two expanded columns. 1368 | 1369 | ### Example 5: Cartesian product of two columns 1370 | 1371 | To get a Cartesian product of expanding two columns, we can expand one after the other. 1372 | 1373 | datatable (a: int, b: dynamic, c: dynamic) 1374 | 1375 | [ 1376 | 1377 | 1, dynamic({"prop1": "a", "prop2": "b"}), dynamic([5, 6]) 1378 | 1379 | ] 1380 | 1381 | \| mv-expand b 1382 | 1383 | \| mv-expand c 1384 | 1385 | Output: 1386 | 1387 | a b c 1388 | 1389 | 1 {"prop1": "a"} 5 1390 | 1391 | 1 {"prop1": "a"} 6 1392 | 1393 | 1 {"prop2": "b"} 5 1394 | 1395 | 1 {"prop2": "b"} 6 1396 | 1397 | In this example, the b column is expanded first, followed by the expansion of the c column, resulting in the Cartesian product of the two expanded columns. 1398 | 1399 | ### Example 6: Convert output 1400 | 1401 | To force the output of mv-expand to a specific type, we can use the to typeof() clause. 1402 | 1403 | datatable (a: string, b: dynamic, c: dynamic) 1404 | 1405 | [ 1406 | 1407 | "Constant", dynamic([1, 2, 3, 4]), dynamic([6, 7, 8, 9]) 1408 | 1409 | ] 1410 | 1411 | \| mv-expand b, c to typeof(int) 1412 | 1413 | \| getschema 1414 | 1415 | Output: 1416 | 1417 | ColumnName ColumnOrdinal DateType ColumnType 1418 | 1419 | a 0 System.String string 1420 | 1421 | b 1 System.Object int 1422 | 1423 | c 2 System.Object int 1424 | 1425 | In this example, the b and c columns are expanded and explicitly cast to the int data type using the to typeof() clause. 1426 | 1427 | TIP: Name/Alias: Franck Heilmann (franckh) 1428 | 1429 | Why it should be run: For Token Protection in Conditional Access deployment, to minimize the likelihood of user disruption due to applications or device incompatibility, we highly recommend doing staged deployment and actively monitoring the sign-in logs. This query gives admin a per Token Protection CA rules user impact view 1430 | 1431 | SigninLogs 1432 | 1433 | \| where TimeGenerated \> ago(7d) 1434 | 1435 | \| project Id,ConditionalAccessPolicies, Status,UserPrincipalName 1436 | 1437 | \| where ConditionalAccessPolicies != "[]" 1438 | 1439 | \| mv-expand todynamic(ConditionalAccessPolicies) 1440 | 1441 | \| union ( 1442 | 1443 | AADNonInteractiveUserSignInLogs 1444 | 1445 | \| where TimeGenerated \> ago(7d) 1446 | 1447 | \| project Id,ConditionalAccessPolicies, Status,UserPrincipalName 1448 | 1449 | \| where ConditionalAccessPolicies != "[]" 1450 | 1451 | \| mv-expand todynamic(ConditionalAccessPolicies) 1452 | 1453 | ) 1454 | 1455 | \| where ConditionalAccessPolicies.enforcedSessionControls contains "Binding" or ConditionalAccessPolicies.enforcedSessionControls contains "SignInTokenProtection" 1456 | 1457 | \| where ConditionalAccessPolicies.result !="reportOnlyNotApplied" and ConditionalAccessPolicies.result !="notApplied" 1458 | 1459 | \| extend SessionNotSatisfyResult = ConditionalAccessPolicies["sessionControlsNotSatisfied"] 1460 | 1461 | \| extend Result = case (SessionNotSatisfyResult contains 'Binding' or SessionNotSatisfyResult contains 'SignInTokenProtection' , 'Block','Allow') 1462 | 1463 | \| extend CADisplayName = ConditionalAccessPolicies.displayName 1464 | 1465 | \| extend CAId = ConditionalAccessPolicies.id 1466 | 1467 | \| summarize by Id,tostring(CAId),tostring(CADisplayName), UserPrincipalName, Result 1468 | 1469 | \| summarize Requests = count(), Block = countif(Result == "Block"), Allow = countif(Result == "Allow"), Users = dcount(UserPrincipalName), BlockedUsers = dcountif(UserPrincipalName, Result == "Block") by tostring(CADisplayName),tostring(CAId) 1470 | 1471 | \| extend PctAllowed = round(100.0 \* Allow/(Allow+Block), 2) 1472 | 1473 | \| sort by Requests desc 1474 | 1475 | ## Understanding the parse Operator 1476 | 1477 | The parse operator is another powerful tool in KQL that allows you to extract specific parts of a string based on a defined pattern. Unlike regular expressions, which can be complex and challenging to work with, the parse operator provides a simpler and more intuitive approach to string extraction. It is particularly useful when dealing with well-formatted strings that have recurring text patterns. 1478 | 1479 | Syntax: 1480 | 1481 | parse ColumnName with Pattern [default DefaultResult] 1482 | 1483 | The parse operator takes the name of the column to parse, followed by the keyword with and the pattern to match within the string. You can also provide a default result to handle cases where the pattern doesn't match. 1484 | 1485 | ## Examples of parse 1486 | 1487 | Let's explore some examples to see how the parse operator can be used effectively: 1488 | 1489 | ### Example 1: Extracting data from a well-formatted string 1490 | 1491 | Suppose we have a datatable with a column called Name, which always begins with the text GET followed by the requested data. 1492 | 1493 | datatable (Name: string) 1494 | 1495 | [ 1496 | 1497 | "GET /api/users", 1498 | 1499 | "GET /api/products", 1500 | 1501 | "GET /api/orders" 1502 | 1503 | ] 1504 | 1505 | \| parse Name with "GET " Data 1506 | 1507 | Output: 1508 | 1509 | Name Data 1510 | 1511 | GET /api/users /api/users 1512 | 1513 | GET /api/products /api/products 1514 | 1515 | GET /api/orders /api/orders 1516 | 1517 | In this example, the parse operator extracts the data following the GET text and places it in a new column called Data. 1518 | 1519 | ### Example 2: Extracting multiple parts from a string 1520 | 1521 | Consider a datatable with a column called Message, which follows a consistent format for certain categories and levels. We can use parse to extract the ID and duration from the Message column. 1522 | 1523 | datatable (Message: string) 1524 | 1525 | [ 1526 | 1527 | "Executed 'Function2' (Failed, Id=123, Duration=500ms)", 1528 | 1529 | "Executed 'Function2' (Failed, Id=456, Duration=750ms)" 1530 | 1531 | ] 1532 | 1533 | \| parse Message with "Executed 'Function2' (Failed, Id=" ID ", Duration=" Duration "ms)" 1534 | 1535 | Output: 1536 | 1537 | Message ID Duration 1538 | 1539 | Executed 'Function2' (Failed, Id=123, Duration=500ms) 123 500 1540 | 1541 | Executed 'Function2' (Failed, Id=456, Duration=750ms) 456 750 1542 | 1543 | In this example, the parse operator extracts the ID and duration from the Message column using a specific pattern. The extracted values are placed in the ID and Duration columns. 1544 | 1545 | ## When to Use mv-expand and parse 1546 | 1547 | The mv-expand operator is ideal for expanding multivalued arrays or property bags into separate records, allowing for more granular analysis and aggregation. It is particularly useful when dealing with structured data that can be expanded into meaningful columns. 1548 | 1549 | On the other hand, the parse operator is handy when you have well-formatted strings with recurring patterns and need to extract specific parts. It simplifies the extraction process and avoids the complexity of regular expressions. 1550 | 1551 | It's important to note that mv-expand and parse work best when the data follows a consistent format. If the data varies significantly, additional filtering or preprocessing may be required to ensure accurate results. 1552 | 1553 | ## Conclusion 1554 | 1555 | Working with multivalued strings in KQL can be challenging, but operators like mv-expand and parse make it easier to extract and manipulate data from these complex structures. By leveraging these powerful tools, you can expand arrays, extract specific parts of strings, and gain deeper insights from your data. Whether you need to analyze dynamic arrays or extract information from formatted strings, KQL has the operators to help you accomplish your data analysis goals. 1556 | 1557 | Remember to experiment with different scenarios and explore the full capabilities of mv-expand and parse. With practice, you'll become more proficient in working with multivalued strings and unlocking valuable insights from your data. 1558 | 1559 | # Using the base64_decode_tostring() KQL Function 1560 | 1561 | ## Introduction to the base64_decode_tostring() Function 1562 | 1563 | ### What is base64 Encoding? 1564 | 1565 | Before diving into the details of the base64_decode_tostring() function, it's essential to understand the concept of base64 encoding. Base64 is a binary-to-text encoding scheme that represents binary data in an ASCII string format. It is commonly used to transmit binary data over text-based protocols such as email and HTTP. The base64 encoding scheme uses a set of 64 characters to represent the 256 possible values of a binary sequence. 1566 | 1567 | ### Understanding UTF-8 Encoding 1568 | 1569 | UTF-8 (Unicode Transformation Format 8-bit) is a variable-width character encoding that can represent any character in the Unicode standard. It is widely used in computer systems for encoding and representing text. UTF-8 uses a variable number of bytes to represent each character, with ASCII characters represented by a single byte. This flexibility allows UTF-8 to support a vast range of characters from different languages and scripts. 1570 | 1571 | ## Syntax and Parameters of base64_decode_tostring() 1572 | 1573 | ### Syntax Conventions 1574 | 1575 | When using the base64_decode_tostring() function in KQL, it is essential to follow the syntax conventions to ensure accurate and error-free execution. The syntax for the base64_decode_tostring() function is as follows: 1576 | 1577 | base64_decode_tostring(base64_string) 1578 | 1579 | The function takes a single parameter, base64_string, which is the base64-encoded string that you want to decode into a UTF-8 string. 1580 | 1581 | ### Exploring the Parameters 1582 | 1583 | Let's take a closer look at the parameters of the base64_decode_tostring() function: 1584 | 1585 | base64_string: This parameter is of type string and is required. It represents the base64-encoded string that you want to decode into a UTF-8 string. 1586 | 1587 | ## Examples and Applications 1588 | 1589 | ### Decoding a Simple Base64 String 1590 | 1591 | To illustrate the usage of the base64_decode_tostring() function, let's consider a simple example. Suppose you have a base64-encoded string "S3VzdG8=". Using the function, you can decode it into the corresponding UTF-8 string. Here's how you can do it: 1592 | 1593 | print Quine = base64_decode_tostring("S3VzdG8=") 1594 | 1595 | The output of the above query will be: 1596 | 1597 | Kusto 1598 | 1599 | In this example, the base64-encoded string "S3VzdG8=" is decoded into the UTF-8 string "Kusto". 1600 | 1601 | ### Handling Invalid UTF-8 Encoding 1602 | 1603 | It's important to note that when decoding a base64 string, there might be cases where the resulting UTF-8 encoding is invalid. In such cases, the base64_decode_tostring() function returns null. Let's consider an example where we try to decode a base64 string generated from invalid UTF-8 encoding: 1604 | 1605 | print Empty = base64_decode_tostring("U3RyaW5n0KHR0tGA0L7Rh9C60LA=") 1606 | 1607 | The output of the above query will be: 1608 | 1609 | Empty 1610 | 1611 | \----- 1612 | 1613 | null 1614 | 1615 | In this example, the base64-encoded string "U3RyaW5n0KHR0tGA0L7Rh9C60LA=" represents an invalid UTF-8 encoding, and hence the function returns null. 1616 | 1617 | ## Related Functions and Use Cases 1618 | 1619 | ### base64_decode_toarray() 1620 | 1621 | In addition to the base64_decode_tostring() function, KQL also provides the base64_decode_toarray() function. This function allows you to decode a base64 string into an array of long values. It can be particularly useful when dealing with binary data or numeric representations encoded in base64 format. 1622 | 1623 | ### base64_encode_tostring() 1624 | 1625 | On the other hand, if you need to encode a string into base64 format, you can use the base64_encode_tostring() function. This function takes a UTF-8 string as input and returns the base64-encoded representation of the string. 1626 | 1627 | ## Best Practices for Using base64_decode_tostring() 1628 | 1629 | ### Performance Considerations 1630 | 1631 | While using the base64_decode_tostring() function, it's important to consider performance implications, especially when dealing with large datasets or frequent decoding operations. Here are a few best practices to optimize performance: 1632 | 1633 | Minimize unnecessary decoding: Only decode base64 strings when necessary. Avoid redundant or excessive decoding operations to improve query performance. 1634 | 1635 | Data type considerations: Ensure that the data type of the base64-encoded string field is consistent throughout your data set. Using consistent data types can facilitate efficient decoding and processing. 1636 | 1637 | Query optimization: Optimize your queries by leveraging query filters and aggregations to reduce the amount of data processed during decoding operations. 1638 | 1639 | ### Error Handling and Validation 1640 | 1641 | When working with the base64_decode_tostring() function, it's crucial to handle errors and validate the input data to ensure the integrity of your results. Here are some best practices for error handling and validation: 1642 | 1643 | Error handling: Handle potential errors caused by invalid base64 strings or unexpected input. Use appropriate error-handling mechanisms, such as try-catch blocks, to gracefully handle exceptions. 1644 | 1645 | Input validation: Validate the input data to ensure that it adheres to the expected format and encoding. Implement input validation checks to prevent potential issues caused by invalid or malformed data. 1646 | 1647 | ## Tips and Tricks for Advanced Usage 1648 | 1649 | ### Chaining Functions for Complex Decoding 1650 | 1651 | In some scenarios, you may need to perform complex decoding operations that involve multiple steps or transformations. KQL allows you to chain functions together to achieve these complex decoding tasks. For example, you can combine the base64_decode_tostring() function with other KQL functions to perform additional data manipulations or transformations. 1652 | 1653 | ### Handling Large Base64 Strings 1654 | 1655 | When dealing with large base64-encoded strings, it's important to consider memory and performance implications. To handle large base64 strings efficiently, you can leverage KQL's streaming and chunking capabilities. By processing the data in smaller chunks, you can minimize memory usage and optimize performance. 1656 | 1657 | ## Case Studies: Real-World Examples 1658 | 1659 | ### Decoding Base64 Strings in Log Analysis 1660 | 1661 | One common use case for the base64_decode_tostring() function is in log analysis. Many log files contain base64-encoded strings that need to be decoded for further analysis. By using the base64_decode_tostring() function in your log analysis queries, you can easily decode these strings and extract valuable insights from your log data. 1662 | 1663 | ### Base64 Decoding in Data Transformation Pipelines 1664 | 1665 | Data transformation pipelines often involve processing data from various sources, including base64-encoded strings. By incorporating the base64_decode_tostring() function into your data transformation pipelines, you can efficiently decode these strings and transform them into a more usable format for downstream processing. 1666 | 1667 | The base64_decode_tostring() function in KQL provides a powerful tool for decoding base64-encoded strings into UTF-8 format. By understanding the syntax, parameters, and best practices for using this function, you can unlock its full potential in your data exploration and analysis tasks. Whether you're working with log data, data transformation pipelines, or any other use case, the base64_decode_tostring() function will undoubtedly prove to be a valuable asset in your data processing toolkit. So, start leveraging its capabilities today and take your data analysis to new heights! 1668 | 1669 | # Working with JSON 1670 | 1671 | ## Querying JSON Data 1672 | 1673 | Once you've ingested JSON data, you can unleash the power of Kusto Query Language (KQL) to query and analyze the data. 1674 | 1675 | ### Extracting JSON Properties 1676 | 1677 | To extract specific JSON properties, you can use the extract_json function. This function allows you to extract values from JSON properties based on a JSONPath-like expression. Let's consider an example: 1678 | 1679 | SensorData 1680 | 1681 | \| extend Name = extract_json("\$.name", Data) 1682 | 1683 | \| extend Index = extract_json("\$.index", Data) 1684 | 1685 | In this example, we extract the "name" and "index" properties from the "SensorData" table's "Data" column using the extract_json function. This enables you to work with specific JSON properties in your queries. 1686 | 1687 | ### Filtering JSON Data 1688 | 1689 | When querying JSON data, you can apply filters to narrow down your results. For example, if you want to retrieve data where the "temperature" property is above a certain threshold, you can use the where operator: 1690 | 1691 | SensorData 1692 | 1693 | \| where Temperature \> 25 1694 | 1695 | This query filters the "SensorData" table to only include records where the "Temperature" property is greater than 25. By applying filters, you can focus on the specific data that meets your criteria. 1696 | 1697 | ### Aggregating JSON Data 1698 | 1699 | KQL allows you to aggregate JSON data using various aggregation functions. For example, you can calculate the average temperature and humidity for each device in the "SensorData" table: 1700 | 1701 | SensorData 1702 | 1703 | \| summarize AvgTemperature = avg(Temperature), AvgHumidity = avg(Humidity) by DeviceId 1704 | 1705 | By using the summarize operator with aggregation functions like avg, you can derive meaningful insights from your JSON data. Aggregating data helps you understand trends and patterns within your dataset. 1706 | 1707 | ## Best Practices for Optimizing JSON Processing 1708 | 1709 | To optimize JSON processing, it's essential to follow best practices that improve query performance and reduce resource consumption. Here are some key recommendations: 1710 | 1711 | ### Early Filtering 1712 | 1713 | When working with CPU-intensive functions like parsing JSON or XML, it's best to apply filtering conditions early in your query. By filtering out irrelevant records before executing CPU-intensive functions, you can significantly improve performance. For example: 1714 | 1715 | SensorData 1716 | 1717 | \| where EventID == 8002 1718 | 1719 | \| where EventData !has "%SYSTEM32" 1720 | 1721 | \| extend Details = parse_xml(EventData) 1722 | 1723 | \| extend FilePath = tostring(Details.UserData.RuleAndFileData.FilePath) 1724 | 1725 | \| extend FileHash = tostring(Details.UserData.RuleAndFileData.FileHash) 1726 | 1727 | \| where FileHash != "" and FilePath !startswith "%SYSTEM32" 1728 | 1729 | \| summarize count() by FileHash, FilePath 1730 | 1731 | In this query, the where conditions are applied before parsing XML, filtering out irrelevant records early in the process. 1732 | 1733 | ### Use Effective Aggregation Functions 1734 | 1735 | When aggregating JSON data, choose the most efficient aggregation functions for your specific use case. KQL provides functions like max, sum, count, and avg that have low CPU impact. Utilize these functions whenever possible. Additionally, consider using functions like dcount, which provide approximate distinct count values without counting each value individually. 1736 | 1737 | ### Avoid Full JSON Parsing 1738 | 1739 | Full parsing of complex JSON objects can consume significant CPU and memory resources. In cases where you only need a few parameters from the JSON data, it's more efficient to parse them as strings using the parse operator or other text parsing techniques. This approach can provide a significant performance boost, especially when dealing with large JSON datasets. 1740 | 1741 | ## Advanced JSON Processing Techniques 1742 | 1743 | KQL offers advanced JSON processing capabilities beyond basic querying and filtering. Let's explore some of these techniques: 1744 | 1745 | ### Handling JSON Arrays 1746 | 1747 | When working with JSON arrays, you can use functions like mv-expand to expand array elements into separate records. This allows you to perform operations on individual array elements. For example: 1748 | 1749 | SensorData 1750 | 1751 | \| mv-expand Data 1752 | 1753 | \| extend Name = extract_json("\$.name", Data) 1754 | 1755 | In this query, the mv-expand function expands the JSON array elements in the "Data" column, enabling you to extract specific properties from each array element. 1756 | 1757 | ### Working with Nested JSON Objects 1758 | 1759 | KQL supports querying and manipulating nested JSON objects. You can access nested properties using dot notation or JSONPath-like expressions. For example: 1760 | 1761 | SensorData 1762 | 1763 | \| extend NestedProperty = Data.NestedObject.NestedProperty 1764 | 1765 | This query accesses the "NestedProperty" within a nested JSON object in the "Data" column. 1766 | 1767 | ### Joining JSON Data 1768 | 1769 | KQL allows you to join JSON data from multiple tables using the join operator. This enables you to combine data from different JSON sources based on common properties. For example: 1770 | 1771 | Table1 1772 | 1773 | \| join kind=inner (Table2) on \$left.CommonProperty == \$right.CommonProperty 1774 | 1775 | By joining JSON data, you can perform more complex analysis and derive insights from multiple data sources. 1776 | 1777 | ## Conclusion 1778 | 1779 | Optimizing JSON processing is crucial for efficient data analysis and improved query performance. 1780 | 1781 | # Time Series Analysis 1782 | 1783 | In the era of cloud services and IoT devices, businesses generate massive amounts of telemetry data. This data holds valuable insights that can be leveraged to monitor service health, track physical production processes, and identify usage trends. However, analyzing this data can be challenging without the right tools and techniques. This is where time series analysis comes into play. By utilizing the power of the Kusto Query Language (KQL), businesses can unlock the full potential of their time series data. 1784 | 1785 | In this section, we will delve into the world of time series analysis using KQL. We will explore the process of creating and analyzing time series, and highlight the key functions and operators that KQL offers for time series manipulation. By the end, you will have a solid understanding of how to harness the power of KQL to gain valuable insights from your time series data. 1786 | 1787 | ## Time Series Creation: Transforming Data into Actionable Insights 1788 | 1789 | The first step in time series analysis is to transform your raw telemetry data into a structured format that is suitable for analysis. KQL provides the make-series operator to simplify this process. It allows you to create a set of time series by partitioning the data based on specific dimensions and aggregating the values within each partition. 1790 | 1791 | Let's consider an example where we have a table demo_make_series1 containing web service traffic records. To create time series representing the traffic count partitioned by the operating system (OS), we can use the following KQL query: 1792 | 1793 | let min_t = toscalar(Perf \| summarize min(TimeGenerated)); 1794 | 1795 | let max_t = toscalar(Perf \| summarize max(TimeGenerated)); 1796 | 1797 | Perf 1798 | 1799 | \| make-series num=count() default=0 on TimeGenerated from min_t to max_t step 1h by ObjectName 1800 | 1801 | \| render timechart 1802 | 1803 | In this query, we use the make-series operator to create a set of time series, with each series representing the traffic count at regular intervals of 1 hour. The by ObjectName clause partitions the data based on the OS version. The resulting time series can be visualized using the render timechart command. 1804 | 1805 | ## Time Series Analysis Functions: Unveiling Patterns and Anomalies 1806 | 1807 | Once you have created the time series, KQL provides a range of functions to process and analyze them. These functions enable you to identify patterns, detect anomalies, and perform regression analysis on your time series data. 1808 | 1809 | ### Filtering: Smoothing the Noise 1810 | 1811 | Filtering is a common practice in time series analysis to remove noise and highlight underlying trends. KQL offers two filtering functions: series_fir() and series_iir(). 1812 | 1813 | series_fir(): This function applies a finite impulse response (FIR) filter to the time series. It is useful for calculating moving averages and detecting changes in the time series. 1814 | 1815 | series_iir(): This function applies an infinite impulse response (IIR) filter to the time series. It is commonly used for exponential smoothing and cumulative sum calculations. 1816 | 1817 | To demonstrate the filtering capabilities of KQL, let's apply a moving average filter to our time series: 1818 | 1819 | let min_t = toscalar(Perf \| summarize min(TimeGenerated)); 1820 | 1821 | let max_t = toscalar(Perf \| summarize max(TimeGenerated)); 1822 | 1823 | Perf 1824 | 1825 | \| make-series num=count() default=0 on TimeGenerated from min_t to max_t step 1h by ObjectName 1826 | 1827 | \| extend ma_num=series_fir(num, repeat(1, 5), true, true) 1828 | 1829 | \| render timechart 1830 | 1831 | In this example, we use the series_fir() function to calculate a moving average of the traffic count. The repeat(1, 5) argument specifies a filter of size 5, and true, true indicate that the filter is centered and symmetric. The resulting time series can be visualized using the render timechart command. 1832 | 1833 | ## Conclusion: Unleash the Power of Time Series Analysis with KQL 1834 | 1835 | Time series analysis is a powerful technique that allows businesses to gain valuable insights from their telemetry data. By leveraging the capabilities of the Kusto Query Language (KQL), you can transform raw data into actionable insights, identify patterns, detect anomalies, and make informed decisions based on the trends in your time series data. 1836 | 1837 | Now armed with this knowledge, you can unlock the power of time series analysis with KQL and gain valuable insights that drive your business forward. So go ahead, dive into your time series data, and discover the hidden patterns and anomalies that will propel your organization to new heights. 1838 | 1839 | # Exploring the Power of Regular Expressions in KQL 1840 | 1841 | ## Understanding Regular Expressions and their Syntax 1842 | 1843 | A regular expression is a sequence of characters that defines a pattern to be searched for within a piece of text. It provides a flexible and concise way to describe complex search patterns. In KQL, regular expressions are enclosed in forward slashes (/). For example, to search for a string that ends with the character "a", the query would look like this: sun:/.\*a/. 1844 | 1845 | To include special characters in a regex query, they must be escaped using the backslash (\\) character. For instance, to search for a string that ends with a dollar sign (\$), the query would be: sun:/.\*\\\\\$/. 1846 | 1847 | When dealing with multiple strings in a specific sequence, quotes can be used to enclose the strings. For example, to search for the sequence of strings "513", "10", and "512" within the TargetAttributeValue field, the query would be: rv43:(+"513"+"10"+"512"). 1848 | 1849 | ## The Power of Regular Expressions in Microsoft Sentinel 1850 | 1851 | In Microsoft Sentinel, regular expression queries are incredibly useful for filtering and searching events that match specific patterns. However, it's important to note that regular expression queries utilize more system resources compared to other types of queries, as they can't leverage the efficient data structures available in the index. Therefore, it's crucial to narrow the breadth of the search as much as possible by using time range and non-regex criteria terms. 1852 | 1853 | ## Leveraging the RE2 Syntax and Microsoft's RE2 Library 1854 | 1855 | To make the most of regular expressions in KQL, it's essential to familiarize yourself with the RE2 syntax. The RE2 syntax is the foundation for regex queries in Microsoft Sentinel. You can find a comprehensive guide to the RE2 syntax in the RE2 Syntax Wiki. 1856 | 1857 | Additionally, Microsoft provides a dedicated RE2 library for Azure Data Explorer (ADX), which includes a wide range of regex functions and operators. This library allows you to perform advanced pattern matching and extraction operations in KQL. You can find detailed information about the RE2 library in the Microsoft Documentation. 1858 | 1859 | By leveraging the RE2 syntax and Microsoft's RE2 library, you can harness the full power of regular expressions in KQL, enabling you to perform complex searches and data extractions with ease. 1860 | 1861 | ## Testing Regular Expressions in KQL 1862 | 1863 | To ensure the accuracy and effectiveness of regex patterns, it's crucial to test them before incorporating them into your queries. While various online tools are available for regex testing, you can also test your regex patterns directly within the KQL query window. 1864 | 1865 | Here's an example of how you can test a regex pattern in KQL: 1866 | 1867 | let Regex=@"(?i)attrib.\*\\+h\\\\"; 1868 | 1869 | let TestString="attribute +h\\"; 1870 | 1871 | print(iif(TestString matches regex Regex, true,false)); 1872 | 1873 | In this example, the Regex variable holds the regex pattern, and the TestString variable contains the string you want to test against the pattern. The print statement checks if the TestString matches the regex pattern and returns either true or false. 1874 | 1875 | By testing your regex patterns in KQL, you can ensure their accuracy and reliability before utilizing them in your production queries. 1876 | 1877 | ## Enhancing Detection Rules and Migrating from Other SIEM Tools 1878 | 1879 | Regular expressions are integral to creating effective detection rules in Microsoft Sentinel. When creating regex-based detection rules or migrating from other SIEM tools, it's crucial to thoroughly test your regex patterns. While regex queries provide powerful filtering capabilities, they also consume more system resources. Therefore, it's essential to strike a balance by combining time range and non-regex criteria terms to optimize performance. 1880 | 1881 | To ensure a smooth transition to Microsoft Sentinel, it's recommended to test and validate your regex patterns to ensure they work seamlessly within the Microsoft Sentinel environment. This will help you maintain the integrity and effectiveness of your detection rules while leveraging the power of regular expressions. 1882 | 1883 | TIP: Explore recent behavior of a specific service principal 1884 | 1885 | Name (Alias): Kristopher Bash (krbash) 1886 | 1887 | Why it should be run: This query looks at Microsoft Graph API requests in the past 3 days, for a specific service principal. To characterize the types of requests the service principal is used for, the query summarizes the count of requests for combinations of HTTP request method, and the segments of the RequestUri that identify the target of the operation. The URI is parsed by first cleaning the RequestUri string for consistency, and then extracting alpha characters following a ‘/’, and concatenating these segments. This transforms a RequestUri from: [https://graph.microsoft.com/beta/users/{id}/manager?\$select=displayName](https://graph.microsoft.com/beta/users/%7bid%7d/manager?$select=displayName) to: users/manager. 1888 | 1889 | Query: 1890 | 1891 | MicrosoftGraphActivityLogs 1892 | 1893 | \| where TimeGenerated \> ago(3d) 1894 | 1895 | \| where ServicePrincipalId == '9d6399dd-e9f6-4271-b3cb-c26e829ea3cf' 1896 | 1897 | \| extend path = replace_string(replace_string(replace_regex(tostring(parse_url(RequestUri).Path), @'(\\/)+','//'),'v1.0/',''),'beta/','') 1898 | 1899 | \| extend UriSegments = extract_all(@'\\/([A-z2]+\|\\\$batch)(\$\|\\/\|\\(\|\\\$)',dynamic([1]),tolower(path)) 1900 | 1901 | \| extend OperationResource = strcat_array(UriSegments,'/')\| summarize RequestCount=count() by RequestMethod, OperationResource 1902 | 1903 | ## Conclusion 1904 | 1905 | Regular expressions are a powerful tool for searching and filtering patterns within text data. In Microsoft Sentinel, the combination of regular expressions and KQL allows operators to extract valuable insights from deployed Azure resources. By understanding the syntax and performance characteristics of regular expressions, leveraging the RE2 library, and testing patterns in KQL, you can harness the full potential of regular expressions in your Microsoft Sentinel workflows. Whether you're creating detection rules or exploring log data, regular expressions in KQL provide a robust and flexible solution for pattern matching and data analysis. 1906 | 1907 | # Using the bin() KQL Function 1908 | 1909 | individual data points. Data analysts often need to aggregate data and calculate summary statistics to gain meaningful insights. One powerful tool for data aggregation in the Kusto Query Language (KQL) is the bin() function. We will explore the various applications of the bin() function and learn how to leverage its capabilities to analyze and summarize data effectively. 1910 | 1911 | ## What is the bin() Function? 1912 | 1913 | The bin() function in KQL is used to round values down to a specific bin size. It is commonly used in combination with the summarize by operator to group scattered data points into specific values. The syntax of the bin() function is as follows: 1914 | 1915 | bin(value, roundTo) 1916 | 1917 | Here, value represents the data point that needs to be rounded down, and roundTo indicates the bin size that divides the value. The bin() function returns the nearest multiple of roundTo below value. Null values, a null bin size, or a negative bin size will result in null. 1918 | 1919 | ## Numeric Binning with the bin() Function 1920 | 1921 | One common use case of the bin() function is to perform numeric binning. Let's consider an example to understand this better. Suppose we have a dataset that contains the sales revenue for each day. We want to group the revenue into bins based on a specific bin size, such as \$1000. We can achieve this using the bin() function in combination with the summarize by operator. Here's an example query: 1922 | 1923 | datatable(Date: datetime, Revenue: real) 1924 | 1925 | [ 1926 | 1927 | datetime(2023-01-01), 1200.50, 1928 | 1929 | datetime(2023-01-02), 2500.75, 1930 | 1931 | datetime(2023-01-03), 1800.25, 1932 | 1933 | datetime(2023-01-04), 3100.80, 1934 | 1935 | datetime(2023-01-05), 900.10 1936 | 1937 | ] 1938 | 1939 | \| summarize TotalRevenue = sum(Revenue) by bin(Revenue, 1000) 1940 | 1941 | The bin() function divides the revenue values into bins of size \$1000. The summarize operator calculates the total revenue for each bin. The output of this query will provide insights into the distribution of revenue across different bins, helping us identify trends and patterns in the data. 1942 | 1943 | ## Timespan Binning with the bin() Function 1944 | 1945 | In addition to numeric values, the bin() function can also be applied to timespan data. Let's say we have a dataset that contains the duration of phone calls made by customers. We want to group the call durations into specific time intervals, such as 5 minutes. We can achieve this using the bin() function with timespan values. Here's an example query: 1946 | 1947 | datatable(CallDuration: timespan) 1948 | 1949 | [ 1950 | 1951 | time(0h, 2m, 30s), 1952 | 1953 | time(0h, 7m, 45s), 1954 | 1955 | time(0h, 4m, 20s), 1956 | 1957 | time(0h, 10m, 15s), 1958 | 1959 | time(0h, 1m, 30s) 1960 | 1961 | ] 1962 | 1963 | \| summarize Count = count() by bin(CallDuration, 5m) 1964 | 1965 | In this query, the bin() function divides the call durations into bins of 5 minutes. The summarize operator calculates the count of calls for each bin. This analysis can help us identify the distribution of call durations and uncover any patterns or anomalies in the data. 1966 | 1967 | ## Datetime Binning with the bin() Function 1968 | 1969 | The bin() function is also useful for binning datetime values. Consider a scenario where we have a dataset that contains the timestamps of customer orders. We want to group the orders into specific time intervals, such as daily bins. We can accomplish this using the bin() function with datetime values. Here's an example query: 1970 | 1971 | datatable(OrderTime: datetime) 1972 | 1973 | [ 1974 | 1975 | datetime(2023-01-01 10:00:00), 1976 | 1977 | datetime(2023-01-01 14:30:00), 1978 | 1979 | datetime(2023-01-02 11:45:00), 1980 | 1981 | datetime(2023-01-02 13:15:00), 1982 | 1983 | datetime(2023-01-03 09:20:00) 1984 | 1985 | ] 1986 | 1987 | \| summarize Count = count() by bin(OrderTime, 1d) 1988 | 1989 | In this query, the bin() function divides the order timestamps into daily bins. The summarize operator calculates the count of orders for each bin. This analysis provides insights into the daily order volume, helping us understand customer behavior and plan inventory accordingly. 1990 | 1991 | ## Pad a Table with Null Bins 1992 | 1993 | In some cases, there may be missing data points for certain bins in a table. To ensure a complete representation of all bins, we can pad the table with null values for the missing bins. Let's consider an example where we have a dataset of website visits, and we want to analyze the number of visits for each day of the week. Here's an example query: 1994 | 1995 | datatable(Date: datetime, Visits: int) 1996 | 1997 | [ 1998 | 1999 | datetime(2023-01-01), 500, 2000 | 2001 | datetime(2023-01-03), 800, 2002 | 2003 | datetime(2023-01-04), 600, 2004 | 2005 | datetime(2023-01-06), 1200, 2006 | 2007 | datetime(2023-01-07), 900 2008 | 2009 | ] 2010 | 2011 | \| summarize Visits = sum(Visits) by bin(Date, 1d) 2012 | 2013 | \| range d from datetime(2023-01-01) to datetime(2023-01-07) step 1d 2014 | 2015 | \| join kind=leftouter (datatable(Date: datetime) [d]) on Date 2016 | 2017 | \| order by Date asc 2018 | 2019 | In this query, we first use the summarize operator with the bin() function to calculate the total visits for each day. Then, we use the range operator to generate a table with all the dates in the desired range. Finally, we join the generated table with the summarized data using a left outer join to include null values for the missing days. This ensures that all days of the week are represented in the output, even if there were no visits on certain days. 2020 | 2021 | ## Conclusion 2022 | 2023 | The bin() function in KQL is a powerful tool for data aggregation and summarization. It allows us to group data into specific bins based on numeric, timespan, or datetime values. By leveraging the bin() function in combination with other operators like summarize, we can gain valuable insights from our data and uncover trends and patterns. Whether it's analyzing sales revenue, call durations, or customer orders, the bin() function provides a flexible and efficient way to aggregate and summarize data in KQL. 2024 | 2025 | So, the next time you're working with data in KQL, remember to harness the power of the bin() function to unlock deeper insights and make more informed decisions based on your data. 2026 | 2027 | # Understanding Functions in Kusto Query Language 2028 | 2029 | ## Introduction to Functions 2030 | 2031 | In Kusto, functions are reusable subqueries or query parts that can be defined as part of the query itself or stored as part of the database metadata. Functions are invoked through a name, provided with input arguments, and produce a single value based on their body. They can be categorized into two types: built-in functions and user-defined functions. 2032 | 2033 | ## Built-in Functions in Kusto 2034 | 2035 | Built-in functions are hard-coded functions defined by Kusto and cannot be modified by users. These functions provide a wide range of functionalities, such as mathematical operations, string manipulations, date and time calculations, and aggregations. Kusto provides a comprehensive library of built-in functions that can be directly used in queries. 2036 | 2037 | ## User-defined Functions in Kusto 2038 | 2039 | User-defined functions are created by users and can be divided into two types: stored functions and query-defined functions. 2040 | 2041 | ### Stored Functions 2042 | 2043 | Stored functions are user-defined functions that are stored and managed as database schema entities, similar to tables. They can be used across multiple queries and provide a way to encapsulate complex logic or calculations. To create a stored function, the .create function command is used. 2044 | 2045 | ### Query-defined Functions 2046 | 2047 | Query-defined functions are user-defined functions that are defined and used within the scope of a single query. These functions are created using the let statement and are not stored as separate entities in the database schema. Query-defined functions are useful when a specific calculation or subquery needs to be reused multiple times within a single query. 2048 | 2049 | ## Syntax and Naming Conventions for User-defined Functions 2050 | 2051 | User-defined functions in Kusto follow a specific syntax and naming conventions. The function name must follow the same identifier naming rules as other entities in Kusto. The name of the function should be unique within its scope of definition. 2052 | 2053 | let function_name = (input_arguments) { 2054 | 2055 | // Function body 2056 | 2057 | }; 2058 | 2059 | The function name is followed by a set of parentheses enclosing the input arguments, and the function body is defined within curly braces. The input arguments can be scalar or tabular, and their types need to be specified. Scalar arguments can also have default values. 2060 | 2061 | ## Scalar Functions in KQL 2062 | 2063 | Scalar functions are user-defined functions that have zero input arguments or all input arguments as scalar values. These functions produce a single scalar value and can be used wherever a scalar expression is allowed. Scalar functions can only refer to tables and views that are in the accessible schema and can utilize the row context in which they are defined. 2064 | 2065 | ## Tabular Functions in KQL 2066 | 2067 | Tabular functions, on the other hand, accept one or more tabular input arguments and zero or more scalar input arguments. They produce a single tabular value as output. Tabular functions are useful when working with complex data structures or when multiple rows of data need to be returned. 2068 | 2069 | ## Creating and Declaring User-defined Functions 2070 | 2071 | To create a user-defined function in Kusto, the let statement is used. The let statement allows us to define variables and functions within a query. Here's an example of creating a simple user-defined function: 2072 | 2073 | let addNumbers = (a: int, b: int) { 2074 | 2075 | a + b 2076 | 2077 | }; 2078 | 2079 | In this example, the function addNumbers takes two integer input arguments, a and b, and returns their sum. The function can be invoked by calling its name and passing the required arguments. 2080 | 2081 | ## Invoking User-defined Functions 2082 | 2083 | User-defined functions can be invoked within a query by calling their name and providing the required arguments. The invocation syntax varies depending on whether the function expects scalar or tabular arguments. 2084 | 2085 | ### Invoking Functions without Arguments 2086 | 2087 | To invoke a function that doesn't require any arguments, simply call the function's name followed by parentheses: 2088 | 2089 | let helloWorld = () { 2090 | 2091 | "Hello, World!" 2092 | 2093 | }; 2094 | 2095 | print helloWorld() 2096 | 2097 | In this example, the function helloWorld doesn't require any arguments. It returns the string "Hello, World!" when invoked. 2098 | 2099 | ### Invoking Functions with Scalar Arguments 2100 | 2101 | For functions that expect scalar arguments, the arguments should be provided within the parentheses when invoking the function: 2102 | 2103 | let addNumbers = (a: int, b: int) { 2104 | 2105 | a + b 2106 | 2107 | }; 2108 | 2109 | print addNumbers(5, 3) 2110 | 2111 | In this case, the function addNumbers expects two integer arguments, a and b. When invoked with the values 5 and 3, it returns the sum of the two numbers, which is 8. 2112 | 2113 | ## Default Values in Functions 2114 | 2115 | User-defined functions in Kusto can have default values for their scalar input arguments. Default values are specified after the argument type and are used when the argument is not provided during the function invocation. Here's an example: 2116 | 2117 | let greetUser = (name: string = "Guest") { 2118 | 2119 | strcat("Hello, ", name, "!") 2120 | 2121 | }; 2122 | 2123 | print greetUser() 2124 | 2125 | In this case, the function greetUser has a default value for the name argument, which is "Guest". If the function is invoked without providing a value for name, it will use the default value and return the string "Hello, Guest!". 2126 | 2127 | By understanding the concept of functions in Kusto Query Language and how to create and use them effectively, you can enhance the reusability and organization of your queries. Functions provide a way to encapsulate complex logic and calculations, making your queries more efficient and maintainable. Experiment with different types of functions and explore the vast library of built-in functions to unleash the full power of KQL in your data analysis workflows. 2128 | 2129 | Remember, functions are just one aspect of KQL, and there is much more to explore and learn. As you delve deeper into Kusto, your ability to leverage its capabilities will expand, enabling you to extract valuable insights from your data with ease. Happy querying! 2130 | 2131 | # How to Use the KQL Materialize Function 2132 | 2133 | In the world of data analysis and query optimization, finding efficient ways to speed up queries and improve performance is crucial. One powerful tool that can help achieve this is the KQL Materialize Function. In this comprehensive guide, we will explore the ins and outs of using the Materialize Function in Kusto Query Language (KQL) and how it can significantly enhance query execution speed. 2134 | 2135 | ## What is the KQL Materialize Function? 2136 | 2137 | The Materialize Function in KQL is designed to capture the value of a tabular expression for the duration of a query execution. By caching the results of a tabular expression, the Materialize Function allows you to reference the cached data multiple times without the need for recalculation. This can be particularly useful when dealing with heavy calculations or non-deterministic expressions. 2138 | 2139 | ### Syntax and Parameters 2140 | 2141 | The syntax of the Materialize Function in KQL is straightforward: 2142 | 2143 | materialize(expression) 2144 | 2145 | The only parameter required by the Materialize Function is the tabular expression that you want to evaluate and cache during query execution. The expression can be any valid KQL query or operation that generates a tabular result. 2146 | 2147 | ## Advantages of Using the Materialize Function 2148 | 2149 | The Materialize Function offers several advantages that can significantly improve query performance and optimize resource usage. Let's explore some key benefits: 2150 | 2151 | ### Speeding up Queries with Heavy Calculations 2152 | 2153 | Queries that involve complex calculations can be time-consuming, especially when the same calculations need to be repeated multiple times within the query. By using the Materialize Function, you can avoid redundant calculations by evaluating the expression only once and referencing the cached result throughout the query. This can lead to substantial time savings, especially for queries with computationally intensive operations. 2154 | 2155 | ### Efficient Evaluation of Non-Deterministic Expressions 2156 | 2157 | Non-deterministic expressions, such as those involving the rand() or dcount() functions, can produce different results each time they are evaluated. In such cases, it is crucial to evaluate the expression only once and use the same result throughout the query. The Materialize Function allows you to achieve this by caching the tabular expression and referencing it multiple times. This ensures consistent results and avoids unnecessary recalculations. 2158 | 2159 | ### Reduced Resource Consumption 2160 | 2161 | By caching the results of a tabular expression, the Materialize Function reduces the overall resource consumption during query execution. Instead of recalculating the expression each time it is referenced, the cached result is used, resulting in lower CPU and memory usage. This can be particularly beneficial in scenarios where large datasets or complex calculations are involved, as it helps optimize resource allocation and improves query performance. 2162 | 2163 | ## Performance Improvement Examples 2164 | 2165 | To better understand how the Materialize Function can improve query performance, let's explore a couple of examples: 2166 | 2167 | ### Example 1: Speeding up Queries with Heavy Calculations 2168 | 2169 | Suppose you have a query that performs heavy calculations on a tabular expression and uses the result multiple times. Without using the Materialize Function, the query would recalculate the expression for each reference, leading to redundant computations. However, by applying the Materialize Function, you can evaluate the expression once and reference the cached result, significantly reducing the query execution time. 2170 | 2171 | NOTE: Run the following at 2172 | 2173 | let \_detailed_data = materialize(StormEvents \| summarize Events=count() by State, EventType); 2174 | 2175 | \_detailed_data 2176 | 2177 | \| summarize TotalStateEvents=sum(Events) by State 2178 | 2179 | \| join (_detailed_data) on State 2180 | 2181 | \| extend EventPercentage = Events\*100.0 / TotalStateEvents 2182 | 2183 | \| project State, EventType, EventPercentage, Events 2184 | 2185 | \| top 10 by EventPercentage 2186 | 2187 | In this example, the \_detailed_data tabular expression is defined using the Materialize Function. As a result, the expression is calculated only once, improving the overall query performance. 2188 | 2189 | ### Example 2: Efficient Evaluation of Non-Deterministic Expressions 2190 | 2191 | Consider a scenario where you need to generate a set of random numbers and perform various calculations on the set. Without using the Materialize Function, each reference to the random number set would generate a new set of numbers, resulting in different results for each reference. However, by applying the Materialize Function, you can generate the random number set once and use the same set throughout the query, ensuring consistent results. 2192 | 2193 | let randomSet = materialize(range x from 1 to 3000000 step 1 \| project value = rand(10000000)); 2194 | 2195 | randomSet 2196 | 2197 | \| summarize Dcount=dcount(value) 2198 | 2199 | ; randomSet 2200 | 2201 | \| top 3 by value 2202 | 2203 | ; randomSet 2204 | 2205 | \| summarize Sum=sum(value) 2206 | 2207 | In this example, the randomSet tabular expression is generated using the Materialize Function. The same set of random numbers is used in multiple calculations, ensuring consistent results and optimizing query performance. 2208 | 2209 | ## Using Materialize() in Let Statements 2210 | 2211 | The Materialize Function can also be used in conjunction with let statements to give cached results a name. This can be particularly useful when you want to reference the cached result multiple times within the same query. Let's take a look at an example: 2212 | 2213 | let materializedData = materialize(AppAvailabilityResults \| where TimeGenerated \> ago(1d)); 2214 | 2215 | union (materializedData \| where AppRoleName !has "somestring" \| summarize dcount(ClientOS)), 2216 | 2217 | (materializedData \| where AppRoleName !has "somestring" \| summarize dcount(ClientCity)) 2218 | 2219 | In this example, the materializedData tabular expression is created using the Materialize Function. The cached result is then referenced twice within the union statement, allowing for efficient evaluation and improved query performance. 2220 | 2221 | ## Best Practices for Using Materialize() 2222 | 2223 | To make the most out of the Materialize Function in KQL, consider following these best practices: 2224 | 2225 | Push Operators that Reduce the Materialized Dataset: Whenever possible, apply operators that reduce the size of the materialized dataset. For example, use common filters on top of the same materialized expression. This helps optimize query performance by minimizing the amount of data that needs to be cached and processed. 2226 | 2227 | Use Materialize with Join or Union Operations: If your query involves join or union operations with mutual subqueries, make use of the Materialize Function. By materializing the shared subqueries, you can execute them once and reuse the results throughout the query. This can lead to significant performance improvements, especially when dealing with large datasets. 2228 | 2229 | Name Cached Results in Let Statements: When using the Materialize Function in let statements, give the cached result a name. This makes it easier to reference the cached data multiple times within the same query and improves query readability. 2230 | 2231 | ## Common Mistakes to Avoid 2232 | 2233 | While using the Materialize Function can greatly enhance query performance, it's important to be aware of common mistakes that can hinder its effectiveness. Here are some pitfalls to avoid: 2234 | 2235 | Exceeding the Cache Size Limit: The Materialize Function has a cache size limit of 5 GB per cluster node. Keep this in mind when designing queries that use the Materialize Function. If the cache reaches its limit, the query will abort with an error, impacting performance. 2236 | 2237 | Overusing Materialize: While the Materialize Function can improve performance, it should be used judiciously. Applying materialization to every expression in a query can lead to excessive resource consumption and may not always result in performance gains. Analyze your query requirements and apply materialization selectively where it provides the most benefit. 2238 | 2239 | Neglecting Query Semantics: When using the Materialize Function, it's essential to ensure that the semantics of your query are preserved. Be mindful of the operators and filters applied to the materialized expression, as they may affect the final results. Carefully review your query to ensure that the desired semantics are maintained. 2240 | 2241 | ## Conclusion 2242 | 2243 | The KQL Materialize Function is a powerful tool that can significantly enhance query performance and optimize resource usage. By caching the results of a tabular expression, the Materialize Function allows for efficient evaluation of heavy calculations and non-deterministic expressions. It reduces redundant computations, speeds up query execution, and improves resource allocation. 2244 | 2245 | When using the Materialize Function, remember to follow best practices, such as pushing operators that reduce the materialized dataset and using it with join or union operations. Avoid common mistakes like exceeding the cache size limit and neglecting query semantics. 2246 | 2247 | By harnessing the power of the Materialize Function, you can unlock the full potential of Kusto Query Language and take your data analysis and query optimization to new heights. 2248 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Addicted to KQL - the blog series, the book, the video channel, the merch store
2 |

Addicted to KQL

3 | This repository contains the code, queries, and eBook included as part of the Addicted to KQL series. The series is a continuing effort to discuss and educate about the power and simplicity of the Kusto Query Language.

4 | 5 | WARNING: This is an advanced KQL series. For beginning topics don't start here. Instead, see the original Must Learn KQL series. Come back when you're done. We'll be waiting for you. 6 |

7 | The series has it's own shortlink. To return back here, just remember the easy URL: https://aka.ms/Addicted2KQL 8 |

9 |

Table of Contents

10 | The following are links to the entire series so far:
11 | (links go live when each part/chapter is released) 12 | * Addicted to KQL Part 0: The Wit and Wisdom of Standard Columns in Azure Monitor Logs Posted March 16, 2022
13 | * Addicted to KQL Part 1: Parsing Unruly Data
14 | ** Addicted to KQL Part 1.a: Access sub-columns using the bag_unpack plugin - Posted April 18, 2022 by Gary Bushey
15 | * Addicted to KQL Part 2: Repeatable Repercussion - Building Functions
16 | * Addicted to KQL Part 3: Deep dive into Join
17 | * Addicted to KQL Part 4: REGEX
18 | * Addicted to KQL Part 5: Using External Data Sources
19 | * Addicted to KQL Part 6: Time Series - Azure KQL – Time After Time - Posted May 16, 2022 by Gary Bushey
20 | * Addicted to KQL Part 7: Working with IP Addresses - Azure KQL – Working with IP Addresses - Posted May 21, 2022 by Gary Bushey
21 | * Addicted to KQL Part 8: Fine-Tuning KQL Query Performance: Best Practices - Fine-Tuning KQL Query Performance: Best Practices - Posted April 25, 2025
22 | * Addicted to KQL Part 9: Using KQL for Hunting Operations
23 | * Addicted to KQL Part 10: Leveraging KQL to Analyze Malware Trends and Identify Recurring Threats - Leveraging KQL to Analyze Malware Trends and Identify Recurring Threats - Posted April 28, 2025
24 | * Addicted to KQL Part 11: Using KQL to Identify Suspicious Behavior - Using KQL to Identify Suspicious Behavior - Posted April 29, 2025
25 | 26 | NOTE: The series is currently being developed. The TOC may change dramatically prior to launch. 27 | 28 |

Addicted to KQL

29 | -------------------------------------------------------------------------------- /Series_Images/Addicted to KQL Promo Image Smaller.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/Addicted to KQL Promo Image Smaller.png -------------------------------------------------------------------------------- /Series_Images/Addicted to KQL Promo Image Smallest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/Addicted to KQL Promo Image Smallest.png -------------------------------------------------------------------------------- /Series_Images/Addicted to KQL Promo Image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/Addicted to KQL Promo Image.png -------------------------------------------------------------------------------- /Series_Images/Part0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/Part0.png -------------------------------------------------------------------------------- /Series_Images/StandardColumns.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/StandardColumns.png -------------------------------------------------------------------------------- /Series_Images/angelchristmas.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/angelchristmas.png -------------------------------------------------------------------------------- /Series_Images/christmastree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/christmastree.png -------------------------------------------------------------------------------- /Series_Images/dovechristmas.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/dovechristmas.png -------------------------------------------------------------------------------- /Series_Images/readme.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /Series_Images/seriesimage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/seriesimage.png -------------------------------------------------------------------------------- /Series_Images/seriesimagesmall.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/seriesimagesmall.png -------------------------------------------------------------------------------- /Series_Images/tree.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rod-trent/AddictedtoKQL/e3daf065454a9de79139edbc19faef52ee2c2ce9/Series_Images/tree.png --------------------------------------------------------------------------------