├── .DS_Store
├── Archive
    └── OCSF Schema Collaboration_ Initial Decisions.pdf
├── Articles
    ├── Defining and Using Observables.md
    ├── Patching Core Using Extensions.md
    ├── Profiles are Powerful.md
    └── Representing Process Parentage.md
├── Contributors.md
├── FAQs
    ├── How to Model Alerts in OCSF.md
    ├── README.md
    └── Schema FAQ.md
├── LICENSE
├── README.md
├── Understanding OCSF.md
└── Understanding OCSF.pdf


/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ocsf/ocsf-docs/100ce76a6a2657fadf5a11cedf8a7a84ac6dd76c/.DS_Store


--------------------------------------------------------------------------------
/Archive/OCSF Schema Collaboration_ Initial Decisions.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ocsf/ocsf-docs/100ce76a6a2657fadf5a11cedf8a7a84ac6dd76c/Archive/OCSF Schema Collaboration_ Initial Decisions.pdf


--------------------------------------------------------------------------------
/Articles/Defining and Using Observables.md:
--------------------------------------------------------------------------------
  1 | # Defining and Using Observables
  2 | Rick Mouritzen,
  3 | August 2024
  4 | 
  5 | Observables provide a way to enrich OCSF events so that important data can be easily found and queried rather than having to walk though the rather rich and potentially deeply nested structure of an event. Observables are _not_ meant as place to be information that not present in other locations in an event. It is, then, an optional query optimization that is common enough to warrant direct support in the OCSF schema. As an example, if one was looking across all events for presence if a set of IP addresses known to be indicators of compromise, instead of manually look for all occurrences of all attributes of type `ip_t`, or worse, all fields ending with `_ip`, one could query each events `observables` array for `type_id` 2 and the set of IP addresses.
  6 | 
  7 | Observables are defined in event class and object definitions in the OCSF metaschema. This is done by associating a data type, data attribute, event class, or object with an observable `type_id`. 
  8 | 
  9 | ## Defining Observables
 10 | The following ways to define observables are supported:
 11 | 
 12 | 1. Observable by dictionary type. All attributes of this type become observables. The generated observables include the attribute values.
 13 | 2. Observable by dictionary attribute. All instances of this attribute become observables. The generated observables include the attribute values.
 14 | 3. Observable by object. All attributes of this object type become observables. The generated observables do _not_ include a value.
 15 | 4. Observable by event class attribute. The attribute in the event class or its subtypes become an observable. Note that these are attributes defined at the base of the event, and not in nested structures. Also note that the attribute type can be any valid attribute type: a primitive, a primitive subtype, or an object.
 16 | 5. Observable by object attribute. The attribute in all instances of the object or its subtypes become observables. Note that these are attributes defined at the base of the object, and not in nested structures. Also note that the attribute type can be any valid attribute type: a primitive, a primitive subtype, or an object.
 17 | 6. Observable by class-specific attribute path. Attributes on the specified attributes path become observables. (The plural wording is used because a path element may refer to an array, resulting in multiple observables.)
 18 | 
 19 | In all cases, the definition of an observable is an integer number of OCSF type `integer_t`, a 32-bit signed integer. This number becomes the `observable` object's `type_id` value. The only values with a special meaning are the typical OCSF enum integer values of `0` (Unknown) and `99` (Other). There is no other special meaning or special ranges of values.
 20 | 
 21 | As with most things in OCSF, these definitions can be in the base of the core schema, one of the core schema extensions, or any other private extension (those extensions outside of the core schema).
 22 | 
 23 | As a historical note, definition types 1 and 3 have been in use since schema version 1.0, and the rest became available for use since 1.2.
 24 | 
 25 | ### Definition Example: Observable by Dictionary Type
 26 | Defining an observable by dictionary type is, naturally, done in a metaschema `dictionary.json` file. The definition is done by adding an `observable` field to a type definition.
 27 | 
 28 | This, along with defining observable objects (which are also a kind of type) are the broadest ways to define observables. All attributes of this type (regardless of attribute name) become observables.
 29 | 
 30 | Example in a `dictionary.json` file:
 31 | ```jsonc
 32 | {
 33 |   "name": "dictionary",
 34 |   // ... other dictionary fields (caption, description, attributes)
 35 |   "types": {
 36 |     // ... other "types" fields (caption, description)
 37 |     "attributes": {
 38 |       // ... other types
 39 |       "email_t": {
 40 |         
 41 |         // This is the observable definition for the email_t type, a subtype of string_t
 42 |         "observable": 5,
 43 | 
 44 |         "caption": "Email Address",
 45 |         "description": "Email address. For example: <code>john_doe@example.com</code>.",
 46 |         "regex": "^[a-zA-Z0-9!#$%&'*+-/=?^_`{|}~.]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$",
 47 |         "type": "string_t",
 48 |         "type_name": "String"
 49 |       },
 50 |       // ...
 51 |     }
 52 |   }
 53 | }
 54 | ```
 55 | 
 56 | ### Definition Example: Observable by Dictionary Attribute
 57 | Defining an observable by dictionary attribute is also done in a metaschema `dictionary.json` file. The definition is done by adding an `observable` field to an attribute definition. 
 58 | 
 59 | Definitions done this way limit the creation of observable instances of this specific attribute, regardless of where it is used.
 60 | 
 61 | Example in a `dictionary.json` file:
 62 | ```jsonc
 63 | {
 64 |   "name": "dictionary",
 65 |   // ... other dictionary fields (caption, description, types)
 66 |   "attributes": {
 67 |     // ... other attributes
 68 |     "cmd_line": {
 69 |       "caption": "Command Line",
 70 |       // ... other attribute fields
 71 | 
 72 |       // This is the observable definition for the cmd_line attribute
 73 |       "observable": 13
 74 |     },
 75 |     // ...
 76 |   },
 77 |   // ...
 78 | }
 79 | ```
 80 | 
 81 | ### Definition Example: Observable by Object
 82 | Defining an observable by object is done in a metaschema object definition file. The definition is done by adding an `observable` field directly in the object definition object. This also works for object definitions that extend another, including the special patch case. For both regular and patch extends cases, the observable definition adds or replaces any existing object-level observable definition.
 83 | 
 84 | This, along with defining observables by dictionary type, are the broadest ways to define observables. All instances attributes of this type (regardless of attribute name) become observables.
 85 | 
 86 | Example in object definition file `objects/container.json`:
 87 | ```jsonc
 88 | {
 89 |   "name": "container",
 90 | 
 91 |   // This is the observable definition for this object
 92 |   "observable": 27,
 93 | 
 94 |   // ... other object definition fields
 95 | }
 96 | ```
 97 | 
 98 | ### Definition Example: Observable by Event Class Attribute / Observable by Object Attribute
 99 | Observables can be defined in event class or object attributes. These are very similar and so are described together here. Defining observables this way limits the scope where these observables will occur. Note that definitions of this type are for attributes directly defined in the event class or object, not for attributes in a nested structure.
100 | 
101 | Defining observables in event class or object attributes works just as with any other attribute definition at these levels: the fields defined override the same field defined in the dictionary or any event classes / objects the current item is derived from.
102 | 
103 | Definitions of this type work for event class and object `extends` definitions both for the normal case (subclass / subtype) as well as the "patch extends" case, however "hidden" event classes and objects are not supported due to the potential of colliding observable `type_id` values. See [Appendix 2: Hidden Types](#appendix-2-hidden-types) for more about this issue.
104 | 
105 | Example in object definition file `objects/cve.json`:
106 | ```jsonc
107 | {
108 |   "name": "cve",
109 |   // ... other object definition fields
110 |   "attributes": {
111 |     // ... other attributes
112 |     "uid": {
113 |       "caption": "CVE ID",
114 |       // ... other attribute fields
115 | 
116 |       // This defines uid as an observable when directly inside a cve object
117 |       "observable": 18
118 |     },
119 |     // ...
120 |   },
121 |   // ...
122 | }
123 | ```
124 | 
125 | Event classes work identically: attribute observables are defined inside attribute details.
126 | 
127 | ### Definition Example: Observable by Class-Specific Attribute Path
128 | This last observable definition type allows defining an observable for an attribute inside a nested structure for an event class, though it can be used for attributes directly defined in the class (in this case being a alternative to defining event class attribute observables). 
129 | 
130 | This type of definition is done by adding an top-level `observables` field to the event class definition whose value is a JSON object that maps from attribute paths to observable `type_id` values.
131 | 
132 | Class-specific attribute path definitions also work for event class `extends` definitions both for the normal case (subclass / subtype) as well as the "patch extends" case. In the extends cases, the class-specific observable definitions replace a prior definition or add a new definition. 
133 | 
134 | Hidden event classes are not supported due to the potential of colliding observable `type_id` values. See [Appendix 2: Hidden Types](#appendix-2-hidden-types) for more about this issue. 
135 | 
136 | The attribute paths use a simple dotted notation. See [Appendix 1: Attribute Paths](#appendix-1-attribute-paths) for details about how attribute paths are defined in OCSF. Notably these attribute paths do _not_ contain array references. In this context it means all array items in the attribute path are considered.
137 | 
138 | The following is an example showing the definition pattern for observables by class-specific attributes in an example "Bag" event class that holds an array of items, where each item has a `type_id` field that we want to become observables.
139 | 
140 | Example in event class definition file `events/bag.json`:
141 | ```jsonc
142 | {
143 |   "name": "bag",
144 |   // ... other event class definition fields
145 |   "attributes": {
146 |     // ... other attributes
147 |     
148 |     // Here we add the items array: an array of item objects
149 |     "items": {
150 |       "requirement": "required"
151 |     }
152 |   },
153 | 
154 |   // Here we define the class-specific attribute observable as type_id 20
155 |   "observables": {
156 |     "items.type_id": 20
157 |   }
158 | }
159 | ```
160 | 
161 | This definition causes the `type_id` value of each item object to become an observable, but _only_ for events of the `bag` event class, and nowhere else.
162 | 
163 | ### Definition Nuances
164 | Defining observables has a few nuances, described in the next few sections.
165 | 
166 | #### Definition Nuance: Avoid Collisions
167 | When defining a new observable, be mindful of collisions. One can use the OCSF Server running with the core extensions plus any additional extensions that may be used in a specific environment, and then view all of the existing observable `type_id` values on the `observable` object page (https://schema.ocsf.io/objects/observable). Also note that for additions to the core schema, the commit process detects collisions by running the [OCSF Server](https://github.com/ocsf/ocsf-server) and running the [OCSF Validator](https://github.com/ocsf/ocsf-validator).
168 | 
169 | Developers of private extensions should be extra wary to avoid collisions. Unlike most other unique identifier integer values in extensions, observable values are _not_ modified with tricky multiplication. Consider using using high integer numbers related to the extension's `uid`. For example for extension `uid` `999`, the observable numbers could start from `999000` (extension `uid` times `1000` plus observable number). This sort of precaution could become important as OCSF gains acceptance across the industry, and publishing and accepting OCSF events generated with private extensions begins to occur.
170 | 
171 | #### Definition Nuance: Precedence and the "Use the Most General" Rule
172 | The definition types, as listed in the [Defining Observables](#defining-observables) section, establish a precedence with 1 being the most general to 6 being the most specific.
173 | 
174 | In cases where an OCSF event has an attribute that is affected by more than one observable definition, _the most general_ should be used.
175 | 
176 | #### Definition Nuance: Extends
177 | Observable definitions can be overridden in extensions (via `extends`) in event class and object definitions, including the special "patch" type of extends. This is also mentioned above, though repeated here to emphasize that this is a general rule for all types of observable definitions.
178 | 
179 | #### Definition Nuance: Hidden Types Are Not Supported
180 | Defining observables in definitions of hidden event classes or object is not supported as this would lead to colliding observables `type_id` values. See [Appendix 2: Hidden Types](#appendix-2-hidden-types) for more information. This is also mentioned above, though repeated here to emphasize that this is a general rule for all types of observable definitions.
181 | 
182 | #### Definition Nuance: Profiles
183 | Profile definitions can override attribute observables (as with any other attribute field), though cannot modify object or class-specific attribute path observables (the top-level `observable` definitions). In other words, with regard to observables profiles can only affect observables by attributes.
184 | 
185 | > [!NOTE]
186 | > Implementation-wise, this restriction may not be difficult to overcome, however it does become difficult to conceptualize and visualize the effect of a profile if it can affect a top level property of event class or object this way. Constraints, another top-level event class / object concept, are similarly not controllable by profiles.
187 | 
188 | #### Definition Nuance: New Object Observables Discouraged
189 | The OCSF community currently discourages defining objects as observables. The object observable merely indicates the presence of the object in the event on a specific path. A second query would be needed to interrogate the object. It is not terribly useful.
190 | 
191 | #### Definition Nuance: Observable Values Are Strings
192 | The `observable` object's `value` field is used for attributes that are primitive types (strings, numbers, booleans), subtypes of primitive types, and arrays of primitive types or primitive subtypes. The type of the `observable` object's `value` field is always of type `string_t`, and so string conversion may be needed. The `value` field is not populated for objects or arrays of objects.
193 | 
194 | For more, see [Populating The Value Field](#populating-the-value-field).
195 | 
196 | #### Definition Nuance: Event Class Observables Are Not Supported
197 | Defining an event class as an observable is not supported. This would be essentially redundant with the `type_uid` field, which already uniquely identifies an OCSF event's event class. In other words, a query for an event class's observable ID might as well query for event classes `type_uid`. Further, the OCSF community is trying to move away from observables by object, which this would be similar to, since these do not populate the `observable` object's `value` field.
198 | 
199 | The only thing this sort of definition would enable would be to detect events that are in an event class inheritance subtree at a finer grain than categories; a fairly esoteric use-case.
200 | 
201 | ## Using Observables
202 | Observables can be generated automatically, though of course can also be created manually. Creating observables automatically requires walking an event's structure along with a compiled schema, looking for observable definitions at each level of the structure, as well at each leaf (each primitive value).
203 | 
204 | A concrete example of this code can be found in the [`Schema` class](https://github.com/ocsf/ocsf-java-tools/blob/main/ocsf-schema/src/main/java/io/ocsf/schema/Schema.java) in the [`ocsf/ocsf-java-tools` repo](https://github.com/ocsf/ocsf-java-tools). Look at the `enrich(Map<String, Object>, boolean, boolean)` method and follow it along. As an aside, this same class and approach can be used to add enum sibling values, and indeed can (and should) be done in the same pass as adding observables.
205 | 
206 | Whether creating observables manually -- a part of mapping process -- or automatically, care must be taken while populating the `name` and `value` fields.
207 | 
208 | ### Populating Observable Path References
209 | The `observable` object's `name` attribute is an attribute path reference. See [Appendix 1: Attribute Paths](#appendix-1-attribute-paths) for details.
210 | 
211 | ### Populating the Value Field
212 | The `observable` object's `value` field should be populated for all primitive types (strings, numbers, booleans), _and_ for arrays of primitive types (see next paragraph). The `value` field is specifically not meant to be used for observable objects, nor for for arrays of objects.
213 | 
214 | For arrays of primitive types, one `observable` object should be created for each element of the array with the `value` field being set to the array element's value.
215 | 
216 | #### All Observable Values Are Strings
217 | The `observable` object's `value` is defined as a type `string_t`. A primitive value that is not a string (either `string_t` or a subtype of `string_t`) must be converted to string.
218 | 
219 | Suggested conversions of non-string values:
220 | * `integer_t` and `long_t`: base 10 string.
221 | * `float_t`: base 10 string using common standard library conversions, including exponential notation and "NaN".
222 | * `boolean_t`: the strings `"true"` and `"false"`.
223 | * Null should not be converted to string, but rather the encoding's equivalent of `null`. In other words, a null remains a null. In JSON encoded events, use `"value": null` and not `"value": "null"`.
224 | 
225 | > [!NOTE]
226 | > About `null`, it's weird. Don't overthink it. OCSF does not have a null type. In practice this means OCSF does not distinguish between a field that has a `null` value and a missing field. For observables, when creating them for primitive fields (like strings and numbers), if the field's value is `null`, then you may either set the `observable` object's `value` to `null` or not set the `value` field -- the meaning of each is equivalent. (This is not true in general. For those of that remember the XML era, distinguishing `null` from missing was one of the consistently annoying edge cases you'd have to always keep in mind.)
227 | 
228 | ## Appendix 1: Attribute Paths
229 | Attribute paths occur in two places: in the `observable` object's `name` attribute as a path reference to a field in the event, and in class-specific attribute observable definitions. In both cases the paths are the same. (Note: this is the only use of a JSON path-like capability in OCSF.)
230 | 
231 | The general pattern is dot-separated attribute names, for example `foo.bar`. Using the dot (".") as a separator works well because OCSF does not use dots in attribute names. There is no special notation for arrays, so these paths only tell us that a reference is for _one of_ the items along a path that includes one or more arrays.
232 | 
233 | These attribute path references are similar (at least in spirit) to [JSON Pointer](https://www.rfc-editor.org/rfc/rfc6901), [JSONPath](https://www.rfc-editor.org/rfc/rfc9535), and the syntax used by the [`jq` command-line tool](https://github.com/jqlang/jq), though simpler and notably without array notation.
234 | 
235 | Let's say we have a event with a nested structure as follows:
236 | ```jsonc
237 | {
238 |   // ... other event fields
239 |   "devices": [
240 |     {
241 |       "hostname": "mercury",
242 |       "network_interfaces": [
243 |         {
244 |           "ip": "10.0.0.5"
245 |         }
246 |       ]
247 |     },
248 |     {
249 |       "hostname": "venus",
250 |       "network_interfaces": [
251 |         {
252 |           "ip": "192.168.0.3"
253 |         },
254 |         {
255 |           "ip": "10.100.0.42"
256 |         }
257 |       ]
258 |     }
259 |   ]
260 | }
261 | ```
262 | 
263 | In this example, `ip` is an observable by dictionary type with `type_id` of `2`. The following shows what the observables for this event look like:
264 | 
265 | ```jsonc
266 | {
267 |   // ... other event fields
268 |   "devices": [
269 |     // ... as above
270 |   ],
271 |   "observables": [
272 |     {
273 |       "name": "devices.network_interfaces.ip",
274 |       "type_id": 2,
275 |       "value": "10.0.0.5"
276 |     },
277 |     {
278 |       "name": "devices.network_interfaces.ip",
279 |       "type_id": 2,
280 |       "value": "192.168.0.3"
281 |     },
282 |     {
283 |       "name": "devices.network_interfaces.ip",
284 |       "type_id": 2,
285 |       "value": "10.100.0.42"
286 |     }
287 |   ]
288 | }
289 | ```
290 | 
291 | Notice that the attribute path in each `name` field is the same; the positions in the `devices` and `network_interfaces` arrays is not included.
292 | 
293 | ## Appendix 2: Hidden Types
294 | It's a bit tedious to keep saying "event classes and objects". In Computer Science terms, these are both abstract data types, and specifically in object-oriented programming terms, their definitions are like classes. The OCSF terminology is a bit loose here. In this section, event class definitions and object definitions will simply be called types. Just note that OCSF also has primitive types (unstructured types) such as `string_t`, including subtypes of their primitive types like `email_t`.
295 | 
296 | > [!NOTE]
297 | > Hidden types work like "inheritance for implementation". They exist so commonalities can be placed in a shared definitions, however these definitions are removed from the final compiled schema. Hidden event class definitions occur for definitions _without_ a `uid`, other than the special `base_event` definition which doesn't have a `uid` defined, but ends up with an effective `uid` of `0`.  Hidden object definitions occur for object definitions where the `name` value has a leading underscore, for example `_hidden`.
298 | 
299 | The net effect of hidden types is that each type that is derived from a hidden type gets all of the inherited information as if it was copied in by hand, essentially replicating the information in the hidden type. For observables, this would mean the `type_id` values would be replicated, and thus cause collisions among the type derived from the hidden type. (The only case where this wouldn't happen would be a hidden event class or object with only a single derivation. This isn't a useful case in practice, however.)
300 | 
301 | This hidden type observable collision is detected and blocked for each of these cases:
302 | * A hidden event class definition with one or more attributes that define `observable`.
303 | * A hidden object definition with one or more attributes that define `observable`.
304 | * A hidden event class definition with class-specific attribute path observables defined via the top-level `observable` field.
305 | * A hidden object definition defining itself an observables via the top-level `observable` field.
306 | 
307 | ### Example Hidden Object
308 | This is a concrete example using a hidden object that tries to define itself an observable by object type. The other cases work similarly.
309 | 
310 | Let's say we have a hidden object definition with `name` `_foo`, as well as `bar` and `baz` object definitions that extend the hidden `_foo` object. Now let's say we want all instances of the `_foo` object to be an observable with `type_id` of `42`, so we add `"observable": 42,` to the `_foo` object's definition. Let's show this more concretely:
311 | 
312 | `objects/_foo.json` (by convention, file names match the `name` attribute in the definition):
313 | ```jsonc
314 | {
315 |   "name": "_foo",
316 |   "observable": 42,
317 |   // ... other object definition fields
318 | }
319 | ```
320 | 
321 | `objects/bar.json`
322 | ```jsonc
323 | {
324 |   "name": "bar",
325 |   "extends": "_foo",
326 |   // ... other definition fields
327 | }
328 | ```
329 | 
330 | `objects/baz.json`
331 | ```jsonc
332 | {
333 |   "name": "baz",
334 |   "extends": "_foo",
335 |   // ... other definition fields
336 | }
337 | ```
338 | 
339 | After compilation, `_foo` disappears, and both `bar` and `baz` are defined as observables by object with the `type_id` value `42`. This is a collision and if done manually would be flagged an an error between `bar` and `baz`.
340 | 
341 | What actually happens in this case is that when the hidden type definition like `_foo` is encountered, the OCSF Server and OCSF Validator ensure that it does not attempt to define observables of any kind.
342 | 


--------------------------------------------------------------------------------
/Articles/Patching Core Using Extensions.md:
--------------------------------------------------------------------------------
  1 | # Patching the Core Schema With Extensions
  2 | Paul Agbabian,
  3 | August 2024
  4 | 
  5 | Extensions have been around since the earliest days of OCSF.  We knew that it was impossible to cover every type of event in a standard way, and that vendors will have special event classes, attributes and objects that only pertain to their products.  Customers may have their own enrichment pipelines with specific attributes that need to be type-checked, etc.
  6 | 
  7 | A somewhat subtle and hidden feature of the extension mechanism is how the core schema itself can be patched, meaning added to, without having to create new classes and new objects.  That is to say, an extension can have new attributes and new objects added to existing core classes without having to create an extension class.  An extension can add new attributes to a core object or class without having to create a new object or class.  An extension can even create new data types for new attributes that can be added to existing core classes and objects.
  8 | 
  9 | Before we learn how to patch the core schema, we will review standard schema extensions.  You will see later that patching the core schema is much simpler than having to create a standard extension, if all you want to do is add some attributes and constraints to existing classes or objects.
 10 | 
 11 | ## Standard Approach for Extensions
 12 | 
 13 | Here is an example of a normal extension in a folder named `ocsf-extension` that adds an attribute, `a1` of type `string_t` via a new object that extends the core `metadata` object:
 14 | 
 15 | #### Extension example registration in `ocsf-extension/extension.json` file.  The name, version and uid are example names.
 16 | 
 17 | ```json
 18 | {
 19 |     "caption": "Generic Extension",
 20 |     "name": "extension",
 21 |     "version": "1.1.0",
 22 |     "uid": 500
 23 | }
 24 |   
 25 | ```
 26 | 
 27 | The `uid` and `version` are arbitrary here. For a real extension, you should request a unique extension ID via a Pull Request to the ocsf-schema repository.  This prevents collisions with other extensions.  Note that having a reserved public extension ID does not mean your actual extensions need be made public.
 28 | 
 29 | #### Dictionary entry in `ocsf-extension/dictionary.json` file.
 30 | 
 31 | ```json
 32 | {
 33 |     "caption": "Generic Extension Attribute Dictionary",
 34 |     "description": "The Attribute Dictionary defines schema attributes and includes references to the events and objects in which they are used.",
 35 |     "name": "dictionary",
 36 |     "attributes": {
 37 |       "a1": {
 38 |         "caption": "An attribute",
 39 |         "description": "A generic extension attribute.",
 40 |         "is_array": false,
 41 |         "type": "string_t"
 42 |       }
 43 |     }
 44 | }
 45 | ```
 46 | 
 47 | #### Object entry `extra_metadata.json` in `ocsf-extension/objects` subdirectory.
 48 | ```json
 49 | {
 50 |     "caption": "Extra Metadata",
 51 |     "description": "The Generic Extension Extra Metadata object.",
 52 |     "name": "extra_metadata",
 53 |     "extends": "metadata",
 54 |     "attributes": {
 55 |       "a1": {
 56 |         "requirement": "recommended"
 57 |       }
 58 |     }
 59 |   }
 60 |   
 61 | ```
 62 | 
 63 | This metaschema code will add a new object `extra_metadata` to the schema with a new attribute `a1` when the schema server loads and compiles the extension `ocsf-extension`.  If you are running your own schema server, you can do this at startup with the environment variable, or you can issue a reload at the Elixir command prompt: `Schema.reload(["extensions", "../ocsf-extension"])`, for this example, if your extension folder is parallel to the OCSF Schema repo folder.
 64 | 
 65 | If you do not include a caption or a description, the default is to use the caption or description of the object being extended.  The same holds for extension classes.
 66 | 
 67 | But, you may ask, how do I use this new `extra_metadata` object I created in my extension?  Although it is now visible to the schema server when loaded, it can only be used within a new extension class.  You would need to create a new class or extend an existing class, likely in the core schema, so that you can add it to that class.  Let's say you want to create a new Base class, and add it to that class.
 68 | 
 69 | First, you would need to add an attribute, for example `new_metadata`, to the extension dictionary of type `extra_metadata`:
 70 | 
 71 | ```json
 72 | {
 73 |     "caption": "Attribute Dictionary",
 74 |     "description": "The Attribute Dictionary defines schema attributes and includes references to the events and objects in which they are used.",
 75 |     "name": "dictionary",
 76 |     "attributes": {
 77 |         ...
 78 |       "new_metadata": {
 79 |         "caption": "New Metadata",
 80 |         "description": "A new metadata object that can work with a new base class.",
 81 |         "type": "extra_metadata"
 82 |       }
 83 |     }
 84 | }
 85 | ```
 86 | 
 87 | There are two ways of creating a new base class: either extend the core Base event class, or create a new base class for your extension.
 88 | 
 89 | To create a new base class, you would write something like the following metaschema code.  This example copies some of the core Base class attributes but not all, for example purposes.  Note the `new_metadata` attribute as well as the `"uid": 0"` statement.  As this is an extension, you must give the class an ID, which will be used to calculate a concrete class ID based on the extension master ID, in this case 500.  The core schema Base event class defaults to 0, as the core schema doesn't have an extension ID.  In most cases it is likely you will not be starting with a new base class in your extension, but rather extending the core Base event class, or some other core event class.
 90 | 
 91 | ```json
 92 | {
 93 |     "caption": "New Base Event",
 94 |     "category": "other",
 95 |     "uid": 0,
 96 |     "description": "The new base event is a generic and concrete event. It also defines a set of attributes available in most event classes. As a generic event that does not belong to any event category, it could be used to log events that are not otherwise defined by the schema.",
 97 |     "name": "new_base_event",
 98 |     "attributes": {
 99 |       "$include": [
100 |         "includes/classification.json",
101 |         "includes/occurrence.json"
102 |       ],
103 |       "message": {
104 |         "group": "primary",
105 |         "requirement": "recommended"
106 |       },
107 |       "new_metadata": {
108 |         "group": "context",
109 |         "requirement": "required"
110 |       },
111 |       "raw_data": {
112 |         "group": "context",
113 |         "requirement": "optional"
114 |       },
115 |       "severity": {
116 |         "group": "classification",
117 |         "requirement": "optional"
118 |       },
119 |       "severity_id": {
120 |         "group": "classification",
121 |         "requirement": "required"
122 |       },
123 |       "status": {
124 |         "group": "primary",
125 |         "requirement": "recommended"
126 |       },
127 |       "status_code": {
128 |         "group": "primary",
129 |         "requirement": "recommended"
130 |       },
131 |       "status_detail": {
132 |         "group": "primary",
133 |         "requirement": "recommended"
134 |       },
135 |       "status_id": {
136 |         "group": "primary",
137 |         "requirement": "recommended"
138 |       },
139 |       "unmapped": {
140 |         "group": "context",
141 |         "requirement": "optional"
142 |       }
143 |     }
144 |   }
145 |   
146 | ```
147 | 
148 | If you really just wanted to add the `new_metadata` attribute to the existing Base event class, you could extend the Base event class in your extension, rather than create a new one:
149 | 
150 | ```json
151 | {
152 |     "caption": "Extended Base Event",
153 |     "category": "other",
154 |     "uid": 1,
155 |     "description": "The Extended Base event adds new attributes to the core Base event.",
156 |     "extends": "base_event",
157 |     "name": "extended_base",
158 | 
159 |     "attributes": {
160 |         "new_metadata": {
161 |             "group": "context",
162 |             "requirement": "required"
163 |           }    
164 |     }
165 | }
166 | 
167 | ```
168 | 
169 | To keep things separated, this extended class uses a different `uid` value.  Note that only the `new_metadata` attribute was needed with this approach, as all of the standard Base event class attributes will still be present.
170 | 
171 | However, now you will have two metadata attributes, the core `metadata` attribute of type `metadata` as well as the extended metadata object `extra_metadata`.  Since `extra_metadata` extended `metadata` you now have two versions of most of the `metadata` attributes.  This is probably not what you wanted.  Maybe that's why you created a new class and copied most but not all of the attributes, so that you would only have one extended metadata object in the class.  You just wanted to add some extra attributes to the `metadata` object in the core schema's Base event class.  That's where Patching Extensions comes in.
172 | 
173 | ## A Patching Approach for Extensions
174 | 
175 | Let's now say that we just want the attribute `a1` to be directly added to the core `metadata` object.
176 | 
177 | ```
178 | {
179 |     "caption": "Metadata Extension",
180 |     "description": "The Generic Extension metadata object.",
181 |     "extends": "metadata",
182 |     "attributes": {
183 |       "a1": {
184 |         "requirement": "recommended"
185 |       }
186 |     }
187 | }
188 | ```
189 | The key enabling feature is the omission of the `name` field.  This indicates to the schema compilation process that no new class or object should be generated.
190 | 
191 | The metaschema code above will add a new attribute `a1` directly into the existing `metadata` object when the schema server loads the extension `ocsf-extension` and compiles the schema.  In this case, the caption and description in the extension object will be ignored by the server but are useful for documentation purposes.
192 | 
193 | I think you will agree this is a much simpler way to add attributes to an object in the core schema.  When your extension is loaded and compiled, they will be added to the core schema.  *There is no need to create a new class to add your attributes - they will be added directly into core.*
194 | 
195 | The same approach can be used to add attributes directly to a core event class.  You would just extend the core class without declaring a new `name` and add the attributes from your extension dictionary.
196 | 
197 | ```json
198 | {
199 |     "caption": "Patched Event",
200 |     "description": "The Extended Base event adds new attributes to the core Base event.",
201 |     "extends": "base_event",
202 | 
203 |     "attributes": {
204 |         "a1": {
205 |             "group": "context",
206 |             "requirement": "recommended"
207 |           }
208 |     }
209 | }
210 | 
211 | ```
212 | 
213 | The core Base event class will now have a new attributes from the `ocsf-extension` extension dictionary, `a1`, directly added to the class, rather than to the `metadata` object (you would need to remove the `metadata` patch extension otherwise you would also have `a1` in the object).  Pretty simple.  Again, note that `name` is omitted and the `caption` and `description` are useful for documentation only - they will not be used in the schema.
214 | 
215 | More generally, a patching extension will do the following things:
216 | - Profiles are merged
217 | - Attributes are merged
218 | - Class and object level observable definitions override / replace
219 | - Constraints override / replace
220 | - Caption and description are not patched but are good for documentation
221 | 
222 | ## Constraints in Patching Extensions
223 | 
224 | As of OCSF Schema Server vs. 2.72.0 patching extensions can add overriding constraints to core classes and objects.  This was always possible with ordinary extension classes and objects, but was not supported by the server for patching constraints until 2.72.0.  An example of this for `metadata` is shown below with two attributes, `a1` from the prior examples and `a2` so that a constraint on the two can be added.
225 | 
226 | ```
227 | {
228 |     "caption": "Attribute Dictionary",
229 |     "description": "The Attribute Dictionary defines schema attributes and includes references to the events and objects in which they are used.",
230 |     "name": "dictionary",
231 |     "attributes": {
232 |       "a1": {
233 |         "caption": "An attribute",
234 |         "description": "A generic extension attribute.",
235 |         "is_array": false,
236 |         "type": "string_t"
237 |       },
238 |       "a2": {
239 |         "caption": "A second attribute",
240 |         "description": "A second generic extension attribute",
241 |         "is_array": false,
242 |         "type": "integer_t"
243 |       }
244 |     }
245 | }
246 | ```
247 | 
248 | ```
249 | {
250 |     "description": "The Generic Extension metadata object.",
251 |     "extends": "metadata",
252 |     "attributes": {
253 |       "a1": {
254 |         "requirement": "recommended"
255 |       },
256 |       "a2": {
257 |         "requirement": "recommended"
258 |       }
259 |     },
260 | 
261 |     "constraints": {
262 |         "just_one": [
263 |           "a1",
264 |           "a2"
265 |         ]
266 |     }
267 |  }
268 | ```
269 | 
270 | Note that if existing parent class or object constraints exist, they will be removed and replaced unless they are also included in the extension.  Also note that multiple constraints may be applied, as long as they don't conflict.  For example:
271 | 
272 | ```json
273 | {
274 |     "caption": "Patched Event",
275 |     "description": "The Extended Base event adds new attributes to the core Base event.",
276 |     "extends": "base_event",
277 | 
278 |     "attributes": {
279 |         "a1": {
280 |             "group": "context",
281 |             "requirement": "recommended"
282 |           },
283 |           "a2": {
284 |             "group": "context",
285 |             "requirement": "recommended"
286 |           },
287 |           "b1": {
288 |             "group": "context",
289 |             "requirement": "recommended"
290 |           },
291 |           "b2": {
292 |             "group": "context",
293 |             "requirement": "recommended"
294 |           }    
295 |     },
296 |     "constraints": {
297 |         "just_one": [
298 |           "a1",
299 |           "a2"
300 |         ],
301 |         "at_least_one": [
302 |             "b1",
303 |             "b2"
304 |         ]
305 |     }
306 | 
307 | }
308 | ```
309 | 
310 | As can be seen in the above example, the same capability is possible when patching event classes; you can add constraints in the patching extension class and they will replace any constraints of the class being extended.
311 | 
312 | ## Conclusion
313 | Patching extensions are a powerful yet simple way to add attributes to existing core schema classes and objects rather than having to introduce new extension classes with distinct names and IDs.  With the most recent OCSF Schema Server update, patching extensions also support constraints which replace any constraints of the extended class or object.  Existing queries will not need to change with the caveat that new patching constraints need to be carefully considered as they can change the validation of the core classes.
314 | 


--------------------------------------------------------------------------------
/Articles/Profiles are Powerful.md:
--------------------------------------------------------------------------------
  1 | # Profiles are Powerful
  2 | Paul Agbabian
  3 | 
  4 | I’ve mentioned OCSF Profiles in blogs, but I want to go into more detail here, as they are becoming more important and sometimes misunderstood as to how they can be constructed.  There are four ways of modeling using profiles:
  5 | 
  6 | 1. Augmentation profiles
  7 | 2. Native profiles
  8 | 3. Partially native profiles
  9 | 4. Hybrid profiles
 10 | 
 11 | An OCSF Profile is a framework construct that cuts across categories and classes to augment classes and objects with focused ‘mix-in’ attributes that better describe aspects of activities and findings in certain situations.  Rather than have an explosion of classes that combine attributes for these situations, profiles are an elegant way of reusing the semantics of fundamental classes without extending them with new classes. If you are a Java or C++ developer, they will resemble implementing additional interfaces on top of a class, and similarly, in OCSF, Profiles are an event type that cuts across event classes.
 12 | 
 13 | Hence a profile is two things: a mix-in attribute set and an alternate typing of the event class or object where it is registered.  This is accomplished via a “profiles” array at the head of the class or object.  The OCSF schema server will take care of filtering or augmenting classes and objects appropriately.  In this way, a related set of attributes can be added selectively independent of class or category when its type cross-cuts the structural taxonomy.  For example, the Host profile can be applied to the Network Activity category classes for host-based network activity coming from an EDR security agent.  Querying on events WHERE “Host” IN metadata.profiles[] retrieves all events from the System Activity category and the Network Activity classes.
 14 | 
 15 | ```
 16 | {
 17 |  "description": "The attributes that identify host/device attributes.",
 18 |  "meta": "profile",
 19 |  "caption": "Host",
 20 |  "name": "host",
 21 |  "annotations": {
 22 |    "group": "primary"
 23 |  },
 24 |  "attributes": {
 25 |    "device": {
 26 |      "requirement": "recommended"
 27 |    },
 28 |    "actor": {
 29 |      "requirement": "optional"
 30 |    }
 31 |  }
 32 | }
 33 | ```
 34 | 
 35 | ## Augmentation Profiles
 36 | 
 37 | The most common way of designing and using a profile is to define it in the metaschema profiles folder via a profile name and the profile attributes, as above; then declare the profile in the class or object, and finally include the profile to bring in its attributes, as below; the attributes will be added when the profile is applied to an event class or object.  
 38 | 
 39 | ```
 40 | {
 41 |  "caption": "Network",
 42 |  "category": "network",
 43 |  "description": "Network event is a generic event that defines a set of attributes available in the Network category.",
 44 |  "extends": "base_event",
 45 |  "name": "network",
 46 |  "profiles": [
 47 |    "host",
 48 |    "network_proxy",
 49 |    "security_control",
 50 |    "load_balancer"
 51 |  ],
 52 |  "attributes": {
 53 |    "$include": [
 54 |      "profiles/host.json",
 55 |      "profiles/network_proxy.json",
 56 |      "profiles/security_control.json",
 57 |      "profiles/load_balancer.json"
 58 |    ],
 59 | ...
 60 | ```
 61 | 
 62 | This is the augmentation profile approach.  When the profile is enabled in the schema browser, the respective classes and objects are augmented with the profile attributes, and schema samples will include the profile name in the metadata.profiles[] array, effectively typing the event or object as a kind of the profile.  
 63 | 
 64 | ```
 65 | {
 66 |   "type_name": "Network Activity: Open",
 67 |   "activity_id": 4,
 68 |   "type_uid": 400104,
 69 |   "class_uid": 4001,
 70 |   "category_uid": 4,
 71 |   "class_name": "Network Activity",
 72 |   "metadata": {
 73 |     "version": "1.1.0",
 74 |     "profiles": ["host"]
 75 |   }
 76 |   "category_name": "Network Activity",
 77 |   ...
 78 | ```
 79 | 
 80 | All events matching the profile will be returned if an event is queried by its profile name, irrespective of class or category.  However, there are three other ways to use profiles in the schema.
 81 | 
 82 | ## Native Profiles
 83 | 
 84 | The second approach is where the attributes of a profile definition are already natively defined within the event class or object.  Think of this as the built-in or native profile approach.  For the profiles system and typing to be consistent, those classes and objects must declare the profile within the class as with the augmentation approach. Still, there is no need to include the profile in the attributes section since those attributes (in the case of the Host profile, actor and device) are already defined there.
 85 | 
 86 | ```
 87 | {
 88 |  "caption": "System Activity",
 89 |  "category": "system",
 90 |  "extends": "base_event",
 91 |  "name": "system",
 92 |  "profiles": [
 93 |    "host",
 94 |    "security_control"
 95 |  ],
 96 |  "attributes": {
 97 |    "$include": [
 98 |      "profiles/security_control.json"
 99 |    ],
100 |    "actor": {
101 |      "group": "primary",
102 |      "requirement": "required"
103 |    },
104 |    "device": {
105 |      "group": "primary",
106 |      "requirement": "required"
107 |    }
108 |  }
109 | }
110 | ...
111 | ```
112 | 
113 | ## Partially Native Profiles
114 | 
115 | What happens when only some of the attributes of the profiles are native to an event class or object?  This is the partially native profile approach.  Using the augmentation profile approach, where the profile is $included into the class or object, the schema server will remove the native attributes when the profile is not applied, which isn’t what you would want.  For these cases, a “profile”: null statement should be added to the potentially affected native attribute, which tells the server to leave it alone regardless of the profile application.  In the example below, actor is native to the Authentication class, but device is not.  When the profile is applied, only device will be added, and when not applied, actor will stay put.
116 | 
117 | ```
118 | {
119 |  "caption": "Authentication",
120 |  "extends": "iam",
121 |  "name": "authentication",
122 |  "uid": 2,
123 |  "profiles": [
124 |    "host"
125 |  ],
126 |  "attributes": {
127 |    "$include": [
128 |      "profiles/host.json"
129 |    ],
130 |    "actor": {
131 |      "description": "The actor that requested the authentication.",
132 |      "group": "context",
133 |      "profile": null
134 |    },
135 | ...
136 | ```
137 | 
138 | ## Hybrid Profiles
139 | 
140 | Finally, what if a class or object wants to be considered as part of the profile family but wants to add new attributes that are only relevant to the one particular class or object?  This may sound a bit esoteric, but it has already been used in the resource_details object for the Cloud profile. When the Cloud profile is applied to classes with attributes of the resource_details object type, for example, API Activity, the cloud_partition and region attributes defined within the object are added, but only when the Cloud profile is applied to the class.  The event now includes the api and cloud attributes, while the resource_details object of the class adds the other two attributes - effectively creating a custom hybrid profile.
141 | 
142 | If you $included the profile attributes, as with the augmented profile, you would also get the Cloud profile’s attributes in the object as well as the class. You don’t want to duplicate those attributes applied by the profile to the class into the objects too.  To make the object’s native attributes aware of the profile (such that the server switches them on, and the event validator won’t complain), you add “profile”: <profile name> within your object’s attribute clause, as well as the usual declaration within the profiles array at the head of the class or object.
143 | 
144 | The example below assigns the Cloud profile to the specific native attributes cloud_partition and region of the Resource Details object.  These attributes are not part of the Cloud profile definition, so only this specific object will include them when the Cloud profile is applied to its enclosing class.  In this way, applying a profile can add its attributes to a class, and different attributes can be added to an object within that class.
145 | 
146 | ```
147 | {
148 |  "caption": "Resource Details",
149 |  "extends": "_resource",
150 |  "name": "resource_details",
151 |  "profiles": ["cloud"],
152 |  "attributes": {
153 |    "agent_list": {
154 |      "requirement": "optional"
155 |    },
156 |    "cloud_partition": {
157 |      "profile": "cloud",
158 |      "requirement": "optional"
159 |    },
160 |    "owner": {
161 |      "description": "The service or user account that owns the resource.",
162 |      "requirement": "recommended"
163 |    },
164 |    "region": {
165 |      "description": "The cloud region of the resource.",
166 |      "profile": "cloud",
167 |      "requirement": "optional"
168 |    },
169 | ...
170 | ```
171 | 


--------------------------------------------------------------------------------
/Articles/Representing Process Parentage.md:
--------------------------------------------------------------------------------
  1 | # Representing Process Parentage
  2 | Mitchell Wasson
  3 | February 2025
  4 | 
  5 | Effectively representing endpoint process parentage is frequently discussed, because the OCSF schema has several fields that support this use case (`actor.process`, `process.parent_process`, `process.lineage` and `process.ancestry`).
  6 | This article clarifies and expands on those discussions to provide presecriptive guidance for representing process parentage within the OCSF schema.
  7 | 
  8 | ## Actor/Creator or Parent
  9 | 
 10 | [Confusion on this topic]((https://github.com/ocsf/ocsf-schema/discussions/1194)) arises from the fact that the OCSF schema enables the simultaneous expression of `.actor.process` and `.process.parent_process` in `Process Activity: Launch` events.
 11 | People are usually wondering if they should put the launched process's parent in `.actor.process`, `.process.parent_process` or both.
 12 | Again, presecriptive guidance will be provided, but it is important to understand the difference between the actor (aka the "creator") and the parent in a process launch/creation event.
 13 | 
 14 | The creator is the process that initiated the creation of a new process with the endpoint operating system.
 15 | The parent is the process that the newly created process inherits properties from according to operating system rules.
 16 | The creator and the parent are _usually_ the same process.
 17 | Additionally, on many platforms they are guaranteed to always be the same.
 18 | 
 19 | However, this guarantee is not present on Windows.
 20 | [Pavel Yosifovich's blog on "Parent Process vs. Creator Process"](https://scorpiosoftware.net/2021/01/10/parent-process-vs-creator-process/) shows exactly how one can create a process on Windows with a parent different from the creator.
 21 | This is a straightforward technique and has many legitimate use cases.
 22 | 
 23 | Note this situation shouldn't be confused with creating a process through a layer of indirection or communicating with another common process that will create a process for you.
 24 | The above disinction of creator vs parent applies to mechanisms natively supported through operating system APIs that can't be modelled otherwise.
 25 | Process creation through a layer of process indirection (e.g. starting a shell to start your program) is modelled through two `Process Activity: Launch` events.
 26 | Asking another process to create a process is modelled through some sort of communication event and a single `Process Activity: Launch` event.
 27 | 
 28 | For `Process Activity: Launch` events, one should set both `.actor.process` and `.process.parent_process` if the ability to know both is present.
 29 | This will provide the visiblity to know when they differ.
 30 | However, your endpoint software must be aware of this difference in order to effectively populate both locations.
 31 | If your endpoint software only reports on parent, then only set `.process.parent_process`.
 32 | However, if your query patterns demand that `.actor.process` be set, you can duplicate the parent information there knowing that this information would be the same the majority of the time anyway.
 33 | Inform your downstream data consumers this approach is being taken so they do not rely on being able to detect when creator and parent differ.
 34 | 
 35 | Note that there is no explicit attribute for creator inside the process object.
 36 | As mentioned above, the creator and the parent will be the same process the majority of the time.
 37 | We currently believe it is sufficient to only provide one location in which this difference is expressed: the `Process Activity: Launch` event.
 38 | 
 39 | Depending on the situation, the creator may be desired instead of the parent.
 40 | When a data consumer is specifically interested in the creator, they should consult the `Process Activity: Launch` event for the process in question.
 41 | It is possible to add attributes for creator to the process object in the future.
 42 | However, the added value will need to be carefully weighed against the incurred bloat.
 43 | 
 44 | ## Extended Ancestry
 45 | 
 46 | OCSF primarily models process ancestry through process object recursion with the `process.parent_process` attribute.
 47 | In theory this recursion ends once you get to the root of the ancestry tree.
 48 | In practice, endpoint software must stop closer to the process in question.
 49 | 
 50 | Current guidance is to only populate `process.parent_process` for the top-level process object in an event.
 51 | This guidance is given in order to prevent deep nesting in events.
 52 | Additionally, there are diminishing returns to going further and further up the process tree.
 53 | 
 54 | The `parent_process` attribute in the top-level process object should be set if possible as the primary mechanism for communicating ancestry.
 55 | Parent process and all the fields in the process object are often critical context in security investigations.
 56 | 
 57 | When going beyond immediate parent, the OCSF 1.4 `process.ancestry` attribute should be used.
 58 | This attribute provides the ability to supply references to processes going up the process ancestry tree (e.g. parent, grandparent, great grandparent, ...).
 59 | The process entity objects in this array contain a small subset of process object atrtributes.
 60 | These fields are meant to enable a lookup of full process details and enable a basic preview of the process.
 61 | It is left up to the implementer to determine how far back to report process ancestry.
 62 | 
 63 | Prior to OCSF 1.4, the `process.lineage` field (now deprecated) enabled a preview of ancestry.
 64 | 
 65 | ## `Process Activity: Launch` Sample Event
 66 | 
 67 | Here is a `Process Activity: Launch` event that adheres to the above guidance on Actor/Creator, Parent and Ancestry.
 68 | Note that the creator and parent are different.
 69 | 
 70 | ```json
 71 | {
 72 |   "activity_id": 1,
 73 |   "activity_name": "Launch",
 74 |   "actor": {
 75 |     "process": {
 76 |       "ancestry": [
 77 |         {
 78 |           "cmd_line": "C:\\windows\\System32\\cmd.exe",
 79 |           "created_time": 1738156431386,
 80 |           "pid": 43548,
 81 |           "uid": "3831f89a-5b2c-8fc2-8396-794f0d877672"
 82 |         },
 83 |         {
 84 |           "cmd_line": "\"C:\\Program Files\\WindowsApps\\Microsoft.WindowsTerminal_1.21.3231.0_x64__8wekyb3d8bbwe\\WindowsTerminal.exe\" ",
 85 |           "created_time": 1737662946236,
 86 |           "pid": 10464,
 87 |           "uid": "9263fade-d82f-8780-95da-203a95408580"
 88 |         },
 89 |         {
 90 |           "cmd_line": "C:\\windows\\Explorer.EXE",
 91 |           "created_time": 1737662110682,
 92 |           "pid": 7956,
 93 |           "uid": "2d7576cb-b5f9-8eee-b209-2022749157d1"
 94 |         }
 95 |       ],
 96 |       "cmd_line": "cuckoo.exe  7956 powershell -command \"echo Hello.\"",
 97 |       "created_time": 1738271124947,
 98 |       "file": {
 99 |         "hashes": [
100 |           {
101 |             "algorithm": "SHA-256",
102 |             "algorithm_id": 3,
103 |             "value": "3dab543070797f16f874caf518807ca9d3e81376daf9eccce34d69bec2e8a7a5"
104 |           }
105 |         ],
106 |         "name": "cuckoo.exe",
107 |         "path": "C:\\Users\\test\\source\\repos\\cuckoo\\x64\\Release\\cuckoo.exe",
108 |         "size": 169472,
109 |         "type": "Regular File",
110 |         "type_id": 1
111 |       },
112 |       "name": "cuckoo.exe",
113 |       "pid": 61320,
114 |       "uid": "2b2de22a-2818-8819-8bdb-67379aa6b98b"
115 |     }
116 |   },
117 |   "category_name": "System Activity",
118 |   "category_uid": 1,
119 |   "class_name": "Process Activity",
120 |   "class_uid": 1007,
121 |   "device": {
122 |     "hostname": "test hostname",
123 |     "type": "Desktop",
124 |     "type_id": 2
125 |   },
126 |   "message": "Process 61320 (cuckoo.exe) created process 14148 (powershell.exe) as a child of process 7956 (explorer.exe).",
127 |   "metadata": {
128 |     "product": {
129 |       "name": "A creator-aware endpoint security product"
130 |     },
131 |     "version": "1.4.0"
132 |   },
133 |   "process": {
134 |     "ancestry": [
135 |       {
136 |         "cmd_line": "C:\\windows\\Explorer.EXE",
137 |         "created_time": 1737662110682,
138 |         "pid": 7956,
139 |         "uid": "2d7576cb-b5f9-8eee-b209-2022749157d1"
140 |       }
141 |     ],
142 |     "cmd_line": "powershell -command \"echo Hello.\"",
143 |     "created_time": 1738271124954,
144 |     "file": {
145 |       "hashes": [
146 |         {
147 |           "algorithm": "SHA-256",
148 |           "algorithm_id": 3,
149 |           "value": "3247bcfd60f6dd25f34cb74b5889ab10ef1b3ec72b4d4b3d95b5b25b534560b8"
150 |         }
151 |       ],
152 |       "name": "powershell.exe",
153 |       "path": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
154 |       "size": 450560,
155 |       "type": "Regular File",
156 |       "type_id": 1
157 |     },
158 |     "name": "powershell.exe",
159 |     "parent_process": {
160 |       "cmd_line": "C:\\windows\\Explorer.EXE",
161 |       "created_time": 1737662110682,
162 |       "file": {
163 |         "hashes": [
164 |           {
165 |             "algorithm": "SHA-256",
166 |             "algorithm_id": 3,
167 |             "value": "6b45e1b4d3af9ae92c6aec095579571317924f50d12f13f0ed6a82b91c6fab83"
168 |           }
169 |         ],
170 |         "name": "explorer.exe",
171 |         "path": "C:\\Windows\\explorer.exe",
172 |         "size": 5575536,
173 |         "type": "Regular File",
174 |         "type_id": 1
175 |       },
176 |       "name": "explorer.exe",
177 |       "pid": 7956,
178 |       "uid": "2d7576cb-b5f9-8eee-b209-2022749157d1"
179 |     },
180 |     "pid": 14148,
181 |     "uid": "a5f0e1f1-8e89-8b58-98ea-3d5ba1a3d07e"
182 |   },
183 |   "severity": "Informational",
184 |   "severity_id": 1,
185 |   "time": 1738271124958,
186 |   "type_name": "Process Activity: Launch",
187 |   "type_uid": 100701
188 | }
189 | ```
190 | 
191 | ## Pre-existing Processes
192 | 
193 | Endpoint security software may report on the existence of processes that were created while the security software was not running.
194 | For example, some processes will be created before endpoint security software is started at system boot.
195 | Care should be taken to correctly represent process ancestry in these situations.
196 | 
197 | First, the type of event used to report process existence is important.
198 | `Process Query: Query` events should be used when the existence of a process is reported, and `Process Activity: Launch` events should be used when a directly observed process creation is reported.
199 | If these two situations can't be distinguished in an endpoint dataset, then one may use `Process Activity: Launch` events for both cases.
200 | 
201 | Communicating this distinction through `Process Query: Query` and `Process Activity: Launch` event types is greatly preferred,
202 | because more information is available to endpoint security software if it directly observes a process creation.
203 | Most importantly, information on process creator is only available at process creation time.
204 | Additionally, if reporting on an already existing process, that process's parent may have terminated already and there will be little information available about it.
205 | Using two different event types allows one to easily communicate a different schemas to data consumers based on event type.
206 | 
207 | When reporting on already existing processes, `.actor.process` should not be supplied to reflect that no information on process creator is available.
208 | `Process Query: Query` does not have the `actor` attribute by default, so this advice only applies if enabling the Host profile.
209 | `.process.parent_process` should be set and populated if the reported process's parent hasn't terminated yet.
210 | If the parent process has terminated, then it is likely that only the parent's PID will be available.
211 | In this situation `.process.parent_process` may be set with only the `pid` attribute supplied, or the PID can be reported in the `process.ancestry` array.
212 | 
213 | ## `Process Query: Query` Sample Event
214 | 
215 | Here is a `Process Query: Query` event that adheres to the above guidance pre-existing processes where creator information is not available.
216 | 
217 | ```json
218 | {
219 |   "activity_id": 1,
220 |   "activity_name": "Query",
221 |   "category_name": "Discovery",
222 |   "category_uid": 5,
223 |   "class_name": "Process Query",
224 |   "class_uid": 5015,
225 |   "message": "Process 8732 (svchost.exe) is pre-existing.",
226 |   "metadata": {
227 |     "product": {
228 |       "name": "A creator-aware endpoint security product"
229 |     },
230 |     "version": "1.4.0"
231 |   },
232 |   "process": {
233 |     "ancestry": [
234 |       {
235 |         "cmd_line": "C:\\Windows\\system32\\services.exe",
236 |         "created_time": 1738348318013,
237 |         "pid": 864,
238 |         "uid": "88b1f102-1c47-8051-8f81-728b01bd7343"
239 |       },
240 |       {
241 |         "cmd_line": "wininit.exe",
242 |         "created_time": 1738348317989,
243 |         "pid": 952,
244 |         "uid": "06b14610-be3d-8a4a-b7c1-e41c25f0fbe3"
245 |       }
246 |     ],
247 |     "cmd_line": "C:\\Windows\\System32\\svchost.exe -k swprv",
248 |     "created_time": 1738349093722,
249 |     "file": {
250 |       "hashes": [
251 |         {
252 |           "algorithm": "SHA-256",
253 |           "algorithm_id": 3,
254 |           "value": "6fc3bf1fdfd76860be782554f8d25bd32f108db934d70f4253f1e5f23522e503"
255 |         }
256 |       ],
257 |       "name": "svchost.exe",
258 |       "path": "C:\\Windows\\System32\\svchost.exe",
259 |       "size": 57528,
260 |       "type": "Regular File",
261 |       "type_id": 1
262 |     },
263 |     "name": "svchost.exe",
264 |     "parent_process": {
265 |       "cmd_line": "C:\\Windows\\system32\\services.exe",
266 |       "created_time": 1738348318013,
267 |       "file": {
268 |         "hashes": [
269 |           {
270 |             "algorithm": "SHA-256",
271 |             "algorithm_id": 3,
272 |             "value": "1efd9a81b2ddf21b3f327d67a6f8f88f814979e84085ec812af72450310d4281"
273 |           }
274 |         ],
275 |         "name": "services.exe",
276 |         "path": "C:\\Windows\\System32\\services.exe",
277 |         "size": 716544,
278 |         "type": "Regular File",
279 |         "type_id": 1
280 |       },
281 |       "name": "services.exe",
282 |       "pid": 864,
283 |       "uid": "88b1f102-1c47-8051-8f81-728b01bd7343"
284 |     },
285 |     "pid": 8732,
286 |     "uid": "52bc2416-2c6b-811c-a031-43ba9b58ec7e"
287 |   },
288 |   "query_result": "Exists",
289 |   "query_result_id": 1,
290 |   "severity": "Informational",
291 |   "severity_id": 1,
292 |   "time": 1738792775574,
293 |   "type_name": "Process Query: Query",
294 |   "type_uid": 501501
295 | }
296 | ```
297 | 


--------------------------------------------------------------------------------
/Contributors.md:
--------------------------------------------------------------------------------
  1 | # Contributors
  2 | | Organization |
  3 | | ------------ |
  4 | 3OPS
  5 | Accenture
  6 | Akamai
  7 | Amazon
  8 | Anomali
  9 | Apple
 10 | Aqua Security
 11 | Aquia
 12 | Norwegian Labor Inspection Authority
 13 | Arctic Wolf
 14 | University of Arizona
 15 | Arklay
 16 | Atlassian
 17 | AT&T
 18 | IBM
 19 | Autodesk
 20 | Automax
 21 | Avalor
 22 | Aviz Networks
 23 | Axiad
 24 | Barracuda
 25 | Bestgate Engineering
 26 | Beyond Identity
 27 | Block
 28 | Blue Cycle
 29 | BlueVoyant
 30 | blyx
 31 | Bricklayer AI
 32 | Broadcom
 33 | IBM
 34 | Cargill
 35 | Carnegie Melon CERT
 36 | Cisco
 37 | Cloud Software Group
 38 | CloudFabrix
 39 | Cloudflare
 40 | Comcast
 41 | ComplianceCow
 42 | Cornell University
 43 | Crash Override
 44 | Cribl
 45 | Crogl
 46 | CrowdStrike
 47 | Crystal Matrix Software
 48 | Canadian Centre for Cyber Security
 49 | CyberActive
 50 | CyberArk
 51 | Cyber Sanik
 52 | Cybersixgill
 53 | Cyware
 54 | Darktrace
 55 | Databahn
 56 | Databricks
 57 | Decathlon
 58 | Deepwatch
 59 | Deloitte
 60 | Devo
 61 | Disney
 62 | DNIF
 63 | DoDIIS
 64 | DTEX Systems
 65 | Duo Security
 66 | AI EdgeLabs
 67 | Elastic
 68 | Elysium Analytics
 69 | Ermetic
 70 | eSentire
 71 | F5
 72 | Grab
 73 | Grafana
 74 | Hi Bob
 75 | Hunters
 76 | Bosch
 77 | Infopercept
 78 | Infosys
 79 | Innotac
 80 | Interpublic Group
 81 | Intuit
 82 | iSenpai
 83 | ITV
 84 | Jamf
 85 | Johns Hopkins Applied Physics Laboratory
 86 | JupiterOne
 87 | Keos Technology
 88 | KKR
 89 | Kyndryl
 90 | Lacework
 91 | Laminar Security
 92 | Leidos
 93 | LimaCharlie
 94 | MalwareBytes
 95 | ManTech
 96 | Mekanoid Corporation
 97 | MetricStream
 98 | Metron Security
 99 | Microsoft
100 | MITRE
101 | Monad
102 | MongoDB
103 | NETSCOUT
104 | Netskope
105 | Networkology
106 | Nokia
107 | Northeastern University
108 | NowSecure
109 | NXLog
110 | OISF
111 | Okta
112 | OLX
113 | Orca Security
114 | OSRS Group
115 | Own Company
116 | Palo Alto Networks
117 | Palosade
118 | Panther
119 | Planned Systems International
120 | Praxis Engineering
121 | Priam Cyber AI
122 | PrimeOrbit
123 | Prowler
124 | PWC
125 | Qualys
126 | QueryAI
127 | Rapid7
128 | RedBear
129 | Red Canary
130 | Reddit
131 | Red Panda
132 | Devoteam Revolve
133 | Ripjar
134 | Rooted Insights
135 | Sailpoint
136 | Salesforce
137 | Sandia National Laboratories
138 | SAS
139 | Scanner
140 | Secureworks
141 | Security Compass
142 | SecurityScorecard
143 | Securonix
144 | SeeMetrics
145 | Sekoia
146 | SentinelOne
147 | sFractal Consulting
148 | Sinko Sinko
149 | Skyscanner
150 | Snowflake
151 | SOC Prime
152 | Sophos
153 | Southern Company
154 | Splunk
155 | Stellar Cyber
156 | Stripe
157 | Sumo Logic
158 | Swimlane
159 | Symmetry Systems
160 | Syncbak
161 | Synqly
162 | SysDig
163 | Tanium
164 | Tarsal
165 | Tech Mahindra
166 | Tenable
167 | Tenzir
168 | Torq
169 | Trellix
170 | Trend Micro
171 | TUI
172 | University of Cincinatti
173 | Uptycs
174 | Vectra
175 | Veeam
176 | Venafi
177 | Verica
178 | Veriti
179 | Vipre
180 | VMWare
181 | WithSecure
182 | Wiz
183 | ZeroFox
184 | Zscaler
185 | Zurich Services
186 | 


--------------------------------------------------------------------------------
/FAQs/How to Model Alerts in OCSF.md:
--------------------------------------------------------------------------------
 1 | # How to Model Alerts with OCSF
 2 | 
 3 | Through version 1.3 of OCSF the concept of an alert, or alertable signal, was not explicitly defined. In practice any event might be considered an alert, if it was deemed important and in many cases worthy of some additional form of notification. All OCSF events have a required `severity_id` attribute. Elevated values such as `Major` or `Critical` could be interpreted to be alertable signals, indirectly. Examples of alertable signals then might be events with high severity, elevated risk scores, detected malware, MITRE ATT&CK annotations, and rule or policy violations.  Other alertable signals might be the creation of a Finding based on some type of analysis.
 4 | 
 5 | `Security Control` profile events, available since version 1.0, are often alerts (the profile was factored out from a series of 'Detection' events). More directly, `Detection Findings` when created might be considered alertable signals. However, there was no way to express the intent of the event producer or event mapper that the event should be interpreted as an alert. And yet alerts are extremely important and prevalent in security event processing.
 6 | 
 7 | In OCSF version 1.4, an explicit alertable signal is an event with the `is_alert` attribute set to `true`.  This is a newer attribute that is not required.  The intent of `is_alert = true` is to signal that the event may require immediate attention by its consumer, which might be a stream processor, a SIEM system or the product’s management console where an analyst can be notified, tickets created or the events prioritized.
 8 | 
 9 | Not all OCSF events are potential alertable signals, i.e. carry the `is_alert` attribute, and as of this writing only the `Detection Finding`, `Data Security Finding` event classes and the `Security Control` profile carry the `is_alert` boolean attribute.  Note the `Security Control` profile may be applied to many activity oriented classes, as well as the two aforementioned `Finding` classes, hence when applied any instances of these classes can be explicit alertable signals.
10 | 
11 | 
12 | ## Detection Finding and Security Control Alerts
13 | 
14 | There is a fundamental difference between `Detection Finding` (or `Data Security Finding`) events and `Security Control` profile augmented events (aside from when the profile is applied to `Detection Finding` or `Data Security Finding`).  
15 | 
16 | ### What is a Security Control?
17 | 
18 | `Security Control` profile attributes represent the augmentation of ordinary activities monitored by a technical security control program or sensor, or an access control system. This profile has been available since version 1.0 but has been expanded to address access control events and risk scoring.
19 | 
20 | An Intrusion Detection System (IDS), Intrusion Prevention System (IPS) sensor, firewall, anti-malware agent, or Data Loss Prevention (DLP) agent are security controls.  Role Based Access Control (RBAC) enforcement points are security controls. They monitor normal activities watching for suspicious or malicious activity, or policy violations. These controls usually emit single events that include the attempted activity (the class's `activity_id`) along with the control’s judgements (for example risk level or MITRE Techniques), and disposition of what was done in real time.  
21 | 
22 | For example, the file open activity was blocked, the control detected malware and quarantined the file. The file access was denied due to policy. A firewall rule was violated and the connection denied. These events are generally consumed by an incident management system or a SIEM in the form of alerts. In these example cases, the `is_alert` attribute should be set to `true` and the `severity_id` set to an appropriate level.  For example, if malware was detected but blocked, the severity might be low but the event may still be considered an alertable signal.
23 | 
24 | ### What is a Detection Finding?
25 | 
26 | `Detection Finding` is a complex class that evolved from the original `Security Finding` in version 1.0 and has a lifecycle: the `activity_id`s are Create, Update, Close. `Detection Finding`s are best suited to systems that consume other events, perform some analysis on them, detect something suspicious or malicious and then have further investigation and workflow, compiling evidence, updating the finding, possibly including it into an `Incident Finding`, assigning it to an analyst, and ultimately closing it out. An MSSP or a SIEM might consume alerts and convert them into `Detection Finding`s. 
27 | 
28 | The `Analytic` object is a required attribute of `Detection Finding`, which describes the type of analysis done on the event or events. `related_events` refer to other events that are relevant to the Finding and were analyzed by the analytic algorithm, for example machine learning or anomaly detection.  
29 | 
30 | Examples of products that create `Detection Finding`s would be User and Entity Behavioral Analysis (UEBA) systems, SIEMs, and Endpoint Detection and Response (EDR) systems. EDRs are systems having both an agent that can monitor activities as well as an analysis system that can apply analytics to the activity events to determine a detection. In some cases the producing agent can make the determination at the point of the threat, while in other cases the associated analysis system will do so.  Hence an EDR agent or similar may need to apply the `Security Control` profile to the `Detection Finding` class to produce a detection but also a disposition.
31 | 
32 | When a Detection Finding is created (`activity_id = 1 Create`), `is_alert` may be set to `true` to indicate the detection is an alertable signal to a system, e.g. a ticketing system.  However, other lifecycle Finding events such as `Update` and `Close` activities are not likely to be considered alertable signals. `is_alert` would be set to `false` or omitted.
33 | 
34 | ## Conclusion
35 | In conclusion, one can use the `is_alert` attribute and set it to `true` when applying the `Security Control` profile to monitored activities for alertable actions and dispositions.  Set the `is_alert` attribute to `true` with a `Detection Finding` or `Data Security Finding` event when they are created to signal an alertable detection. Apply the `Security Control` profile to `Detection Finding` or `Data Security Finding` when the analytic on multiple events happens at a control or enforcement point.


--------------------------------------------------------------------------------
/FAQs/README.md:
--------------------------------------------------------------------------------
  1 | # Frequently Asked Questions
  2 | 
  3 | ## What is OCSF?
  4 | Open Cybersecurity Schema Framework (OSCF)
  5 | is an open-source effort to create a common schema
  6 | for security events across the cybersecurity ecosystem.
  7 | 
  8 | See [this whitepaper](https://github.com/ocsf/ocsf-docs/blob/main/Understanding%20OCSF.pdf)
  9 | for more info.
 10 | 
 11 | ## What Problems does OCSF solve for?
 12 | One of the primary challenges of cybersecurity analytics
 13 | is that there is no common and agreed-upon format
 14 | and data model for logs and alerts.
 15 | As a result, pretty much everyone in the space creates
 16 | and uses their own format and data model
 17 | (IE sets of fields).  
 18 | There are *many* such models that exist,
 19 | including some open ones like
 20 | STIX, OSSEM, and the Sigma taxonomy.
 21 | The challenge to date is that none of these
 22 | models have become widely adopted by practitioners
 23 | for logging and event purposes,
 24 | and thus it requires a lot of manual work
 25 | in order to derive value.
 26 | This poses a challenge to
 27 | detection engineering, threat hunting,
 28 | and analytics development,
 29 | not to mention AI – as Rob Thomas said,
 30 | “There is no AI without IA”.
 31 | Despite the issues this causes in the industry,
 32 | there has been no significant progress on the problem space,
 33 | because until now there has been lack of a “critical mass”
 34 | of major players willing to tackle the problem head-on, and
 35 | with efforts like this, timing is everything.
 36 | With OCSF,
 37 | we are now at a moment where we have 
 38 | that critical mass as well
 39 | as a real willingness to tackle these challenges.
 40 | 
 41 | ## How can I contribute to OCSF?
 42 | See the
 43 | [OCSF Contribution Guide](https://github.com/ocsf/ocsf-schema/blob/main/CONTRIBUTING.md)
 44 | 
 45 | ## What is OCSF Governance Model?
 46 | See [OCSF Governance](https://github.com/ocsf/governance/blob/main/Governance.md)
 47 | 
 48 | ## How does OCSF relate to STIX?
 49 | OCSF and STIX are compatible and complementary.  While STIX is focused on threat intelligence, campaigns and actors, OCSF is focused on events representing the activities on computer systems, networks and cloud platforms that may have security implications.  Observables represented OCSF can be matched with IOCs from STIX, for example, to determine whether a threat or malicious actor has compromised a system or enterprise environment.
 50 | 
 51 | Structured Threat Information Expression (STIX™)
 52 | is a open-source language and serialization format
 53 | used to exchange cyber threat intelligence (CTI).
 54 | For more info on STIX, see
 55 | [this info](https://oasis-open.github.io/cti-documentation/stix/intro.html)
 56 | or the
 57 | (spec itself](https://docs.oasis-open.org/cti/stix/v2.1/csprd01/stix-v2.1-csprd01.html)
 58 | 
 59 | ## How does OCSF relate to the Sigma taxonomy?
 60 | Sigma is a SIEM language format for detection rules.
 61 | Sigma rules can be written against OCSF events and complement OCSF.  The
 62 | essence of Sigma is the logic of what to look for
 63 | within events to yield security findings.
 64 | 
 65 | See
 66 | [Sigma Taxomomy](https://github.com/SigmaHQ/sigma/wiki/Taxonomy)
 67 | for more info on it.
 68 | 
 69 | ## How does OCSF relate to Kestrel?
 70 | OCSF and Kestrel are complementary, solving different problems.
 71 | 
 72 | The Kestrel Threat Hunting Language
 73 | provides an abstraction for threat hunters
 74 | to focus on what to hunt instead of how to hunt.
 75 | See their
 76 | [repo](https://github.com/opencybersecurityalliance/kestrel-lang)
 77 | for more information.
 78 |  
 79 | ## How does OCSF relate to OSSEM?
 80 | 
 81 | Open Source Security Events Metadata (OSSEM)
 82 | is a community-led project focused
 83 | primarily on the documentation,
 84 | standardization and modeling of security event logs.
 85 | See [OSSEM repo](https://github.com/OTRF/OSSEM).
 86 | 
 87 | ## How does OCSF relate to OpenC2?
 88 | OCSF and OpenC2 are complementary.
 89 | 
 90 | OpenC2 is a standardized language
 91 | for the command and control of technologies
 92 | that provide or support cyber defenses.
 93 | By providing a common language
 94 | for machine-to-machine communication,
 95 | OpenC2 is vendor and application agnostic,
 96 | enabling interoperability
 97 | across a range of cyber security tools and applications.
 98 | The use of standardized interfaces and protocols
 99 | enables interoperability of different tools,
100 | regardless of the vendor that developed them,
101 | the language they are written in
102 | or the function they are designed to fulfill.
103 | For more info on OpenC2, see
104 | [info](https://openc2.org/).
105 | 
106 | 
107 | 


--------------------------------------------------------------------------------
/FAQs/Schema FAQ.md:
--------------------------------------------------------------------------------
  1 | # Schema FAQ
  2 | This document answers common questions about how to use the OCSF Schema
  3 | 
  4 | ## How do I create a typical OCSF event?
  5 | Depending on the type of event, a data producer or data mapper should first determine what event class best suits your event.  Start with the OCSF category to narrow down the choices.  For example, an endpoint security product would likely choose an event class from the System Activity category, for example, File System Activity for an AV product.  Every event class has an `activity_id` enumeration which narrows down the intended activity of the event.  Sometimes these are simple CRUD activities, but often they are more specific to the class, such as `Logon` for the `Authentication` class in the `Identity and Access Management` category.
  6 | 
  7 | Since endpoint security products typically send alert events when malware is detected, the producer or mapper would apply the Security Control profile, which adds important attributes to the File System Activity event class, e.g. a Malware object, a MITRE ATT&CK object, the disposition etc.  These profiles have their own attributes that must be populated.
  8 | 
  9 | If your endpoint security product also has network security capabilities, you would choose an event class from the Network Activity category, for example the general Network Activity event class.  Given that the endpoint product will have information about the host system, you would apply the Host profile, as well as the Security Control profile.  The Host profile includes attributes about the device and the actor (e.g. process or user) on the host.
 10 | 
 11 | Every OCSF event must have all of its event class Required attributes populated, and should have its Recommended attributes populated, if possible.  This includes any of the embedded objects, such as the Malware, Process and Device objects above.
 12 | 
 13 | All OCSF events have a set of required classification attributes from the Base Event class: the `class_uid` the `category_uid` the `activity_id` and the derived `type_uid`.  Their associated `*_name` attributes are optional.
 14 | 
 15 | In addition to the classification attributes, a number of other Base Event class attributes are required and must be populated: the `time` `metadata` and `severity` attributes.  The `metadata` attribute is an object that itself requires the `product` and associated `version` of the reporting event, as well as the version of the OCSF schema adhered to with the event.
 16 | 
 17 | Note that the product should be the originating event producer (i.e. not the mapping system, nor any intermediary event processing systems) in order to best represent the origin of the event.  The `time` should be the time that the event actually occurred assuming that information is known, or the earliest possible time available to the event producer or mapper.
 18 | 
 19 | Although the `observables` array attribute is optional, populating it can make things easier for event consumers and analysts.  Each Observable object surfaces an important attribute of the event in a common location in a simple tuple: name, value, type.  For example, if the event class has a `device` `user` and `process` populated, an array of three Observable objects will refer to them in a common location to all OCSF events.
 20 | 
 21 | ---
 22 | 
 23 | ## How would I populate the `observables` array?
 24 | 
 25 | There are three important attributes of the `Observable` object, and the Base Event class allows for an array of these objects with the `observables` attribute: `name`, `type_id` and `value`. The first two are required attributes, while `value` is optional.  Why it is optional will become clear soon.  There can be multiple observables within an event, even of the same type. This is why `observables` is an array.
 26 | 
 27 | The required `name` attribute of the `Observable` object should be the fully qualified attribute name within the event.  E.g. `fingerprint.value` or `actor.process.file` or `actor.process.file.name`.  In other words, `observable.name` is the locator of that observable within the instance of the event.  Note that the observable attribute can be a scalar, like `device.ip`, or it can be an object, like `actor.process.file`.  
 28 | 
 29 | When the `type_id` of the observable indicates that the observable's `name` attribute is of object type, e.g. Fingerprint, the observable's `value` attribute is not populated.  When the `type_id` indicates the observable's `name` is a scalar, e.g. File Hash or File Name, then the observable's `value` should be populated with the value of that attribute, that is, a copy of the value from the event.
 30 | 
 31 | ---
 32 | 
 33 | ## When should I use a Finding event class?
 34 | 
 35 | A Finding in OCSF represents the result of some type of enrichment, correlation, aggregation, analysis or other processing of one or more events or alerts, producing a derived insight.  Most security events and alerts are activity events with a dispostion (e.g. Blocked), for example when using the Security Control profile.  Findings in OCSF are not always alerts themselves, although alerts may be triggered by findings or findings might be added to an incident further downstream.
 36 | 
 37 | For example, an email security product may determine that a user has been phished or an email attachment is malicious.  It would send an email activity event (from its standpoint an alert) containing the user and sender, supplemented by the Security Control profile with a disposition of Blocked, and information about the Malware, to its management console which in turn sends it to a SIEM.  
 38 | 
 39 | The SIEM might receive other related events or alerts, for example for other users in the same circumstance or for general email activity from the same sender.  The SIEM might enrich the events with information from a Threat Intelligence Platform or threat feed pertaining to the email sender.  The result of the aggregation, and enrichment would constitute an OCSF Finding.  The SIEM might create an incident that includes or refers to the finding, in the event that there are remediation steps required.
 40 | 
 41 | Note that in a more complex processing architecture, there may be layered findings.  That is, the original event may go to product A which eventually triggers a finding. Product B meanwhile may take in a lot of other events and findings (including those from product A) and make its own findings. In the example above, the originating email alert might have been a finding from the producer's standpoint if the event was enriched by its management system before being collected by the SIEM, which then produced a more complete finding.
 42 | 
 43 | ---
 44 | 
 45 | ## When should I use metadata.correlation_uid?
 46 | 
 47 | When an event producer or mapper emits multiple events that have some grouping characteristic, or similarity of any form, it should populate the `metadata.correlation_uid` attribute with a constant identifier.  This allows consumers and analysts of the set of events to more easily aggregate and correlate the events.
 48 | 
 49 | A simple example would be a vulnerability scanner that emits events at the start of a scan of a system, at the end of the scan, and separate events for each vulnerability discovered.  If these are separate events, they would all have their `metadata.correlation_uid` set to the same value.
 50 | 
 51 | It is possible for an intermediary system to determine the grouping characcteristic as well, populating the attribute after collection of the events, although when OCSF events are  immutable a copy of the original events would be made with added correlation information.  See the next question.
 52 | 
 53 | ---
 54 | 
 55 | ## Can Finding events be correlated with each other too?
 56 | 
 57 | Yes, they are also events with a base class metadata object that can follow the same pattern.
 58 | E.g. a SIEM that creates findings may have enough knowledge and state to tie multiple findings together with a metadata.correlation_uid.
 59 | 
 60 | ---
 61 | 
 62 | ## How do I use the Actor object?
 63 | The Actor object is intended for use in event classes when knowledge of one entity that is initiating or causing some action on another entity.  
 64 | For example, a process deleting a file is the actor in a Filesystem Activity event.
 65 | 
 66 | From a structural standpoint, the `actor` attribute avoids name collisions with the other end of the activity in cases where a process acts on another process, as those attribute names would be in contention at the same level within the class.
 67 | 
 68 | Currently the Actor object has a `process` and `user` attribute, where one or the other is in the role of the actor in the activity.  It also has Optional attributes for Session, `authorizations`, `idp`, and `invoked_by`.
 69 | 
 70 | The `idp` is populated in IAM category event classes, when the actor's identity provider is known and logged with Authentication and related events.
 71 | 
 72 | The `authorizations` attribute is an array of information pertaining to what permissions and privileges the actor has at the time of the event, if known.
 73 | 
 74 | The `invoked_by` attribute is populated with the name of the service or application through which the actor's activity was initiated.
 75 | 
 76 | ---
 77 | 
 78 | ## When should I use the session attribute?
 79 | The `session` attribute is usually paired with the `user` attribute.  A Session object has information pertaining to a particular user session through which the activity was initiated.  User is an entity object that isn't always associated with a session, and isn't always an actor, hence Session isn't part of the User object, but is included with the Actor object for actor semantics.
 80 | 
 81 | Related to this, the `process` attribute of type Process has a User object which represents the user's account that the process runs under or is impersonating.  Hence, the Process object also has a `session` attribute paired with its `user` attribute.
 82 | 
 83 | Often, User and Session objects will be paired in many event classes.
 84 | 
 85 | ---
 86 | 
 87 | ## When should I use the unmapped attribute?
 88 | The `unmapped` attribute is a catchall for event producers and mappers when there is data that doesn't populate the more specific attributes of the class.  For example, product specific data that is extracted into fields and values from a log that aren't mapped.
 89 | 
 90 | Where `unmapped` is best used, is for a mapper who is mapping events from multiple vendors where each vendor may have unique fields not common to other vendors for the same type of data source.
 91 | 
 92 | However, using `unmapped` is not recommended for event producers.  A native event producer should extend the schema to properly capture the data that can't be mapped.  For product specific data, an extension is preferred, using either a vendor developed profile, or in some cases a new event class if the core event class doesn't adequately represent the event due to data that can't be naturally mapped, or activities not captured by the core class.
 93 | 
 94 | ---
 95 | 
 96 | ## unmapped is of Object type.  What does that mean and is it different from JSON or a String type?
 97 | 
 98 | Object is the empty complex data type from which all OCSF objects extend with JSON formatted attributes, requirements, and descriptions.  Think of `unmapped` as if it were an OCSF object that you created on the fly.  In the Java programming language, it would be like an inner class that doesn't need to be declared externally or globally.  That is to say, it is used within the instance of an OCSF class only, and not part of the schema.
 99 | 
100 | JSON is more free form data, hence the `data` attribute is of type JSON.  It can be anything encapsulated within JSON and does not need to look like an OCSF object. It should not be used for unmapped extracted fields, but rather other data that may be captured with the event.  It is used, for example, within the Enrichment object (the `enrichments` array attribute of the Base Event) to augment one or more of the mapped or unmapped attributes.
101 | 
102 | A String type is reserved for unformatted text, such as the `raw_data` attribute of the Base Event class. Binary data is Base64 encoded in an attribute of `bytestring_t` type, currently not used in the core schema but may be used in extensions or within the `unmapped` object.
103 | 
104 | ---
105 | 
106 | ## When should I use Authorize Session from Identity and Access Management vs. Web Resource Access Activity from the Application category?
107 | These two event classes are complementary.  Changes to a security principal's permissions, privileges, roles are authorization activities, while the access of web resources by a security principal is logged as Web Access Activity.  IAM category authorization or change events are independent of a particular resource access, while enforcement of authorization restrictions is made at access time and is logged as such. For example, when a new Logon session is created, authorization checks are made and if logged, belong in the Authorize Session class.  However, when the user or process that has those permissions accesses a web resource, and it is granted or denied, the Web Access Activity class is used.
108 | 
109 | ---
110 | 
111 | ## When should I use HTTP Activity vs. Web Resource Access Activity?
112 | HTTP Activity is information focused on the network protocol, and not the gating of the resource.  While access to a resource is often requested via a web service or REST APIs, the HTTP Activity is the protocol activity for that access, not the activity of the gating service to the resource, which might be via the HTTP server nevertheless.  And of course access activity in general is not uniquely via HTTP: Kerberos and LDAP servers grant and deny access to resources over their respective protocols.
113 | 
114 | ---
115 | 
116 | ## Can you explain Profiles to me?
117 | Profiles in OCSF are a way to uniformly add a set of attributes to one or more event classes or objects.  Event classes provide the basic structure and type of an event, while objects provide the structure of complex types. Their definitions can indicate that additional attributes may be included with an event instance via profiles specified with the class or object definition.  In effect, adding a profile or profiles to the definition gives you the permission to dynamically include those attributes.  When constructing an event, you would add an OCSF profile name to the `metadata.profiles` array to mix-in the additional attributes with the event.
118 | 
119 | An event that has that profile applied is then a kind of that profile, as well as a kind of the event class.  For example, if the `Host` profile was applied to the `HTTP Activity` class to add the `actor.process` making a request, the event would be queriable either via the metadata.profiles[] as `Host` or via class_name as `HTTP Activity`.  If using `Host` other events from `System Activity` could also be returned with the same actor.
120 | 
121 | Not all of the attributes from the profile need be added together.  For example, a profile with attributes A, B, C can be defined within the definition of class D and object E.  Class D can include A and B, while object E can include attribute C.  You can also build in a profile, by adding the attributes of the profile directly into your class, and referencing the profile in your class definition.  In this case, as with class and object extensions, the profile defined requirements, group or description can be overridden within the definition of the class or object, although this is not recommended.  Only the attribute data type and constraints cannot be overridden.
122 | 
123 | ---
124 | 
125 | ## Is there a simiilarity between OCSF and LDAP (and X.500)?
126 | Yes there is, although OCSF is considerably simpler.  At a fundamental level LDAP consists of attributes and object classes, while OCSF consists of attributes and event classes.  Attributes in LDAP have syntaxes and in OCSF have data types (OCSF objects are complex data types).  An event class is similar to an LDAP structural object class; it defines the basic structure of an event, as the LDAP object class defines the structure of an entry.  Like LDAP, an OCSF event class can be constructed via extending a super class to inherit attributes.  And an OCSF profile is similar to an LDAP auxiliary class which can be applied to a structural object class so that an entry can mix in additional attributes, independent of structural hierarchy of the entry.
127 | 
128 | ---
129 | 
130 | ## How should the attribute suffixes `_uid` and `_id` be used and what are "siblings?"
131 | These are naming conventions rather than metaschema or data type validation factors.  `_id` is the convention for OCSF enumerated attributes.  These attributes can be integer data types, or string data types, although OCSF favors integer data types with string labels.  Every integer enum attribute SHOULD have standard values of `0` for `Unknown` and `99` for `Other`.  There is no requirement that the integers stay within those bounds, or that they increment by `1`.  Every enum attribute SHOULD have a string sibling attribute of the same name but without the `_id` suffix. A sibling is declared within the attribute definition of the enum attribute.  When the logged value is not mappable within the enum listed values, `Other` can be set and a source specific label can populate the sibling attribute.  The exception to this convention is when an enum attribute mirrors an external standard, for example with the `dns_query` object's `opcode_id` which mirrors the values requested from a resolver.  It is recommended that the sibling attribute is populated with the enum label so that human queries can be made against a more easily remembered string, rather than a number.
132 | 
133 | `_uid` suffix attributes are for unique identifier values within the schema, or external identifier values, e.g. coming from a public cloud resource or similar entity.  For this reason `_uid` suffix attributes are usually strings, in order to accomodate any type of alphanumeric format, but they MAY be integers (or longs).  Within OCSF Classification attributes, `_uid` attributes are integers or longs (see `class_uid` or `type_uid`). The sibling for `_uid` attributes is an attribute of the same base name with the `_name` suffix (see `class_name` or `type_name`).  The exception for Classification attributes is `activity_id` which is an enum rather than a singular identifier.  However its sibling is also of suffix `_name`: `activity_name` following the convention for `_uid` attributes of the Classification group.
134 | 
135 | Note that sibling string attributes can be used standalone, i.e. without an associated enum or unique identifier.
136 | 
137 | ---
138 | 
139 | ## How is backwards compatibility managed?
140 | OCSF follows the [semver](https://semver.org/) versioning scheme.
141 | 
142 | From the semver documentation:
143 | 
144 | > Given a version number MAJOR.MINOR.PATCH, increment the:
145 | > 
146 | > MAJOR version when you make incompatible API changes
147 | > MINOR version when you add functionality in a backward compatible manner
148 | > PATCH version when you make backward compatible bug fixes
149 | 
150 | In terms practical to OCSF users, this means:
151 |  - PATCH version increments may change documentation values like `description`.
152 |  - MINOR version increments may add new schemata like attributes or events.
153 |  - MAJOR version increments may add and remove anything.
154 | 
155 | So any version in the 1.x line should be backwards-compatible with previous 1.x versions.
156 | 
157 | ---
158 | 
159 | ## What changes are not backwards compatible?
160 | 
161 | 1. The removal of an event, object, attribute, data type, or enum member.
162 |     1. The `name` of an event or object is missing in the NEW schema.
163 |     2. The dictionary key of an attribute or data type (its implied `name`) is missing in the NEW schema.
164 |     3. The dictionary key of an enum member (its value) is missing in the NEW schema.
165 |     4. The `uid` or `class_uid` of an event is missing in the NEW schema.
166 | 2. Renaming an event, object, attribute, or enum member.
167 |     1. A special case of removal in which the same `caption` belongs to an element with a different `name`, key, or `class_uid`; or the same `class_uid` belongs to an event with a different name.
168 | 3. Changing the data type of an attribute **unless**:
169 |     1. The data type is changing from `int` to `long`. This exception is allowed on the basis that *nearly* all encodings use variable lengths by default, meaning that data written in nearly all encodings as an `int` can be safely interpreted as a `long`.
170 |     2. Changing a scalar type when the underlying type (e.g. `string_t`) remains the same and there are no constraints on the new type.
171 | 4. Changing the `requirement` value of an attribute from `optional` or `recommended` to `required`.
172 | 5. Making the `constraints` of a data type more restrictive.
173 | 6. Adding a `required` attribute to an existing event or object.
174 | 7. Changing the `caption` of an event, enum member, or category.
175 | 
176 | ---
177 | 
178 | ## When should I use `status` and when should I use `state` when adding to the schema?
179 | 
180 | The convention we try to stick to when authoring OCSF classes and objects is to use `status_id` and its sibling `status` for the result of an activity, usually as a class attribute, and use `state_id` and its sibling `state` for the state of an object. The latter might sound obvious but it may not be obvious to not use `status` for objects. The reasoning is that an object exist independent of time or an activity or action, and therefore it has a state. It could have just as easily had a status, over an indeterminate period of time, but we have tried to distinguish between the two situations by reserving `status` for the point in time result of an activity or action.
181 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # OCSF Documentation
2 | The ocsf-docs repository is intended to be the location where relevant proposals, documentation or other descriptive information for the schema are stored.
3 | Documents such as Understanding OCSF.pdf are point in time snapshots of current work before public release.
4 | Over time, documents will be organized based on version of schema.
5 | 
6 | Common questions are answered in the FAQs folder of this repo.  Schema specific questions are answered in FAQs/Schema FAQ.md
7 | 
8 | The governance repo holds the governance material.
9 | 


--------------------------------------------------------------------------------
/Understanding OCSF.md:
--------------------------------------------------------------------------------
  1 | 
  2 | 
  3 | # Understanding the Open Cybersecurity Schema Framework
  4 | 
  5 | Author: Paul Agbabian
  6 | 
  7 | Date: September 2024
  8 | 
  9 | Status: RFC - Corresponds to schema version 1.0.0
 10 | 
 11 | Version: 1.16
 12 | 
 13 | ## Introduction to the Framework and Schema
 14 | 
 15 | This document describes the Open Cybersecurity Schema Framework (OCSF) and its taxonomy, including the core cybersecurity event schema built with the framework.[^1]
 16 | 
 17 | The framework is made up of a set of data types and objects, an attribute dictionary, and the taxonomy.  It is not restricted to the cybersecurity domain nor to events, however the initial focus of the framework has been a schema for cybersecurity events.  A schema browser for the schema can be found at [schema.ocsf.io](https://schema.ocsf.io).
 18 | 
 19 | OCSF is agnostic to storage format, data collection and ETL processes.  The core schema is intended to be agnostic to implementations.  The schema framework definition files and the resulting normative schema are written as JSON.
 20 | 
 21 | ### Personas
 22 | 
 23 | There are four personas that are users of the framework and the schema built with the framework.  
 24 | 
 25 | The _author_ persona is who creates or extends the schema.  The _producer_ persona is who generates events natively into the schema, or via a translation from another schema.  The _mapper_ persona is who translates or creates events from another source to the schema.  The _analyst_ persona is the end user who searches the data, writes rules or analytics against the schema, or creates reports from the schema.  The analyst may also be considered the _consumer_ persona.
 26 | 
 27 | For example, a vendor may write a translation from some external source format into the schema but also extend the schema to accommodate source specific attributes or operations.  The vendor is operating as both the mapper and author personas.  A SOC analyst that collects the data in a SIEM system writes rules against the events and searches events during investigation.  The SOC analyst is operating as the analyst persona.  Finally, a vendor that emits events natively in OCSF form, even if translated, is a data producer.
 28 | 
 29 | ### Taxonomy Constructs
 30 | 
 31 | There are 5 fundamental constructs of the OCSF taxonomy: 
 32 | 
 33 | 1. Data Types, Attributes and Arrays
 34 | 2. Event Class
 35 | 3. Category
 36 | 4. Profile
 37 | 5. Extension
 38 | 
 39 | The scalar data types are defined on top of primitive data types such as strings, integers, floating point numbers and booleans.  Examples of scalar data types are Timestamp, IP Address, MAC Address, and User Name.
 40 | 
 41 | An _attribute_ is a unique identifier name for a specific validatable data type, either scalar or complex.
 42 | 
 43 | Complex data types are termed objects.  An _object_ is a collection of contextually related attributes, usually representing an entity, and may include other objects. Each object is also a data type in OCSF.  Examples of object data types are Process, Device, User, Malware and File.
 44 | 
 45 | _Arrays_ support any of the data types.  
 46 | 
 47 | Most scalar data types have constraints on their valid values or ranges, for example Enum integer types are constrained to a specific set of integer values.  Enum integer typed attributes are an important part of the framework constructs and used in place of strings where possible to ensure consistency.  
 48 | 
 49 | Complex data types, or objects, can also be validated based on their particular structure and attribute requirements.  Attribute requirements are discussed in a subsequent section.
 50 | 
 51 | Appendix A and B describe the OCSF Guidelines and data types respectively.[^2]
 52 | 
 53 | The _attribute dictionary_ of all available attributes, and their types are the building blocks of the framework.  Event classes are particular sets of attributes from the dictionary.
 54 | 
 55 | Events in OCSF are represented by _event classes_ which structure a set of attributes that attempt to describe the semantics of the event in detail.  An individual event is an instance of an event class.  Event classes have schema-unique IDs.  Individual events may have globally unique IDs.
 56 | 
 57 | Each event class is grouped by category, and has a unique `category_uid` attribute value which is the category identifier.  Categories also have friendly name captions, such as System Activity, Network Activity, Findings, etc.  Event classes are grouped into categories for a number of purposes: a container for a particular event domain, documentation convenience and search, reporting, storage partitioning or access control to name a few.  
 58 | 
 59 | _Profiles_ overlay additional related attributes into event classes and objects, allowing for cross-category event class augmentation and filtering.  Event classes register for profiles that when optionally applied, can be mixed into event classes and objects, by a producer or mapper.  For example, System Activity event classes may also include attributes for malware detection or vulnerability information when an endpoint security product is the data source.  Network Activity event classes from a host computer may carry the device, process and user associated with the activity.  A Security Control profile or Host profile can be applied in these cases, respectively.
 60 | 
 61 | Finally, _extensions_ allow the schema to be extended using the framework without modification of the core schema.  New attributes, objects, event classes, categories and profiles are all available to extensions.  Existing profiles can be applied to extensions, and new extension profiles can be applied to core event classes and objects as well as to other extensions.
 62 | 
 63 | The [schema browser](https://schema.ocsf.io) visually represents the categories, event classes, dictionary, data types, profiles and extensions in a navigable portal. The schema for an event class, and an event example for a class can be generated via menu options of the browser, which also serves as a validation server via the server APIs, whose documentation is also available from the browser.
 64 | 
 65 | #### Comparison with MITRE ATT&CK[^3] Framework
 66 | 
 67 | The MITRE ATT&CK Framework is widely used in the cybersecurity domain.  While the purpose and content type of the two frameworks are different but complementary, there are some similarities with OCSF’s taxonomy that may be instructive to those with familiarity with ATT&CK.
 68 | 
 69 | Categories are similar to Tactics, which have unique IDs.  Event Classes are similar to Techniques, which have unique IDs.  Profiles are similar to Matrices[^4], which have unique names.  Type IDs are similar to Procedures which have unique IDs.  Profiles can filter the Event Classes and Categories similar to how Matrices filter Techniques and Tactics.
 70 | 
 71 | Differences from MITRE ATT&CK are that in OCSF, Event Classes are in only one Category, while MITRE ATT&CK Techniques can be part of multiple Tactics.  Similarly MITRE ATT&CK Procedures can be used in multiple Techniques.  MITRE ATT&CK<sup>TM</sup> has Sub-techniques while OCSF does not have Sub-Event Classes.[^5]
 72 | 
 73 | OCSF is open and extensible by vendors, and end customers while the content within MITRE ATT&CK<sup>TM</sup> is released by MITRE.
 74 | 
 75 | ## Attributes
 76 | 
 77 | Attributes and the dictionary are the building blocks of a schema.  This section discusses OCSF attribute conventions, requirements, groupings, constraints, and some of the special attributes used in the core cybersecurity schema.
 78 | 
 79 | In general, an attribute from the dictionary has the same meaning everywhere it is used in a schema.  Some attributes can have a meaning that is overloaded depending on the event class context where they are used.  In these cases the description of the attribute will be generic and include a ‘see specific usage’ instruction to override its description within the event class context rather than in the dictionary.
 80 | 
 81 | ### Conventions
 82 | 
 83 | OCSF adheres to naming conventions in order to more easily identify attributes with similar semantics.  These conventions take the form of standard suffixes and prefixes.  The standard suffixes are:
 84 | 
 85 | ```
 86 | _id, _ids, _uid, _uuid, _ip, _name, _info, _detail, _time, _dt, _process, _ver, _list
 87 | ```
 88 | 
 89 | #### Arrays
 90 | 
 91 | Attribute names used for arrays end with `s`.  For example `category_ids`.  A MITRE ATT&CK<sup>TM</sup> array is named `attacks`.
 92 | 
 93 | #### Unique IDs
 94 | 
 95 | Attribute names for classification values that are unique within the schema end with `_uid`.  Schema classification attributes that have the `_uid` suffix are integers, preset by the schema definition (i.e. they must be populated as defined by the schema).
 96 | 
 97 | Certain schema-unique attributes that also have a friendly name or caption have the same prefix but by convention use the `_name` suffix.  For example, `class_uid` and `class_name`, or `category_uid` and `category_name`.
 98 | 
 99 | Other attributes with the `_uid` suffix convention may be strings or integers, depending on their purpose, although the majority are strings.
100 | 
101 | A `uid` core attribute is used wherever a producer or mapper populates an identifier for an entity object.  Entity objects also have a corresponding `name` attribute by convention.  Both are of type string (`string_t`).
102 | 
103 | Attribute names for values that are globally unique end with _uuid.  They do not have friendly names.  For example GUIDs.
104 | 
105 | #### Enum Attributes
106 | 
107 | Attributes that are of an Enum integer type end with `_id`.  Enum constant identifiers are integers from a defined set where each has a friendly name label.  Arrays of enum attributes end with `_ids`.
108 | 
109 | By convention, every Enum type has two common values with integer value 0 for `Unknown` and 99 for `Other`.  
110 | 
111 | If a source event has missing values that are required by the event class for that event, an `Unknown` value should be set for Enum types which is also the default.  
112 | 
113 | If a mapped event attribute does not have a defined enumeration value corresponding to a value of the event, `Other` is used which indicates that a sibling string attribute is populated with the custom attribute value.  The sibling string attribute has the same name, minus the suffix.  For example, `activity_id` and `activity`, or `severity_id` and `severity`.
114 | 
115 | Sibling string attributes are optional, but if the enum value is `Other` (`99`) then the sibling string **must** be populated with the custom label (i.e. not “Other”).
116 | 
117 | For all defined enumeration integer values, including `Unknown`, the enum label text for the item **may** populate the sibling string attribute.  That is, both the integer value and the string attribute are set.  If the Enum attribute is required, then both the integer attribute and the sibling string attribute **should** be populated. Attribute requirements are discussed in a subsequent section.
118 | 
119 | ### Attribute Requirement Flags
120 | 
121 | Attributes in the context of an event class have a requirement flag, that depends on the semantics of the event class.  Attributes themselves do not have a requirement flag, only within the context of event classes.[^6]
122 | 
123 | The requirement flags are:
124 | 
125 | * Required
126 | * Recommended
127 | * Optional
128 | 
129 | Event classes are designed so that the most essential attributes are required, to give enough meaning and context to the information reported by the data source.  If an attribute is required, then a consumer of the event can count on the attribute being present, and its value populated.  If a required attribute cannot be populated for a particular event class, a default value is defined by the event class, usually `Unknown`.[^7]  
130 | 
131 | Recommended attributes should be populated but cannot be in all cases and unlike required attributes are not subject to validation.  They do not have default values. 
132 | Optional attributes may be populated to add context and when data sources emit richer information.
133 | Data onboarders should place more weight on recommended attributes versus optional attributes.
134 | 
135 | Some event classes may specify constraints on recommended attributes.
136 | 
137 | ### Constraints
138 | 
139 | A _Constraint_ is a documented rule subject to validation that requires at least one of the specified recommended attributes of a class to be populated.  Constraints are used in classes where there are attributes that cannot be required in all use cases, but in order to have unambiguous meaning, at least one of the attributes in the constraint is required.  Attributes in a constraint must be Recommended.
140 | 
141 | The two constraints are: `at_least_one` and `just_one`.  These will be explained further in the section on Event Classes.
142 | 
143 | ### Attribute Groups
144 | 
145 | Attributes are grouped for documentation purposes into _Primary_, _Classification_, _Occurrence_, and _Context_ groups.  Classification and Occurrence groupings are independent of event class and are defined with the attribute in the dictionary.  Primary and Context attributes’ groupings are based on their usage within a given event class.
146 | 
147 | Each event class has primary attributes, the attributes that are indicative of the event semantics in all use cases.  Primary attributes are typically Required, or Recommended per event class, based on their use in each class.  Primary attributes in the Base Event class apply to all event classes.
148 | 
149 | Attributes that are important for the taxonomy of the framework are designated as Classification attributes.  The classification attributes are marked as Required as part of the Base Event class.  Their values are nominally `Unknown` or `Other` and will be overridden within specific event classes.
150 | 
151 | Attributes that are related to time and time ranges are designated as Occurrence attributes.  The occurrence attributes may be marked with any requirement level, depending on their usage within an event class.
152 | 
153 | Attributes that are used for variations on typical use cases, to enhance the meaning or enrich the content of an event are designated as Context attributes.  The context attributes may be marked with any requirement level, but most often are marked as Optional.
154 | 
155 | ### Timestamp and Datetime Attributes
156 | 
157 | Representing time values is one of the most important aspects of OCSF.  For an event schema it is even more important.  There are time attributes associated with events that need to be captured in a number of places throughout the schema, for example when a file was opened or when a process started and stopped.  There are also times that are directly related to the event stream, for example event creation, collection, processing, and logging.  The nominal data type for these attributes is `timestamp_t` based on Unix time or number of milliseconds since the Unix epoch.   The `datetime_t` data type represents times in human readable RFC3339 form.
158 | 
159 | The Date/Time profile when applied adds a sibling attribute of data type `datetime_t `wherever a `timestamp_t `attribute appears in the schema.
160 | 
161 | The following terms are used below:
162 | 
163 | Event Producer -- the system (application, services, etc.) that generates events.  Related to the producer persona.
164 | 
165 | Event Consumer -- the system that receives the events generated by the event producer.  Related to the analyst persona.
166 | 
167 | Event Processor -- a system that processes and logs, including an ETL chain, the events received by the event consumer.  Related to the mapper and analyst personas.
168 | 
169 | The core time attributes may be present in all events as they are from the Base Event class.  They are:
170 | 
171 | * `original_time: string` \
172 | The original event time, as created by the event producer as part of the Metadata object of the Base Event class. The time format is not specified by OCSF and as such is a non-validated string. The time could be UTC time in milliseconds (1659378222123), ISO 8601 (2019-09-07T15:50-04:00), or any other value (12/13/2021 10:12:55 PM).
173 | * `time: timestamp_t` \
174 | The normalized event occurrence time. Normalized time means the original event time `original_time` is corrected for the clock skew of the source if any, and batch submission delay and after it was converted to the OCSF `timestamp_t`.
175 | * `processed_time: timestamp_t` \
176 | The time when the event (or batch of events) was sent by the event processor to the event consumer. The processed time can be used to determine the clock skew at the earliest known event source. Clock skew occurs when the UTC clock time on one computer differs from the UTC clock time on another computer.  It is assumed that the transport latency is very small compared to the clock skew, therefore if the `processed_time` is very close to the `logged_time`, no correction should be made, notwithstanding any known hops.
177 | * `logged_time: timestamp_t` \
178 | The time when the event consumer logged the event. It must be equal or greater than the normalized event occurrence time.
179 | * `modified_time: timestamp_t` \
180 | The time when the event was last updated or enriched.  It must be equal or greater than the normalized event occurrence time. It could be less-than, equal, or greater-than the `logged_time`.
181 | * `start_time/end_time: timestamp_t` \
182 | The start and end event times of the Base Event class are used when the event represents some activity that happened over a time range, for example a vulnerability or virus scan, or a discovery run. The other use-case is event aggregation. Aggregation is a mechanism that allows for a number of events of the same event type to be summarized into one for more efficient processing. For example netflow events.  In this use case, the `count` integer attribute is also populated.
183 | 
184 | #### Time Zone
185 | 
186 | The time zone where the event occurred is represented by the `timezone_offset` attribute of data type Integer.  Although time attributes are otherwise UTC except for the pass through attribute original_time, most security use cases benefit from knowing what time of day the event occurred at the event source.
187 | 
188 | `timezone_offset` is the number of minutes that the reported event time is ahead or behind UTC, in the range -1,080 to +1,080.  It is a recommended attribute of the Base Event class.
189 | 
190 | ### Metadata
191 | 
192 | Metadata is an object referenced by the required Base Event attribute `metadata`.  As its name implies, the attribute is populated with data outside of the source event.  Some of the attributes of the object are optional, such as `logged_time` and `uid`, while the `version` attribute is required - the schema version for the event.  It is expected that a logging system _may_ assign the `logged_time` and `uid` at storage time.
193 | 
194 | Metadata attributes such as `modified_time` and `processed_time` are optional.  `modified_time` is populated when an event has been enriched or mutated in some way before analysis or storage.  `processed_time` is populated typically when an event is collected and submitted to a logging system.[^8]
195 | 
196 | **Version.**  OCSF core schema version uses Semantic Versioning Specification (SemVer), e.g. `0.99.0,` which indicates to consumers of the event which attributes may be found in the event, and what the class and category structure are.  The convention is that the major version, after `1.0.0`, or first part, remains the same while versions of the schema remain backwards compatible with previous versions of the schema and framework.  As new classes, attributes, objects and profiles are added to the schema, the minor version, or second part of the version increases.  The third part is reserved for corrections that don’t break the schema, for example documentation or caption changes.
197 | 
198 | Extensions, discussed later, have their own versions and can change at their own pace but must remain compatible and consistent with the major version of the core schema that they extend.  The optional `extension` attribute of type Schema Extension carries the version of an extension.
199 | 
200 | ### Observables
201 | 
202 | Observable is an object referenced by the primary Base Event class array attribute `observables`.  It is populated from other attributes produced or mapped from the source event.  An Observable object surfaces attribute information in one place irrespective of event class, while the security relevant indicators that populate the observable may occur in many places across event classes.  In effect it is an array of summaries of those attributes regardless of where they stem from in the event based on their data type or object type (e.g. `ip_address`, `process`, `file`, etc).
203 | 
204 | For example, an IP address may populate  multiple attributes: `public_ip, intermediate_ips, ip` (as part of objects Endpoint, Device, Network Proxy, etc.).  An analyst may be interested to know if a particular IP address is present anywhere in any event.  Searching for the IP address value from the Base Event `observables` attribute surfaces any of these events more easily than remembering all of the attributes across all event classes that may have an IP address.
205 | 
206 | There are three important attributes in the Observable object: `name`, `value`, and `type_id`.  For scalar attributes within an event, all three observable attributes are populated, where the `type_id` declares what the type of attribute is, the `name` is the fully qualified attribute name within the event, and `value` is the value of that attribute.
207 | 
208 | For complex (object type) attributes, Observable.`name` is the pointer or reference to the attribute, but as an object has more than one value, Observable.`value` is not populated.
209 | 
210 | ```json
211 | "observables": [
212 |      { 
213 |           "name": "actor.process.name",
214 |           "type": "Process Name",
215 |           "type_id": 9,
216 |           "value": "Notepad.exe"
217 |           },
218 |      { 
219 |           "Name": "tls.ja3_hash",
220 |           "Type": "Fingerprint",
221 |           "Type_id": "30"
222 |           },
223 |      { 
224 |           "name": "file.name",
225 |           "type": "File Name",
226 |           "type_id": 7,
227 |           "value": "Notepad.exe"
228 |      }
229 | ]
230 | ```
231 | 
232 | ### Enrichments
233 | 
234 | Enrichment is an object referenced by the Base Event array attribute `enrichments`.  An Enrichment object describes additional information added to the event during collection or event processing but before an immutable operation such as storage of the event.  An example would be looking up location data on an IP address, or IOCs against a domain name or file hash.
235 | 
236 | Because enriching data can be extremely open-ended, the object uses generic string attributes along with a JSON `data` attribute that holds an arbitrary enrichment in a form known to the processing system.  Similar to the Observable object, `name` and `value` attributes are required to point to the event class attribute that is being enriched.  Unlike Observable, there is no predefined set of attributes that are tagged for enrichment, therefore only a recommended `type` attribute is specified (i.e. there is no `type_id` Enum).
237 | 
238 | Also unlike Observable, which is synchronized with the time of the event, it is assumed that there is some latency between the event time and the time the event is enriched, hence the Base Event class `metadata`.`modified_time` should be populated at the time of enrichment.
239 | 
240 | For example
241 | 
242 | ```json
243 | "metadata": {
244 |     "logged_time": 1659056959885,
245 |     "modified_time": 1659056959885,
246 |     "processed_time": 1659056959885,
247 |     "sequence": 69,
248 |     "uid": "1310fc5c-0edb-11ed-88fc-0242ac110002",
249 |     "version": "1.0.0"
250 | },
251 | "enrichments": [
252 |      {
253 |           "data": {
254 |                "hash": "0c5ad1e8fe43583e279201cdb1046aea742bae59685e6da24e963a41df987494"
255 |           },
256 |           "name": "ip",
257 |           "provider": "media.defense.gov",
258 |           "type": "IP Address",
259 |           "value": "103.216.221.19"
260 |      },
261 |      {
262 |           "data": {
263 |                "yara_rule": "rule \"wellmail_unique_strings\"{...}"
264 |           },
265 |           "name": "ip",
266 |           "provider": "media.defense.gov",
267 |           "type": "IP Address",
268 |           "value": "103.216.221.19"
269 |      }
270 | ]
271 | ```
272 | 
273 | ## Event Classes
274 | 
275 | **Events are represented by instances of Event Classes**, which are particular sets of attributes and objects representing a log line or telemetry submission at a point in time.  Event classes have semantics that describe what happened: either a particular activity, disposition or both.  
276 | 
277 | It is the intent of the schema to allow for the mapping of any raw event to a single event class.  This is achieved by careful design using composition rather than a multiple inheritance approach.  In order to completely capture the information in a rich data source, many attributes may be required.
278 | 
279 | Unfortunately, not every data source emits the same information for the same observed behavior.  In the interest of consistency, accuracy and precision, the schema event classes specify which dictionary attributes are essential, (recommended or required), while others are optional as not all are needed across different data sources.  Attribute requirements, aside from Classification attributes from the Base Event class, are always within the scope of the event class definition and not tied to the attributes themselves.  
280 | 
281 | By convention, all event classes extend the Base Event event class.  Attributes of the Base Event class can be present in any event class and are termed Base Attributes.
282 | 
283 | ### Base Event Class Attributes
284 | 
285 | The Base Event class has required, recommended, and optional attributes that apply to all core schema classes.  The required attributes must be populated for every core schema event.  Optional Base Event class attributes may be included in any event class, along with event class-specific optional attributes.  Individual event classes will include their own required and recommended attributes.
286 | 
287 | Examples of required base attributes are `class_uid`, `category_uid`, `activity_id`, `severity_id`.
288 | 
289 | Examples of recommended base attributes are `timezone_offset, status_id, message.`
290 | 
291 | Examples of optional base attributes are `activity_name`, `start_time`, `end_time`, `count`, `duration`, `unmapped`.
292 | 
293 | **Each event class has a unique `class_uid` attribute value** which is the event class identifier.  It is a required attribute whose value overrides the nominal Base Event class value of `0`.  Event class friendly names are defined by the schema, optionally populate the `class_name` attribute and are descriptive of the specific class, such as File System Activity or Process Activity.
294 | 
295 | **Every event class has a `category_uid` attribute value** which indicates which OCSF Category the class belongs to.  An event class may be of only one category.  Category friendly names are defined by the schema, optionally populate the <code>category_name</code> attribute and are descriptive of the specific category the class belongs to, such as System Activity or Network Activity.
296 | 
297 | **Every event class has an `activity_id` Enum attribute**, constrained to the values appropriate for each event class.  The semantics of the class are further defined by the `activity_id` attribute, such as Open for File System Activity or Launch for Process Activity.  By convention, `activity_id` Enum labels are present tense imperatives.  The Enum label optionally may populate the `activity_name` attribute, which is a sibling to the `activity_id` Enum attribute but as a Classification group attribute, follows the `_name` suffix convention.
298 | 
299 | ### Special Base Attributes
300 | 
301 | There are a few base attributes that are worth calling out specifically.  These are the `unmapped` attribute, the `raw_data` attribute and the `type_uid` attribute.
302 | 
303 | While most if not all fields from a raw event can be parsed and tokenized, not all are mapped to the schema.  The fields that are not mapped may be included with the event in the optional `unmapped` attribute.
304 | 
305 | The `raw_data` optional attribute holds the event data as received from the source.  It is unparsed and represented as a String type.
306 | 
307 | The `type_uid` required attribute is constructed by the combination of the event class of the event (`class_uid`) and its activity (`activity_id`).  It is unique across the schema hence it has a `_uid` suffix.  The `type_uid` friendly name, `type_name,` is a way of identifying the event in a more readable and complete way.  It too is a combination of the names of the two component parts.  
308 | 
309 | The value is calculated as: `class_uid` `* 100 + activity_id`.  For example:
310 | 
311 | `type_uid` = `3001 * 100 + 1 = 300101` 
312 | 
313 | `type_name` = “Authentication: Logon”
314 | 
315 | A snippet of a File Activity event example with random values is shown below[^9]:
316 | 
317 | ```json
318 | {
319 |      "activity_id": 11,
320 |      "activity_name": "Decrypt",
321 |      "actor": {},
322 |      "category_name": "System Activity",
323 |      "category_uid": 1,
324 |      "class_name": "File System Activity",
325 |      "class_uid": 1001,
326 |      "device": {},
327 |      "end_time": 1685403212867,
328 |      "file": {},
329 |      "message": "entry queue amateur",
330 |      "metadata": {},
331 |      "observables": [],
332 |      "severity": "Low",
333 |      "severity_id": 2,
334 |      "start_time": 1685403212792,
335 |      "status": "img logs grove",
336 |      "status_detail": "barrier filled clothes",
337 |      "time": 1685403212834,
338 |      "type_name": "File System Activity: Decrypt",
339 |      "type_uid": 100111
340 | }
341 | ```
342 | 
343 | ### Constraints
344 | 
345 | As discussed in a previous section, an event class can have constraints that are more versatile than simple Required attribute requirements.  When at least one of a set of recommended attributes must be present, the class can assert the `at_least_one` constraint:
346 | 
347 | ```json
348 |  "constraints": {
349 |      "at_least_one": [
350 |           "ip",
351 |           "mac",
352 |           "name",
353 |           "hostname"
354 |    ]
355 |  }
356 | ```
357 | 
358 | Or the `just_one` constraint:
359 | 
360 | ```json
361 | "constraints": {
362 |      "just_one": [
363 |           "privileges",
364 |           "group"
365 |      ]
366 |  }
367 | ```
368 | 
369 | ### Associations
370 | 
371 | Attributes within an event class are sometimes associated with each other and in some cases only one of them is present in the event while another may be looked up at processing or storage time.  OCSF denotes this within a class definition via the association construct:
372 | 
373 | ```json
374 | "associations": {
375 |      "actor.user": [
376 |           "src_endpoint"
377 |      ],
378 |      "dst_endpoint": [
379 |           "user"
380 |      ],
381 |      "src_endpoint": [
382 |           "actor.user"
383 |      ],
384 |      "user": [
385 |           "dst_endpoint"
386 |      ]
387 |  }
388 | ```
389 | 
390 | In this example from the Authentication class, the `user` as actor associates with its endpoint, the `src_endpoint` attribute of the class, while the target `user` associates with its endpoint, the `dst_endpoint` of the class.  Note that the associations in this class are bi-directional, which is common, although uni-directional associations are also possible in other situations.
391 | 
392 | The construct may be useful for automated processing systems where a lookup service is available for an attribute that isn’t or can’t be populated via the source event producer.  In these cases the `processor_time` should be populated at the time of the association, as with other types of enrichments.
393 | 
394 | ## Categories
395 | 
396 | **A Category organizes event classes that represent a particular domain.**  For example, a category can include event classes for different kinds of events that may be found in an access log, or audit log, or network and system events.  Each category has a unique `category_uid` attribute value which is the category identifier.  Category IDs also have `category_name` friendly name attributes, such as System Activity, Network Activity, Audit, etc.
397 | 
398 | An example of categories with some of their event classes is shown in the below table.
399 | 
400 | | **System Activity**       | **Network Activity**  | **Identity & Access Management** | **Findings**     | **Discovery**         | **Application Activity**     |
401 | | ------------------------- | --------------------- | -------------------------------- | ---------------- | --------------------- | ---------------------------- |
402 | | File System Activity      | Network Activity      | Account Change                   | Security Finding | Device Inventory Info | Web Resources Activity       |
403 | | Kernel Extension Activity | HTTP Activity         | Authentication                   |                  | Device Config State   | Application Lifecycle        |
404 | | Kernel Activity           | DNS Activity          | Authorize Session                |                  |                       | API Activity                 |
405 | | Memory Activity           | DHCP Activity         | Entity Management                |                  |                       | Web Resrouce Access Activity |
406 | | Module Activity           | RDP Activity          | User Access Management           |                  |                       |                              |
407 | | Scheduled Job Activity    | SMB Activity          | Group Management                 |                  |                       |                              |
408 | | Process Activity          | SSH Activity          |                                  |                  |                       |                              |
409 | |                           | FTP Activity          |                                  |                  |                       |                              |
410 | |                           | Email Activity        |                                  |                  |                       |                              |
411 | |                           | Network File Activity |                                  |                  |                       |                              |
412 | |                           | Email File Activity   |                                  |                  |                       |                              |
413 | |                           | Email URL Activity    |                                  |                  |                       |                              |
414 | 
415 | 
416 | Finding the right granularity of categories is an important modeling topic.  Categorization is weakly structural while event classification is strongly structural (i.e. it defines the particular attributes, their requirements, and specific Enum values for the event class).
417 | 
418 | Many events produced in a cloud platform can be classified as network activity. Similarly, many host system events include network activity.  The key question to ask is, do the logs from these services and hosts provide the same context or information? Would there be a family of event classes that make sense in a single category?  For example, does the NLB Access log provide context/info similar to a Flow log?  Does network traffic from a host provide similar information to a firewall or router?  Are they structured in the same fashion? Do they share attributes?  Would we obscure the meaning of these logs if we normalize them under the same category? Would the resultant category make sense on its own or will it lose its contextual meaning all together?
419 | 
420 | Using profiles, some of these overlapping categorical scenarios can be handled without new partially redundant event classes.
421 | 
422 | ## Profiles
423 | 
424 | **Profiles are overlays on event classes and objects,** effectively a dynamic mix-in class of attributes with their requirements and constraints.  While event classes specialize their category domain, a profile can augment existing event classes with a set of attributes independent of category.  Attributes that must or may occur in any event class are members of the Base Event class.  Attributes that are specialized for selected classes are members of a profile.
425 | 
426 | Multiple profiles can be added to an event class via an array of profile values in the optional `profiles` attribute of the Base Event class.  This mix-in approach allows for reuse of event classes vs. creating new classes one by one that include the same attributes.  Event classes and instances of events that support the profile can be filtered via the `profiles` attribute across all categories and event classes, forming another dimension of classification.
427 | 
428 | For example, a `Security Controls` profile that adds MITRE ATT&CK<sup>TM</sup> Attack and Malware objects to System Activity classes avoids having to recreate a new event class, or many classes, with all of the same attributes as the System Activity classes.  A query for events of the class will return all the events, with or without the security information, while a query for just the profile will return events across all event classes that support the `Malware` profile.  A `Host` profile can add `Device`, and `Actor` objects to Network Activity event classes when the network activity log source is a user’s computer.  Note that the `Actor` object includes `Process` and `User` objects, so a Host profile can include all of these when applied.  A Cloud profile could mix-in cloud platform specific information onto Network Activity events.
429 | 
430 | The `profiles` attribute is an optional array attribute of the Base Event class.  The absence of the `profiles` attribute means no profile attributes are added as would be expected.  Attributes defined with a profile have requirements that cannot be overridden, since profiles are themselves optional; it is assumed that the application of a profile is because those attributes are desired and can be populated.
431 | 
432 | However some classes, such as System Activity classes, build-in the attributes of a profile, for example the `Host` profile attributes `device` and `actor` are defined in the class.  When a class definition includes the profile attributes, it still registers for that profile in the class definition so as to match any searches across events for that profile. In this case the class defined attribute requirement definitions take precedence.
433 | 
434 | Core schema profiles for `Security Control`, `Host`, `Cloud`, `Container` and `Linux` (for the Linux extension described later) are shown in the below table with their attributes.
435 | 
436 | | **Security Control**         | **Host** | **Cloud** | **Container** | **Linux** |
437 | | :--------------------------- | -------- | --------- | ------------- | --------- |
438 | | attacks                      | actor    | api       | container     | group     |
439 | | disposition_id / disposition | device   | cloud     | namespace_pid | euid      |
440 | | malware                      |          |           |               | egid      |
441 | |                              |          |           |               | auid      |
442 | 
443 | A special `Date/Time` profile adds `Datetime` typed time attributes in every class where there is a `Timestamp` time attribute.  This allows for human readable RFC-3339 strings paired with epoch UTC integer values.
444 | 
445 | Other profiles could be product oriented, such as Firewall, IDS, VA, DLP etc. if they need to add attributes to existing classes.  They can also be more general, platform oriented, such as for Mac, Linux or Windows environments.
446 | 
447 | The core schema comes with a Linux profile via the Linux platform extension.
448 | 
449 | Vendors can add profiles via extensions.  For example, Splunk Technical Add-ons might define a profile that could be added to all events with Splunk’s standard `source`, `sourcetype`, `host` attributes.
450 | 
451 | ### Disposition
452 | 
453 | The `disposition_id` attribute of the Security Control profile indicates the outcome or state of the event class’ activity at the time of event capture and is an Enum with a standard set of values, such as Blocked, Quarantined, Deleted, Delayed.  
454 | 
455 | Only event classes that register for the profile may have a `disposition_id` but all have an `activity_id`. A typical use of `disposition_id` is when a security protection product detects a threat and blocks it.  The activity might have been a file open, but if the file was infected, the disposition would be that the file open was blocked.  As of this writing, `disposition_id` is added to core schema classes only via the Security Controls profile.
456 | 
457 | ### Profile Application Examples 
458 | 
459 | Using example categories and event classes from a preceding section, examples of how profiles might be applied to event classes are shown below.
460 | 
461 | #### System Activity
462 | 
463 | The event classes **would** all include the Host profile and **may** include the Security Controls or Cloud profile.
464 | 
465 | #### Network Activity
466 | 
467 | The event classes **may** include the Host profile and **may** include the Security Controls or Cloud profile.
468 | 
469 | #### Identity & Access Management
470 | 
471 | The event classes **would** include the Host profile, (due to actor.user), **may **include the Cloud profile, and **would not** include the Security Control profile.
472 | 
473 | ### Personas and Profiles
474 | 
475 | The personas called out in an earlier section, producer, author, mapper, analyst, all can consider the profile from a different perspective.
476 | 
477 | Producers, who can also be authors, can add profiles to their events when the events will include the additional information the profile adds.  For example a vendor may have certain system attributes that are added via an extension profile.  A network vendor that can detect malware would apply the Security Controls profile to their events.  An endpoint security vendor can apply the Host, User and Security Controls profile to network events.
478 | 
479 | Authors define profiles, and the profiles are applicable to specific classes, objects or categories.
480 | 
481 | Mappers can add the profile ID and associated attributes to specific events mapped to logs in much the same way producers would apply profiles.
482 | 
483 | Analysts, e.g. end users, can use the browser to select applicable profiles at the class level.  They can use the profile identifier in queries for hunting, and can use the profile identifiers for analytics and reporting. For example, show all malware alerts across any category and class.
484 | 
485 | ## Extensions
486 | 
487 | OCSF schemas can be extended by adding new attributes, objects, categories, profiles and event classes.  A schema is the aggregation of core schema entities and extensions.  
488 | 
489 | Extensions allow a particular vendor or customer to create a new schema or augment an existing schema.[^10]  Extensions can also be used to factor out non-essential schema domains keeping a schema small.  Extensions to the core schema use the framework in the same way as a new schema, optionally creating categories, profiles or event classes from the dictionary.  Extensions can add new attributes to the dictionary, including new objects.  Extended attribute names can be the same as core schema names but this is not a good practice for a number of reasons.  As with categories, event classes and profiles, extensions have unique IDs within the framework as well as versioning.[^11]
490 | 
491 | As of this writing, two platform extensions augment the core schema: Linux and Windows.  The Linux extension adds a profile, while the Windows extension adds three classes to the System Activity category.
492 | 
493 | Another use of extensions to the core schema is the development of new schema artifacts, which later may be promoted into the core schema or to a platform extension.  Another use of extensions is to add vendor specific extensions in addition to the core schema.  In this case, a best practice is to prefix the schema artifacts with a short identifier associated with the extension range registered.[^12]  Lastly, as mentioned above, entirely new schemas can be constructed as extensions.
494 | 
495 | Examples of new experimental categories, new event classes that contain some new attributes and objects are shown in the table below with a `Dev` extension superscript convention.  In the example, extension classes were added to the core Findings category, and three extension categories were added, Policy, Remediation and Diagnostic, with extension classes.
496 | 
497 | | **Findings**                         | **Policy<sup>Dev</sup>**                   | **Remediation<sup>Dev</sup>**                 | **Diagnostic<sup>Dev</sup>** |
498 | | ------------------------------------ | ------------------------------------------ | --------------------------------------------- | ---------------------------- |
499 | | Incident Creation<sup>Dev</sup>      | Clipbaord Content Protection<sup>Dev</sup> | File Remediation<sup>Dev</sup>                | CPU Usage<sup>Dev</sup>      |
500 | | Incident Associate<sup>Dev</sup>     | Compliance<sup>Dev</sup>                   | Folder Remediation<sup>Dev</sup>              | Memory Usage<sup>Dev</sup>   |
501 | | Incident Closure<sup>Dev</sup>       | Compliance Scan<sup>Dev</sup>              | Startup Application Remediation<sup>Dev</sup> | Throughput<sup>Dev</sup>     |
502 | | Incident Update<sup>Dev</sup>        | Content Protection<sup>Dev</sup>           | User Session Remediation<sup>Dev</sup>        |                              |
503 | | Email Delivery Finding<sup>Dev</sup> | Information Protection<sup>Dev</sup>       |                                               |                              |
504 | 
505 | 
506 | A brief discussion of how to extend the schema is found in Appendix C.
507 | 
508 | ## Appendix A - Guidelines and Conventions
509 | 
510 | ### Guidelines for attribute names
511 | 
512 | * Attribute names must be a valid UTF-8 sequence. 
513 | * Attribute names must be all lower case. 
514 | * Combine words using underscore. 
515 | * No special characters except underscore. 
516 | * Reserved attributes are prefixed with an underscore.
517 | * Use present tense unless the attribute describes historical information. 
518 | * `activity_id` enum labels should be present tense.  For example, `Delete`.  `disposition_id` enum labels should be past tense.  For example, `Blocked.`
519 | * Use singular and plural names properly to reflect the attribute content.  \
520 | For example, use `events_per_sec` rather than `event_per_sec`. 
521 | * When an attribute represents multiple entities, the attribute name should be pluralized and the value type should be an array.  \
522 | Example: `process.loaded_modules` includes multiple values -- a loaded module names list. 
523 | * Avoid repetition of words where possible.  \
524 | Example: `device.device_ip` should be `device.ip`. 
525 | * Avoid abbreviations when possible.  \
526 | Some exceptions can be made for well-accepted abbreviations. Example: `ip`, or `os`. 
527 | * For vendor extensions to the dictionary, prefix attribute names with a 3-letter moniker in order to avoid name collisions.  Example: `aws_finding, spk_context_ids`.
528 | 
529 | ## Appendix B - Data Types
530 | 
531 | Refer to [https://schema.ocsf.io/data_types](https://schema.ocsf.io/data_types) for the OCSF data types and their validation constraints.
532 | 
533 | ## Appendix C - Schema Construction and Extension
534 | 
535 | The OCSF schema repository can be found at [https://github.com/ocsf/ocsf-schema](https://github.com/ocsf/ocsf-schema).
536 | 
537 | The repository is structured as follows:
538 | 
539 | | **File or Folder** | **Purpose**                                                                           |
540 | | ------------------ | :------------------------------------------------------------------------------------ |
541 | | categories.json    | the schema categories are defined and must be present for classes to be in a category |
542 | | dictionary.json    | the schema dictionary is where all attributes must be defined                         |
543 | | version.json       | the schema semver version, every change to the schema requires this file be updated   |
544 | | enums/             | the schema enum definitions, optional if enums are shared                             |
545 | | events/            | the schema event classes                                                              |
546 | | extensions/        | the schema extensions, a similar structure is set per extension                       |
547 | | includes/          | the schema shared files                                                               |
548 | | objects/           | the schema object definitions                                                         |
549 | | profiles/          | the schema profiles                                                                   |
550 | 
551 | For information and examples about how to add to the schema, see [CONTRIBUTING.md](https://github.com/ocsf/ocsf-schema/blob/a46b6df1d60ad052739caa96c29109e9b233ef82/CONTRIBUTING.md) in the OCSF GitHub.
552 | 
553 | ### Extending the Schema
554 | 
555 | To extend the schema create a new directory using a unique extension name (e.g. dev)  in the extensions directory. The directory structure is the same as the top level repository structure above, and it may contain the following files and subdirectories, depending on what type of extension is desired:
556 | 
557 | | **File or Folder** | **Purpose**                                                           |
558 | | ------------------ | --------------------------------------------------------------------- |
559 | | categories.json    | Create to define a new event category to reserve a range of class IDs |
560 | | dictionary.json    | Create to define new attributes                                       |
561 | | events/            | Create to define new event classes                                    |
562 | | objects/           | Create to define new objects                                          |
563 | | profiles/          | Create to define new profiles                                         |
564 | 
565 | In order to reserve an ID space, and make your extension public, add a UID to your extension name in the OCSF Extensions Registry [here](https://github.com/ocsf/ocsf-schema/blob/main/extensions.md) to avoid collisions with core or other extension schemas.  For example, the dev extension would have a row in the table as follows:
566 | 
567 | | **Extension Name** | **Type** | **UID** | **Notes**                         |
568 | | ------------------ | -------- | ------- | --------------------------------- |
569 | | Development        | dev      | 999     | The development schema extensions |
570 | 
571 | New categories and event classes will have their unique IDs offset by the UID.
572 | 
573 | More information about extending existing schema artifacts can be found at [extending-existing-class.md](https://github.com/ocsf/ocsf-schema/blob/a46b6df1d60ad052739caa96c29109e9b233ef82/doc/extending-existing-class.md).
574 | 
575 | <!-- Footnotes themselves at the bottom. -->
576 | ## Notes
577 | 
578 | [^1]:
579 |      OCSF includes concepts and portions of the ICD Schema, developed by Symantec, a division of Broadcom and has been generalized and made open under Apache 2 license with their permission.
580 | 
581 | [^2]:
582 |      For the most up-to-date guidelines and data types, refer to the schema browser at [https://schema.ocsf.io](https://schema.ocsf.io).
583 | 
584 | [^3]:
585 |      MITRE ATT&CK<sup>TM</sup>: https://attack.mitre.org/
586 | 
587 | [^4]:
588 |      MITRE ATT&CK<sup>TM</sup> Matrix: https://attack.mitre.org/matrices/enterprise/
589 | 
590 | [^5]:
591 |      The internal source definition of an OCSF schema can be hierarchical but the resulting compiled schema does not expose sub classes.
592 | 
593 | [^6]:
594 |      Event class validation is enforced via the required attributes, in particular the classification attributes, which by necessity need to be kept to a minimum, as well as attribute data type validation and the event class structure
595 | 
596 | [^7]:
597 |      Required attributes that cannot be populated due to information missing from a data source must be carried with the event as _unknown_ values - asserting that the information was missing.
598 | 
599 | [^8]:
600 |      Note that a non-trivial difference between the processed_time and the logged_time in UTC may indicate a clock synchronization problem with the source of the event (but not necessarily the actual source of the event if there  is an intermediate collection system or forwarder).
601 | 
602 | [^9]:
603 |      Objects have been collapsed to save space.  You can generate full examples with dummy data at [https://schema.ocsf.io/doc/swagger.json](https://schema.ocsf.io/doc/swagger.json) or from within the browser.
604 | 
605 | [^10]:
606 |      An extension does not need to extend the core schema base class if it is a new schema.
607 | 
608 | [^11]:
609 |      Reserved identifier ranges are registered within a file in the project GitHub repository.  Extended events should populate the `metadata.version` attribute with the extended schema version.
610 | 
611 | [^12]:
612 |      The Schema Browser will label extensions with a superscript.
613 | 


--------------------------------------------------------------------------------
/Understanding OCSF.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ocsf/ocsf-docs/100ce76a6a2657fadf5a11cedf8a7a84ac6dd76c/Understanding OCSF.pdf


--------------------------------------------------------------------------------