├── README.md ├── data-message ├── README.md └── docs │ └── sdmx-csv-field-guide.md └── metadata-message └── docs └── sdmx-csv-field-guide.md /README.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | This repository is used for maintaining the SDMX-CSV data message specifications. 4 | 5 | This includes: 6 | 7 | - Normative documentation and samples for the SDMX-CSV data message. 8 | - [Wiki](https://github.com/sdmx-twg/sdmx-csv/wiki) for additional information 9 | -------------------------------------------------------------------------------- /data-message/README.md: -------------------------------------------------------------------------------- 1 | # Overview 2 | 3 | This repository is used for maintaining the SDMX-CSV data message specifications. 4 | 5 | This includes: 6 | 7 | - Normative documentation and samples for the SDMX-CSV data message. 8 | - [Wiki](https://github.com/sdmx-twg/sdmx-csv/wiki) for additional information -------------------------------------------------------------------------------- /metadata-message/docs/sdmx-csv-field-guide.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | SDMX-CSV Data Message is an SDMX data exchange format based on the [RFC 4180](https://tools.ietf.org/html/rfc4180). CSV is a widely used standardised and simple format to exchange data supported by many tools. 4 | 5 | SDMX-CSV integrates with other specifications, i.e.: 6 | - The SDMX API RESTful specification (e.g. content negotiation with mime-type to get SDMX-CSV representations, specific formats for responses, language selection through HTTP content negotiation) 7 | - The [RFC 4180](https://tools.ietf.org/html/rfc4180) specification 8 | 9 | ## RFC 4180: A common format for CSV files 10 | 11 | In order to benefit from best practices, SDMX-CSV is based on the rules defined in the [RFC 4180](https://tools.ietf.org/html/rfc4180), which defines a common format and MIME Type for CSV files. It is advised to read the (very short) RFC for a full list of requirements but, in a nutshell, the RFC defines rules such as: 12 | - How the CSV file should be structured (the RFC specifies that all records must have an identical structure (determined column number), like when using an SDMX "flat" representation for data); 13 | - When double-quotes should be used and how to escape them when needed; 14 | - How spaces should be handled: Spaces are considered part of a field and should not be ignored; 15 | - Which mime type should be used; 16 | - What is the default character set, etc. 17 | 18 | # Design principles for SDMX-CSV 2.1 Metadata Messages (aligned with SDMX 3.1) 19 | 20 | - In order to ensure the identifiability of the metadata contained in the message, the header row containing the column headers is mandatory and its content is well-defined. 21 | - An SDMX-CSV referential metadata message contains metadata attribute values for one or more metadatasets reported for one or more metadataflows or metadata provision agreements. 22 | - After the mandatory header row, each row contains the information related to one specific metadataset attached to one or more identifiable artefacts (targets). 23 | - In [RFC 4180](https://tools.ietf.org/html/rfc4180), csv stands for "comma-separated values". However, while SDMX-CSV uses indeed the "comma" (%x2C) as the default field separator, it adopts the wider interpretation of csv as "character-separated values". It is recommended for implementers to provide SDMX-CSV messages according to the locale of the user (e.g. as indicated in the http Accept-Language header). It means that e.g. the semi-colon ‘;’ (as used typically in specific regions or countries) is acceptable as separator. See also the examples below. Note that the separator used in a message can be determined by retrieving the character that follows the header field of the first column which extended by a squared bracket term (see below). 24 | 25 | ## Columns 26 | 27 | - The first column is always used for the underlying type of structure by which the metadataset is defined: metadataflow or metadata provision agreement. 28 | - The next one or two columns are always used for the related structure identification. 29 | - The next one or two columns are used for the metadataset identification. 30 | - The next column, previously used for the action to be performed for the metadataset, is deprecated. Instead use the appropriate HTTP method when submitting the message to an SDMX Rest API as documented [here](https://github.com/sdmx-twg/sdmx-rest/blob/complement-maintenance-doc/doc/maintenance.md#maintaining-reference-metadata). 31 | - The next column is used for indicating if the metadataset includes only partial available languages. If false (the default), then the value is `0`, otherwise `1`. E.g., an SDMX Rest GET query with an HTTP header `Accept-Language` may result in a metadataset containing only partial languages. If such a metadataset is again submitted to an SDMX Rest web service, then only the included languages are added or updated in the target SDMX system but other languages are not changed; 32 | - The next column is used for the structure types of all targets of the metadataset. 33 | - The next one or two columns are used for the identification of all targets of the metadataset. 34 | - Each metadata attribute of the included metadataset(s) is represented in one or two columns. SDMX web services should return the columns in the metadata attribute order as defined in (each of) the underlying Metadata Structure Definition(s), thus in case of data defined by different metadata structures: first the metadata attributes of the first metadata structure, then the remaining metadata attributes of the second metadata structure and so forth. However, any order of these columns is valid for metadata uploads to SDMX-consuming systems. 35 | - Implementers have the possibility to add any other custom columns as required, e.g. publicationPeriod, publicationYear, reportingBegin, reportingEnd, prepared, etc. 36 | - In the context of appending or deleting metadata, certain columns may be omitted, see below. 37 | 38 | ## Column headers (first row) 39 | 40 | - The header field of the first column always starts with the term `MDSTRUCTURE`. 41 | - This field must be extended with a sub-field delimiter encapsulated in squared brackets "[]", e.g. `MDSTRUCTURE[;]`, in case the message contains metadatasets with multiple targets or with multi-instance or multi-language metadata attributes. 42 | - The header field of the second column always contains the term `MDSTRUCTURE_ID`. 43 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the structure identification column containing the term `MDSTRUCTURE_NAME`. 44 | - The header field of the next column always contains the term `METADATASET_ID`. 45 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadataset identification column containing the term `METADATASET_NAME`. 46 | - The header field of the next column may contain the term `ACTION`. If this deprecated column is present, it is to be ignored. 47 | - The header field of the next column may contain the term `IS_PARTIAL_LANGUAGE`. 48 | - The header field of the next column contains the term `TARGET_TYPES`. 49 | - The header field of the next column contains the term `TARGET_IDS`. 50 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the target identification column containing the term `TARGET_NAMES`. 51 | - The other columns for components contain: 52 | - Default: The ID of the metadata attribute reported in that column prefixed by all corresponding nested parent metadata attributes separated by a dot "." in the form *METADATA_ID[.METADATA_ID]+*, e.g. `ATTRIBUTE_GRANDPARENT_ID.ATTRIBUTE_PARENT_ID.ATTRIBUTE_CHILD_ID`. Additional pairs of squared brackets `[]` are added at the end of the IDs of those metadata attributes that have multiple instances, e.g. `CONTACT[].NAME`, `CONTACT[].PHONE[]` or `CONTACT.PHONE[]`, and/or that contain localised values. In the latter case the brackets encapsulate the ISO 2-letter language codes that can be encountered in that column, separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. Example of a localised child attribute: `PROCESS.STEP[en;fr]`, and for multiple instances: `PROCESS.STEP[][en;fr]`. 53 | - If option `labels=both` (see *[here](#optional-parameters)*): The full ID (as described above under 'Default') and the localised name of the metadata attribute reported in that column separated by the term ": ", e.g. `ATTRIBUTE_ID: ATTRIBUTE_NAME. 54 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadata attribute identification column containing the localised name of the metadata attribute reported in the previous column. 55 | - Any other custom column contains a custom but unique term, e.g. `publicationPeriod`. 56 | 57 | ## Column content (all rows after header) 58 | 59 | - The first column contains: `metadataflow` or `metadataprovision`, depending on type of artefact for which the metadata contained in the message are defined: metadataflow or metadata provision agreement. 60 | - The second column contains: 61 | - Default: The structure identification information in the form *AGENCY:ARTEFACT_ID(VERSION)* (1), e.g. `ESTAT:MDF(1.6.0)`. 62 | - If option `labels=both` (see *[here](#optional-parameters)*): The structure identification information and its localised name separated by the term ": ", e.g. `ESTAT:MDF(1.6.0): Metadataflow name`. 63 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the structure identification column with the structure's localised name, e.g. `Metadataflow name`. 64 | - The next column contains the metadataset identification information in the form *AGENCY:ARTEFACT_ID(VERSION)*(1), e.g. `AGENCY:MD_SET(1.0.0)`. 65 | - If option `labels=both` (see *[here](#optional-parameters)*): The ID and the localised name of the metadataset separated by the term ": ", e.g. `ESTAT:MD_SET(1.0.0): Metadataset 1`. 66 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadataset identification column with the metadataset's localised name, e.g. `Metadata set name`. 67 | - The next column, if present, containing one of the action types, is deprecated and ignored. 68 | - The next column, if present, contains `1` or `0`, indicating if the metadataset only contains only a subset of all available languages. `1` stands for partial language subset, and `0` for the complete set of available languages (default). 69 | - The next column contains the types of all the targets of the metadataset according to the resource names defined for Structural Metadata Queries, e.g. `dataflow`. Multiple targets are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. Example for multiple target types: `dataflow;dataflow`. 70 | - The next column contains the identification information of all the targets of the metadataset in the form *AGENCY:ARTEFACT_ID(VERSION)* (1), separated by the sub-field separation character, e.g. `AGENCY:DF1(1.0.0);AGENCY:DF2(1.0.0)`. 71 | - If option `labels=both` (see *[here](#optional-parameters)*): The column contains the ID and the localised name of the targets separated by the term ": ", e.g. `AGENCY:DF(1.0.0): Dataflow name` or `AGENCY:DF1(1.0.0): Dataflow 1 name;AGENCY:DF2(1.0.0): Dataflow 2 name`. 72 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the target identification column with the target's localised name, e.g. `Dataflow name` or `Dataflow 1 name;Dataflow 2 name`. 73 | - The other columns for metadata attributes contain: 74 | - Default: The ID(s) (if coded) or value(s) (if non-coded) for the metadata attribute reported in that column, e.g. `A`, `A;B` or `"
An XHTML text
"`. 75 | - If option `labels=both` (see *[here](#optional-parameters)*): The ID(s) and their localised name(s) for the metadata attribute separated by the term ": " (if coded) or the value(s) (if non-coded) for the metadata attribute reported in that column, e.g. `A: A value name`, `A: A value name;B: B value name` or `"An XHTML text
"`. 76 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the metadata attribute identification column containing the localised name, e.g. `A value name` or `A value name;B value name`, of the metadata attribute value reported in the previous column. It is empty if the value has no localised name. 77 | - All string/textual values (complete string between column-separating characters including ID's or language codes) should always be encapsulated in quotation marks, they must be if they contain commas or inner quotation marks. Quotation marks in strings/textual values must always be escaped by doubling the quotes. 78 | - When metadata from different metadata structures are present then the columns not related to the attribute's metadata structure are to be left empty. 79 | - The other custom columns contain any potentially localised custom content. 80 | 81 | ## Localisation 82 | 83 | - HTTP content negotiation, see [RFC 2616 - HTTP 1.1 Header Field Definitions](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html) 84 | - Always use this mime-type in the Accept header: `application/vnd.sdmx.metadata+csv; version=2.0.0`. 85 | - The client can indicate preferred languages through the Accept-Language header, e.g. `fr, en-gb;q=0.8, en;q=0.7`. 86 | - Always localise all artefact names according to the preferred language. The first best language match according to the user’s preferred language choices in the http Accept-Language header (or if that is not available than according to the system's default language order) is to be used for each localisable name element. The message does however not indicate the returned language per localisable name element. In case that there is no such language match for a particular localisable name element, it is optional to return the element in a system-default language or alternatively to not return the name element. 87 | **It is recommended to indicate all languages used anywhere in the message for localised name elements through http Content-Language response header (languages of the intended audience).** 88 | Note: For multi-language metadata attribute values, all language versions are provided independently from the preferred language (see below). 89 | 90 | ## Multi-instance metadata attributes 91 | 92 | - Values from multiple instances of a metadata attribute within a metadataset are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. 93 | - Such metadata attributes are indicated in the column header by having their ID followed by empty squared brackets "[]", e.g. `ATTR[]`. 94 | - For coded multi-instance metadata attributes, if option `labels=both` (see *[here](#optional-parameters)*) then each individual value is to be prefixed with its ID and the term ": ", e.g. `A: Value A;B: Value B`. 95 | 96 | ## Non-coded multi-lingual metadata attributes 97 | 98 | - Non-coded metadata attributes allow for multi-lingual values. Those values are separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. 99 | - Such metadata attributes are indicated in the column header by having their ID followed by the list of possible 2-letter ISO language codes separated by the sub-field separator and encapsulated squared brackets "[]", e.g. `ATTR[en;fr]`. 100 | - Each individual language value is to be prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value;fr:Valeur`. Thus, in distinction to the ID prefix for coded values when using the HTTP accept header `labels=both` (see *[here](#optional-parameters)*), the language prefix `xx:` doesn't have an extra space character. 101 | 102 | ## Non-coded multi-lingual multi-instance metadata attributes 103 | 104 | - When non-coded multi-lingual metadata attributes have multiple instances within a metadataset, then all individual values are included and separated by the special sub-field separation character, e.g. `;`, defined in the squared bracket term of the header field of the first column, e.g. `MDSTRUCTURE[;]`. 105 | - Such metadata attributes are indicated in the column header by having their ID followed by squared brackets "[]" as well as by the list of possible language codes separated by the sub-field separator and encapsulated in additional squared brackets "[]", e.g. `ATTR[][en;fr;de]`. 106 | - Each individual language value is to prefixed with its 2-letter ISO language code and a colon character ":", e.g. `en:Value1`. 107 | - Not each value needs all language versions. In order to allow knowing to which value the different language items belong, each multi-lingual value set is to be encapsulated in double quotes, e.g. `"en:Value1;fr:Valeur1";"en:Value2;de:Wert2"`. However, note that fields with double quotes must themselves be encapsulated in double quotes and that the inner double quotes need to be doubled, thus the fully complete example is `"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2"""`. 108 | 109 | ## Non-coded XHTML-valued components 110 | 111 | - Some non-coded metadata attributes allow for XHTML values. 112 | - Each XHTML value is to be encapsulated in double quotes, e.g. `"This is some ""metadata html""
"`. Remember that the inner double quotes need to be doubled. 113 | - The CSV format allows fields to contain line break characters if those fields are enclosed in double quotes. Thus XHTML values can also contain line breaks, although HTML viewers will ignore them. 114 | 115 | # Optional parameters 116 | 117 | The following optional parameter can be added to the HTTP Accept header. It needs to be separated by the character combination `"; "`. 118 | - labels (id|name|both; default=id): This parameter applies to all Nameable SDMX Artefacts contained in the header and the body of the message: 119 | - If the parameter value is `id` then only the id of the Artefacts is displayed. 120 | - If the parameter value is `both` then the concatenated id and localised name of the Artefacts (see the section on [localised names](#localised-names) on how the message deals with languages) separated by `": "` are displayed. Note that the character combination `": "` could also be part of the Artefact name and could therefore occur several times within the concatenated string. 121 | - If the parameter value is `name` then the id/value and the name of the artefacts are displayed in separate columns (see *[here](#columns)*), the ID/value column always directly preceding its related localised name column. 122 | 123 | # Examples 124 | 125 | Note: All examples assume the minimal HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=2.1.0` 126 | 127 | #### 1) Ordinary case 128 | 129 | MDSTRUCTURE,MDSTRUCTURE_ID,METADATASET_ID,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_1.CHILD,ATTRIBUTE_2 130 | metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),dataflow,OECD:DF(1.0.0),A STRING VALUE,"An XHTML text with ""quotes""
",123 131 | 132 | Note: 133 | The following default parameter settings are automatically applied: 134 | - labels=id 135 | 136 | #### 2) Metadata attribute with multiple instances and multi-lingual values 137 | 138 | MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_1.ATTRIBUTE_1_2[][en;fr],ATTRIBUTE_2[],ATTRIBUTE_3[] 139 | metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),dataflow,OECD:DF(1.0.0),CODE_ID,"""en:""""An XHTML text
"""";fr:""""Un texte XHTML
""""";""en:""""Another XHTML text
"""";fr:""""Un autre texte XHTML
""""""","""Text with """"quotes"""""";""Another text""",123;456 140 | 141 | #### 3) Localisation: HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=1.0.0; labels=both`, HTTP Accept-Language header: `fr-FR, en;q=0.7`, metadata attribute with multiple instances, metadata attributes with multi-lingual values 142 | 143 | MDSTRUCTURE[|],MDSTRUCTURE_ID;METADATASET_ID;TARGET_TYPES;TARGET_IDS;ATTRIBUTE_1: Attribut d'exemple 1;ATTRIBUTE_1.ATTRIBUTE_1_2[][en|fr]: Attribut d'exemple 12;ATTRIBUTE_2[]: Attribut d'exemple 2 144 | metadataflow;OECD:MDF(1.0.0): Metadataflow d'exemple;OECD:MDS(1.0.0): Metadataset d'exemple;dataflow;OECD:DF(1.0.0): Dataflow d'exemple;CODE_ID: Nom du code;"""en:""""An XHTML text
""""|fr:""""Un texte XHTML
""""""|""en:""""Another XHTML text
""""|fr:""""Un autre texte XHTML
""""""";123,45|6,789 145 | 146 | Note that in this example the client prefers French (fr) language with the France (FR) locale, but will also accept any type of English. Therefore, in the message the French language with the France locale is applied, transforming also the field separator from comma (,) to semicolon (;), and the decimal separator from dot (.) to comma (,). 147 | 148 | #### 4) Localisation: HTTP Accept header: `application/vnd.sdmx.metadata+csv; version=1.0.0; labels=name`, HTTP Accept-Language header: `en-US`, metadata attribute with multiple instances, metadata attributes with multi-lingual values, different targets and metadatasets 149 | 150 | MDSTRUCTURE[;],MDSTRUCTURE_ID,MDSTRUCTURE_NAME,METADATASET_ID,METADATASET_NAME,TARGET_TYPES,TARGET_IDS,TARGET_NAMES,ATTRIBUTE_1,Attribute 1,ATTRIBUTE_1.ATTRIBUTE_1_2[][en|fr],Attribute 12,ATTRIBUTE_2[],Attribute 2 151 | metadataflow,OECD:MDF(1.0.0),Metadataflow name,OECD:MDS(1.0.0),Metadataset name,dataflow;dataflow,OECD:DF(1.0.0);OECD:DF(1.1.0),Dataflow name 1;Dataflow name 2,CODE_ID,Code name,"""en:""""An XHTML text
"""";fr:""""Un texte XHTML
"""""";""en:""""Another XHTML text
"""";fr:""""Un autre texte XHTML
""""""",123.45;6.789 152 | metadataflow,OECD:MDF(1.0.0),Metadataflow name,OECD:MDS(1.1.0),Metadataset new name,codelist,OECD:CL(1.0.0),Codelist name,CODE_ID,Code name,"""en:""""Text 1
"""";fr:""""Texte 1
"""""";""en:""""Text 2
"""";fr:""""Texte 2
""""""",0 153 | 154 | #### 5) Varying metadataflows 155 | 156 | MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_2[][en;fr;de] 157 | metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),dataflow,OECD:DF(1.0.0),CODE_ID,"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2""" 158 | metadataflow,OECD:MDF(1.1.0),OECD:MDS(1.1.0),dataflow,OECD:DF(1.1.0),CODE_ID,"""en:Value1;fr:Valeur1"";""en:Value2;de:Wert2""" 159 | 160 | #### 6) Non-versioned metadataset for a non-versioned[^1] data provision agreement 161 | 162 | MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_2[en;fr] 163 | metadataprovision,OECD:MDP,OECD:MDS,dataflow,OECD:DF(1.0.0),CODE_ID,"en:Value1;fr:Valeur1" 164 | 165 | #### 7) Non-coded metadata attribute values with line-breaks 166 | 167 | MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1[] 168 | metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),dataflow,OECD:DF(1.0.0),"""This text with a line 169 | break"";""This is some other text""" 170 | 171 | #### 8) Metadataflows with partial languages 172 | 173 | MDSTRUCTURE[;],MDSTRUCTURE_ID,METADATASET_ID,IS_PARTIAL_LANGUAGE,TARGET_TYPES,TARGET_IDS,ATTRIBUTE_1,ATTRIBUTE_2[][en] 174 | metadataflow,OECD:MDF(1.0.0),OECD:MDS(1.0.0),1,dataflow,OECD:DF(1.0.0),CODE_ID,"""en:Value1"";""en:Value2""" 175 | metadataflow,OECD:MDF(1.1.0),OECD:MDS(1.1.0),1,dataflow,OECD:DF(1.1.0),CODE_ID,"""en:Value1"";""en:Value2""" 176 | 177 | ------------------------ 178 | 179 | [^1]: Note that since SDMX 3.0.0 the syntax *AGENCY:ARTEFACT_ID(VERSION)* allows omitting the version for non-versioned artefacts. In this case using *AGENCY:ARTEFACT_ID* is sufficient, e.g. `OECD:MDP` 180 | -------------------------------------------------------------------------------- /data-message/docs/sdmx-csv-field-guide.md: -------------------------------------------------------------------------------- 1 | # Introduction 2 | 3 | SDMX-CSV Data Message is an SDMX data exchange format based on the [RFC 4180](https://tools.ietf.org/html/rfc4180). CSV is a widely used standardised and simple format to exchange data supported by many tools. 4 | 5 | SDMX-CSV integrates with other specifications, i.e.: 6 | - The SDMX API RESTful specification (e.g. content negotiation with mime-type to get SDMX-CSV representations, specific formats for responses, language selection through HTTP content negotiation) 7 | - The [RFC 4180](https://tools.ietf.org/html/rfc4180) specification 8 | 9 | ## RFC 4180: A common format for CSV files 10 | In order to benefit from best practices, SDMX-CSV is based on the rules defined in the [RFC 4180](https://tools.ietf.org/html/rfc4180), which defines a common format and MIME Type for CSV files. It is advised to read the (very short) RFC for a full list of requirements but, in a nutshell, the RFC defines rules such as: 11 | - How the CSV file should be structured (the RFC specifies that all records must have an identical structure (determined column number), like when using an SDMX "flat" representation for data); 12 | - When double-quotes should be used and how to escape them when needed; 13 | - How spaces should be handled: Spaces are considered part of a field and should not be ignored; 14 | - Which mime type should be used; 15 | - What is the default character set, etc. 16 | 17 | The SDMX-CSV format is flexible enough in its representation to support the needs of different target audiences: 18 | - It is designed and optimised for the purpose of general public data dissemination of statistical data, and for usage in common statistical software. 19 | - It allows using the messages to create pivot tables in spreadsheets applications. 20 | 21 | # Design principles for SDMX-CSV 2.1 Data Messages (aligned with SDMX 3.1) 22 | 23 | - In order to ensure the identifiability of the data contained in the message, the header row containing the column headers is mandatory and its content is well-defined. 24 | - After the mandatory header row, each row contains the information related to one specific observation or to one or more attributes attached to partial keys. For `Delete` actions a row can also concern several observations if dimensions are wildcarded. 25 | - In [RFC 4180](https://tools.ietf.org/html/rfc4180), csv stands for "comma-separated values". However, while SDMX-CSV uses indeed the "comma" (%x2C) as the default field separator, it adopts the wider interpretation of csv as "character-separated values". It is recommended for implementers to provide SDMX-CSV messages according to the locale of the user (e.g. as indicated in the http Accept-Language header). It means that e.g. the semi-colon ‘;’ (as used typically in specific regions or countries) is acceptable as separator. See also the related example below. Note that the separator used in a message can be determined by retrieving the character that follows the fixed first column header term *STRUCTURE* (which may be extended by a squared bracket term). 26 | 27 | ## Columns 28 | 29 | - The first column is always used for the structure type: dataflow, data structure definition or data provision agreement. 30 | - The next one or two columns are always used for the structure's identification. 31 | - The next column is always used for the action to be performed. 32 | - The next up to two columns are used for the series and/or observation key. 33 | - Each Data Structure Definition (DSD) component (dimensions, measures, attributes including those defined through a referenced Metadata Structure Definition) included in the message is represented in one or two columns. SDMX web services should return the columns in the order of components as defined in (each of) the underlying Data Structure Definition(s), grouped by type of component, thus in case of data defined by different data structures: first the dimensions of the first data structure, then the remaining dimensions of the second data structure and so forth, then the measures of the first data structure, then the remaining measures of the second data structure and so forth, then the attributes of the first data structure, then the remaining attributes of the second data structure and so forth. However, any order of these columns is valid for data uploads to SDMX-consuming systems. 34 | - Only all those dimension columns have to be present, that are required to uniquely identify the concerned attributes and/or measures. 35 | - Attributes can but do not need to be included even if they have a mandatory status. 36 | - Measures can but do not have to be included. 37 | - When an SDMX RESTful web service implements streaming, then it might not know, while generating the csv header row, which measures and attributes actually have values. Therefore, it can happen that all values presented in an attribute or measure column are left empty. 38 | - Implementers have the possibility to add any other custom columns as required, e.g. updated, prepared, etc. 39 | 40 | ## Column headers (first row) 41 | 42 | - The header field of the first column always contains the term `STRUCTURE`. 43 | - This field must be extended with a sub-field delimiter encapsulated in squared brackets "[]", e.g. `STRUCTURE[;]`, in case the message contains multi-valued or multi-language measure or attribute values. 44 | - The header field of the second column always contains the term `STRUCTURE_ID`. 45 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the artefact identification column containing the term `STRUCTURE_NAME`. 46 | - The header field of the next column should contain the term `ACTION`. For convenience, if this column is not present, a default action ("Information") is assumed for the whole message. 47 | - The next up to two columns contain, if option `key=series|obs|both` (see *[here](#optional-parameters)*), in this order the terms `SERIES_KEY` and/or `OBS_KEY`. 48 | - The other columns for components contain: 49 | - Default: The ID of the component reported in that column, e.g. `DIM1`. 50 | - If option `labels=both` (see *[here](#optional-parameters)*): The ID and the localised name of the component reported in that column separated by the term ": ", e.g. `DIM1: Dimension 1`. 51 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the component identification column containing the localised name of the component reported in the previous column. 52 | - Any other custom column contains a custom but unique term, e.g. `UPDATED`. 53 | 54 | ## Column content (all rows after header) 55 | 56 | - The first column contains: `dataflow`, `datastructure` or `dataprovision`, depending on type of artefact for which the data contained in the row are defined: dataflow, data structure definition or data provision agreement. 57 | - The second column contains: 58 | - Default: The artefact identification information for the data in the row in the form *AGENCY:ARTEFACT_ID(VERSION)*(1), e.g. `ESTAT:NA_MAIN(1.6.0)`. 59 | - If option `labels=both` (see *[here](#optional-parameters)*): The artefact identification information and its localised name separated by the term ": ", e.g. `ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates`. 60 | - If option `labels=name` (see *[here](#optional-parameters)*): An additional column is added right after the artefact identification column with the artefact's localised name, e.g. `National Accounts Main Aggregates`. 61 | - The next column contains one character representing one of the current 4 action types: 62 | - "I": Information - Deprecated. When used to update an SDMX storage system, the *Merge* action is assumed. 63 | - "A": Append - Deprecated. When used to update an SDMX storage system, the *Merge* action is assumed. 64 | - "M": Merge - Data or data-related reference metadata is to be merged, through either update or insertion depending on already existing information. This operation does not allow deleting any component values. Updating individual values in multi-valued measure, attribute or data-related reference metadata values is not supported either. The complete multi-valued value is to be provided. Only non-dimensional components (measure, attribute or data-related reference metadata values) can be **omitted** (\This is some ""metadata html""
"`. Remember that the inner quotes need to be doubled. 175 | - The CSV format allows fields to contain line breaks if those fields are enclosed in double quotes. Thus XHTML values can also contain line breaks. 176 | 177 | # Optional parameters 178 | 179 | Optional parameters can be added to the HTTP Accept header. They need to be separated by the character combination `"; "`. 180 | - labels (id|name|both; default=id): This parameter applies to all Nameable SDMX Artefacts contained in the header and the body of the message: 181 | - If the parameter value is `id` then only the id/value of the artefacts is displayed. 182 | - If the parameter value is `name` then the id/value and the name of the artefacts are displayed in separate columns (see *[here](#columns)*), the ID/value column always directly preceding its related localised name column. 183 | - If the parameter value is `both` then the concatenated id/value and localised name of the artefacts (see the section on [localised names](#localised-names) on how the message deals with languages) separated by `": "` are displayed. Note that the character combination `": "` could also be part of the artefact name and could therefore occur several times within the concatenated string. 184 | - timeFormat (original|normalized; default=original): 185 | - If the parameter value is `original` then the time dimension (*TIME-PERIOD*) values are displayed in the SDMX *TIME_PERIOD* format as originally recorded. 186 | - If the parameter value is `normalized` then the time dimension (*TIME_PERIOD*) values are converted to the most granular [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) representation taking into account the highest frequency of the data in the message and the moment in time when the lower-frequency values were collected (which, e.g. at the ECB, is typically either at the beginning, middle or end of the reporting period). This eases comparisons and business analysis of multi-frequency values, e.g. in pivot tables. As an example, if annual and daily data are available in the message and the annual data were collected at the end of the reporting period, the formatted value for the annual period 2014 becomes 2014-12-31. 187 | - keys (none|obs|series|both; default=none): Request the addition of column(s) for keys. 188 | - If the value is `none` (the default), no related column will be added. 189 | - If the value is `obs`, a new column OBS_KEY will be added after the ACTION column. The column will contain the combination of IDs/values for all the dimensions, order by their order in the data structure definition and separated by a dot character (.), e.g. M.USD.EUR.SP00.2020-01 190 | - If the value is `series`, a new column SERIES_KEY will be added after the ACTION column. The column will contain the combination of IDs/values for all the dimensions except the one(s) attached to the observation, ordered by their order in the data structure definition and separated by a dot character (.), e.g. M.USD.EUR.SP00 191 | - If the value is `both`, both a SERIES_KEY and an OBS_KEY columns must be added after the ACTION column, starting with the SERIES_KEY column. 192 | 193 | # Examples 194 | 195 | Note: All examples assume the minimal HTTP Accept header: `application/vnd.sdmx.data+csv; version=2.1.0` 196 | 197 | #### 1) Ordinary case 198 | 199 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1,UPDATED 200 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-01,12.4,Y,"Normal, special and other values",N,2021-01-22T13:15:41Z 201 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-02,10.8,Y,"Normal, special and other values",Y,2021-01-22T13:15:41Z 202 | 203 | Notes: 204 | - The following default parameter settings are automatically applied: 205 | - labels=id 206 | - timeFormat=original 207 | - *UPDATED* is a custom column 208 | 209 | #### 2) Components in any order, missing component(s), component with multiple values 210 | 211 | STRUCTURE[;],STRUCTURE_ID,ACTION,OBS_VALUE1,OBS_VALUE2,ATTR_3,ATTR_1[],DIM_2,DIM_1,DIM_3 212 | dataflow,ESTAT:NA_MAIN(1.6.0),M,12.4,12.5,"Normal, special and other values",X;Y,B,A,2014-01 213 | dataflow,ESTAT:NA_MAIN(1.6.0),M,10.8,10.9,"Normal, special and other values",X;Z,B,A,2014-02 214 | 215 | #### 3) Components in any order and missing component, HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; key=series` 216 | 217 | STRUCTURE[;],STRUCTURE_ID,ACTION,SERIES_KEY,OBS_VALUE1,OBS_VALUE2,ATTR_3,ATTR_1,DIM_2,DIM_1,DIM_3 218 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A.B,12.4,12.5,"Normal, special and other values",N,B,A,2014-01 219 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A.B,10.8,10.9,"Normal, special and other values",Y,B,A,2014-02 220 | 221 | #### 4) Localisation: HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=both; key=both`, HTTP Accept-Language header: `fr-FR, en;q=0.7` 222 | 223 | STRUCTURE[|];STRUCTURE_ID;ACTION;SERIES_KEY;OBS_KEY;DIM_1: Dimension 1;DIM_2: Dimension 2;DIM_3: Dimension 3;OBS_VALUE: Observation value;ATTR_2: Attribut 2;ATTR_3: Attribut 3;ATTR_1: Attribut 1 224 | dataflow;ESTAT:NA_MAIN(1.6.0): Principaux agrégats des comptes nationaux;M;A.B;A.B.2014-01;A: Value A;B: Value B;2014-01: 2014-01;12,4;Y: Oui;Normal, special and other values;N: Non 225 | dataflow;ESTAT:NA_MAIN(1.6.0): Principaux agrégats des comptes nationaux;M;A.B;A.B.2014-02;A: Value A;B: Value B;2014-02: 2014-02;10,8;Y: Oui;Normal, special and other values;Y: Oui 226 | 227 | Note that in this example the client prefers French (fr) language with the France (FR) locale, but will also accept any type of English. Therefore, in the message the French language with the France locale is applied, transforming also the field separator from comma (,) to semicolon (;), and the decimal separator from dot (.) to comma (,). 228 | 229 | #### 5) HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=both; timeFormat=normalized` 230 | 231 | STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1: Dimension 1,DIM_2: Dimension 2,DIM_3: Dimension 3,OBS_VALUE: Observation value,ATTR_2: Attribute 2,ATTR_3: Attribute 3,ATTR_1: Attribute 1 232 | dataflow,ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates,M,A: Value A,B: Value B,2014-01-01,12.4,Y: Yes,"Normal, special and other values",N: No 233 | dataflow,ESTAT:NA_MAIN(1.6.0): National Accounts Main Aggregates,M,A: Value A,B: Value B,2014-02-01,10.8,Y: Yes,"Normal, special and other values",Y: Yes 234 | 235 | #### 6) HTTP Accept header: `application/vnd.sdmx.data+csv; version=1.0.0; labels=name` 236 | 237 | STRUCTURE,STRUCTURE_ID,STRUCTURE_NAME,ACTION,DIM_1,Dimension 1,DIM_2,Dimension 2,DIM_3,Dimension 3,OBS_VALUE,Observation value,ATTR_1,Attribute 1,ATTR_2,Attribute 2,ATTR_3,Attribute 3 238 | dataflow,ESTAT:NA_MAIN(1.6.0),National Accounts Main Aggregates,M,A,Value A,B,Value B,2014-01,2014-01,12.4,,Y,Yes,"Normal, special and other values",,N,No 239 | dataflow,ESTAT:NA_MAIN(1.6.0),National Accounts Main Aggregates,M,A,Value A,B,Value B,2014-02,2014-02,10.8,,Y,Yes,"Normal, special and other values",,Y,Yes 240 | 241 | #### 7) Multi-valued components 242 | 243 | STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[],ATTR_2[],ATTR_3[] 244 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-01,12.4,Value X;Value Y,"M, N & O;P & Q",A;B;C 245 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-02,10.8,Value X;Value Y,"M, N & O;P & Q",A;C 246 | 247 | #### 8) Non-coded multi-lingual components, varying dataflows based on the same underlying data structure 248 | 249 | STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[en;fr] 250 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-01,12.4,en:Any Value;fr:N'importe quelle Valeur 251 | dataflow,ESTAT:NA_MAIN(1.7.0),M,A,B,2014-02,10.8,"en:Value ""X"";fr:Valeur ""X""" 252 | 253 | #### 9-A) Varying structural artefacts based on same underlying data structure 254 | 255 | STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1[en;fr] 256 | dataflow,ESTAT:DF_NA_MAIN(1.6.0),M,A,B,2014-01,12.4,en:Any Value;fr:N'importe quelle Valeur 257 | datastructure,ESTAT:DSD_NA_MAIN(1.7.0),M,A,B,2014-02,10.8,"en:Value ""X"";fr:Valeur ""X""" 258 | dataprovision,ESTAT:DPA_NA_MAIN(1.8.0),M,A,B,2014-03,11.2,"en:Value ""Y"";fr:Valeur ""Y""" 259 | 260 | #### 9-B) Varying structural artefacts based on different underlying data structures 261 | 262 | STRUCTURE[;],STRUCTURE_ID,ACTION,DIM_A1B1,DIM_A2,DIM_A3C2,DIM_B2,DIM_C1,DIM_C3,MEAS_A1B1C1,MEAS_C2,ATTR_A1,ATTR_B1 263 | dataflow,ESTAT:DF_A(1.6.0),M,DIMVAL_A1B1,DIMVAL_A2,DIMVAL_A3C2,,,,"MEASVAL_A1B1C1",,"ATTRVAL_A1", 264 | datastructure,ESTAT:DSD_B(1.7.0),M,DIMVAL_A1B1,,,DIMVAL_B2,,,"MEASVAL_A1B1C1",,,"ATTRVAL_B1" 265 | dataprovision,ESTAT:DPA_C(1.8.0),M,,,DIMVAL_A3C2,,DIMVAL_C1,DIMVAL_C3,"MEAS_A1B1C1","MEAS_C2",, 266 | 267 | #### 10) Varying actions 268 | 269 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 270 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-01,12.4,X 271 | dataflow,ESTAT:NA_MAIN(1.6.0),R,A,B,2014-02,10.8,Y 272 | 273 | #### 11) Data for a non-versioned(1) data structure definition 274 | 275 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 276 | datastructure,AGENCY:DF_ID,M,A,B,2014-01,12.4,N 277 | datastructure,AGENCY:DF_ID,M,A,B,2014-02,10.8,Y 278 | 279 | #### 12) Attributes attached to partial keys for a data provision agreement 280 | 281 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,DIM_3,ATTR_1 282 | dataprovision,AGENCY:DPA_ID(1.0.0),M,B,2014-01,N 283 | dataprovision,AGENCY:DPA_ID(1.0.0),M,B,2014-02,Y 284 | 285 | #### 13) Mixing rows for attributes attached to partial keys with rows for observations 286 | 287 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,MEAS_1,ATTR_1,ATTR_2 288 | dataflow,AGENCY:DF_ID(1.0.0),M,A,B,2014-01,12.4,N, 289 | dataflow,AGENCY:DF_ID(1.0.0),M,,B,,,,Y 290 | 291 | #### 14) Nested metadata attributes attached to partial keys 292 | 293 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,COLLECTION.METHOD[en;fr],CONTACT[],CONTACT[].NAME[] 294 | dataflow,AGENCY:DF_ID(1.0.0),M,A,en:AAA;fr:BBB,Contact 1;Contact 2,"""Contact 1 Name 1;Contact 1 Name 2"";""Contact 1 Name 1;Contact 2 Name 2""" 295 | dataflow,AGENCY:DF_ID(1.0.0),M,B,en:CCC;fr:DDD,Contact 1;Contact 2;Contact 3,"""Contact 1 Name 1;Contact 1 Name 2"";;""Contact 3 Name 1;Contact 3 Name 2""" 296 | 297 | #### 15) Non-coded XHTML-formatted values with line-breaks 298 | 299 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_1 300 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-01,12.4,"This is some ""xhtml"" with a line 301 | break
" 302 | dataflow,ESTAT:NA_MAIN(1.6.0),M,A,B,2014-02,10.8,"This is some other ""xhtml""
" 303 | 304 | #### 16) Deleting specific measure and attribute values: all non-empty values (e.g. marked with "-") are deleted 305 | 306 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1 307 | dataflow,ESTAT:NA_MAIN(1.6.0),D,A,B,2014-01,-,,, 308 | dataflow,ESTAT:NA_MAIN(1.6.0),D,A,B,2014-02,,,-, 309 | 310 | #### 17) Deleting specific measure and attribute values with wildcarded dimensions: all non-empty values (e.g. marked with "-") are deleted for all dimension combinations where: 311 | - row 2: DIM2=A 312 | - row 3: DIM2=B 313 | 314 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3,OBS_VALUE,ATTR_2,ATTR_3,ATTR_1 315 | dataflow,ESTAT:NA_MAIN(1.6.0),D,,A,,-,,, 316 | dataflow,ESTAT:NA_MAIN(1.6.0),D,,B,,,,-, 317 | 318 | #### 18) Deleting whole observations with wildcarded dimensions: all observations are deleted for all dimension combinations where: 319 | - row 2: DIM2=A 320 | - row 3: DIM2=B and DIM3=C 321 | 322 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_2,DIM_3 323 | dataflow,ESTAT:NA_MAIN(1.6.0),D,A,, 324 | dataflow,ESTAT:NA_MAIN(1.6.0),D,B,C, 325 | 326 | #### 19) Deleting all data for a data structure definition: 327 | 328 | STRUCTURE,STRUCTURE_ID,ACTION 329 | datastructure,ESTAT:DSD_NA_MAIN(1.6.0),D 330 | or 331 | 332 | STRUCTURE,STRUCTURE_ID,ACTION,DIM_1,DIM_2,DIM_3 333 | datastructure,ESTAT:DSD_NA_MAIN(1.6.0),D,,, 334 | 335 | ------------------------ 336 | 337 | **(1)** Note that since SDMX 3.0.0 the syntax *AGENCY:ARTEFACT_ID(VERSION)* allows omitting the version for non-versioned artefacts. In this case using *AGENCY:ARTEFACT_ID* is sufficient, e.g. `AGENCY:DF_ID` 338 | --------------------------------------------------------------------------------