├── README.md ├── ldif2csv ├── csv2ldif └── LICENSE.txt /README.md: -------------------------------------------------------------------------------- 1 | ldif-csv-conv — LDIF to CSV and vice versa 2 | ========================================== 3 | 4 | These two scripts can convert datasets between the LDIF (LDAP Data Interchange 5 | Format) and CSV (Comma-Separated Values) formats, in both directions, without 6 | losing information. 7 | (Comments in the LDIF are not preserved; ordering of multi-value attributes is 8 | preserved.) 9 | 10 | ## LDIF to CSV 11 | 12 | ./ldif2csv data.ldif > data.csv 13 | 14 | or 15 | 16 | ./ldif2csv < data.ldif > data.csv 17 | 18 | ## CSV to LDIF 19 | 20 | ./csv2ldif data.csv > data.ldif 21 | 22 | or 23 | 24 | ./csv2ldif < data.csv > data.ldif 25 | 26 | ## Notes (format, etc) 27 | 28 | * The CSV files use `;` as the cell separator, enclose all cells in `"` (with 29 | `"`s in the input represented as `""`) and separate the values of a 30 | multi-value attribute with `|`: 31 | 32 | "dn";"email";"objectclass" 33 | "uid=johndoe,dc=example,dc=com";"johndoe@example.com";"inetOrgPerson|top" 34 | 35 | The `""` escaping allows you to use both `;` and `"` in your data. If you 36 | need to use `|`, you'll have to edit the `JOINER` definition in both script 37 | files. 38 | `ldif2csv` will examine your data and output suggestions for unused 39 | characters that could be used as joiners or delimiters. 40 | 41 | * Individual attribute values¹ longer than 499 bytes (think `jpegphoto`s) are 42 | split off into individual files to keep the CSV small. 43 | They are referenced as `FILE=dn=…,attr=…,id=…` in the respective CSV cell. If 44 | this format is kept intact, they will be reintegrated into the LDIF by 45 | `csv2ldif`. 46 | The user is free to manually insert or modify these `FILE=…` references — if 47 | they point to a valid file, they will be followed; if not, the converter will 48 | crash and burn. Enjoy! 49 | (¹ Note that this means you can have a mixture of "direct" and "referenced" 50 | values in the same cell.) 51 | 52 | ## Known issues 53 | 54 | * Conversion from CSV to LDIF is very slow when the input contains a lot of 55 | binary data (e.g. `jpegphoto` attributes). 56 | * *Error handling? What's that?* 57 | If something goes wrong, you'll get your usual Python exception stack trace. 58 | -------------------------------------------------------------------------------- /ldif2csv: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # encoding: utf-8 (as per PEP 263) 3 | 4 | import sys 5 | import os 6 | import fileinput 7 | import base64 8 | import string 9 | import csv 10 | 11 | SEPARATOR = ';' 12 | JOINER = '|' 13 | QUOTECHAR = '"' 14 | 15 | """ 16 | Final structure: 17 | dict( 18 | uid=doej,ou=people,dc=example: { 19 | cn: ['John Doe',], 20 | email: ['doej@fim.uni-passau.de', 'johndoe@gmail.com'], 21 | } 22 | """ 23 | 24 | class filepeek(object): 25 | def __init__(self): 26 | self.fi = fileinput.input() 27 | try: 28 | self.next_elem = self.fi.__next__().rstrip('\n') 29 | except StopIteration: 30 | self.next_elem = None 31 | 32 | def next(self): 33 | elem = self.next_elem 34 | try: 35 | self.next_elem = self.fi.__next__().rstrip('\n') 36 | except StopIteration: 37 | self.next_elem = None 38 | return elem 39 | 40 | def peek(self): 41 | return self.next_elem 42 | 43 | def main(): 44 | data = {} 45 | attrs = set() 46 | charset = set() 47 | current_dn = '' 48 | fp = filepeek() 49 | while True: 50 | line = fp.next() 51 | if line is None: 52 | break 53 | 54 | if not line: 55 | # end of dn block 56 | current_dn = '' 57 | continue 58 | 59 | if line.lstrip().startswith('#'): 60 | # skip comment line 61 | continue 62 | 63 | if line.startswith('version: '): 64 | # skip version line 65 | continue 66 | 67 | key, value = line.split(': ', 1) 68 | is_b64 = key.endswith(':') 69 | key = key.rstrip(':') 70 | 71 | full_value = value 72 | while fp.peek().startswith(' '): 73 | full_value += fp.next()[1:] 74 | 75 | if is_b64: 76 | full_value = base64.b64decode(full_value) 77 | 78 | attrs.add(key) 79 | 80 | if key == 'dn': 81 | current_dn = full_value 82 | data[current_dn] = {} 83 | 84 | if not current_dn: 85 | raise Exception('Non-dn attribute "%s" while not inside a dn block!' % (key)) 86 | 87 | if key not in data[current_dn]: 88 | data[current_dn][key] = [] 89 | 90 | if len(full_value) < 500: 91 | if type(full_value) is bytes: 92 | full_value = full_value.decode('utf8') 93 | 94 | store_value = full_value 95 | 96 | else: 97 | # too large for CSV output, write to file instead 98 | 99 | # find unused filename 100 | fileid = 0 101 | while True: 102 | filename = 'dn=%s,attr=%s,id=%d' % (current_dn, key, fileid) 103 | if not os.path.exists(filename): 104 | break 105 | fileid += 1 106 | 107 | with open(filename, 'wb') as f: 108 | f.write(full_value) 109 | 110 | store_value = 'FILE=' + filename 111 | 112 | data[current_dn][key] += [store_value] 113 | charset.update(set(store_value)) 114 | 115 | usable_chars = set(string.printable) - set(string.whitespace) 116 | available_chars = usable_chars - charset 117 | sys.stderr.write('The following characters DO NOT appear in the dataset (i.e. they can be freely used as separators etc):\n') 118 | sys.stderr.write('%s\n' % (' '.join(sorted(available_chars)))) 119 | 120 | separator = ';' 121 | joiner = '|' 122 | strdelimiter = '"' 123 | 124 | attrs = list(sorted(attrs)) 125 | 126 | csvwriter = csv.writer(sys.stdout, delimiter=SEPARATOR, quotechar=QUOTECHAR, quoting=csv.QUOTE_ALL) 127 | 128 | csvwriter.writerow(attrs) 129 | 130 | for entry in data: 131 | row = [] 132 | for attr in attrs: 133 | content = '' 134 | if attr in data[entry]: 135 | content = joiner.join(data[entry][attr]) 136 | row.append(content) 137 | csvwriter.writerow(row) 138 | 139 | 140 | if __name__ == '__main__': 141 | main() 142 | -------------------------------------------------------------------------------- /csv2ldif: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # encoding: utf-8 (as per PEP 263) 3 | 4 | import sys 5 | import os 6 | import fileinput 7 | import csv 8 | import base64 9 | from textwrap import TextWrapper 10 | 11 | LINELENGTH = 76 12 | 13 | SEPARATOR = ';' 14 | JOINER = '|' 15 | QUOTECHAR = '"' 16 | 17 | """ 18 | Final structure: 19 | dn: uid=doejohnn,ou=members,ou=people,dc=zimvm,dc=fsinfo,dc=fim,dc=uni-passa 20 | u,dc=de 21 | cn: John Doe 22 | displayname: JohnD 23 | 24 | dn: ... 25 | """ 26 | 27 | def is_ascii(s): 28 | return all(ord(char) < 128 for char in s) 29 | 30 | def main(): 31 | wrapper = TextWrapper(width=76, expand_tabs=False, 32 | replace_whitespace=False, drop_whitespace=False, initial_indent='', 33 | subsequent_indent=' ', break_long_words=True, 34 | break_on_hyphens=False) 35 | csvreader = csv.reader(fileinput.input(mode='r'), delimiter=SEPARATOR, quotechar=QUOTECHAR, quoting=csv.QUOTE_ALL) 36 | first = True 37 | for row in csvreader: 38 | if first: 39 | attrs = row 40 | dn_index = attrs.index('dn') 41 | first = False 42 | 43 | else: 44 | line = 'dn: %s\n' % (row[dn_index]) 45 | sys.stdout.write('\n'.join(wrapper.wrap(line))) 46 | 47 | for idx, attr in enumerate(attrs): 48 | values = row[idx] 49 | 50 | for value in values.split(JOINER): 51 | if value == '': 52 | # skip undefined attributes 53 | continue 54 | 55 | if attr == 'dn': 56 | # skip dn attribute (we already included it in the beginning) 57 | continue 58 | 59 | is_base64 = False 60 | 61 | if value.startswith('FILE='): 62 | filename = value[len('FILE='):] 63 | with open(filename, 'rb') as f: 64 | value = f.read() 65 | try: 66 | value = value.decode('ascii') 67 | except UnicodeDecodeError: 68 | is_base64 = True 69 | value = base64.b64encode(value).decode('ascii') 70 | else: 71 | if not is_ascii(value): 72 | is_base64 = True 73 | value = base64.b64encode(value.encode('utf8')).decode('ascii') 74 | 75 | if is_base64: 76 | line = '%s:: %s\n' % (attr, value) 77 | else: 78 | line = '%s: %s\n' % (attr, value) 79 | sys.stdout.write('\n'.join(wrapper.wrap(line))) 80 | 81 | sys.stdout.write('\n') 82 | 83 | return 0 84 | 85 | data = {} 86 | attrs = set() 87 | charset = set() 88 | current_dn = '' 89 | fp = filepeek() 90 | while True: 91 | line = fp.next() 92 | if line is None: 93 | break 94 | 95 | if not line: 96 | # end of dn block 97 | current_dn = '' 98 | continue 99 | 100 | if line.lstrip().startswith('#'): 101 | # skip comment line 102 | continue 103 | 104 | if line.startswith('version: '): 105 | # skip version line 106 | continue 107 | 108 | key, value = line.split(': ', 1) 109 | is_b64 = key.endswith(':') 110 | key = key.rstrip(':') 111 | 112 | full_value = value 113 | while fp.peek().startswith(' '): 114 | full_value += fp.next()[1:] 115 | 116 | if is_b64: 117 | full_value = base64.b64decode(full_value) 118 | 119 | attrs.add(key) 120 | 121 | if key == 'dn': 122 | current_dn = full_value 123 | data[current_dn] = {} 124 | 125 | if not current_dn: 126 | raise Exception('Non-dn attribute "%s" while not inside a dn block!' % (key)) 127 | 128 | if key not in data[current_dn]: 129 | data[current_dn][key] = [] 130 | 131 | if len(full_value) < 500: 132 | if type(full_value) is bytes: 133 | full_value = full_value.decode('utf8') 134 | 135 | store_value = full_value 136 | 137 | else: 138 | # too large for CSV output, write to file instead 139 | 140 | # find unused filename 141 | fileid = 0 142 | while True: 143 | filename = 'dn=%s,attr=%s,id=%d' % (current_dn, key, fileid) 144 | if not os.path.exists(filename): 145 | break 146 | fileid += 1 147 | 148 | with open(filename, 'wb') as f: 149 | f.write(full_value) 150 | 151 | store_value = 'FILE=' + filename 152 | 153 | data[current_dn][key] += [store_value] 154 | charset.update(set(store_value)) 155 | 156 | usable_chars = set(string.printable) - set(string.whitespace) 157 | available_chars = usable_chars - charset 158 | sys.stderr.write('The following characters DO NOT appear in the dataset (i.e. they can be freely used as separators etc):\n') 159 | sys.stderr.write('%s\n' % (' '.join(sorted(available_chars)))) 160 | 161 | attrs = list(sorted(attrs)) 162 | 163 | first = True 164 | for attr in attrs: 165 | if not first: 166 | sys.stdout.write(separator) 167 | sys.stdout.write(attr) 168 | first = False 169 | 170 | sys.stdout.write('\n') 171 | 172 | for entry in data: 173 | first = True 174 | for attr in attrs: 175 | if not first: 176 | sys.stdout.write(separator) 177 | if attr in data[entry]: 178 | sys.stdout.write(strdelimiter) 179 | sys.stdout.write( 180 | joiner.join( 181 | data[entry][attr] 182 | ) 183 | ) 184 | sys.stdout.write(strdelimiter) 185 | first = False 186 | sys.stdout.write('\n') 187 | 188 | 189 | if __name__ == '__main__': 190 | main() 191 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | --------------------------------------------------------------------------------