├── .gitignore ├── README.md ├── doc ├── codeq.org └── intro.md ├── epl-v10.html ├── examples ├── clojure-and-contrib └── src │ └── datomic │ └── codeq │ └── examples │ └── clojure_and_contrib.clj ├── project.clj ├── src └── datomic │ └── codeq │ ├── analyzer.clj │ ├── analyzers │ └── clj.clj │ ├── core.clj │ └── util.clj └── test └── datomic └── codeq └── core_test.clj /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /lib 3 | /classes 4 | /checkouts 5 | /tmp 6 | pom.xml 7 | *.jar 8 | *.class 9 | .lein-deps-sum 10 | .lein-failures 11 | .lein-plugins 12 | *.tar 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # codeq 2 | 3 | **codeq** ('co-deck') is Clojure+Datomic application designed to do code-aware imports of your git repos into a [Datomic](http://datomic.com) db 4 | 5 | ## Usage 6 | 7 | Clone the **codeq** repo. Then (in it) run: 8 | 9 | lein uberjar 10 | 11 | Get [Datomic Free](http://www.datomic.com/get-datomic.html) 12 | 13 | Unzip it, then start the Datomic Free transactor. Follow the instructions for [running the transactor with the free storage protocol](http://docs.datomic.com/getting-started.html) 14 | 15 | cd theGitRepoYouWantToImport 16 | 17 | java -server -Xmx1g -jar whereverYouPutCodeq/target/codeq-0.1.0-SNAPSHOT-standalone.jar datomic:free://localhost:4334/git 18 | 19 | This will create a db called `git` (you can call it whatever you like) and import the commits from the local view of the repo. You should see output like: 20 | 21 | Importing repo: git@github.com:clojure/clojure.git as: clojure 22 | Adding repo git@github.com:clojure/clojure.git 23 | Importing commit: e54a1ff1ac0d02560e80aad460e77ac353efad49 24 | Importing commit: 894a0c81075b8f4b64b7f890ab0c8522a7a9986a 25 | ... 26 | Importing commit: c1884eaca8ffb7aff2c3d393a9d5fa3306cf3f33 27 | Importing commit: 01b4cb7156f0b378e70020d0abe293bffe35b031 28 | Importing commit: 6bbfd943766e11e52a3fe21b177d55536892d132 29 | Import complete! 30 | 31 | Analyzing... 32 | Running analyzer: :clj on [.clj] 33 | analyzing file: 17592186045504 34 | analyzing file: 17592186045496 35 | Analysis complete! 36 | 37 | The import is not too peppy, since it shells to `git` relentlessly, but it imports e.g. Clojure's entire commit history in about 10 minutes, plus analysis. 38 | 39 | You can import more than one repo into the same db. You can re-import later after some more commits and they will be incrementally added. 40 | 41 | You can then (or during) connect to the same db URI with a peer. Or, just start the [Datomic REST service](http://docs.datomic.com/rest.html) and poke around: 42 | 43 | cd whereverYouPutDatomicFree 44 | bin/rest -p 8080 free datomic:free://localhost:4334/ 45 | 46 | Browse to [localhost:8080/data/](http://localhost:8080/data/). You should see the `free` storage and the `git` db within it. 47 | 48 | The [schema diagram](https://github.com/downloads/Datomic/codeq/codeq.pdf) will help you get oriented. 49 | 50 | ## More info 51 | 52 | See the [intro blog post](http://blog.datomic.com/2012/10/codeq.html) and the [wiki](https://github.com/Datomic/codeq/wiki) 53 | 54 | ## License 55 | 56 | Copyright © 2012 Metadata Partners, LLC and Contributors. All rights reserved. 57 | 58 | Distributed under the Eclipse Public License, the same as Clojure. 59 | -------------------------------------------------------------------------------- /doc/codeq.org: -------------------------------------------------------------------------------- 1 | Codeq 2 | * Objective 3 | ** Lots of interesting information exists in the source history of a project 4 | ** Get it into a db so it's queryable 5 | *** must include time (use Datomic) 6 | ** Move from source file orientation to language-level definition orientation 7 | * Premise 8 | ** Presumptions 9 | *** Using a VCS 10 | **** with single total order available 11 | *** Using a lang with globally unique namespaced definitions 12 | **** e.g. Clojure, Java etc 13 | ** turn VCS commits into Datomic transactions 14 | *** with 1:1 monotonicity - must import in order 15 | *** Commit properties become tx attrs 16 | *** Analyze affected sources 17 | **** grab docs and other metadata 18 | **** track definitions (e.g. in Clojure, defns) 19 | ***** def is namespaced name associated with source (data or code) 20 | ****** different source is different def for same name 21 | **** other analysis-derived info 22 | ***** call sites (use of other fns) 23 | ***** use of lang constructs (EH, mutation etc)? 24 | * VCS 25 | ** need monotonic log 26 | ** ability to list affected files 27 | ** ability to get file contents 28 | *** a la carte blob/file vs pulling revision? 29 | ** Git, defactor standard, has good temporal aspect, blobs 30 | *** but not moreso than mercurial etc, so don't close doors 31 | * Datomic 32 | ** code entities 33 | *** named by namespaced names 34 | **** name might not be enough 35 | ***** e.g. if overloads have location-distinct defs 36 | ***** what type is name then? 37 | ****** or name is not unique 38 | ******* that's what overloading means 39 | *** point to definition 40 | **** name -> def 41 | **** name -> overloads -> defs 42 | ** definition 43 | *** identified by source equality 44 | **** preferably code-data or AST, vs strings 45 | **** for Clojure, must be metadata-sensitive compare 46 | *** occurs at location in file 47 | **** over time, same def might move around within/across files 48 | *** source-derived metadata 49 | **** docs 50 | **** arglist(s) 51 | **** langs with overloads might consider separate defs 52 | ***** e.g. may have separate docs 53 | **** also have calls per arity/sig 54 | ***** could handle locations separately, even though nested 55 | *** non fn/method defs 56 | **** e.g. classes, types etc 57 | *** def is totality of source associated with name 58 | **** can't be if not file-contiguous 59 | ***** else name+sig? 60 | *** analysis-derived info 61 | **** callsites 62 | ***** vars called 63 | ****** or methods? 64 | ******* if statically determinable 65 | * Issues 66 | ** Multi-language support? 67 | *** multiple langs good, same schema not so good 68 | **** don't try to normalize across langs 69 | ** Clojure lang support 70 | *** multimethods 71 | **** impls have separate locations, calls etc 72 | **** but same names 73 | *** protocols 74 | **** independent fn defs 75 | **** defs in deftype/record 76 | *** deftype/record 77 | -------------------------------------------------------------------------------- /doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to codeq 2 | 3 | 4 | -------------------------------------------------------------------------------- /epl-v10.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Eclipse Public License - Version 1.0 8 | 25 | 26 | 27 | 28 | 29 | 30 |

Eclipse Public License - v 1.0

31 | 32 |

THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE 33 | PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR 34 | DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS 35 | AGREEMENT.

36 | 37 |

1. DEFINITIONS

38 | 39 |

"Contribution" means:

40 | 41 |

a) in the case of the initial Contributor, the initial 42 | code and documentation distributed under this Agreement, and

43 |

b) in the case of each subsequent Contributor:

44 |

i) changes to the Program, and

45 |

ii) additions to the Program;

46 |

where such changes and/or additions to the Program 47 | originate from and are distributed by that particular Contributor. A 48 | Contribution 'originates' from a Contributor if it was added to the 49 | Program by such Contributor itself or anyone acting on such 50 | Contributor's behalf. Contributions do not include additions to the 51 | Program which: (i) are separate modules of software distributed in 52 | conjunction with the Program under their own license agreement, and (ii) 53 | are not derivative works of the Program.

54 | 55 |

"Contributor" means any person or entity that distributes 56 | the Program.

57 | 58 |

"Licensed Patents" mean patent claims licensable by a 59 | Contributor which are necessarily infringed by the use or sale of its 60 | Contribution alone or when combined with the Program.

61 | 62 |

"Program" means the Contributions distributed in accordance 63 | with this Agreement.

64 | 65 |

"Recipient" means anyone who receives the Program under 66 | this Agreement, including all Contributors.

67 | 68 |

2. GRANT OF RIGHTS

69 | 70 |

a) Subject to the terms of this Agreement, each 71 | Contributor hereby grants Recipient a non-exclusive, worldwide, 72 | royalty-free copyright license to reproduce, prepare derivative works 73 | of, publicly display, publicly perform, distribute and sublicense the 74 | Contribution of such Contributor, if any, and such derivative works, in 75 | source code and object code form.

76 | 77 |

b) Subject to the terms of this Agreement, each 78 | Contributor hereby grants Recipient a non-exclusive, worldwide, 79 | royalty-free patent license under Licensed Patents to make, use, sell, 80 | offer to sell, import and otherwise transfer the Contribution of such 81 | Contributor, if any, in source code and object code form. This patent 82 | license shall apply to the combination of the Contribution and the 83 | Program if, at the time the Contribution is added by the Contributor, 84 | such addition of the Contribution causes such combination to be covered 85 | by the Licensed Patents. The patent license shall not apply to any other 86 | combinations which include the Contribution. No hardware per se is 87 | licensed hereunder.

88 | 89 |

c) Recipient understands that although each Contributor 90 | grants the licenses to its Contributions set forth herein, no assurances 91 | are provided by any Contributor that the Program does not infringe the 92 | patent or other intellectual property rights of any other entity. Each 93 | Contributor disclaims any liability to Recipient for claims brought by 94 | any other entity based on infringement of intellectual property rights 95 | or otherwise. As a condition to exercising the rights and licenses 96 | granted hereunder, each Recipient hereby assumes sole responsibility to 97 | secure any other intellectual property rights needed, if any. For 98 | example, if a third party patent license is required to allow Recipient 99 | to distribute the Program, it is Recipient's responsibility to acquire 100 | that license before distributing the Program.

101 | 102 |

d) Each Contributor represents that to its knowledge it 103 | has sufficient copyright rights in its Contribution, if any, to grant 104 | the copyright license set forth in this Agreement.

105 | 106 |

3. REQUIREMENTS

107 | 108 |

A Contributor may choose to distribute the Program in object code 109 | form under its own license agreement, provided that:

110 | 111 |

a) it complies with the terms and conditions of this 112 | Agreement; and

113 | 114 |

b) its license agreement:

115 | 116 |

i) effectively disclaims on behalf of all Contributors 117 | all warranties and conditions, express and implied, including warranties 118 | or conditions of title and non-infringement, and implied warranties or 119 | conditions of merchantability and fitness for a particular purpose;

120 | 121 |

ii) effectively excludes on behalf of all Contributors 122 | all liability for damages, including direct, indirect, special, 123 | incidental and consequential damages, such as lost profits;

124 | 125 |

iii) states that any provisions which differ from this 126 | Agreement are offered by that Contributor alone and not by any other 127 | party; and

128 | 129 |

iv) states that source code for the Program is available 130 | from such Contributor, and informs licensees how to obtain it in a 131 | reasonable manner on or through a medium customarily used for software 132 | exchange.

133 | 134 |

When the Program is made available in source code form:

135 | 136 |

a) it must be made available under this Agreement; and

137 | 138 |

b) a copy of this Agreement must be included with each 139 | copy of the Program.

140 | 141 |

Contributors may not remove or alter any copyright notices contained 142 | within the Program.

143 | 144 |

Each Contributor must identify itself as the originator of its 145 | Contribution, if any, in a manner that reasonably allows subsequent 146 | Recipients to identify the originator of the Contribution.

147 | 148 |

4. COMMERCIAL DISTRIBUTION

149 | 150 |

Commercial distributors of software may accept certain 151 | responsibilities with respect to end users, business partners and the 152 | like. While this license is intended to facilitate the commercial use of 153 | the Program, the Contributor who includes the Program in a commercial 154 | product offering should do so in a manner which does not create 155 | potential liability for other Contributors. Therefore, if a Contributor 156 | includes the Program in a commercial product offering, such Contributor 157 | ("Commercial Contributor") hereby agrees to defend and 158 | indemnify every other Contributor ("Indemnified Contributor") 159 | against any losses, damages and costs (collectively "Losses") 160 | arising from claims, lawsuits and other legal actions brought by a third 161 | party against the Indemnified Contributor to the extent caused by the 162 | acts or omissions of such Commercial Contributor in connection with its 163 | distribution of the Program in a commercial product offering. The 164 | obligations in this section do not apply to any claims or Losses 165 | relating to any actual or alleged intellectual property infringement. In 166 | order to qualify, an Indemnified Contributor must: a) promptly notify 167 | the Commercial Contributor in writing of such claim, and b) allow the 168 | Commercial Contributor to control, and cooperate with the Commercial 169 | Contributor in, the defense and any related settlement negotiations. The 170 | Indemnified Contributor may participate in any such claim at its own 171 | expense.

172 | 173 |

For example, a Contributor might include the Program in a commercial 174 | product offering, Product X. That Contributor is then a Commercial 175 | Contributor. If that Commercial Contributor then makes performance 176 | claims, or offers warranties related to Product X, those performance 177 | claims and warranties are such Commercial Contributor's responsibility 178 | alone. Under this section, the Commercial Contributor would have to 179 | defend claims against the other Contributors related to those 180 | performance claims and warranties, and if a court requires any other 181 | Contributor to pay any damages as a result, the Commercial Contributor 182 | must pay those damages.

183 | 184 |

5. NO WARRANTY

185 | 186 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS 187 | PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 188 | OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, 189 | ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY 190 | OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely 191 | responsible for determining the appropriateness of using and 192 | distributing the Program and assumes all risks associated with its 193 | exercise of rights under this Agreement , including but not limited to 194 | the risks and costs of program errors, compliance with applicable laws, 195 | damage to or loss of data, programs or equipment, and unavailability or 196 | interruption of operations.

197 | 198 |

6. DISCLAIMER OF LIABILITY

199 | 200 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT 201 | NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, 202 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING 203 | WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF 204 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 205 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR 206 | DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED 207 | HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

208 | 209 |

7. GENERAL

210 | 211 |

If any provision of this Agreement is invalid or unenforceable under 212 | applicable law, it shall not affect the validity or enforceability of 213 | the remainder of the terms of this Agreement, and without further action 214 | by the parties hereto, such provision shall be reformed to the minimum 215 | extent necessary to make such provision valid and enforceable.

216 | 217 |

If Recipient institutes patent litigation against any entity 218 | (including a cross-claim or counterclaim in a lawsuit) alleging that the 219 | Program itself (excluding combinations of the Program with other 220 | software or hardware) infringes such Recipient's patent(s), then such 221 | Recipient's rights granted under Section 2(b) shall terminate as of the 222 | date such litigation is filed.

223 | 224 |

All Recipient's rights under this Agreement shall terminate if it 225 | fails to comply with any of the material terms or conditions of this 226 | Agreement and does not cure such failure in a reasonable period of time 227 | after becoming aware of such noncompliance. If all Recipient's rights 228 | under this Agreement terminate, Recipient agrees to cease use and 229 | distribution of the Program as soon as reasonably practicable. However, 230 | Recipient's obligations under this Agreement and any licenses granted by 231 | Recipient relating to the Program shall continue and survive.

232 | 233 |

Everyone is permitted to copy and distribute copies of this 234 | Agreement, but in order to avoid inconsistency the Agreement is 235 | copyrighted and may only be modified in the following manner. The 236 | Agreement Steward reserves the right to publish new versions (including 237 | revisions) of this Agreement from time to time. No one other than the 238 | Agreement Steward has the right to modify this Agreement. The Eclipse 239 | Foundation is the initial Agreement Steward. The Eclipse Foundation may 240 | assign the responsibility to serve as the Agreement Steward to a 241 | suitable separate entity. Each new version of the Agreement will be 242 | given a distinguishing version number. The Program (including 243 | Contributions) may always be distributed subject to the version of the 244 | Agreement under which it was received. In addition, after a new version 245 | of the Agreement is published, Contributor may elect to distribute the 246 | Program (including its Contributions) under the new version. Except as 247 | expressly stated in Sections 2(a) and 2(b) above, Recipient receives no 248 | rights or licenses to the intellectual property of any Contributor under 249 | this Agreement, whether expressly, by implication, estoppel or 250 | otherwise. All rights in the Program not expressly granted under this 251 | Agreement are reserved.

252 | 253 |

This Agreement is governed by the laws of the State of New York and 254 | the intellectual property laws of the United States of America. No party 255 | to this Agreement will bring a legal action under this Agreement more 256 | than one year after the cause of action arose. Each party waives its 257 | rights to a jury trial in any resulting litigation.

258 | 259 | 260 | 261 | 262 | -------------------------------------------------------------------------------- /examples/clojure-and-contrib: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd `dirname $0`/.. 4 | CODEQ_VERSION="0.1.0-SNAPSHOT" 5 | CODEQ_ROOT=`pwd` 6 | 7 | DATOMIC_VERSION="0.8.3784" 8 | DATOMIC_FILE="datomic-free-$DATOMIC_VERSION" 9 | DATOMIC_URL=http://downloads.datomic.com/$DATOMIC_VERSION/$DATOMIC_FILE.zip 10 | 11 | BACKUP_FILE="clojure-and-contrib" 12 | BACKUP_URL=http://codeq.s3.amazonaws.com/examples/$BACKUP_FILE.zip 13 | 14 | DB_URI="datomic:free://localhost:4334/clojure-and-contrib" 15 | 16 | RET=0 17 | 18 | WORKING_DIR=tmp/examples/clojure-and-contrib 19 | mkdir -p $WORKING_DIR 20 | cd $WORKING_DIR 21 | WORKING_DIR=`pwd` 22 | 23 | if [ ! -d "$BACKUP_FILE" ]; then 24 | wget $BACKUP_URL 25 | unzip $BACKUP_FILE.zip 26 | fi 27 | 28 | if [ ! -d "$DATOMIC_FILE" ]; then 29 | wget $DATOMIC_URL 30 | unzip $DATOMIC_FILE.zip 31 | fi 32 | 33 | #### Restore 34 | 35 | cd $DATOMIC_FILE 36 | 37 | # Start with a fresh database 38 | rm -rf data log 39 | 40 | bin/transactor config/samples/free-transactor-template.properties & 41 | TRANSACTOR_PID=$! 42 | 43 | (( RET += $? )) 44 | 45 | bin/datomic restore-db file:$WORKING_DIR/$BACKUP_FILE $DB_URI 46 | 47 | (( RET += $? )) 48 | 49 | pkill -P $TRANSACTOR_PID 50 | 51 | #### Verify 52 | 53 | bin/transactor config/samples/free-transactor-template.properties & 54 | TRANSACTOR_PID=$! 55 | 56 | (( RET += $? )) 57 | 58 | cd $CODEQ_ROOT 59 | 60 | sleep 5 61 | 62 | lein run -m datomic.codeq.examples.clojure-and-contrib $DB_URI 63 | 64 | (( RET += $? )) 65 | 66 | cd $CODEQ_ROOT 67 | lein clean 68 | lein uberjar 69 | 70 | git clone git@github.com:clojure/clojure.git $WORKING_DIR/clojure 71 | cd $WORKING_DIR/clojure 72 | 73 | java -server -Xmx1g -jar $CODEQ_ROOT/target/codeq-$CODEQ_VERSION-standalone.jar $DB_URI 74 | 75 | (( RET += $? )) 76 | 77 | pkill -P $TRANSACTOR_PID 78 | 79 | exit $RET 80 | -------------------------------------------------------------------------------- /examples/src/datomic/codeq/examples/clojure_and_contrib.clj: -------------------------------------------------------------------------------- 1 | (ns datomic.codeq.examples.clojure-and-contrib 2 | (:require [datomic.api :as d :refer [q]] 3 | [clojure.pprint :refer [pprint]])) 4 | 5 | (defn -main [& [database-uri]] 6 | (assert database-uri) 7 | (println "Running clojure-and-contrib examples with database" database-uri) 8 | (try 9 | (let [conn (d/connect database-uri) 10 | db (-> conn d/db (d/as-of 435691)) 11 | repos (map first 12 | (q '[:find ?repo 13 | :where [?e :repo/uri ?repo]] 14 | db)) 15 | namespaces (map first 16 | (q '[:find ?ns 17 | :where 18 | [?e :clj/ns ?n] 19 | [?n :code/name ?ns]] 20 | db)) 21 | definitions (reduce (fn [agg [o d]] 22 | (update-in agg [o] (fnil conj []) d)) 23 | {} 24 | (q '[:find ?op ?def 25 | :where 26 | [?e :clj/def ?d] 27 | [?e :clj/defop ?op] 28 | [?d :code/name ?def]] 29 | db))] 30 | (println) 31 | (println "#### Repos:") 32 | (println) 33 | (pprint repos) 34 | (println) 35 | (println "#### Namespaces:") 36 | (println) 37 | (pprint namespaces) 38 | (println) 39 | (println "#### Definitions:") 40 | (println) 41 | (pprint definitions) 42 | (assert (= 44 (-> definitions keys count))) 43 | (assert (= 33 (-> "defne" definitions count)))) 44 | (finally 45 | ;; (shutdown-agents) 46 | (System/exit 0)))) 47 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject datomic/codeq "0.1.0-SNAPSHOT" 2 | :description "codeq does a code-aware import of your git repo into a Datomic db" 3 | :url "http://datomic.com" 4 | :license {:name "Eclipse Public License" 5 | :url "http://www.eclipse.org/legal/epl-v10.html"} 6 | :main datomic.codeq.core 7 | :plugins [[lein-tar "1.1.0"]] 8 | :dependencies [[com.datomic/datomic-free "0.9.4699"] 9 | [commons-codec "1.7"] 10 | [org.clojure/clojure "1.5.1"]] 11 | :source-paths ["src" "examples/src"]) 12 | -------------------------------------------------------------------------------- /src/datomic/codeq/analyzer.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.analyzer 10 | (:import [java.io StringReader] 11 | [org.apache.commons.codec.digest DigestUtils])) 12 | 13 | (set! *warn-on-reflection* true) 14 | 15 | (defprotocol Analyzer 16 | (keyname [a] "keyword name for analyzer") 17 | (revision [a] "long") 18 | (extensions [a] "[string ...], including '.'") 19 | (schemas [a] "map of revisions to (incremental) schema data") 20 | (analyze [a db f src] "f is file entityid, src is string, returns tx-data")) 21 | 22 | (defn sha 23 | "Returns the hex string of the sha1 of s" 24 | [^String s] 25 | (org.apache.commons.codec.digest.DigestUtils/shaHex s)) 26 | 27 | (defn ws-minify 28 | "Consecutive ws becomes a single space, then trim" 29 | [s] 30 | (let [r (java.io.StringReader. s) 31 | sb (StringBuilder.)] 32 | (loop [c (.read r) skip true] 33 | (when-not (= c -1) 34 | (let [ws (Character/isWhitespace c)] 35 | (when (or (not ws) (not skip)) 36 | (.append sb (if ws " " (char c)))) 37 | (recur (.read r) ws)))) 38 | (-> sb str .trim))) 39 | 40 | (defn loc 41 | "Returns zero-based [line col endline endcol] given one-based 42 | \"line col endline endcol\" string" 43 | [loc-string] 44 | (mapv dec (read-string (str "[" loc-string "]")))) 45 | 46 | (defn line-offsets 47 | "Returns a vector of zero-based offsets of lines. Note the offsets 48 | are where the line would be, the last offset is not necessarily 49 | within the string. i.e. if the last character is a newline, the last 50 | index is the length of the string." 51 | [^String s] 52 | (let [nl (long \newline)] 53 | (persistent! 54 | (loop [ret (transient [0]), i 0] 55 | (if (= i (.length s)) 56 | ret 57 | (recur (if (= (.codePointAt s i) nl) 58 | (conj! ret (inc i)) 59 | ret) 60 | (inc i))))))) 61 | 62 | (defn segment 63 | "Given a string and line offsets, returns text from (zero-based) 64 | line and col to endline/endcol (exclusive)" 65 | [^String s line-offsets line col endline endcol] 66 | (subs s 67 | (+ (nth line-offsets line) col) 68 | (+ (nth line-offsets endline) endcol))) 69 | -------------------------------------------------------------------------------- /src/datomic/codeq/analyzers/clj.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.analyzers.clj 10 | (:require [datomic.api :as d] 11 | [datomic.codeq.util :refer [index->id-fn tempid?]] 12 | [datomic.codeq.analyzer :as az])) 13 | 14 | (defn analyze-1 15 | "returns [tx-data ctx]" 16 | [db f x loc seg ret {:keys [sha->id codename->id added ns] :as ctx}] 17 | (if loc 18 | (let [sha (-> seg az/ws-minify az/sha) 19 | codeid (sha->id sha) 20 | newcodeid (and (tempid? codeid) (not (added codeid))) 21 | ret (cond-> ret newcodeid (conj {:db/id codeid :code/sha sha :code/text seg})) 22 | added (cond-> added newcodeid (conj codeid)) 23 | 24 | codeqid (or (ffirst (d/q '[:find ?e :in $ ?f ?loc 25 | :where [?e :codeq/file ?f] [?e :codeq/loc ?loc]] 26 | db f loc)) 27 | (d/tempid :db.part/user)) 28 | 29 | op (first x) 30 | ns? (= op 'ns) 31 | defing (and ns 32 | (symbol? op) 33 | (.startsWith (name op) "def")) 34 | 35 | naming (let [nsym (second x)] 36 | (cond 37 | ns? (str nsym) 38 | defing (if (namespace nsym) 39 | (str nsym) 40 | (str (symbol (name ns) (name nsym)))))) 41 | 42 | nameid (when naming (codename->id naming)) 43 | 44 | ret (cond-> ret 45 | (tempid? codeqid) 46 | (conj {:db/id codeqid 47 | :codeq/file f 48 | :codeq/loc loc 49 | :codeq/code codeid}) 50 | 51 | ns? 52 | (conj [:db/add codeqid :clj/ns nameid]) 53 | 54 | defing 55 | (conj [:db/add codeqid :clj/def nameid] 56 | [:db/add codeqid :clj/defop (str op)]) 57 | 58 | (tempid? nameid) 59 | (conj [:db/add nameid :code/name naming]))] 60 | [ret (assoc ctx :added added)]) 61 | [ret ctx])) 62 | 63 | (defn analyze 64 | [db f src] 65 | (with-open [r (clojure.lang.LineNumberingPushbackReader. (java.io.StringReader. src))] 66 | (let [loffs (az/line-offsets src) 67 | eof (Object.) 68 | ctx {:sha->id (index->id-fn db :code/sha) 69 | :codename->id (index->id-fn db :code/name) 70 | :added #{}}] 71 | (loop [ret [], ctx ctx, x (read r false eof)] 72 | (if (= eof x) 73 | ret 74 | (let [{:keys [line column]} (meta x) 75 | ctx (if (and (coll? x) (= (first x) 'ns)) 76 | (assoc ctx :ns (second x)) 77 | ctx) 78 | endline (.getLineNumber r) 79 | endcol (.getColumnNumber r) 80 | [loc seg] (when (and line column) 81 | [(str line " " column " " endline " " endcol) 82 | (az/segment src loffs (dec line) (dec column) (dec endline) (dec endcol))]) 83 | [ret ctx] (analyze-1 db f x loc seg ret ctx)] 84 | (recur ret ctx (read r false eof)))))))) 85 | 86 | (defn schemas [] 87 | {1 [{:db/id #db/id[:db.part/db] 88 | :db/ident :clj/ns 89 | :db/valueType :db.type/ref 90 | :db/cardinality :db.cardinality/one 91 | :db/doc "codename of ns defined by expression" 92 | :db.install/_attribute :db.part/db} 93 | {:db/id #db/id[:db.part/db] 94 | :db/ident :clj/def 95 | :db/valueType :db.type/ref 96 | :db/cardinality :db.cardinality/one 97 | :db/doc "codename defined by expression" 98 | :db.install/_attribute :db.part/db}] 99 | 2 [{:db/id #db/id[:db.part/db] 100 | :db/ident :clj/defop 101 | :db/valueType :db.type/string 102 | :db/cardinality :db.cardinality/one 103 | :db/doc "the def form (defn, defmacro etc) used to create this definition" 104 | :db.install/_attribute :db.part/db}]}) 105 | 106 | (deftype CljAnalyzer [] 107 | az/Analyzer 108 | (keyname [a] :clj) 109 | (revision [a] 2) 110 | (extensions [a] [".clj"]) 111 | (schemas [a] (schemas)) 112 | (analyze [a db f src] (analyze db f src))) 113 | 114 | (defn impl [] (CljAnalyzer.)) -------------------------------------------------------------------------------- /src/datomic/codeq/core.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.core 10 | (:require [datomic.api :as d] 11 | [clojure.java.io :as io] 12 | [clojure.set] 13 | [clojure.string :as string] 14 | [datomic.codeq.util :refer [index->id-fn tempid?]] 15 | [datomic.codeq.analyzer :as az] 16 | [datomic.codeq.analyzers.clj]) 17 | (:import java.util.Date) 18 | (:gen-class)) 19 | 20 | (set! *warn-on-reflection* true) 21 | 22 | (def schema 23 | [ 24 | ;;tx attrs 25 | {:db/id #db/id[:db.part/db] 26 | :db/ident :tx/commit 27 | :db/valueType :db.type/ref 28 | :db/cardinality :db.cardinality/one 29 | :db/doc "Associate tx with this git commit" 30 | :db.install/_attribute :db.part/db} 31 | 32 | {:db/id #db/id[:db.part/db] 33 | :db/ident :tx/file 34 | :db/valueType :db.type/ref 35 | :db/cardinality :db.cardinality/one 36 | :db/doc "Associate tx with this git blob" 37 | :db.install/_attribute :db.part/db} 38 | 39 | {:db/id #db/id[:db.part/db] 40 | :db/ident :tx/analyzer 41 | :db/valueType :db.type/keyword 42 | :db/cardinality :db.cardinality/one 43 | :db/index true 44 | :db/doc "Associate tx with this analyzer" 45 | :db.install/_attribute :db.part/db} 46 | 47 | {:db/id #db/id[:db.part/db] 48 | :db/ident :tx/analyzerRev 49 | :db/valueType :db.type/long 50 | :db/cardinality :db.cardinality/one 51 | :db/doc "Associate tx with this analyzer revision" 52 | :db.install/_attribute :db.part/db} 53 | 54 | {:db/id #db/id[:db.part/db] 55 | :db/ident :tx/op 56 | :db/valueType :db.type/keyword 57 | :db/index true 58 | :db/cardinality :db.cardinality/one 59 | :db/doc "Associate tx with this operation - one of :import, :analyze" 60 | :db.install/_attribute :db.part/db} 61 | 62 | ;;git stuff 63 | {:db/id #db/id[:db.part/db] 64 | :db/ident :git/type 65 | :db/valueType :db.type/keyword 66 | :db/cardinality :db.cardinality/one 67 | :db/index true 68 | :db/doc "Type enum for git objects - one of :commit, :tree, :blob, :tag" 69 | :db.install/_attribute :db.part/db} 70 | 71 | {:db/id #db/id[:db.part/db] 72 | :db/ident :git/sha 73 | :db/valueType :db.type/string 74 | :db/cardinality :db.cardinality/one 75 | :db/doc "A git sha, should be in repo" 76 | :db/unique :db.unique/identity 77 | :db.install/_attribute :db.part/db} 78 | 79 | {:db/id #db/id[:db.part/db] 80 | :db/ident :repo/commits 81 | :db/valueType :db.type/ref 82 | :db/cardinality :db.cardinality/many 83 | :db/doc "Associate repo with these git commits" 84 | :db.install/_attribute :db.part/db} 85 | 86 | {:db/id #db/id[:db.part/db] 87 | :db/ident :repo/uri 88 | :db/valueType :db.type/string 89 | :db/cardinality :db.cardinality/one 90 | :db/doc "A git repo uri" 91 | :db/unique :db.unique/identity 92 | :db.install/_attribute :db.part/db} 93 | 94 | {:db/id #db/id[:db.part/db] 95 | :db/ident :commit/parents 96 | :db/valueType :db.type/ref 97 | :db/cardinality :db.cardinality/many 98 | :db/doc "Parents of a commit" 99 | :db.install/_attribute :db.part/db} 100 | 101 | {:db/id #db/id[:db.part/db] 102 | :db/ident :commit/tree 103 | :db/valueType :db.type/ref 104 | :db/cardinality :db.cardinality/one 105 | :db/doc "Root node of a commit" 106 | :db.install/_attribute :db.part/db} 107 | 108 | {:db/id #db/id[:db.part/db] 109 | :db/ident :commit/message 110 | :db/valueType :db.type/string 111 | :db/cardinality :db.cardinality/one 112 | :db/doc "A commit message" 113 | :db/fulltext true 114 | :db.install/_attribute :db.part/db} 115 | 116 | {:db/id #db/id[:db.part/db] 117 | :db/ident :commit/author 118 | :db/valueType :db.type/ref 119 | :db/cardinality :db.cardinality/one 120 | :db/doc "Person who authored a commit" 121 | :db.install/_attribute :db.part/db} 122 | 123 | {:db/id #db/id[:db.part/db] 124 | :db/ident :commit/authoredAt 125 | :db/valueType :db.type/instant 126 | :db/cardinality :db.cardinality/one 127 | :db/doc "Timestamp of authorship of commit" 128 | :db/index true 129 | :db.install/_attribute :db.part/db} 130 | 131 | {:db/id #db/id[:db.part/db] 132 | :db/ident :commit/committer 133 | :db/valueType :db.type/ref 134 | :db/cardinality :db.cardinality/one 135 | :db/doc "Person who committed a commit" 136 | :db.install/_attribute :db.part/db} 137 | 138 | {:db/id #db/id[:db.part/db] 139 | :db/ident :commit/committedAt 140 | :db/valueType :db.type/instant 141 | :db/cardinality :db.cardinality/one 142 | :db/doc "Timestamp of commit" 143 | :db/index true 144 | :db.install/_attribute :db.part/db} 145 | 146 | {:db/id #db/id[:db.part/db] 147 | :db/ident :tree/nodes 148 | :db/valueType :db.type/ref 149 | :db/cardinality :db.cardinality/many 150 | :db/doc "Nodes of a git tree" 151 | :db.install/_attribute :db.part/db} 152 | 153 | {:db/id #db/id[:db.part/db] 154 | :db/ident :node/filename 155 | :db/valueType :db.type/ref 156 | :db/cardinality :db.cardinality/one 157 | :db/doc "filename of a tree node" 158 | :db.install/_attribute :db.part/db} 159 | 160 | {:db/id #db/id[:db.part/db] 161 | :db/ident :node/paths 162 | :db/valueType :db.type/ref 163 | :db/cardinality :db.cardinality/many 164 | :db/doc "paths of a tree node" 165 | :db.install/_attribute :db.part/db} 166 | 167 | {:db/id #db/id[:db.part/db] 168 | :db/ident :node/object 169 | :db/valueType :db.type/ref 170 | :db/cardinality :db.cardinality/one 171 | :db/doc "Git object (tree/blob) in a tree node" 172 | :db.install/_attribute :db.part/db} 173 | 174 | {:db/id #db/id[:db.part/db] 175 | :db/ident :git/prior 176 | :db/valueType :db.type/ref 177 | :db/cardinality :db.cardinality/one 178 | :db/doc "Node containing prior value of a git object" 179 | :db.install/_attribute :db.part/db} 180 | 181 | {:db/id #db/id[:db.part/db] 182 | :db/ident :email/address 183 | :db/valueType :db.type/string 184 | :db/cardinality :db.cardinality/one 185 | :db/doc "An email address" 186 | :db/unique :db.unique/identity 187 | :db.install/_attribute :db.part/db} 188 | 189 | {:db/id #db/id[:db.part/db] 190 | :db/ident :file/name 191 | :db/valueType :db.type/string 192 | :db/cardinality :db.cardinality/one 193 | :db/doc "A filename" 194 | :db/fulltext true 195 | :db/unique :db.unique/identity 196 | :db.install/_attribute :db.part/db} 197 | 198 | ;;codeq stuff 199 | {:db/id #db/id[:db.part/db] 200 | :db/ident :codeq/file 201 | :db/valueType :db.type/ref 202 | :db/cardinality :db.cardinality/one 203 | :db/doc "Git file containing codeq" 204 | :db.install/_attribute :db.part/db} 205 | 206 | {:db/id #db/id[:db.part/db] 207 | :db/ident :codeq/loc 208 | :db/valueType :db.type/string 209 | :db/cardinality :db.cardinality/one 210 | :db/doc "Location of codeq in file. A location string in format \"line col endline endcol\", one-based" 211 | :db.install/_attribute :db.part/db} 212 | 213 | {:db/id #db/id[:db.part/db] 214 | :db/ident :codeq/parent 215 | :db/valueType :db.type/ref 216 | :db/cardinality :db.cardinality/one 217 | :db/doc "Parent (containing) codeq of codeq (if one)" 218 | :db.install/_attribute :db.part/db} 219 | 220 | {:db/id #db/id[:db.part/db] 221 | :db/ident :codeq/code 222 | :db/valueType :db.type/ref 223 | :db/cardinality :db.cardinality/one 224 | :db/doc "Code entity of codeq" 225 | :db.install/_attribute :db.part/db} 226 | 227 | {:db/id #db/id[:db.part/db] 228 | :db/ident :code/sha 229 | :db/valueType :db.type/string 230 | :db/cardinality :db.cardinality/one 231 | :db/doc "SHA of whitespace-minified code segment text: consecutive ws becomes a single space, then trim. ws-sensitive langs don't minify." 232 | :db/unique :db.unique/identity 233 | :db.install/_attribute :db.part/db} 234 | 235 | {:db/id #db/id[:db.part/db] 236 | :db/ident :code/text 237 | :db/valueType :db.type/string 238 | :db/cardinality :db.cardinality/one 239 | :db/doc "The source code for a code segment" 240 | ;;:db/fulltext true 241 | :db.install/_attribute :db.part/db} 242 | 243 | {:db/id #db/id[:db.part/db] 244 | :db/ident :code/name 245 | :db/valueType :db.type/string 246 | :db/cardinality :db.cardinality/one 247 | :db/doc "A globally-namespaced programming language identifier" 248 | :db/fulltext true 249 | :db/unique :db.unique/identity 250 | :db.install/_attribute :db.part/db} 251 | ]) 252 | 253 | (defn ^java.io.Reader exec-stream 254 | [^String cmd] 255 | (-> (Runtime/getRuntime) 256 | (.exec cmd) 257 | .getInputStream 258 | io/reader)) 259 | 260 | (defn ensure-schema [conn] 261 | (or (-> conn d/db (d/entid :tx/commit)) 262 | @(d/transact conn schema))) 263 | 264 | ;;example commit - git cat-file -p 265 | ;;tree d81cd432f2050c84a3d742caa35ccb8298d51e9d 266 | ;;author Rich Hickey 1348842448 -0400 267 | ;;committer Rich Hickey 1348842448 -0400 268 | 269 | ;; or 270 | 271 | ;;tree ba63180c1d120b469b275aef5da479ab6c3e2afd 272 | ;;parent c3bd979cfe65da35253b25cb62aad4271430405c 273 | ;;maybe more parents 274 | ;;author Rich Hickey 1348869325 -0400 275 | ;;committer Rich Hickey 1348869325 -0400 276 | ;;then blank line 277 | ;;then commit message 278 | 279 | 280 | ;;example tree 281 | ;;100644 blob ee508f768d92ba23e66c4badedd46aa216963ee1 .gitignore 282 | ;;100644 blob b60ea231eb47eb98395237df17550dee9b38fb72 README.md 283 | ;;040000 tree bcfca612efa4ff65b3eb07f6889ebf73afb0e288 doc 284 | ;;100644 blob 813c07d8cd27226ddd146ddd1d27fdbde10071eb epl-v10.html 285 | ;;100644 blob f8b5a769bcc74ee35b9a8becbbe49d4904ab8abe project.clj 286 | ;;040000 tree 6b880666740300ac57361d5aee1a90488ba1305c src 287 | ;;040000 tree 407924e4812c72c880b011b5a1e0b9cb4eb68cfa test 288 | 289 | ;; example git remote origin 290 | ;;RichMacPro:codeq rich$ git remote show -n origin 291 | ;;* remote origin 292 | ;; Fetch URL: https://github.com/Datomic/codeq.git 293 | ;; Push URL: https://github.com/Datomic/codeq.git 294 | ;; HEAD branch: (not queried) 295 | 296 | (defn get-repo-uri 297 | "returns [uri name]" 298 | [] 299 | (with-open [s (exec-stream (str "git remote show -n origin"))] 300 | (let [es (line-seq s) 301 | ^String line (second es) 302 | uri (subs line (inc (.lastIndexOf line " "))) 303 | noff (.lastIndexOf uri "/") 304 | noff (if (not (pos? noff)) (.lastIndexOf uri ":") noff) 305 | name (subs uri (inc noff)) 306 | _ (assert (pos? (count name)) "Can't find remote origin") 307 | name (if (.endsWith name ".git") (subs name 0 (.indexOf name ".")) name)] 308 | [uri name]))) 309 | 310 | (defn dir 311 | "Returns [[sha :type filename] ...]" 312 | [tree] 313 | (with-open [s (exec-stream (str "git cat-file -p " tree))] 314 | (let [es (line-seq s)] 315 | (mapv #(let [ss (string/split ^String % #"\s")] 316 | [(nth ss 2) 317 | (keyword (nth ss 1)) 318 | (subs % (inc (.indexOf ^String % "\t")) (count %))]) 319 | es)))) 320 | 321 | (defn commit 322 | [[sha _]] 323 | (let [trim-email (fn [s] (subs s 1 (dec (count s)))) 324 | dt (fn [ds] (Date. (* 1000 (Integer/parseInt ds)))) 325 | [tree parents author committer msg] 326 | (with-open [s (exec-stream (str "git cat-file -p " sha))] 327 | (let [lines (line-seq s) 328 | slines (mapv #(string/split % #"\s") lines) 329 | tree (-> slines (nth 0) (nth 1)) 330 | [plines xs] (split-with #(= (nth % 0) "parent") (rest slines))] 331 | [tree 332 | (seq (map second plines)) 333 | (vec (reverse (first xs))) 334 | (vec (reverse (second xs))) 335 | (->> lines 336 | (drop-while #(not= % "")) 337 | rest 338 | (interpose "\n") 339 | (apply str))]))] 340 | {:sha sha 341 | :msg msg 342 | :tree tree 343 | :parents parents 344 | :author (trim-email (author 2)) 345 | :authored (dt (author 1)) 346 | :committer (trim-email (committer 2)) 347 | :committed (dt (committer 1))})) 348 | 349 | 350 | 351 | (defn commit-tx-data 352 | [db repo repo-name {:keys [sha msg tree parents author authored committer committed] :as commit}] 353 | (let [tempid? map? ;;todo - better pred 354 | sha->id (index->id-fn db :git/sha) 355 | email->id (index->id-fn db :email/address) 356 | filename->id (index->id-fn db :file/name) 357 | authorid (email->id author) 358 | committerid (email->id committer) 359 | cid (d/tempid :db.part/user) 360 | tx-data (fn f [inpath [sha type filename]] 361 | (let [path (str inpath filename) 362 | id (sha->id sha) 363 | filenameid (filename->id filename) 364 | pathid (filename->id path) 365 | nodeid (or (and (not (tempid? id)) 366 | (not (tempid? filenameid)) 367 | (ffirst (d/q '[:find ?e :in $ ?filename ?id 368 | :where [?e :node/filename ?filename] [?e :node/object ?id]] 369 | db filenameid id))) 370 | (d/tempid :db.part/user)) 371 | newpath (or (tempid? pathid) (tempid? nodeid) 372 | (not (ffirst (d/q '[:find ?node :in $ ?path 373 | :where [?node :node/paths ?path]] 374 | db pathid)))) 375 | data (cond-> [] 376 | (tempid? filenameid) (conj [:db/add filenameid :file/name filename]) 377 | (tempid? pathid) (conj [:db/add pathid :file/name path]) 378 | (tempid? nodeid) (conj {:db/id nodeid :node/filename filenameid :node/object id}) 379 | newpath (conj [:db/add nodeid :node/paths pathid]) 380 | (tempid? id) (conj {:db/id id :git/sha sha :git/type type})) 381 | data (if (and newpath (= type :tree)) 382 | (let [es (dir sha)] 383 | (reduce (fn [data child] 384 | (let [[cid cdata] (f (str path "/") child) 385 | data (into data cdata)] 386 | (cond-> data 387 | (tempid? id) (conj [:db/add id :tree/nodes cid])))) 388 | data es)) 389 | data)] 390 | [nodeid data])) 391 | [treeid treedata] (tx-data nil [tree :tree repo-name]) 392 | tx (into treedata 393 | [[:db/add repo :repo/commits cid] 394 | {:db/id (d/tempid :db.part/tx) 395 | :tx/commit cid 396 | :tx/op :import} 397 | (cond-> {:db/id cid 398 | :git/type :commit 399 | :commit/tree treeid 400 | :git/sha sha 401 | :commit/author authorid 402 | :commit/authoredAt authored 403 | :commit/committer committerid 404 | :commit/committedAt committed 405 | } 406 | msg (assoc :commit/message msg) 407 | parents (assoc :commit/parents 408 | (mapv (fn [p] 409 | (let [id (sha->id p)] 410 | (assert (not (tempid? id)) 411 | (str "Parent " p " not previously imported")) 412 | id)) 413 | parents)))]) 414 | tx (cond-> tx 415 | (tempid? authorid) 416 | (conj [:db/add authorid :email/address author]) 417 | 418 | (and (not= committer author) (tempid? committerid)) 419 | (conj [:db/add committerid :email/address committer]))] 420 | tx)) 421 | 422 | (defn commits 423 | "Returns log as [[sha msg] ...], in commit order. commit-name may be nil 424 | or any acceptable commit name arg for git log" 425 | [commit-name] 426 | (let [commits (with-open [s (exec-stream (str "git log --pretty=oneline --date-order --reverse " commit-name))] 427 | (mapv 428 | #(vector (subs % 0 40) 429 | (subs % 41 (count %))) 430 | (line-seq s)))] 431 | commits)) 432 | 433 | (defn unimported-commits 434 | [db commit-name] 435 | (let [imported (into {} 436 | (d/q '[:find ?sha ?e 437 | :where 438 | [?tx :tx/op :import] 439 | [?tx :tx/commit ?e] 440 | [?e :git/sha ?sha]] 441 | db))] 442 | (pmap commit (remove (fn [[sha _]] (imported sha)) (commits commit-name))))) 443 | 444 | 445 | (defn ensure-db [db-uri] 446 | (let [newdb? (d/create-database db-uri) 447 | conn (d/connect db-uri)] 448 | (ensure-schema conn) 449 | conn)) 450 | 451 | (defn import-git 452 | [conn repo-uri repo-name commits] 453 | ;;todo - add already existing commits to new repo if it includes them 454 | (println "Importing repo:" repo-uri "as:" repo-name) 455 | (let [db (d/db conn) 456 | repo 457 | (or (ffirst (d/q '[:find ?e :in $ ?uri :where [?e :repo/uri ?uri]] db repo-uri)) 458 | (let [temp (d/tempid :db.part/user) 459 | tx-ret @(d/transact conn [[:db/add temp :repo/uri repo-uri]]) 460 | repo (d/resolve-tempid (d/db conn) (:tempids tx-ret) temp)] 461 | (println "Adding repo" repo-uri) 462 | repo))] 463 | (doseq [commit commits] 464 | (let [db (d/db conn)] 465 | (println "Importing commit:" (:sha commit)) 466 | (d/transact conn (commit-tx-data db repo repo-name commit)))) 467 | (d/request-index conn) 468 | (println "Import complete!"))) 469 | 470 | (def analyzers [(datomic.codeq.analyzers.clj/impl)]) 471 | 472 | (defn run-analyzers 473 | [conn] 474 | (println "Analyzing...") 475 | (doseq [a analyzers] 476 | (let [aname (az/keyname a) 477 | exts (az/extensions a) 478 | srevs (set (map first (d/q '[:find ?rev :in $ ?a :where 479 | [?tx :tx/op :schema] 480 | [?tx :tx/analyzer ?a] 481 | [?tx :tx/analyzerRev ?rev]] 482 | (d/db conn) aname)))] 483 | (println "Running analyzer:" aname "on" exts) 484 | ;;install schema(s) if not yet present 485 | (doseq [[rev aschema] (az/schemas a)] 486 | (when-not (srevs rev) 487 | (d/transact conn 488 | (conj aschema {:db/id (d/tempid :db.part/tx) 489 | :tx/op :schema 490 | :tx/analyzer aname 491 | :tx/analyzerRev rev})))) 492 | (let [db (d/db conn) 493 | arev (az/revision a) 494 | ;;candidate files 495 | cfiles (set (map first (d/q '[:find ?f :in $ [?ext ...] :where 496 | [?fn :file/name ?n] 497 | [(.endsWith ^String ?n ?ext)] 498 | [?node :node/filename ?fn] 499 | [?node :node/object ?f]] 500 | db exts))) 501 | ;;already analyzed files 502 | afiles (set (map first (d/q '[:find ?f :in $ ?a ?rev :where 503 | [?tx :tx/op :analyze] 504 | [?tx :tx/analyzer ?a] 505 | [?tx :tx/analyzerRev ?rev] 506 | [?tx :tx/file ?f]] 507 | db aname arev)))] 508 | ;;find files not yet analyzed 509 | (doseq [f (sort (clojure.set/difference cfiles afiles))] 510 | ;;analyze them 511 | (println "analyzing file:" f " - sha: " (:git/sha (d/entity db f))) 512 | (let [db (d/db conn) 513 | src (with-open [s (exec-stream (str "git cat-file -p " (:git/sha (d/entity db f))))] 514 | (slurp s)) 515 | adata (try 516 | (az/analyze a db f src) 517 | (catch Exception ex 518 | (println (.getMessage ex)) 519 | []))] 520 | (d/transact conn 521 | (conj adata {:db/id (d/tempid :db.part/tx) 522 | :tx/op :analyze 523 | :tx/file f 524 | :tx/analyzer aname 525 | :tx/analyzerRev arev}))))))) 526 | (println "Analysis complete!")) 527 | 528 | (defn main [& [db-uri commit]] 529 | (if db-uri 530 | (let [conn (ensure-db db-uri) 531 | [repo-uri repo-name] (get-repo-uri)] 532 | ;;(prn repo-uri) 533 | (import-git conn repo-uri repo-name (unimported-commits (d/db conn) commit)) 534 | (run-analyzers conn)) 535 | (println "Usage: datomic.codeq.core db-uri [commit-name]"))) 536 | 537 | (defn -main 538 | [& args] 539 | (apply main args) 540 | (shutdown-agents) 541 | (System/exit 0)) 542 | 543 | 544 | (comment 545 | (def uri "datomic:mem://git") 546 | ;;(def uri "datomic:free://localhost:4334/git") 547 | (datomic.codeq.core/main uri "c3bd979cfe65da35253b25cb62aad4271430405c") 548 | (datomic.codeq.core/main uri "20f8db11804afc8c5a1752257d5fdfcc2d131d08") 549 | (datomic.codeq.core/main uri) 550 | (require '[datomic.api :as d]) 551 | (def conn (d/connect uri)) 552 | (def db (d/db conn)) 553 | (seq (d/datoms db :aevt :file/name)) 554 | (seq (d/datoms db :aevt :commit/message)) 555 | (seq (d/datoms db :aevt :tx/file)) 556 | (count (seq (d/datoms db :aevt :code/sha))) 557 | (take 20 (seq (d/datoms db :aevt :code/text))) 558 | (seq (d/datoms db :aevt :code/name)) 559 | (count (seq (d/datoms db :aevt :codeq/code))) 560 | (d/q '[:find ?e :where [?f :file/name "core.clj"] [?n :node/filename ?f] [?n :node/object ?e]] db) 561 | (d/q '[:find ?m :where [_ :commit/message ?m] [(.contains ^String ?m "\n")]] db) 562 | (d/q '[:find ?m :where [_ :code/text ?m] [(.contains ^String ?m "(ns ")]] db) 563 | (sort (d/q '[:find ?var ?def :where [?cn :code/name ?var] [?cq :clj/def ?cn] [?cq :codeq/code ?def]] db)) 564 | (sort (d/q '[:find ?var ?def :where [?cn :code/name ?var] [?cq :clj/ns ?cn] [?cq :codeq/code ?def]] db)) 565 | (sort (d/q '[:find ?var ?def ?n :where 566 | [?cn :code/name ?var] 567 | [?cq :clj/ns ?cn] 568 | [?cq :codeq/file ?f] 569 | [?n :node/object ?f] 570 | [?cq :codeq/code ?def]] db)) 571 | (def x "(doseq [f (clojure.set/difference cfiles afiles)] 572 | ;;analyze them 573 | (println \"analyzing file:\" f) 574 | (let [db (d/db conn) 575 | s (with-open [s (exec-stream (str \"git cat-file -p \" (:git/sha (d/entity db f))))] 576 | (slurp s)) 577 | adata (az/analyze a db s)] 578 | (d/transact conn 579 | (conj adata {:db/id (d/tempid :db.part/tx) 580 | :tx/op :analyze 581 | :codeq/file f 582 | :tx/analyzer aname 583 | :tx/analyzerRev arev}))))") 584 | ) 585 | -------------------------------------------------------------------------------- /src/datomic/codeq/util.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.util 10 | (:require [datomic.api :as d])) 11 | 12 | (set! *warn-on-reflection* true) 13 | 14 | (defn index-get-id 15 | [db attr v] 16 | (let [d (first (d/index-range db attr v nil))] 17 | (when (and d (= (:v d) v)) 18 | (:e d)))) 19 | 20 | (defn index->id-fn 21 | [db attr] 22 | (memoize 23 | (fn [x] 24 | (or (index-get-id db attr x) 25 | (d/tempid :db.part/user))))) 26 | 27 | (def tempid? map?) 28 | -------------------------------------------------------------------------------- /test/datomic/codeq/core_test.clj: -------------------------------------------------------------------------------- 1 | (ns datomic.codeq.core-test 2 | (:use clojure.test 3 | datomic.codeq.core)) 4 | 5 | (deftest a-test 6 | (testing "FIXME, I fail." 7 | (is (= 1 1)))) --------------------------------------------------------------------------------