├── .gitignore ├── README.md ├── doc ├── codeq.org └── intro.md ├── epl-v10.html ├── examples ├── clojure-and-contrib └── src │ └── datomic │ └── codeq │ └── examples │ └── clojure_and_contrib.clj ├── project.clj ├── src └── datomic │ └── codeq │ ├── analyzer.clj │ ├── analyzers │ └── clj.clj │ ├── core.clj │ └── util.clj └── test └── datomic └── codeq └── core_test.clj /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /lib 3 | /classes 4 | /checkouts 5 | /tmp 6 | pom.xml 7 | *.jar 8 | *.class 9 | .lein-deps-sum 10 | .lein-failures 11 | .lein-plugins 12 | *.tar 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # codeq 2 | 3 | **codeq** ('co-deck') is Clojure+Datomic application designed to do code-aware imports of your git repos into a [Datomic](http://datomic.com) db 4 | 5 | ## Usage 6 | 7 | Clone the **codeq** repo. Then (in it) run: 8 | 9 | lein uberjar 10 | 11 | Get [Datomic Free](http://www.datomic.com/get-datomic.html) 12 | 13 | Unzip it, then start the Datomic Free transactor. Follow the instructions for [running the transactor with the free storage protocol](http://docs.datomic.com/getting-started.html) 14 | 15 | cd theGitRepoYouWantToImport 16 | 17 | java -server -Xmx1g -jar whereverYouPutCodeq/target/codeq-0.1.0-SNAPSHOT-standalone.jar datomic:free://localhost:4334/git 18 | 19 | This will create a db called `git` (you can call it whatever you like) and import the commits from the local view of the repo. You should see output like: 20 | 21 | Importing repo: git@github.com:clojure/clojure.git as: clojure 22 | Adding repo git@github.com:clojure/clojure.git 23 | Importing commit: e54a1ff1ac0d02560e80aad460e77ac353efad49 24 | Importing commit: 894a0c81075b8f4b64b7f890ab0c8522a7a9986a 25 | ... 26 | Importing commit: c1884eaca8ffb7aff2c3d393a9d5fa3306cf3f33 27 | Importing commit: 01b4cb7156f0b378e70020d0abe293bffe35b031 28 | Importing commit: 6bbfd943766e11e52a3fe21b177d55536892d132 29 | Import complete! 30 | 31 | Analyzing... 32 | Running analyzer: :clj on [.clj] 33 | analyzing file: 17592186045504 34 | analyzing file: 17592186045496 35 | Analysis complete! 36 | 37 | The import is not too peppy, since it shells to `git` relentlessly, but it imports e.g. Clojure's entire commit history in about 10 minutes, plus analysis. 38 | 39 | You can import more than one repo into the same db. You can re-import later after some more commits and they will be incrementally added. 40 | 41 | You can then (or during) connect to the same db URI with a peer. Or, just start the [Datomic REST service](http://docs.datomic.com/rest.html) and poke around: 42 | 43 | cd whereverYouPutDatomicFree 44 | bin/rest -p 8080 free datomic:free://localhost:4334/ 45 | 46 | Browse to [localhost:8080/data/](http://localhost:8080/data/). You should see the `free` storage and the `git` db within it. 47 | 48 | The [schema diagram](https://github.com/downloads/Datomic/codeq/codeq.pdf) will help you get oriented. 49 | 50 | ## More info 51 | 52 | See the [intro blog post](http://blog.datomic.com/2012/10/codeq.html) and the [wiki](https://github.com/Datomic/codeq/wiki) 53 | 54 | ## License 55 | 56 | Copyright © 2012 Metadata Partners, LLC and Contributors. All rights reserved. 57 | 58 | Distributed under the Eclipse Public License, the same as Clojure. 59 | -------------------------------------------------------------------------------- /doc/codeq.org: -------------------------------------------------------------------------------- 1 | Codeq 2 | * Objective 3 | ** Lots of interesting information exists in the source history of a project 4 | ** Get it into a db so it's queryable 5 | *** must include time (use Datomic) 6 | ** Move from source file orientation to language-level definition orientation 7 | * Premise 8 | ** Presumptions 9 | *** Using a VCS 10 | **** with single total order available 11 | *** Using a lang with globally unique namespaced definitions 12 | **** e.g. Clojure, Java etc 13 | ** turn VCS commits into Datomic transactions 14 | *** with 1:1 monotonicity - must import in order 15 | *** Commit properties become tx attrs 16 | *** Analyze affected sources 17 | **** grab docs and other metadata 18 | **** track definitions (e.g. in Clojure, defns) 19 | ***** def is namespaced name associated with source (data or code) 20 | ****** different source is different def for same name 21 | **** other analysis-derived info 22 | ***** call sites (use of other fns) 23 | ***** use of lang constructs (EH, mutation etc)? 24 | * VCS 25 | ** need monotonic log 26 | ** ability to list affected files 27 | ** ability to get file contents 28 | *** a la carte blob/file vs pulling revision? 29 | ** Git, defactor standard, has good temporal aspect, blobs 30 | *** but not moreso than mercurial etc, so don't close doors 31 | * Datomic 32 | ** code entities 33 | *** named by namespaced names 34 | **** name might not be enough 35 | ***** e.g. if overloads have location-distinct defs 36 | ***** what type is name then? 37 | ****** or name is not unique 38 | ******* that's what overloading means 39 | *** point to definition 40 | **** name -> def 41 | **** name -> overloads -> defs 42 | ** definition 43 | *** identified by source equality 44 | **** preferably code-data or AST, vs strings 45 | **** for Clojure, must be metadata-sensitive compare 46 | *** occurs at location in file 47 | **** over time, same def might move around within/across files 48 | *** source-derived metadata 49 | **** docs 50 | **** arglist(s) 51 | **** langs with overloads might consider separate defs 52 | ***** e.g. may have separate docs 53 | **** also have calls per arity/sig 54 | ***** could handle locations separately, even though nested 55 | *** non fn/method defs 56 | **** e.g. classes, types etc 57 | *** def is totality of source associated with name 58 | **** can't be if not file-contiguous 59 | ***** else name+sig? 60 | *** analysis-derived info 61 | **** callsites 62 | ***** vars called 63 | ****** or methods? 64 | ******* if statically determinable 65 | * Issues 66 | ** Multi-language support? 67 | *** multiple langs good, same schema not so good 68 | **** don't try to normalize across langs 69 | ** Clojure lang support 70 | *** multimethods 71 | **** impls have separate locations, calls etc 72 | **** but same names 73 | *** protocols 74 | **** independent fn defs 75 | **** defs in deftype/record 76 | *** deftype/record 77 | -------------------------------------------------------------------------------- /doc/intro.md: -------------------------------------------------------------------------------- 1 | # Introduction to codeq 2 | 3 | 4 | -------------------------------------------------------------------------------- /epl-v10.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 |
6 | 7 |Eclipse Public License - v 1.0
31 | 32 |THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE 33 | PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR 34 | DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS 35 | AGREEMENT.
36 | 37 |1. DEFINITIONS
38 | 39 |"Contribution" means:
40 | 41 |a) in the case of the initial Contributor, the initial 42 | code and documentation distributed under this Agreement, and
43 |b) in the case of each subsequent Contributor:
44 |i) changes to the Program, and
45 |ii) additions to the Program;
46 |where such changes and/or additions to the Program 47 | originate from and are distributed by that particular Contributor. A 48 | Contribution 'originates' from a Contributor if it was added to the 49 | Program by such Contributor itself or anyone acting on such 50 | Contributor's behalf. Contributions do not include additions to the 51 | Program which: (i) are separate modules of software distributed in 52 | conjunction with the Program under their own license agreement, and (ii) 53 | are not derivative works of the Program.
54 | 55 |"Contributor" means any person or entity that distributes 56 | the Program.
57 | 58 |"Licensed Patents" mean patent claims licensable by a 59 | Contributor which are necessarily infringed by the use or sale of its 60 | Contribution alone or when combined with the Program.
61 | 62 |"Program" means the Contributions distributed in accordance 63 | with this Agreement.
64 | 65 |"Recipient" means anyone who receives the Program under 66 | this Agreement, including all Contributors.
67 | 68 |2. GRANT OF RIGHTS
69 | 70 |a) Subject to the terms of this Agreement, each 71 | Contributor hereby grants Recipient a non-exclusive, worldwide, 72 | royalty-free copyright license to reproduce, prepare derivative works 73 | of, publicly display, publicly perform, distribute and sublicense the 74 | Contribution of such Contributor, if any, and such derivative works, in 75 | source code and object code form.
76 | 77 |b) Subject to the terms of this Agreement, each 78 | Contributor hereby grants Recipient a non-exclusive, worldwide, 79 | royalty-free patent license under Licensed Patents to make, use, sell, 80 | offer to sell, import and otherwise transfer the Contribution of such 81 | Contributor, if any, in source code and object code form. This patent 82 | license shall apply to the combination of the Contribution and the 83 | Program if, at the time the Contribution is added by the Contributor, 84 | such addition of the Contribution causes such combination to be covered 85 | by the Licensed Patents. The patent license shall not apply to any other 86 | combinations which include the Contribution. No hardware per se is 87 | licensed hereunder.
88 | 89 |c) Recipient understands that although each Contributor 90 | grants the licenses to its Contributions set forth herein, no assurances 91 | are provided by any Contributor that the Program does not infringe the 92 | patent or other intellectual property rights of any other entity. Each 93 | Contributor disclaims any liability to Recipient for claims brought by 94 | any other entity based on infringement of intellectual property rights 95 | or otherwise. As a condition to exercising the rights and licenses 96 | granted hereunder, each Recipient hereby assumes sole responsibility to 97 | secure any other intellectual property rights needed, if any. For 98 | example, if a third party patent license is required to allow Recipient 99 | to distribute the Program, it is Recipient's responsibility to acquire 100 | that license before distributing the Program.
101 | 102 |d) Each Contributor represents that to its knowledge it 103 | has sufficient copyright rights in its Contribution, if any, to grant 104 | the copyright license set forth in this Agreement.
105 | 106 |3. REQUIREMENTS
107 | 108 |A Contributor may choose to distribute the Program in object code 109 | form under its own license agreement, provided that:
110 | 111 |a) it complies with the terms and conditions of this 112 | Agreement; and
113 | 114 |b) its license agreement:
115 | 116 |i) effectively disclaims on behalf of all Contributors 117 | all warranties and conditions, express and implied, including warranties 118 | or conditions of title and non-infringement, and implied warranties or 119 | conditions of merchantability and fitness for a particular purpose;
120 | 121 |ii) effectively excludes on behalf of all Contributors 122 | all liability for damages, including direct, indirect, special, 123 | incidental and consequential damages, such as lost profits;
124 | 125 |iii) states that any provisions which differ from this 126 | Agreement are offered by that Contributor alone and not by any other 127 | party; and
128 | 129 |iv) states that source code for the Program is available 130 | from such Contributor, and informs licensees how to obtain it in a 131 | reasonable manner on or through a medium customarily used for software 132 | exchange.
133 | 134 |When the Program is made available in source code form:
135 | 136 |a) it must be made available under this Agreement; and
137 | 138 |b) a copy of this Agreement must be included with each 139 | copy of the Program.
140 | 141 |Contributors may not remove or alter any copyright notices contained 142 | within the Program.
143 | 144 |Each Contributor must identify itself as the originator of its 145 | Contribution, if any, in a manner that reasonably allows subsequent 146 | Recipients to identify the originator of the Contribution.
147 | 148 |4. COMMERCIAL DISTRIBUTION
149 | 150 |Commercial distributors of software may accept certain 151 | responsibilities with respect to end users, business partners and the 152 | like. While this license is intended to facilitate the commercial use of 153 | the Program, the Contributor who includes the Program in a commercial 154 | product offering should do so in a manner which does not create 155 | potential liability for other Contributors. Therefore, if a Contributor 156 | includes the Program in a commercial product offering, such Contributor 157 | ("Commercial Contributor") hereby agrees to defend and 158 | indemnify every other Contributor ("Indemnified Contributor") 159 | against any losses, damages and costs (collectively "Losses") 160 | arising from claims, lawsuits and other legal actions brought by a third 161 | party against the Indemnified Contributor to the extent caused by the 162 | acts or omissions of such Commercial Contributor in connection with its 163 | distribution of the Program in a commercial product offering. The 164 | obligations in this section do not apply to any claims or Losses 165 | relating to any actual or alleged intellectual property infringement. In 166 | order to qualify, an Indemnified Contributor must: a) promptly notify 167 | the Commercial Contributor in writing of such claim, and b) allow the 168 | Commercial Contributor to control, and cooperate with the Commercial 169 | Contributor in, the defense and any related settlement negotiations. The 170 | Indemnified Contributor may participate in any such claim at its own 171 | expense.
172 | 173 |For example, a Contributor might include the Program in a commercial 174 | product offering, Product X. That Contributor is then a Commercial 175 | Contributor. If that Commercial Contributor then makes performance 176 | claims, or offers warranties related to Product X, those performance 177 | claims and warranties are such Commercial Contributor's responsibility 178 | alone. Under this section, the Commercial Contributor would have to 179 | defend claims against the other Contributors related to those 180 | performance claims and warranties, and if a court requires any other 181 | Contributor to pay any damages as a result, the Commercial Contributor 182 | must pay those damages.
183 | 184 |5. NO WARRANTY
185 | 186 |EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS 187 | PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 188 | OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, 189 | ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY 190 | OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely 191 | responsible for determining the appropriateness of using and 192 | distributing the Program and assumes all risks associated with its 193 | exercise of rights under this Agreement , including but not limited to 194 | the risks and costs of program errors, compliance with applicable laws, 195 | damage to or loss of data, programs or equipment, and unavailability or 196 | interruption of operations.
197 | 198 |6. DISCLAIMER OF LIABILITY
199 | 200 |EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT 201 | NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, 202 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING 203 | WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF 204 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 205 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR 206 | DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED 207 | HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
208 | 209 |7. GENERAL
210 | 211 |If any provision of this Agreement is invalid or unenforceable under 212 | applicable law, it shall not affect the validity or enforceability of 213 | the remainder of the terms of this Agreement, and without further action 214 | by the parties hereto, such provision shall be reformed to the minimum 215 | extent necessary to make such provision valid and enforceable.
216 | 217 |If Recipient institutes patent litigation against any entity 218 | (including a cross-claim or counterclaim in a lawsuit) alleging that the 219 | Program itself (excluding combinations of the Program with other 220 | software or hardware) infringes such Recipient's patent(s), then such 221 | Recipient's rights granted under Section 2(b) shall terminate as of the 222 | date such litigation is filed.
223 | 224 |All Recipient's rights under this Agreement shall terminate if it 225 | fails to comply with any of the material terms or conditions of this 226 | Agreement and does not cure such failure in a reasonable period of time 227 | after becoming aware of such noncompliance. If all Recipient's rights 228 | under this Agreement terminate, Recipient agrees to cease use and 229 | distribution of the Program as soon as reasonably practicable. However, 230 | Recipient's obligations under this Agreement and any licenses granted by 231 | Recipient relating to the Program shall continue and survive.
232 | 233 |Everyone is permitted to copy and distribute copies of this 234 | Agreement, but in order to avoid inconsistency the Agreement is 235 | copyrighted and may only be modified in the following manner. The 236 | Agreement Steward reserves the right to publish new versions (including 237 | revisions) of this Agreement from time to time. No one other than the 238 | Agreement Steward has the right to modify this Agreement. The Eclipse 239 | Foundation is the initial Agreement Steward. The Eclipse Foundation may 240 | assign the responsibility to serve as the Agreement Steward to a 241 | suitable separate entity. Each new version of the Agreement will be 242 | given a distinguishing version number. The Program (including 243 | Contributions) may always be distributed subject to the version of the 244 | Agreement under which it was received. In addition, after a new version 245 | of the Agreement is published, Contributor may elect to distribute the 246 | Program (including its Contributions) under the new version. Except as 247 | expressly stated in Sections 2(a) and 2(b) above, Recipient receives no 248 | rights or licenses to the intellectual property of any Contributor under 249 | this Agreement, whether expressly, by implication, estoppel or 250 | otherwise. All rights in the Program not expressly granted under this 251 | Agreement are reserved.
252 | 253 |This Agreement is governed by the laws of the State of New York and 254 | the intellectual property laws of the United States of America. No party 255 | to this Agreement will bring a legal action under this Agreement more 256 | than one year after the cause of action arose. Each party waives its 257 | rights to a jury trial in any resulting litigation.
258 | 259 | 260 | 261 | 262 | -------------------------------------------------------------------------------- /examples/clojure-and-contrib: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | cd `dirname $0`/.. 4 | CODEQ_VERSION="0.1.0-SNAPSHOT" 5 | CODEQ_ROOT=`pwd` 6 | 7 | DATOMIC_VERSION="0.8.3784" 8 | DATOMIC_FILE="datomic-free-$DATOMIC_VERSION" 9 | DATOMIC_URL=http://downloads.datomic.com/$DATOMIC_VERSION/$DATOMIC_FILE.zip 10 | 11 | BACKUP_FILE="clojure-and-contrib" 12 | BACKUP_URL=http://codeq.s3.amazonaws.com/examples/$BACKUP_FILE.zip 13 | 14 | DB_URI="datomic:free://localhost:4334/clojure-and-contrib" 15 | 16 | RET=0 17 | 18 | WORKING_DIR=tmp/examples/clojure-and-contrib 19 | mkdir -p $WORKING_DIR 20 | cd $WORKING_DIR 21 | WORKING_DIR=`pwd` 22 | 23 | if [ ! -d "$BACKUP_FILE" ]; then 24 | wget $BACKUP_URL 25 | unzip $BACKUP_FILE.zip 26 | fi 27 | 28 | if [ ! -d "$DATOMIC_FILE" ]; then 29 | wget $DATOMIC_URL 30 | unzip $DATOMIC_FILE.zip 31 | fi 32 | 33 | #### Restore 34 | 35 | cd $DATOMIC_FILE 36 | 37 | # Start with a fresh database 38 | rm -rf data log 39 | 40 | bin/transactor config/samples/free-transactor-template.properties & 41 | TRANSACTOR_PID=$! 42 | 43 | (( RET += $? )) 44 | 45 | bin/datomic restore-db file:$WORKING_DIR/$BACKUP_FILE $DB_URI 46 | 47 | (( RET += $? )) 48 | 49 | pkill -P $TRANSACTOR_PID 50 | 51 | #### Verify 52 | 53 | bin/transactor config/samples/free-transactor-template.properties & 54 | TRANSACTOR_PID=$! 55 | 56 | (( RET += $? )) 57 | 58 | cd $CODEQ_ROOT 59 | 60 | sleep 5 61 | 62 | lein run -m datomic.codeq.examples.clojure-and-contrib $DB_URI 63 | 64 | (( RET += $? )) 65 | 66 | cd $CODEQ_ROOT 67 | lein clean 68 | lein uberjar 69 | 70 | git clone git@github.com:clojure/clojure.git $WORKING_DIR/clojure 71 | cd $WORKING_DIR/clojure 72 | 73 | java -server -Xmx1g -jar $CODEQ_ROOT/target/codeq-$CODEQ_VERSION-standalone.jar $DB_URI 74 | 75 | (( RET += $? )) 76 | 77 | pkill -P $TRANSACTOR_PID 78 | 79 | exit $RET 80 | -------------------------------------------------------------------------------- /examples/src/datomic/codeq/examples/clojure_and_contrib.clj: -------------------------------------------------------------------------------- 1 | (ns datomic.codeq.examples.clojure-and-contrib 2 | (:require [datomic.api :as d :refer [q]] 3 | [clojure.pprint :refer [pprint]])) 4 | 5 | (defn -main [& [database-uri]] 6 | (assert database-uri) 7 | (println "Running clojure-and-contrib examples with database" database-uri) 8 | (try 9 | (let [conn (d/connect database-uri) 10 | db (-> conn d/db (d/as-of 435691)) 11 | repos (map first 12 | (q '[:find ?repo 13 | :where [?e :repo/uri ?repo]] 14 | db)) 15 | namespaces (map first 16 | (q '[:find ?ns 17 | :where 18 | [?e :clj/ns ?n] 19 | [?n :code/name ?ns]] 20 | db)) 21 | definitions (reduce (fn [agg [o d]] 22 | (update-in agg [o] (fnil conj []) d)) 23 | {} 24 | (q '[:find ?op ?def 25 | :where 26 | [?e :clj/def ?d] 27 | [?e :clj/defop ?op] 28 | [?d :code/name ?def]] 29 | db))] 30 | (println) 31 | (println "#### Repos:") 32 | (println) 33 | (pprint repos) 34 | (println) 35 | (println "#### Namespaces:") 36 | (println) 37 | (pprint namespaces) 38 | (println) 39 | (println "#### Definitions:") 40 | (println) 41 | (pprint definitions) 42 | (assert (= 44 (-> definitions keys count))) 43 | (assert (= 33 (-> "defne" definitions count)))) 44 | (finally 45 | ;; (shutdown-agents) 46 | (System/exit 0)))) 47 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject datomic/codeq "0.1.0-SNAPSHOT" 2 | :description "codeq does a code-aware import of your git repo into a Datomic db" 3 | :url "http://datomic.com" 4 | :license {:name "Eclipse Public License" 5 | :url "http://www.eclipse.org/legal/epl-v10.html"} 6 | :main datomic.codeq.core 7 | :plugins [[lein-tar "1.1.0"]] 8 | :dependencies [[com.datomic/datomic-free "0.9.4699"] 9 | [commons-codec "1.7"] 10 | [org.clojure/clojure "1.5.1"]] 11 | :source-paths ["src" "examples/src"]) 12 | -------------------------------------------------------------------------------- /src/datomic/codeq/analyzer.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.analyzer 10 | (:import [java.io StringReader] 11 | [org.apache.commons.codec.digest DigestUtils])) 12 | 13 | (set! *warn-on-reflection* true) 14 | 15 | (defprotocol Analyzer 16 | (keyname [a] "keyword name for analyzer") 17 | (revision [a] "long") 18 | (extensions [a] "[string ...], including '.'") 19 | (schemas [a] "map of revisions to (incremental) schema data") 20 | (analyze [a db f src] "f is file entityid, src is string, returns tx-data")) 21 | 22 | (defn sha 23 | "Returns the hex string of the sha1 of s" 24 | [^String s] 25 | (org.apache.commons.codec.digest.DigestUtils/shaHex s)) 26 | 27 | (defn ws-minify 28 | "Consecutive ws becomes a single space, then trim" 29 | [s] 30 | (let [r (java.io.StringReader. s) 31 | sb (StringBuilder.)] 32 | (loop [c (.read r) skip true] 33 | (when-not (= c -1) 34 | (let [ws (Character/isWhitespace c)] 35 | (when (or (not ws) (not skip)) 36 | (.append sb (if ws " " (char c)))) 37 | (recur (.read r) ws)))) 38 | (-> sb str .trim))) 39 | 40 | (defn loc 41 | "Returns zero-based [line col endline endcol] given one-based 42 | \"line col endline endcol\" string" 43 | [loc-string] 44 | (mapv dec (read-string (str "[" loc-string "]")))) 45 | 46 | (defn line-offsets 47 | "Returns a vector of zero-based offsets of lines. Note the offsets 48 | are where the line would be, the last offset is not necessarily 49 | within the string. i.e. if the last character is a newline, the last 50 | index is the length of the string." 51 | [^String s] 52 | (let [nl (long \newline)] 53 | (persistent! 54 | (loop [ret (transient [0]), i 0] 55 | (if (= i (.length s)) 56 | ret 57 | (recur (if (= (.codePointAt s i) nl) 58 | (conj! ret (inc i)) 59 | ret) 60 | (inc i))))))) 61 | 62 | (defn segment 63 | "Given a string and line offsets, returns text from (zero-based) 64 | line and col to endline/endcol (exclusive)" 65 | [^String s line-offsets line col endline endcol] 66 | (subs s 67 | (+ (nth line-offsets line) col) 68 | (+ (nth line-offsets endline) endcol))) 69 | -------------------------------------------------------------------------------- /src/datomic/codeq/analyzers/clj.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.analyzers.clj 10 | (:require [datomic.api :as d] 11 | [datomic.codeq.util :refer [index->id-fn tempid?]] 12 | [datomic.codeq.analyzer :as az])) 13 | 14 | (defn analyze-1 15 | "returns [tx-data ctx]" 16 | [db f x loc seg ret {:keys [sha->id codename->id added ns] :as ctx}] 17 | (if loc 18 | (let [sha (-> seg az/ws-minify az/sha) 19 | codeid (sha->id sha) 20 | newcodeid (and (tempid? codeid) (not (added codeid))) 21 | ret (cond-> ret newcodeid (conj {:db/id codeid :code/sha sha :code/text seg})) 22 | added (cond-> added newcodeid (conj codeid)) 23 | 24 | codeqid (or (ffirst (d/q '[:find ?e :in $ ?f ?loc 25 | :where [?e :codeq/file ?f] [?e :codeq/loc ?loc]] 26 | db f loc)) 27 | (d/tempid :db.part/user)) 28 | 29 | op (first x) 30 | ns? (= op 'ns) 31 | defing (and ns 32 | (symbol? op) 33 | (.startsWith (name op) "def")) 34 | 35 | naming (let [nsym (second x)] 36 | (cond 37 | ns? (str nsym) 38 | defing (if (namespace nsym) 39 | (str nsym) 40 | (str (symbol (name ns) (name nsym)))))) 41 | 42 | nameid (when naming (codename->id naming)) 43 | 44 | ret (cond-> ret 45 | (tempid? codeqid) 46 | (conj {:db/id codeqid 47 | :codeq/file f 48 | :codeq/loc loc 49 | :codeq/code codeid}) 50 | 51 | ns? 52 | (conj [:db/add codeqid :clj/ns nameid]) 53 | 54 | defing 55 | (conj [:db/add codeqid :clj/def nameid] 56 | [:db/add codeqid :clj/defop (str op)]) 57 | 58 | (tempid? nameid) 59 | (conj [:db/add nameid :code/name naming]))] 60 | [ret (assoc ctx :added added)]) 61 | [ret ctx])) 62 | 63 | (defn analyze 64 | [db f src] 65 | (with-open [r (clojure.lang.LineNumberingPushbackReader. (java.io.StringReader. src))] 66 | (let [loffs (az/line-offsets src) 67 | eof (Object.) 68 | ctx {:sha->id (index->id-fn db :code/sha) 69 | :codename->id (index->id-fn db :code/name) 70 | :added #{}}] 71 | (loop [ret [], ctx ctx, x (read r false eof)] 72 | (if (= eof x) 73 | ret 74 | (let [{:keys [line column]} (meta x) 75 | ctx (if (and (coll? x) (= (first x) 'ns)) 76 | (assoc ctx :ns (second x)) 77 | ctx) 78 | endline (.getLineNumber r) 79 | endcol (.getColumnNumber r) 80 | [loc seg] (when (and line column) 81 | [(str line " " column " " endline " " endcol) 82 | (az/segment src loffs (dec line) (dec column) (dec endline) (dec endcol))]) 83 | [ret ctx] (analyze-1 db f x loc seg ret ctx)] 84 | (recur ret ctx (read r false eof)))))))) 85 | 86 | (defn schemas [] 87 | {1 [{:db/id #db/id[:db.part/db] 88 | :db/ident :clj/ns 89 | :db/valueType :db.type/ref 90 | :db/cardinality :db.cardinality/one 91 | :db/doc "codename of ns defined by expression" 92 | :db.install/_attribute :db.part/db} 93 | {:db/id #db/id[:db.part/db] 94 | :db/ident :clj/def 95 | :db/valueType :db.type/ref 96 | :db/cardinality :db.cardinality/one 97 | :db/doc "codename defined by expression" 98 | :db.install/_attribute :db.part/db}] 99 | 2 [{:db/id #db/id[:db.part/db] 100 | :db/ident :clj/defop 101 | :db/valueType :db.type/string 102 | :db/cardinality :db.cardinality/one 103 | :db/doc "the def form (defn, defmacro etc) used to create this definition" 104 | :db.install/_attribute :db.part/db}]}) 105 | 106 | (deftype CljAnalyzer [] 107 | az/Analyzer 108 | (keyname [a] :clj) 109 | (revision [a] 2) 110 | (extensions [a] [".clj"]) 111 | (schemas [a] (schemas)) 112 | (analyze [a db f src] (analyze db f src))) 113 | 114 | (defn impl [] (CljAnalyzer.)) -------------------------------------------------------------------------------- /src/datomic/codeq/core.clj: -------------------------------------------------------------------------------- 1 | ;; Copyright (c) Metadata Partners, LLC and Contributors. All rights reserved. 2 | ;; The use and distribution terms for this software are covered by the 3 | ;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php) 4 | ;; which can be found in the file epl-v10.html at the root of this distribution. 5 | ;; By using this software in any fashion, you are agreeing to be bound by 6 | ;; the terms of this license. 7 | ;; You must not remove this notice, or any other, from this software. 8 | 9 | (ns datomic.codeq.core 10 | (:require [datomic.api :as d] 11 | [clojure.java.io :as io] 12 | [clojure.set] 13 | [clojure.string :as string] 14 | [datomic.codeq.util :refer [index->id-fn tempid?]] 15 | [datomic.codeq.analyzer :as az] 16 | [datomic.codeq.analyzers.clj]) 17 | (:import java.util.Date) 18 | (:gen-class)) 19 | 20 | (set! *warn-on-reflection* true) 21 | 22 | (def schema 23 | [ 24 | ;;tx attrs 25 | {:db/id #db/id[:db.part/db] 26 | :db/ident :tx/commit 27 | :db/valueType :db.type/ref 28 | :db/cardinality :db.cardinality/one 29 | :db/doc "Associate tx with this git commit" 30 | :db.install/_attribute :db.part/db} 31 | 32 | {:db/id #db/id[:db.part/db] 33 | :db/ident :tx/file 34 | :db/valueType :db.type/ref 35 | :db/cardinality :db.cardinality/one 36 | :db/doc "Associate tx with this git blob" 37 | :db.install/_attribute :db.part/db} 38 | 39 | {:db/id #db/id[:db.part/db] 40 | :db/ident :tx/analyzer 41 | :db/valueType :db.type/keyword 42 | :db/cardinality :db.cardinality/one 43 | :db/index true 44 | :db/doc "Associate tx with this analyzer" 45 | :db.install/_attribute :db.part/db} 46 | 47 | {:db/id #db/id[:db.part/db] 48 | :db/ident :tx/analyzerRev 49 | :db/valueType :db.type/long 50 | :db/cardinality :db.cardinality/one 51 | :db/doc "Associate tx with this analyzer revision" 52 | :db.install/_attribute :db.part/db} 53 | 54 | {:db/id #db/id[:db.part/db] 55 | :db/ident :tx/op 56 | :db/valueType :db.type/keyword 57 | :db/index true 58 | :db/cardinality :db.cardinality/one 59 | :db/doc "Associate tx with this operation - one of :import, :analyze" 60 | :db.install/_attribute :db.part/db} 61 | 62 | ;;git stuff 63 | {:db/id #db/id[:db.part/db] 64 | :db/ident :git/type 65 | :db/valueType :db.type/keyword 66 | :db/cardinality :db.cardinality/one 67 | :db/index true 68 | :db/doc "Type enum for git objects - one of :commit, :tree, :blob, :tag" 69 | :db.install/_attribute :db.part/db} 70 | 71 | {:db/id #db/id[:db.part/db] 72 | :db/ident :git/sha 73 | :db/valueType :db.type/string 74 | :db/cardinality :db.cardinality/one 75 | :db/doc "A git sha, should be in repo" 76 | :db/unique :db.unique/identity 77 | :db.install/_attribute :db.part/db} 78 | 79 | {:db/id #db/id[:db.part/db] 80 | :db/ident :repo/commits 81 | :db/valueType :db.type/ref 82 | :db/cardinality :db.cardinality/many 83 | :db/doc "Associate repo with these git commits" 84 | :db.install/_attribute :db.part/db} 85 | 86 | {:db/id #db/id[:db.part/db] 87 | :db/ident :repo/uri 88 | :db/valueType :db.type/string 89 | :db/cardinality :db.cardinality/one 90 | :db/doc "A git repo uri" 91 | :db/unique :db.unique/identity 92 | :db.install/_attribute :db.part/db} 93 | 94 | {:db/id #db/id[:db.part/db] 95 | :db/ident :commit/parents 96 | :db/valueType :db.type/ref 97 | :db/cardinality :db.cardinality/many 98 | :db/doc "Parents of a commit" 99 | :db.install/_attribute :db.part/db} 100 | 101 | {:db/id #db/id[:db.part/db] 102 | :db/ident :commit/tree 103 | :db/valueType :db.type/ref 104 | :db/cardinality :db.cardinality/one 105 | :db/doc "Root node of a commit" 106 | :db.install/_attribute :db.part/db} 107 | 108 | {:db/id #db/id[:db.part/db] 109 | :db/ident :commit/message 110 | :db/valueType :db.type/string 111 | :db/cardinality :db.cardinality/one 112 | :db/doc "A commit message" 113 | :db/fulltext true 114 | :db.install/_attribute :db.part/db} 115 | 116 | {:db/id #db/id[:db.part/db] 117 | :db/ident :commit/author 118 | :db/valueType :db.type/ref 119 | :db/cardinality :db.cardinality/one 120 | :db/doc "Person who authored a commit" 121 | :db.install/_attribute :db.part/db} 122 | 123 | {:db/id #db/id[:db.part/db] 124 | :db/ident :commit/authoredAt 125 | :db/valueType :db.type/instant 126 | :db/cardinality :db.cardinality/one 127 | :db/doc "Timestamp of authorship of commit" 128 | :db/index true 129 | :db.install/_attribute :db.part/db} 130 | 131 | {:db/id #db/id[:db.part/db] 132 | :db/ident :commit/committer 133 | :db/valueType :db.type/ref 134 | :db/cardinality :db.cardinality/one 135 | :db/doc "Person who committed a commit" 136 | :db.install/_attribute :db.part/db} 137 | 138 | {:db/id #db/id[:db.part/db] 139 | :db/ident :commit/committedAt 140 | :db/valueType :db.type/instant 141 | :db/cardinality :db.cardinality/one 142 | :db/doc "Timestamp of commit" 143 | :db/index true 144 | :db.install/_attribute :db.part/db} 145 | 146 | {:db/id #db/id[:db.part/db] 147 | :db/ident :tree/nodes 148 | :db/valueType :db.type/ref 149 | :db/cardinality :db.cardinality/many 150 | :db/doc "Nodes of a git tree" 151 | :db.install/_attribute :db.part/db} 152 | 153 | {:db/id #db/id[:db.part/db] 154 | :db/ident :node/filename 155 | :db/valueType :db.type/ref 156 | :db/cardinality :db.cardinality/one 157 | :db/doc "filename of a tree node" 158 | :db.install/_attribute :db.part/db} 159 | 160 | {:db/id #db/id[:db.part/db] 161 | :db/ident :node/paths 162 | :db/valueType :db.type/ref 163 | :db/cardinality :db.cardinality/many 164 | :db/doc "paths of a tree node" 165 | :db.install/_attribute :db.part/db} 166 | 167 | {:db/id #db/id[:db.part/db] 168 | :db/ident :node/object 169 | :db/valueType :db.type/ref 170 | :db/cardinality :db.cardinality/one 171 | :db/doc "Git object (tree/blob) in a tree node" 172 | :db.install/_attribute :db.part/db} 173 | 174 | {:db/id #db/id[:db.part/db] 175 | :db/ident :git/prior 176 | :db/valueType :db.type/ref 177 | :db/cardinality :db.cardinality/one 178 | :db/doc "Node containing prior value of a git object" 179 | :db.install/_attribute :db.part/db} 180 | 181 | {:db/id #db/id[:db.part/db] 182 | :db/ident :email/address 183 | :db/valueType :db.type/string 184 | :db/cardinality :db.cardinality/one 185 | :db/doc "An email address" 186 | :db/unique :db.unique/identity 187 | :db.install/_attribute :db.part/db} 188 | 189 | {:db/id #db/id[:db.part/db] 190 | :db/ident :file/name 191 | :db/valueType :db.type/string 192 | :db/cardinality :db.cardinality/one 193 | :db/doc "A filename" 194 | :db/fulltext true 195 | :db/unique :db.unique/identity 196 | :db.install/_attribute :db.part/db} 197 | 198 | ;;codeq stuff 199 | {:db/id #db/id[:db.part/db] 200 | :db/ident :codeq/file 201 | :db/valueType :db.type/ref 202 | :db/cardinality :db.cardinality/one 203 | :db/doc "Git file containing codeq" 204 | :db.install/_attribute :db.part/db} 205 | 206 | {:db/id #db/id[:db.part/db] 207 | :db/ident :codeq/loc 208 | :db/valueType :db.type/string 209 | :db/cardinality :db.cardinality/one 210 | :db/doc "Location of codeq in file. A location string in format \"line col endline endcol\", one-based" 211 | :db.install/_attribute :db.part/db} 212 | 213 | {:db/id #db/id[:db.part/db] 214 | :db/ident :codeq/parent 215 | :db/valueType :db.type/ref 216 | :db/cardinality :db.cardinality/one 217 | :db/doc "Parent (containing) codeq of codeq (if one)" 218 | :db.install/_attribute :db.part/db} 219 | 220 | {:db/id #db/id[:db.part/db] 221 | :db/ident :codeq/code 222 | :db/valueType :db.type/ref 223 | :db/cardinality :db.cardinality/one 224 | :db/doc "Code entity of codeq" 225 | :db.install/_attribute :db.part/db} 226 | 227 | {:db/id #db/id[:db.part/db] 228 | :db/ident :code/sha 229 | :db/valueType :db.type/string 230 | :db/cardinality :db.cardinality/one 231 | :db/doc "SHA of whitespace-minified code segment text: consecutive ws becomes a single space, then trim. ws-sensitive langs don't minify." 232 | :db/unique :db.unique/identity 233 | :db.install/_attribute :db.part/db} 234 | 235 | {:db/id #db/id[:db.part/db] 236 | :db/ident :code/text 237 | :db/valueType :db.type/string 238 | :db/cardinality :db.cardinality/one 239 | :db/doc "The source code for a code segment" 240 | ;;:db/fulltext true 241 | :db.install/_attribute :db.part/db} 242 | 243 | {:db/id #db/id[:db.part/db] 244 | :db/ident :code/name 245 | :db/valueType :db.type/string 246 | :db/cardinality :db.cardinality/one 247 | :db/doc "A globally-namespaced programming language identifier" 248 | :db/fulltext true 249 | :db/unique :db.unique/identity 250 | :db.install/_attribute :db.part/db} 251 | ]) 252 | 253 | (defn ^java.io.Reader exec-stream 254 | [^String cmd] 255 | (-> (Runtime/getRuntime) 256 | (.exec cmd) 257 | .getInputStream 258 | io/reader)) 259 | 260 | (defn ensure-schema [conn] 261 | (or (-> conn d/db (d/entid :tx/commit)) 262 | @(d/transact conn schema))) 263 | 264 | ;;example commit - git cat-file -p 265 | ;;tree d81cd432f2050c84a3d742caa35ccb8298d51e9d 266 | ;;author Rich Hickey