├── .gitignore ├── README.md ├── go.mod ├── go.sum └── main.go /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Go vs GPU: Fast Spatial Joins 2 | 3 | I read [a post yesterday](https://medium.com/swlh/how-to-perform-fast-and-powerful-geospatial-data-analysis-with-gpu-48f16a168b10) that demonstrates using the GPU to process a spatial join on bigdata. 4 | The author takes a recordset of 9 million parking violations in Philadelphia and spatially joins them to a recordset of 150 neighborhood polygons. 5 | He uses the GPU libraries [rapidsai/cuDF](https://github.com/rapidsai/cudf) to read the data and [rapidsai/cuSpatial](https://github.com/rapidsai/cuspatial) to do spatial point-in-polygon operations. 6 | 7 | His results are very good. Reading the input data takes 4 seconds and performing the spatial join takes only 13 seconds. 8 | 9 | I was curious to see how well a Go program using the CPU would compare. 10 | 11 | I used the Go libraries [tidwall/geojson](https://github.com/tidwall/geojson) for storing the neighborhood polygons and [tidwall/rtree](https://github.com/tidwall/rtree) for the spatial index. These are the same foundational libraries I use in [Tile38](https://github.com/tidwall/tile38). 12 | 13 | 14 | ## Results 15 | 16 | On my 2019 Macbook (2.4 GHz 8-Core Intel Core i9): 17 | 18 | ``` 19 | Loading neighborhoods... 0.01 secs 20 | Loading violations... 3.90 secs 21 | Joining neighborhoods and violations... 2.75 secs 22 | Writing output... 0.46 secs 23 | Total execution time... 7.12 secs 24 | ``` 25 | 26 | Most of the time is taken up reading the violations CSV file. 27 | 28 | The `Joining neighborhoods and violations` operation is where the point-in-polygon and spatial joins happen. 29 | 30 | ## Downloading the data 31 | 32 | All data is downloaded to the `data` directory. 33 | The `ogr2ogr` command is provided by [GDAL](https://gdal.org). 34 | 35 | ```sh 36 | # Create the directories for storing data 37 | mkdir -p data/shapes 38 | 39 | # Download the parking violations, this will take awhile. 40 | curl "https://phl.carto.com/api/v2/sql?filename=parking_violations&format=csv&skipfields=cartodb_id,the_geom,the_geom_webmercator&q=SELECT%20*%20FROM%20parking_violations%20WHERE%20issue_datetime%20%3E=%20%272012-01-01%27%20AND%20issue_datetime%20%3C%20%272017-12-31%27" > data/phl_parking.csv 41 | 42 | # Download the Philadelphia neighborhoods shapes and unzip it. 43 | wget -P data/shapes https://github.com/azavea/geo-data/raw/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.zip 44 | 45 | unzip -d data/shapes data/shapes/Neighborhoods_Philadelphia.zip 46 | 47 | # Convert the neighborhood shapes into wgs84 GeoJSON 48 | ogr2ogr -t_srs EPSG:4326 data/Neighborhoods_Philadelphia.json data/shapes/Neighborhoods_Philadelphia.shp 49 | ``` 50 | 51 | ## Running 52 | 53 | ```sh 54 | git clone https://github.com/tidwall/fast-spatial-joins 55 | cd fast-spatial-joins 56 | go run main.go 57 | ``` 58 | 59 | The final output is written to `data/output.csv`. 60 | 61 | Here's a snapshot of what the output looks like: 62 | 63 | ``` 64 | anon_ticket_number,neighborhood 65 | 1777797,Center City East 66 | 1777798,Chinatown 67 | 1777799,Wister 68 | 1777801,Center City East 69 | 1777802,Logan Square 70 | 1777803,Old City 71 | 1777804,Northern Liberties 72 | 1777805,Newbold 73 | 1777806,Old City 74 | 1777808,Society Hill 75 | 1777809,Spring Garden 76 | 1777810,Rittenhouse 77 | 1777811,Logan Square 78 | 1777813,Graduate Hospital 79 | 1777814,Point Breeze 80 | 1777815,Washington Square West 81 | 1777816,Rittenhouse 82 | 1777817,Wister 83 | 1777818,Society Hill 84 | 1777820,Logan Square 85 | ``` 86 | -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/tidwall/sick-spatial-join 2 | 3 | go 1.17 4 | 5 | require ( 6 | github.com/tidwall/geojson v1.3.4 7 | github.com/tidwall/rtree v1.3.1 8 | ) 9 | 10 | require ( 11 | github.com/tidwall/geoindex v1.4.4 // indirect 12 | github.com/tidwall/gjson v1.12.1 // indirect 13 | github.com/tidwall/match v1.1.1 // indirect 14 | github.com/tidwall/pretty v1.2.0 // indirect 15 | github.com/tidwall/sjson v1.2.4 // indirect 16 | ) 17 | -------------------------------------------------------------------------------- /go.sum: -------------------------------------------------------------------------------- 1 | github.com/tidwall/cities v0.1.0 h1:CVNkmMf7NEC9Bvokf5GoSsArHCKRMTgLuubRTHnH0mE= 2 | github.com/tidwall/cities v0.1.0/go.mod h1:lV/HDp2gCcRcHJWqgt6Di54GiDrTZwh1aG2ZUPNbqa4= 3 | github.com/tidwall/geoindex v1.4.4 h1:hdwzy5qNtK75i7nus59Ibr+SwcH4F2v65bw4txrLJ9M= 4 | github.com/tidwall/geoindex v1.4.4/go.mod h1:rvVVNEFfkJVWGUdEfU8QaoOg/9zFX0h9ofWzA60mz1I= 5 | github.com/tidwall/geojson v1.3.4 h1:mHB2yGK7HPgf4vFkLdPeIzguFpqkmCT2yTgGhXbrqBo= 6 | github.com/tidwall/geojson v1.3.4/go.mod h1:1cn3UWfSYCJOq53NZoQ9rirdw89+DM0vw+ZOAVvuReg= 7 | github.com/tidwall/gjson v1.12.1 h1:ikuZsLdhr8Ws0IdROXUS1Gi4v9Z4pGqpX/CvJkxvfpo= 8 | github.com/tidwall/gjson v1.12.1/go.mod h1:/wbyibRr2FHMks5tjHJ5F8dMZh3AcwJEMf5vlfC0lxk= 9 | github.com/tidwall/lotsa v1.0.2 h1:dNVBH5MErdaQ/xd9s769R31/n2dXavsQ0Yf4TMEHHw8= 10 | github.com/tidwall/lotsa v1.0.2/go.mod h1:X6NiU+4yHA3fE3Puvpnn1XMDrFZrE9JO2/w+UMuqgR8= 11 | github.com/tidwall/match v1.1.1 h1:+Ho715JplO36QYgwN9PGYNhgZvoUSc9X2c80KVTi+GA= 12 | github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JTxsfmM= 13 | github.com/tidwall/pretty v1.2.0 h1:RWIZEg2iJ8/g6fDDYzMpobmaoGh5OLl4AXtGUGPcqCs= 14 | github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU= 15 | github.com/tidwall/rtree v1.3.1 h1:xu3vJPKJrmGce7YJcFUCoqLrp9DTUEJBnVgdPSXHgHs= 16 | github.com/tidwall/rtree v1.3.1/go.mod h1:S+JSsqPTI8LfWA4xHBo5eXzie8WJLVFeppAutSegl6M= 17 | github.com/tidwall/sjson v1.2.4 h1:cuiLzLnaMeBhRmEv00Lpk3tkYrcxpmbU81tAY4Dw0tc= 18 | github.com/tidwall/sjson v1.2.4/go.mod h1:098SZ494YoMWPmMO6ct4dcFnqxwj9r/gF0Etp19pSNM= 19 | -------------------------------------------------------------------------------- /main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "bufio" 5 | "fmt" 6 | "log" 7 | "os" 8 | "runtime" 9 | "strconv" 10 | "strings" 11 | "sync" 12 | "time" 13 | 14 | "github.com/tidwall/geojson" 15 | "github.com/tidwall/geojson/geometry" 16 | "github.com/tidwall/gjson" 17 | "github.com/tidwall/rtree" 18 | ) 19 | 20 | type hood struct { 21 | name string 22 | feat *geojson.Feature 23 | } 24 | 25 | type violation struct { 26 | point [2]float64 27 | row int 28 | num string 29 | hood *hood 30 | } 31 | 32 | func main() { 33 | start := time.Now() 34 | 35 | mark := time.Now() 36 | fmt.Printf("Loading neighborhoods... ") 37 | hoods, err := loadHoods() 38 | if err != nil { 39 | log.Fatal(err) 40 | } 41 | fmt.Printf("%.2f secs\n", time.Since(mark).Seconds()) 42 | 43 | mark = time.Now() 44 | fmt.Printf("Loading violations... ") 45 | violations, err := loadViolations() 46 | if err != nil { 47 | log.Fatal(err) 48 | } 49 | fmt.Printf("%.2f secs\n", time.Since(mark).Seconds()) 50 | 51 | mark = time.Now() 52 | fmt.Printf("Joining neighborhoods and violations... ") 53 | join(hoods, violations) 54 | fmt.Printf("%.2f secs\n", time.Since(mark).Seconds()) 55 | 56 | mark = time.Now() 57 | fmt.Printf("Writing output... ") 58 | if err := writeViolations(violations); err != nil { 59 | log.Fatal(err) 60 | } 61 | fmt.Printf("%.2f secs\n", time.Since(mark).Seconds()) 62 | 63 | fmt.Printf("Total execution time... %.2f secs\n", 64 | time.Since(start).Seconds()) 65 | } 66 | 67 | func loadHoods() (*rtree.RTree, error) { 68 | hoods := new(rtree.RTree) 69 | data, err := os.ReadFile("data/Neighborhoods_Philadelphia.json") 70 | if err != nil { 71 | return nil, err 72 | } 73 | json := string(data) 74 | g, err := geojson.Parse(json, nil) 75 | if err != nil { 76 | return nil, err 77 | } 78 | g.(*geojson.FeatureCollection).ForEach(func(f geojson.Object) bool { 79 | r := g.Rect() 80 | feat := f.(*geojson.Feature) 81 | h := &hood{ 82 | name: gjson.Get(feat.Members(), "properties.LISTNAME").String(), 83 | feat: feat, 84 | } 85 | min, max := [2]float64{r.Min.X, r.Min.Y}, [2]float64{r.Max.X, r.Max.Y} 86 | hoods.Insert(min, max, h) 87 | return true 88 | }) 89 | return hoods, nil 90 | } 91 | 92 | func loadViolations() ([]violation, error) { 93 | data, err := os.ReadFile("data/phl_parking.csv") 94 | if err != nil { 95 | return nil, err 96 | } 97 | csv := string(data) 98 | violations := make([]violation, 0, 10_000_000) 99 | var cols []string 100 | var row int 101 | s := strings.IndexByte(csv, '\n') + 1 102 | for i := s; i < len(csv); i++ { 103 | switch csv[i] { 104 | case ',': 105 | cols = append(cols, csv[s:i]) 106 | s = i + 1 107 | case '\n': 108 | cols = append(cols, csv[s:i]) 109 | var v violation 110 | v.point[0], _ = strconv.ParseFloat(string(cols[10]), 64) 111 | v.point[1], _ = strconv.ParseFloat(string(cols[9]), 64) 112 | v.num = cols[0] 113 | v.row = row 114 | violations = append(violations, v) 115 | s = i + 1 116 | cols = cols[:0] 117 | row++ 118 | } 119 | } 120 | return violations, nil 121 | 122 | } 123 | 124 | func join(hoods *rtree.RTree, violations []violation) { 125 | var wg sync.WaitGroup 126 | ch := make(chan int, 8192) 127 | for i := 0; i < runtime.NumCPU(); i++ { 128 | wg.Add(1) 129 | go func() { 130 | for i := range ch { 131 | rpt := violations[i].point 132 | hoods.Search(rpt, rpt, 133 | func(_, _ [2]float64, v interface{}) bool { 134 | h := v.(*hood) 135 | gpt := geometry.Point{X: rpt[0], Y: rpt[1]} 136 | if h.feat.IntersectsPoint(gpt) { 137 | violations[i].hood = h 138 | return false 139 | } 140 | return true 141 | }, 142 | ) 143 | } 144 | wg.Done() 145 | }() 146 | } 147 | for i := range violations { 148 | ch <- i 149 | } 150 | close(ch) 151 | wg.Wait() 152 | } 153 | 154 | func writeViolations(violations []violation) error { 155 | f, err := os.Create("data/output.csv") 156 | if err != nil { 157 | return err 158 | } 159 | defer f.Close() 160 | w := bufio.NewWriter(f) 161 | _, err = w.WriteString("anon_ticket_number,neighborhood\r\n") 162 | if err != nil { 163 | return err 164 | } 165 | var buf []byte 166 | var count int 167 | for _, v := range violations { 168 | if v.hood == nil { 169 | continue 170 | } 171 | buf = append(buf[:0], v.num...) 172 | buf = append(buf, ',') 173 | buf = append(buf, v.hood.name...) 174 | buf = append(buf, '\r', '\n') 175 | _, err = w.Write(buf) 176 | if err != nil { 177 | return err 178 | } 179 | count++ 180 | } 181 | if err := w.Flush(); err != nil { 182 | return err 183 | } 184 | return nil 185 | } 186 | --------------------------------------------------------------------------------