├── README.md ├── final ├── .gitignore ├── Makefile ├── compile.sh ├── include │ ├── Build.h │ ├── Intermediate.h │ ├── JobScheduler.h │ ├── Joiner.h │ ├── Operations.h │ ├── Optimizer.h │ ├── Parser.h │ ├── Partition.h │ ├── Probe.h │ ├── Queue.h │ ├── Relation.h │ ├── Utils.h │ └── Vector.h ├── run.sh ├── runTestharness.sh ├── runUnitTesting.sh ├── src │ ├── Build.c │ ├── Intermediate.c │ ├── JobScheduler.c │ ├── Joiner.c │ ├── Operations.c │ ├── Optimizer.c │ ├── Parser.c │ ├── Partition.c │ ├── Probe.c │ ├── Queue.c │ ├── Relation.c │ ├── Utils.c │ ├── Vector.c │ ├── harness.cpp │ └── main.c ├── test │ ├── TestHeader.h │ ├── TestJoiner.c │ ├── TestMain.c │ ├── TestOperators.c │ ├── TestOptimizer.c │ ├── TestParser.c │ ├── TestQueue.c │ └── TestRadixHashJoin.c └── workloads │ └── small │ ├── r0 │ ├── r0.sql │ ├── r0.tbl │ ├── r1 │ ├── r1.sql │ ├── r1.tbl │ ├── r10 │ ├── r10.sql │ ├── r10.tbl │ ├── r11 │ ├── r11.sql │ ├── r11.tbl │ ├── r12 │ ├── r12.sql │ ├── r12.tbl │ ├── r13 │ ├── r13.sql │ ├── r13.tbl │ ├── r2 │ ├── r2.sql │ ├── r2.tbl │ ├── r3 │ ├── r3.sql │ ├── r3.tbl │ ├── r4 │ ├── r4.sql │ ├── r4.tbl │ ├── r5 │ ├── r5.sql │ ├── r5.tbl │ ├── r6 │ ├── r6.sql │ ├── r6.tbl │ ├── r7 │ ├── r7.sql │ ├── r7.tbl │ ├── r8 │ ├── r8.sql │ ├── r8.tbl │ ├── r9 │ ├── r9.sql │ ├── r9.tbl │ ├── small.init │ ├── small.result │ ├── small.work │ └── small.work.sql └── img ├── cost.png ├── plot1.png ├── plot2.png └── radix_hash_join.png /README.md: -------------------------------------------------------------------------------- 1 | # Software Development for Database Systems 2 | 3 | ## About 4 | 5 | The continuous advancement of technology used in the hardware domain has lead to the mass production of multi-core CPUs as well as to the decrease of RAM cost in terms of $/GB. In this project we demonstrate an efficient implementation of join operation in relational databases exploiting CPU parallelism and the large amount of available RAM in modern servers. 6 | 7 | Our project, written in C language, was constructed in three parts which were then merged into a single one in a way, that complies with the instructions we were given. It features a hash-based radix partition join inspired by the join algorithms introduced in this [paper](https://15721.courses.cs.cmu.edu/spring2016/papers/balkesen-icde2013.pdf). 8 | 9 | Additionally, it is worth mentioning that the idea of this project originated from the [SIGMOD Programming Contest 2018](http://sigmod18contest.db.in.tum.de/task.shtml). Thus, we follow the task specifications of the contest and we also utilize the testing workloads provided to the contestants. 10 | 11 | 12 | ## Implementation 13 | 14 | 15 | * #### Query Execution 16 | 17 | Firstly, we collect various statistics concerning the input relations, such as max/min value, approximate number of discrete values e.t.c during the nontimed pre-processing stage. These stats can be useful for future work on query optimization. 18 | 19 | During the execution of the query we use an intermediate result structure to store the tuples resulted from each predicate, either a filter or a join. By doing that we manage to avoid scanning a relation from top to bottom multiple times when it is present in more than one predicates. 20 | 21 | 22 | * #### Radix Hash Join 23 | 24 | The main idea of Radix Hash Join algorithm is to partition the input data of the two join relations in a number of buckets, so that the largest bucket can fit into the CPU cache. More precisely, the RHJ algorithm consists of the following three phases: 25 | 26 | * **Partition** 27 | 28 | We partition the data of each relation into a number of buckets by applying the same hash function (HASH_FUN_1) on both relations. In our implementation HASH_FUN_1 uses the n least-significant bits of the record to determine its bucket. In addition, histogram and prefix sum tables need to be calculated for each one of the two relations. 29 | 30 | * **Build** 31 | 32 | An index is created for each of the partitions (i.e: buckets) of the smallest relation. Each index resembles a hash table using two arrays (chain array and bucket array). These arrays are used to store indices of the corresponding bucket according to the hash value of a new hash function (HASH_FUN_2). 33 | 34 | * **Probe** 35 | 36 | Partitions of the non-indexed relation, i.e: the bigger one, are scanned and the respective index is probed for matching tuples. 37 | 38 | ![image not found](./img/radix_hash_join.png) 39 | 40 | *Image above illustrates the three phases of Radix Hash Join Algorithm* 41 | 42 | 43 | * #### Multithreading 44 | 45 | We managed to speed up our serial implementation by applying multithreading on various parts of our code, such as filter execution, histogram creation, bucket indexing, probing e.t.c. We decided to make use of POSIX Threads for this purpose. You may modify the thread number [here](./final/src/JobScheduler.c). The following figures depict the satisfactory speedup we achieved. 46 | 47 | ![image not found](./img/plot2.png) 48 | 49 | The above graph shows the correlation between execution time and number of threads using the [small](./final/workloads/small) dataset. For this test we used a machine from the Linux lab of our department (Intel i5-6500 3.2 GHz, 4 cores, 4 threads | 16 GB RAM ) 50 | 51 | ![image not found](./img/plot1.png) 52 | 53 | The above graph shows the correlation between execution time and number of threads using the *public* dataset which can be downloaded from [here](http://sigmod18contest.db.in.tum.de/public.tar.gz). 54 | 55 | Our machine's specifications are: 56 | * CPU: Ryzen 2400G 3.6 GHz , 4 cores , 8 threads 57 | * RAM: 16GB DDR4 dual-channel 58 | 59 | ## Usage 60 | 61 | * ``cd final`` 62 | * ``./compile.sh && ./runTestHarness.sh`` 63 | 64 | ## Unit Testing 65 | 66 | For unit testing we use the [CUnit](http://cunit.sourceforge.net/index.html) testing framework. Tests are added to different suites, each one being responsible for testing a specific category of functions. In order to run the tests CUnit must be [installed](http://archive15.fossology.org/projects/fossology/wiki/Installing_CUnit) on your system. 67 | 68 | #### Running the tests 69 | * ``cd final`` 70 | * ``make unittest && ./runUnitTesting.sh`` 71 | 72 | ## Profiling 73 | 74 | This pie graph was generated using profiling data collected by [callgrind's](http://valgrind.org/docs/manual/cl-manual.html#cl-manual.options.separation) function cost mechanism. 75 | 76 | ![image not found](./img/cost.png) 77 | 78 | 1. `constructTuple` : creates a new tuple that will be added to the join result 79 | 80 | 2. `insertAtVector` : inserts the tuple into the result vector 81 | 82 | 3. `joinFunc` : implements Probing (phase 3) 83 | 84 | 4. `checkSumFunc` : calculates checksums after query's execution is finished 85 | 86 | 5. `partition` : implements Partition (phase 2) 87 | 88 | You may also run a memory check using valgrind by uncommenting the line you wish in [run.sh](./final/run.sh) script. 89 | 90 | ## Authors 91 | 92 | * George Panagiotopoulos - [giorgospan](https://github.com/giorgospan) 93 | * John Papastamou - [JohnThePriest](https://github.com/JohnThePriest) 94 | 95 | ## References 96 | 97 | * Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu 98 | [Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware](https://15721.courses.cs.cmu.edu/spring2016/papers/balkesen-icde2013.pdf) 99 | 100 | -------------------------------------------------------------------------------- /final/.gitignore: -------------------------------------------------------------------------------- 1 | # ignore all .o files 2 | *.o 3 | 4 | # ignore all files inside build directory 5 | build/release/ 6 | 7 | # ingore all files inside dumpFiles directory 8 | dumpFiles/ 9 | 10 | # ignore all .log files 11 | *.log -------------------------------------------------------------------------------- /final/Makefile: -------------------------------------------------------------------------------- 1 | # specify directories 2 | INCDIR := ./include 3 | SRCDIR := ./src 4 | TESTDIR := ./test 5 | BUILDDIR:= ./build/release 6 | 7 | # path to our custom test file 8 | WORKPATH:= ./workloads/small/custom_input_file 9 | 10 | # name of our executable 11 | TARGET := Driver 12 | 13 | # specify source,object,header files & test source/object files 14 | SOURCES := $(wildcard $(SRCDIR)/*.c) 15 | OBJECTS := $(SOURCES:$(SRCDIR)/%.c=./%.o) 16 | INCLUDES:= $(wildcard $(INCDIR)/*.h) 17 | 18 | TESTSRCS:= $(wildcard $(TESTDIR)/*.c) 19 | TESTOBJS:= $(TESTSRCS:$(TESTDIR)/%.c=./%.o) 20 | 21 | CC := gcc 22 | CFLAGS := -O3 -g -I $(INCDIR) # -O3 for even more compiler optimization 23 | 24 | ############################################################# 25 | # Build harness Driver 26 | all: $(BUILDDIR)/harness $(BUILDDIR)/$(TARGET) 27 | 28 | # all: clean $(BUILDDIR)/$(TARGET) 29 | # $(BUILDDIR)/$(TARGET) < $(WORKPATH) 30 | 31 | # valgrind: clean $(BUILDDIR)/$(TARGET) 32 | # valgrind --leak-check=yes $(BUILDDIR)/$(TARGET) < $(WORKPATH) 33 | 34 | # create executable [by linking object files] 35 | $(BUILDDIR)/$(TARGET): $(OBJECTS) 36 | $(CC) $(CFLAGS) -o $@ $(OBJECTS) -lpthread 37 | 38 | # create object files 39 | $(OBJECTS): ./%.o : $(SRCDIR)/%.c 40 | $(CC) $(CFLAGS) -c $< -o $@ 41 | 42 | ############################################################# 43 | 44 | # create harness executable 45 | $(BUILDDIR)/harness: ./harness.o 46 | g++ -O3 -g -o $@ $^ 47 | 48 | # create harness object file 49 | ./harness.o: $(SRCDIR)/harness.cpp 50 | g++ -O3 -g -c $< -o $@ 51 | 52 | ############################################################# 53 | 54 | # note that we discard the main function of our program. 55 | # that is because we want to use the main function that we 56 | # have in TESTDIR 57 | unittest: $(TESTOBJS) $(filter-out ./main.o, ./$(OBJECTS)) 58 | $(CC) $(CFLAGS) -o $@ $^ -lcunit -lpthread 59 | 60 | $(TESTOBJS): ./%.o : $(TESTDIR)/%.c 61 | $(CC) $(CFLAGS) -c $< -o $@ 62 | 63 | 64 | ############################################################# 65 | 66 | # clean up 67 | .PHONY: clean 68 | clean: 69 | @rm -f harness.o $(OBJECTS) $(TESTOBJS) $(BUILDDIR)/* ./dumpFiles/* unittest ./workloads/small/call* ./workloads/small/massif* 70 | @echo "Cleanup complete!" 71 | 72 | # count 73 | .PHONY: count 74 | count: 75 | @wc $(SOURCES) $(INCLUDES) $(TESTS) 76 | -------------------------------------------------------------------------------- /final/compile.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) 4 | 5 | cd $DIR 6 | mkdir -p dumpFiles 7 | mkdir -p build/release 8 | make clean 9 | make -j4 10 | -------------------------------------------------------------------------------- /final/include/Build.h: -------------------------------------------------------------------------------- 1 | #ifndef BUILD_H 2 | #define BUILD_H 3 | 4 | #include "Partition.h"/* for "RadixHashJoinInfo" type*/ 5 | 6 | #define HASH_RANGE_2 301 7 | #define HASH_FUN_2(KEY) ((KEY)%(301)) 8 | 9 | struct buildArg{ 10 | unsigned bucket; 11 | RadixHashJoinInfo *info; 12 | }; 13 | 14 | 15 | void buildFunc(void* arg); 16 | void build(RadixHashJoinInfo *infoLeft,RadixHashJoinInfo *infoRight); 17 | void initializeIndexArray(RadixHashJoinInfo *info); 18 | void buildIndexPerBucket(RadixHashJoinInfo *info); 19 | void traverseChain(unsigned chainPos,unsigned* chainArray,unsigned posToBeStored); 20 | 21 | #endif 22 | -------------------------------------------------------------------------------- /final/include/Intermediate.h: -------------------------------------------------------------------------------- 1 | #ifndef INTERMEDIATE_H 2 | #define INTERMEDIATE_H 3 | 4 | #include "Joiner.h" 5 | #include "Parser.h" 6 | 7 | struct InterMetaData 8 | { 9 | /** 10 | * Array arrays of vectors: Each vector-array will be an "intermediate" entity 11 | * We need one array of vectors per "intermediate" entity because we'll be computing 12 | * the intermediate result using many jobs, not just one. Those jobs will be served 13 | * by our threads. 14 | */ 15 | struct Vector ***interResults; 16 | 17 | /** 18 | * Array of arrays: One array per interResult . 19 | * Each array will be of size "queryRelations" 20 | * It is actually a mapping between a relation and where its rowIds are placed inside the tuple 21 | * E.g: mapRels[0][1] = 2 means: 0-th vector will contain tuples of the following format: 22 | * 23 | * , where X,Y are ids of the joined relations 24 | * 0-th rowId inside tuple is from relation X, 25 | * 1-st rowId inside tuple is from relation Y, 26 | * 2-nd rowId inside tuple is from relation 1, 27 | * e.t.c 28 | * 29 | * In general : mapRels[..][relId] = tupleOffset 30 | */ 31 | unsigned **mapRels; 32 | 33 | /* Number of relations participating in the query */ 34 | unsigned queryRelations; 35 | 36 | /* Size of interResults array 37 | * I.e: max number of "intermediate" entities that might be created 38 | */ 39 | unsigned maxNumOfVectors; 40 | }; 41 | 42 | typedef struct ColumnInfo 43 | { 44 | uint64_t *values; 45 | unsigned *rowIds; 46 | struct Vector *tuples; 47 | }ColumnInfo; 48 | 49 | typedef struct Index 50 | { 51 | /* These are the two arrays 52 | used for the indexing of a single bucket */ 53 | unsigned *chainArray; 54 | unsigned *bucketArray; 55 | }Index; 56 | 57 | /** 58 | * @brief Holds info about the relation we are joining 59 | */ 60 | typedef struct RadixHashJoinInfo 61 | { 62 | 63 | unsigned pos; 64 | // Id of the relation [relevant to the parse order] 65 | unsigned relId; 66 | 67 | // Id of the column 68 | unsigned colId; 69 | 70 | // Column values 71 | uint64_t *col; 72 | 73 | // Number of tuples [either in original relation or in intermediate vector] 74 | unsigned numOfTuples; 75 | 76 | // Number of rowIds in each tuple of the vector 77 | // In case the relation is not in the intermediate results,tupleSize won't be used at all. 78 | unsigned tupleSize; 79 | 80 | // Intermediate vector 81 | // In case the relation is not in the intermediate results,vector won't be used at all. 82 | struct Vector **vector; 83 | 84 | // Vector's address [Useful when we want to destroy the vector after having executed the join] 85 | struct Vector **ptrToVec; 86 | 87 | // Map[relation <---> vector's tuple] 88 | // In case the relation is not in the intermediate results,map won't be used at all. 89 | unsigned *map; 90 | 91 | // Map's address [Useful when we want to destroy the map array after having executed the join] 92 | unsigned **ptrToMap; 93 | 94 | // Number of relations participating in the query 95 | // I.e: size of map array 96 | unsigned queryRelations; 97 | 98 | // 1 if relation is in interResults 99 | // 0 otherwise 100 | unsigned isInInter; 101 | 102 | // Will be set to 1 during build phase, in case this is the small column 103 | // between the two columns we're joining. 104 | unsigned isSmall; 105 | // 1: if it is on join's lhs, 0 otherwise 106 | unsigned isLeft; 107 | 108 | ColumnInfo *unsorted; 109 | ColumnInfo *sorted; 110 | unsigned *hist; 111 | unsigned *pSum; 112 | Index **indexArray; 113 | }RadixHashJoinInfo; 114 | 115 | /* Creators/Initializers */ 116 | void createInterMetaData(struct InterMetaData **inter,struct QueryInfo *q); 117 | void initializeInfo(struct InterMetaData *inter,struct QueryInfo *q,struct SelectInfo *s,struct Joiner *j,RadixHashJoinInfo *arg); 118 | 119 | /* Apply Functions */ 120 | void applyColumnEqualities(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q); 121 | void applyFilters(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q); 122 | void applyJoins(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q); 123 | void applyProperJoin(struct InterMetaData *inter,RadixHashJoinInfo* argLeft,RadixHashJoinInfo* argRight); 124 | void applyCheckSums(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q); 125 | 126 | /* Check functions */ 127 | unsigned isInInter(struct Vector *vector); 128 | 129 | /* Misc */ 130 | unsigned getVectorPos(struct InterMetaData *inter,unsigned relId); 131 | unsigned getFirstAvailablePos(struct InterMetaData* inter); 132 | void createMap(unsigned **mapRels,unsigned size,unsigned *values); 133 | void printCheckSum(uint64_t checkSum,unsigned iSLast); 134 | 135 | /* Destroyer */ 136 | void destroyInterMetaData(struct InterMetaData *inter); 137 | void destroyRadixHashJoinInfo(RadixHashJoinInfo *); 138 | void destroyColumnInfo(ColumnInfo **c); 139 | 140 | #endif 141 | -------------------------------------------------------------------------------- /final/include/JobScheduler.h: -------------------------------------------------------------------------------- 1 | #ifndef JOB_SCHEDULER_H 2 | #define JOB_SCHEDULER_H 3 | 4 | #include 5 | #include 6 | 7 | 8 | /* Mutexes - conditional variables - barriers */ 9 | extern pthread_mutex_t queueMtx; 10 | extern pthread_mutex_t jobsFinishedMtx; 11 | extern pthread_cond_t condNonEmpty; 12 | extern pthread_cond_t condJobsFinished; 13 | extern pthread_barrier_t barrier; 14 | extern struct JobScheduler* js; 15 | extern unsigned jobsFinished; 16 | extern pthread_mutex_t* partitionMtxArray; 17 | 18 | 19 | /* Job queue */ 20 | /* It must be visible from all threads, including the main thread of course */ 21 | extern struct Queue* jobQueue; 22 | 23 | struct Job{ 24 | // Function that the worker thread is going to execute 25 | void (*function)(void*); 26 | // Argument passed to the function 27 | void *argument; 28 | }; 29 | 30 | struct JobScheduler{ 31 | // number of worker threads 32 | unsigned threadNum; 33 | //thread ids 34 | pthread_t *tids; 35 | // histgrams 36 | unsigned **histArray; 37 | // checksums 38 | uint64_t *checkSumArray; 39 | 40 | // job arrays [different kinds of jobs] 41 | struct Job *histJobs; 42 | struct Job *partitionJobs; 43 | struct Job *buildJobs; 44 | struct Job *joinJobs; 45 | struct Job *colEqualityJobs; 46 | struct Job *filterJobs; 47 | struct Job *checkSumJobs; 48 | }; 49 | 50 | void createJobScheduler(struct JobScheduler** js); 51 | void createJobArrays(struct JobScheduler* js); 52 | void *threadFunc(void *); 53 | void destroyJobScheduler(struct JobScheduler* js); 54 | 55 | #endif 56 | -------------------------------------------------------------------------------- /final/include/Joiner.h: -------------------------------------------------------------------------------- 1 | #ifndef JOINER_H 2 | #define JOINER_H 3 | 4 | #include 5 | #include "Parser.h" 6 | 7 | 8 | #define HASH_FUN_1(KEY) ((KEY) & ((1< /* for CHAR_BIT */ 5 | #include "stdint.h" 6 | 7 | #include "Parser.h" 8 | #include "Joiner.h" 9 | 10 | #define PRIMELIMIT 49999991 11 | // #define PRIMELIMIT 1500 12 | 13 | 14 | #define BITMASK(b) (1 << ((b) % CHAR_BIT)) 15 | #define BITSLOT(b) ((b) / CHAR_BIT) 16 | #define BITSET(v,b) ((v)[BITSLOT(b)] |= BITMASK(b)) 17 | #define BITCLEAR(v,b) ((v)[BITSLOT(b)] &= ~BITMASK(b)) 18 | #define BITTEST(v,b) ((v)[BITSLOT(b)] & BITMASK(b)) 19 | #define BITNSLOTS(nb) ((nb + CHAR_BIT - 1) / CHAR_BIT) 20 | 21 | struct columnStats 22 | { 23 | uint64_t minValue; 24 | uint64_t maxValue; 25 | unsigned f; 26 | unsigned discreteValues; 27 | char *bitVector; 28 | unsigned bitVectorSize; 29 | char typeOfBitVector; 30 | }; 31 | 32 | void findStats(uint64_t *column, struct columnStats *stat, unsigned columnSize); 33 | void applyColEqualityEstimations(struct QueryInfo *q, struct Joiner *j); 34 | void filterEstimation(struct Joiner *j,struct QueryInfo *q,unsigned colId,struct columnStats *stat,unsigned actualRelId,unsigned relId,Comparison cmp,uint64_t constant); 35 | void applyFilterEstimations(struct QueryInfo *q, struct Joiner *j); 36 | void applyJoinEstimations(struct QueryInfo *q, struct Joiner *j); 37 | void findOptimalJoinOrder(struct QueryInfo *q, struct Joiner *j); 38 | 39 | 40 | /* Printing functions */ 41 | void columnPrint(uint64_t *column, unsigned columnSize); 42 | void printBooleanArray(char *array, unsigned size); 43 | void printColumnStats(struct columnStats *s); 44 | 45 | #endif 46 | -------------------------------------------------------------------------------- /final/include/Parser.h: -------------------------------------------------------------------------------- 1 | #ifndef PARSER_H 2 | #define PARSER_H 3 | 4 | #include 5 | #include "Utils.h" 6 | #include "Joiner.h" 7 | 8 | struct Joiner; 9 | 10 | struct SelectInfo 11 | { 12 | unsigned relId; 13 | unsigned colId; 14 | }; 15 | 16 | struct FilterInfo 17 | { 18 | struct SelectInfo filterLhs; 19 | enum Comparison comparison; 20 | uint64_t constant; 21 | }; 22 | 23 | struct PredicateInfo 24 | { 25 | struct SelectInfo left; 26 | struct SelectInfo right; 27 | }; 28 | 29 | struct QueryInfo 30 | { 31 | unsigned *relationIds; 32 | struct PredicateInfo *predicates; 33 | struct FilterInfo *filters; 34 | struct SelectInfo *selections; 35 | unsigned numOfRelationIds; 36 | unsigned numOfPredicates; 37 | unsigned numOfFilters; 38 | unsigned numOfSelections; 39 | // One estimation array per relation 40 | struct columnStats **estimations; 41 | }; 42 | 43 | /** 44 | * @brief Creates a new query structure. 45 | * Subsequently, parseQuery(..) is called. 46 | */ 47 | void createQueryInfo(struct QueryInfo **qInfo,char *rawQuery); 48 | 49 | /** 50 | * @brief Deallocates any space allocated for qInfo members 51 | */ 52 | void destroyQueryInfo(struct QueryInfo *qInfo); 53 | 54 | /** 55 | * @brief Parses relation ids ... 56 | */ 57 | void parseRelationIds(struct QueryInfo *qInfo,char *rawRelations); 58 | 59 | /** 60 | * @brief Parses predicates r1.a=r2.b&r1.b=r3.c... 61 | */ 62 | void parsePredicates(struct QueryInfo *qInfo,char *rawPredicates); 63 | 64 | /** 65 | * @brief Parses selections r1.a r1.b r3.c... 66 | */ 67 | void parseSelections(struct QueryInfo *qInfo,char *rawSelections); 68 | 69 | /** 70 | * @brief Parses selections [RELATIONS]|[PREDICATES]|[SELECTS] 71 | */ 72 | void parseQuery(struct QueryInfo *qInfo,char *rawQuery); 73 | 74 | /** 75 | * @brief Determines if predicate is filter 76 | * 77 | * @param predicate The predicate 78 | * 79 | * @return True if filter, False otherwise. 80 | */ 81 | int isFilter(char *predicate); 82 | 83 | void createQueryEstimations(struct QueryInfo *qInfo,struct Joiner * joiner); 84 | int isColEquality(struct PredicateInfo *pInfo); 85 | void addFilter(struct FilterInfo *fInfo,char *token); 86 | void addPredicate(struct PredicateInfo *pInfo,char *token); 87 | 88 | /* 89 | * "Getter" functions. 90 | * Despite having access to each struct's members from anywhere in our program, 91 | * we use "getter" functions just to make our code more neat & clean. 92 | */ 93 | unsigned getOriginalRelId(struct QueryInfo *qInfo,struct SelectInfo *sInfo); 94 | unsigned getRelId(struct SelectInfo *sInfo); 95 | unsigned getColId(struct SelectInfo *sInfo); 96 | uint64_t getConstant(struct FilterInfo *fInfo); 97 | Comparison getComparison(struct FilterInfo *fInfo); 98 | unsigned getNumOfRelations(struct QueryInfo *qInfo); 99 | unsigned getNumOfFilters(struct QueryInfo *qInfo); 100 | unsigned getNumOfColEqualities(struct QueryInfo *qInfo); 101 | unsigned getNumOfJoins(struct QueryInfo *qInfo); 102 | 103 | void printTest(struct QueryInfo *qInfo); 104 | #endif 105 | -------------------------------------------------------------------------------- /final/include/Partition.h: -------------------------------------------------------------------------------- 1 | #ifndef PARTITION_H 2 | #define PARTITION_H 3 | 4 | #include 5 | #include "Intermediate.h"/*for RadixHashJoinInfo type*/ 6 | 7 | struct histArg{ 8 | unsigned start; 9 | unsigned end; 10 | uint64_t *values; 11 | unsigned *histogram; 12 | }; 13 | 14 | struct partitionArg{ 15 | unsigned start; 16 | unsigned end; 17 | unsigned *pSumCopy; 18 | RadixHashJoinInfo *info; 19 | }; 20 | 21 | void histFunc(void*); 22 | void partitionFunc(void*); 23 | void partition(RadixHashJoinInfo*); 24 | 25 | #endif 26 | -------------------------------------------------------------------------------- /final/include/Probe.h: -------------------------------------------------------------------------------- 1 | #ifndef PROBE_H 2 | #define PROBE_H 3 | #include "Vector.h" 4 | #include "Partition.h" 5 | 6 | 7 | struct joinArg{ 8 | unsigned bucket; 9 | RadixHashJoinInfo *left; 10 | RadixHashJoinInfo *right; 11 | struct Vector *results; 12 | }; 13 | void joinFunc(void *arg); 14 | 15 | 16 | /** 17 | * @brief Checks for equality between the two column values and inserts to the 18 | * results vector a tuple constructed from the two tuples [one from each column] 19 | * 20 | * @param small The small column [has been indexed] 21 | * @param big The big column [non-indexed] 22 | * @param[in] i Row for the big column 23 | * @param[in] start Starting position of the small column's bucket 24 | * @param[in] searchValue The search value 25 | * @param[in] pseudoRow The bucket row [will use it to construct the original row] 26 | * @param results The results vector 27 | * param[in] tupleToInsert We'll fill it using constructTuple and then we'll add it to results vector 28 | */ 29 | void checkEqual(RadixHashJoinInfo *small,RadixHashJoinInfo *big,unsigned i,unsigned start,unsigned searchValue,unsigned pseudoRow,struct Vector *results,unsigned *tupleToInsert); 30 | 31 | void probe(RadixHashJoinInfo *left,RadixHashJoinInfo *right,struct Vector *results); 32 | void constructTuple(RadixHashJoinInfo *small,RadixHashJoinInfo *big,unsigned actualRow,unsigned i,unsigned *tuple); 33 | 34 | #endif 35 | -------------------------------------------------------------------------------- /final/include/Queue.h: -------------------------------------------------------------------------------- 1 | #ifndef QUEUE_H 2 | #define QUEUE_H 3 | 4 | struct Queue 5 | { 6 | /* Start and end of queue */ 7 | int front; 8 | int rear; 9 | 10 | /* Fixed size of array */ 11 | int size; 12 | void **array; 13 | }; 14 | 15 | /* Creates the queue */ 16 | void createQueue(struct Queue **q, int size); 17 | 18 | /* Free the allocated memory of the data structure */ 19 | void destroyQueue(struct Queue *q); 20 | 21 | /* enQueue an item returns 1 on success */ 22 | int enQueue(struct Queue *q, void* item); 23 | 24 | /* always extract from front returns item on success */ 25 | void* deQueue(struct Queue *q); 26 | 27 | int isEmpty(struct Queue *q); 28 | 29 | /* display queue and front and rear */ 30 | void display(struct Queue *q); 31 | 32 | 33 | #endif 34 | -------------------------------------------------------------------------------- /final/include/Relation.h: -------------------------------------------------------------------------------- 1 | #ifndef RELATION_H 2 | #define RELATION_H 3 | 4 | #include 5 | 6 | 7 | struct Relation 8 | { 9 | unsigned numOfTuples; 10 | unsigned numOfCols; 11 | uint64_t **columns; 12 | struct columnStats *stats; 13 | }; 14 | 15 | 16 | /** 17 | * @brief Creates a new relation and calls 18 | * loadRelation() to retrieve relation's data. 19 | */ 20 | void createRelation(struct Relation **rel,char *fileName); 21 | 22 | 23 | /** 24 | * @brief mmap-s relation's data from the given file 25 | */ 26 | void loadRelation(struct Relation *rel,char *fileName); 27 | 28 | /** 29 | * @brief Typical printing function 30 | */ 31 | void printRelation(struct Relation *rel); 32 | 33 | /** 34 | * @brief Dumps relation to the given file 35 | */ 36 | void dumpRelation(struct Relation *rel,char *fileName); 37 | 38 | /** 39 | * @brief Free-s any allocated space 40 | */ 41 | void destroyRelation(struct Relation *rel); 42 | 43 | #endif 44 | -------------------------------------------------------------------------------- /final/include/Utils.h: -------------------------------------------------------------------------------- 1 | #ifndef UTILS_H 2 | #define UTILS_H 3 | 4 | #include /* uint64_t */ 5 | #define BUFFERSIZE 512 6 | 7 | #define MALLOC_CHECK(M) \ 8 | if(!M){ \ 9 | fprintf(stderr,"[ERROR] MALLOC_CHECK: %s : %d\n", __FILE__, __LINE__); \ 10 | exit(EXIT_FAILURE); \ 11 | } 12 | 13 | 14 | typedef enum Comparison { Less='<', Greater='>', Equal='=' } Comparison; 15 | 16 | int compare(uint64_t key,Comparison cmp,uint64_t constant); 17 | 18 | /* Power functions */ 19 | uint64_t power(uint64_t base, uint64_t exponent); 20 | uint64_t linearPower(uint64_t base, uint64_t exponent); 21 | 22 | #endif 23 | -------------------------------------------------------------------------------- /final/include/Vector.h: -------------------------------------------------------------------------------- 1 | #ifndef VECTOR_H 2 | #define VECTOR_H 3 | #include 4 | 5 | #include "Utils.h" /*Comparison type*/ 6 | #include "Intermediate.h" /*RadixHashJoinInfo type*/ 7 | 8 | extern unsigned initSize; 9 | 10 | struct Vector 11 | { 12 | /* Table with tuples */ 13 | unsigned *table; 14 | 15 | /* Size of tuple (i.e: rowids per tuple) */ 16 | unsigned tupleSize; 17 | 18 | /* Position where the next tuple will be inserted 19 | * This member also acts as a counter for the vector elements 20 | * i.e:for the rowIds stored in the vector 21 | */ 22 | unsigned nextPos; 23 | 24 | /* Max capacity[i.e: number of rowIds] of the table */ 25 | /* If needed, we'll double it using realloc(..) */ 26 | unsigned capacity; 27 | }; 28 | 29 | struct checkSumArg{ 30 | struct Vector *vector; 31 | uint64_t *col; 32 | unsigned rowIdPos; 33 | uint64_t *sum; 34 | }; 35 | void checkSumFunc(void *arg); 36 | 37 | struct colEqualityArg{ 38 | struct Vector *new; 39 | struct Vector* old; 40 | uint64_t *leftCol; 41 | uint64_t* rightCol; 42 | unsigned posLeft; 43 | unsigned posRight; 44 | }; 45 | void colEqualityFunc(void *arg); 46 | 47 | 48 | 49 | /* Creators/Destroyer */ 50 | void createVector(struct Vector **vector,unsigned tupleSize); 51 | /** 52 | * @brief Creates a vector fixed size. 53 | * 54 | * @param vector The vector 55 | * @param[in] tupleSize The tuple size 56 | * @param[in] fixedSize The number of tuples that will be stored in it 57 | */ 58 | void createVectorFixedSize(struct Vector **vector,unsigned tupleSize,unsigned fixedSize); 59 | void destroyVector(struct Vector **vector); 60 | 61 | /* Insert functions. No reason for delete function.*/ 62 | void insertAtVector(struct Vector *vector,unsigned *tuple); 63 | void insertAtPos(struct Vector *vector,unsigned *tuple,unsigned offset); 64 | 65 | 66 | /* Getter functions */ 67 | unsigned getVectorTuples(struct Vector *vector); 68 | unsigned getTupleSize(struct Vector *vector); 69 | unsigned *getTuple(struct Vector *vector,unsigned i); 70 | 71 | /* Scan functions [Used in the case of an intermediate relation]*/ 72 | void scanColEquality(struct Vector *new,struct Vector* old,uint64_t *leftCol,uint64_t* rightCol,unsigned posLeft,unsigned posRight); 73 | void scanFilter(struct Vector *new,struct Vector* old,uint64_t *col,Comparison cmp,uint64_t constant); 74 | void scanJoin(RadixHashJoinInfo *joinRel); 75 | 76 | int vectorIsFull(struct Vector *vector); 77 | int vectorIsEmpty(struct Vector *vector); 78 | 79 | void printVector(struct Vector *vector); 80 | void printTuple(struct Vector *vector,unsigned pos); 81 | uint64_t checkSum(struct Vector *vector,uint64_t *col,unsigned rowIdPos); 82 | 83 | 84 | #endif 85 | -------------------------------------------------------------------------------- /final/run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) 4 | # valgrind --tool=callgrind ${DIR}/build/release/Driver 5 | # valgrind --tool=massif ${DIR}/build/release/Driver 6 | # valgrind --leak-check=full --track-origins=yes --show-leak-kinds=all ${DIR}/build/release/Driver 7 | ${DIR}/build/release/Driver 8 | -------------------------------------------------------------------------------- /final/runTestharness.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) 3 | 4 | WORKLOAD_DIR=${1-$DIR/workloads/small} 5 | WORKLOAD_DIR=$(echo $WORKLOAD_DIR | sed 's:/*$::') 6 | 7 | cd $WORKLOAD_DIR 8 | 9 | WORKLOAD=$(basename "$PWD") 10 | echo execute $WORKLOAD ... 11 | $DIR/build/release/harness *.init *.work *.result ../../run.sh 12 | -------------------------------------------------------------------------------- /final/runUnitTesting.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR=$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd ) 4 | 5 | WORKLOAD_DIR=${1-$DIR/workloads/small} 6 | WORKLOAD_DIR=$(echo $WORKLOAD_DIR | sed 's:/*$::') 7 | 8 | cd $WORKLOAD_DIR 9 | 10 | WORKLOAD=$(basename "$PWD") 11 | # valgrind --leak-check=full --track-origins=yes --show-leak-kinds=all $DIR/unittest 12 | $DIR/unittest 13 | -------------------------------------------------------------------------------- /final/src/Build.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include /*sleep()--debugging*/ 5 | 6 | #include "Build.h" 7 | #include "Queue.h" 8 | #include "Partition.h" 9 | #include "JobScheduler.h" 10 | #include "Utils.h" 11 | 12 | void build(RadixHashJoinInfo *infoLeft,RadixHashJoinInfo *infoRight) 13 | { 14 | RadixHashJoinInfo *big,*small; 15 | big = (infoLeft->numOfTuples >= infoRight->numOfTuples) ? infoLeft:infoRight; 16 | small = (infoLeft->numOfTuples < infoRight->numOfTuples) ? infoLeft:infoRight; 17 | big->isSmall = 0; 18 | small->isSmall = 1; 19 | 20 | initializeIndexArray(small); 21 | 22 | jobsFinished = 0; 23 | for(unsigned i=0;ibuildJobs[i].argument; 25 | arg->bucket = i; 26 | arg->info = small; 27 | pthread_mutex_lock(&queueMtx); 28 | enQueue(jobQueue,&js->buildJobs[i]); 29 | pthread_cond_signal(&condNonEmpty); 30 | pthread_mutex_unlock(&queueMtx); 31 | } 32 | 33 | pthread_mutex_lock(&jobsFinishedMtx); 34 | while (jobsFinished!=HASH_RANGE_1) { 35 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 36 | } 37 | jobsFinished = 0; 38 | pthread_cond_signal(&condJobsFinished); 39 | pthread_mutex_unlock(&jobsFinishedMtx); 40 | } 41 | 42 | void initializeIndexArray(RadixHashJoinInfo *info) 43 | { 44 | unsigned i,j; 45 | unsigned bucketSize; 46 | 47 | /* Firstly, we need to malloc space for indexArray */ 48 | /* Remember: One struct Index per bucket */ 49 | info->indexArray = malloc(HASH_RANGE_1*sizeof(struct Index*)); 50 | MALLOC_CHECK(info->indexArray); 51 | 52 | 53 | // For every bucket 54 | for(i=0;ihist[i] == 0) 58 | info->indexArray[i] = NULL; 59 | 60 | else 61 | { 62 | // Fetch bucket's size from hist array 63 | // Remember: hist is array with ints 64 | bucketSize = info->hist[i]; 65 | 66 | /* Allocate space for bucket's index */ 67 | info->indexArray[i] = malloc(sizeof(struct Index)); 68 | MALLOC_CHECK(info->indexArray[i]); 69 | 70 | 71 | /* Allocate space for index's fields */ 72 | info->indexArray[i]->chainArray = malloc(bucketSize*sizeof(unsigned)); 73 | MALLOC_CHECK(info->indexArray[i]->chainArray); 74 | info->indexArray[i]->bucketArray = malloc(HASH_RANGE_2*sizeof(unsigned)); 75 | MALLOC_CHECK(info->indexArray[i]->bucketArray); 76 | 77 | /* Initialize chainArray and bucketArray with 0's */ 78 | for(j=0;jindexArray[i]->chainArray[j] = 0; 80 | 81 | for(j=0;jindexArray[i]->bucketArray[j] = 0; 83 | } 84 | } 85 | } 86 | 87 | void buildFunc(void* arg) 88 | { 89 | struct buildArg *myarg = arg; 90 | unsigned bucketSize; 91 | uint64_t hash; 92 | unsigned chainPos; 93 | int j; 94 | int start; 95 | 96 | if(myarg->info->indexArray[myarg->bucket] != NULL) 97 | { 98 | 99 | // Fetch bucket's starting point from pSum array 100 | // Remember: pSum is array with pointers to int 101 | start = myarg->info->pSum[myarg->bucket]; 102 | 103 | // Fetch bucket's size from hist array 104 | // Remember: hist is array with ints 105 | bucketSize = myarg->info->hist[myarg->bucket]; 106 | 107 | // fprintf(stderr,"[%u]bucketSize:%u\n",myarg->bucket,bucketSize); 108 | 109 | /* Scan from the bottom of the bucket to the top */ 110 | for(j=start+bucketSize-1;j>=start;j--) 111 | { 112 | hash = HASH_FUN_2(myarg->info->sorted->values[j]); 113 | // fprintf(stderr,"\nsecondHash(%lu): %lu\n",myarg->info->sorted->values[j],hash); 114 | 115 | if(myarg->info->indexArray[myarg->bucket]->bucketArray[hash] == 0) 116 | { 117 | // fprintf(stderr,"Found empty spot in bucketArray\n"); 118 | myarg->info->indexArray[myarg->bucket]->bucketArray[hash] = (j-start)+1; 119 | // fprintf(stderr,"bucketArray[%lu]: %u\n",hash, myarg->info->indexArray[myarg->bucket]->bucketArray[hash] ); 120 | } 121 | else 122 | { 123 | /* Find the first zero in chainArray 124 | by following the chain and 125 | store "(j-start) + 1" in that place */ 126 | chainPos = myarg->info->indexArray[myarg->bucket]->bucketArray[hash]-1; 127 | traverseChain(chainPos, myarg->info->indexArray[myarg->bucket]->chainArray, j-start + 1); 128 | } 129 | } 130 | } 131 | 132 | pthread_mutex_lock(&jobsFinishedMtx); 133 | // fprintf(stderr, "Thread[%u] working on bucket:%u | jobs already finished:%u\n",(unsigned)pthread_self(),myarg->bucket,jobsFinished); 134 | ++jobsFinished; 135 | pthread_cond_signal(&condJobsFinished); 136 | pthread_mutex_unlock(&jobsFinishedMtx); 137 | 138 | } 139 | 140 | void traverseChain(unsigned chainPos,unsigned* chainArray,unsigned posToBeStored) 141 | { 142 | // fprintf(stderr,"Moving to chainArray[%u](now is equal to %u)\n",chainPos,chainArray[chainPos]); 143 | while(1) 144 | { 145 | // We've found an empty spot in chainArray 146 | if(chainArray[chainPos] == 0) 147 | { 148 | chainArray[chainPos] = posToBeStored; 149 | // fprintf(stderr,"Found empty spot on chainArray[%u]\n",chainPos); 150 | break; 151 | } 152 | /* Step further on the chain */ 153 | else 154 | { 155 | chainPos = chainArray[chainPos] - 1; 156 | // fprintf(stderr,"Moving to chainArray[%u](now is equal to %u)\n",chainPos,chainArray[chainPos]); 157 | // sleep(1); 158 | } 159 | } 160 | } 161 | -------------------------------------------------------------------------------- /final/src/Intermediate.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include /*strlen()*/ 3 | #include /*write()*/ 4 | #include /*free()*/ 5 | #include 6 | 7 | #include "Intermediate.h" 8 | #include "Parser.h" 9 | #include "Partition.h" 10 | #include "Operations.h" 11 | #include "Utils.h" 12 | #include "Vector.h"/* destroyVector(..)*/ 13 | #include "Queue.h" 14 | #include "JobScheduler.h" 15 | 16 | void createInterMetaData(struct InterMetaData **inter,struct QueryInfo *q) 17 | { 18 | *inter = malloc(sizeof(struct InterMetaData)); 19 | MALLOC_CHECK(*inter); 20 | (*inter)->maxNumOfVectors = getNumOfFilters(q) + getNumOfColEqualities(q) + getNumOfJoins(q); 21 | (*inter)->interResults = malloc((*inter)->maxNumOfVectors*sizeof(struct Vector**)); 22 | MALLOC_CHECK((*inter)->interResults); 23 | 24 | (*inter)->mapRels = malloc((*inter)->maxNumOfVectors*sizeof(unsigned*)); 25 | MALLOC_CHECK((*inter)->mapRels ); 26 | (*inter)->queryRelations = getNumOfRelations(q); 27 | 28 | for(unsigned i=0;i<(*inter)->maxNumOfVectors;++i){ 29 | (*inter)->interResults[i] = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 30 | MALLOC_CHECK((*inter)->interResults[i]); 31 | for(unsigned v=0;vinterResults[i][v] = NULL; 33 | (*inter)->mapRels[i] = NULL; 34 | } 35 | } 36 | 37 | void applyColumnEqualities(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q) 38 | { 39 | for(unsigned i=0;inumOfPredicates;++i) 40 | if(isColEquality(&q->predicates[i])) 41 | { 42 | 43 | unsigned original = getOriginalRelId(q,&q->predicates[i].left); 44 | unsigned relId = getRelId(&q->predicates[i].left); 45 | unsigned leftColId = getColId(&q->predicates[i].left); 46 | unsigned rightColId = getColId(&q->predicates[i].right); 47 | unsigned pos = getVectorPos(inter,relId); 48 | struct Vector **vector = inter->interResults[pos]+0; 49 | unsigned numOfTuples = getRelationTuples(joiner,original); 50 | uint64_t *leftCol = getColumn(joiner,original,leftColId); 51 | uint64_t *rightCol = getColumn(joiner,original,rightColId); 52 | if(isInInter(vector[0])) 53 | { 54 | // fprintf(stderr, "Column Equality Inter\n"); 55 | // printf("~~~Again~~~\n"); 56 | // printf("Pos:%u\n",pos); 57 | // printf("%u.%u=%u.%u [r%u.tbl]\n",relId,leftColId,relId,rightColId,original); 58 | colEqualityInter(leftCol,rightCol,0,0,vector); 59 | } 60 | else 61 | { 62 | // fprintf(stderr, "Column Equality\n"); 63 | // printf("%u.%u=%u.%u [r%u.tbl]\n",relId,leftColId,relId,rightColId,original); 64 | unsigned *values = malloc(inter->queryRelations*sizeof(unsigned)); 65 | MALLOC_CHECK(values); 66 | for(unsigned i=0;iqueryRelations;++i) 67 | values[i] = (i==relId) ? 0 : -1; 68 | createMap(&inter->mapRels[pos],inter->queryRelations,values); 69 | free(values); 70 | colEquality(leftCol,rightCol,numOfTuples,vector); 71 | } 72 | // printf("\n\n"); 73 | } 74 | } 75 | 76 | void applyFilters(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q) 77 | { 78 | for(unsigned i=0;inumOfFilters;++i) 79 | { 80 | unsigned original = getOriginalRelId(q,&q->filters[i].filterLhs); 81 | unsigned relId = getRelId(&q->filters[i].filterLhs); 82 | unsigned colId = getColId(&q->filters[i].filterLhs); 83 | uint64_t constant = getConstant(&q->filters[i]); 84 | Comparison cmp = getComparison(&q->filters[i]); 85 | unsigned pos = getVectorPos(inter,relId); 86 | struct Vector **vector = inter->interResults[pos]; 87 | unsigned numOfTuples = getRelationTuples(joiner,original); 88 | uint64_t *col = getColumn(joiner,original,colId); 89 | if(isInInter(vector[0])) 90 | { 91 | // fprintf(stderr,"%u.%u%c%lu [r%u.tbl]\n\n",relId,colId,cmp,constant,original); 92 | filterInter(col,cmp,constant,vector); 93 | } 94 | else 95 | { 96 | // Create map array [0 in every place except for the relId-th place] 97 | unsigned *values = malloc(inter->queryRelations*sizeof(unsigned)); 98 | MALLOC_CHECK(values); 99 | for(unsigned i=0;iqueryRelations;++i) 100 | values[i] = (i==relId) ? 0 : -1; 101 | createMap(&inter->mapRels[pos],inter->queryRelations,values); 102 | free(values); 103 | 104 | // Add filter jobs to the queue 105 | jobsFinished = 0; 106 | unsigned chunkSize = numOfTuples / HASH_RANGE_1; 107 | unsigned lastEnd = 0; 108 | unsigned i; 109 | for(i=0;ifilterJobs[i].argument; 111 | arg->col = col; 112 | arg->constant = constant; 113 | arg->cmp = cmp; 114 | arg->start = i*chunkSize; 115 | arg->end = arg->start + chunkSize; 116 | arg->vector = vector+i; 117 | lastEnd = arg->end; 118 | pthread_mutex_lock(&queueMtx); 119 | enQueue(jobQueue,&js->filterJobs[i]); 120 | pthread_cond_signal(&condNonEmpty); 121 | pthread_mutex_unlock(&queueMtx); 122 | } 123 | struct filterArg *arg = js->filterJobs[i].argument; 124 | arg->col = col; 125 | arg->constant = constant; 126 | arg->cmp = cmp; 127 | arg->start = lastEnd; 128 | arg->end = numOfTuples; 129 | arg->vector = vector+i; 130 | pthread_mutex_lock(&queueMtx); 131 | enQueue(jobQueue,&js->filterJobs[i]); 132 | pthread_cond_signal(&condNonEmpty); 133 | pthread_mutex_unlock(&queueMtx); 134 | 135 | // Wait for all filter jobs to finish 136 | pthread_mutex_lock(&jobsFinishedMtx); 137 | while (jobsFinished!=HASH_RANGE_1) { 138 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 139 | } 140 | jobsFinished = 0; 141 | pthread_cond_signal(&condJobsFinished); 142 | pthread_mutex_unlock(&jobsFinishedMtx); 143 | } 144 | } 145 | } 146 | 147 | void applyJoins(struct InterMetaData *inter,struct Joiner* joiner,struct QueryInfo *q) 148 | { 149 | for(unsigned i=0;inumOfPredicates;++i) 150 | if(!isColEquality(&q->predicates[i])) 151 | { 152 | RadixHashJoinInfo argLeft,argRight; 153 | initializeInfo(inter,q,&q->predicates[i].left,joiner,&argLeft); 154 | // printf(" = "); 155 | initializeInfo(inter,q,&q->predicates[i].right,joiner,&argRight); 156 | // printf("\n"); 157 | applyProperJoin(inter,&argLeft,&argRight); 158 | // printf("\n"); 159 | } 160 | } 161 | 162 | void applyProperJoin(struct InterMetaData *inter,RadixHashJoinInfo* argLeft,RadixHashJoinInfo* argRight) 163 | { 164 | switch(argLeft->isInInter){ 165 | case 0: 166 | switch(argRight->isInInter){ 167 | case 0: 168 | // fprintf(stderr, "joinNonInterNonInter()\n"); 169 | joinNonInterNonInter(inter,argLeft,argRight); 170 | break; 171 | case 1: 172 | // fprintf(stderr, "joinNonInterInter()\n"); 173 | joinNonInterInter(inter,argLeft,argRight); 174 | break; 175 | } 176 | break; 177 | case 1: 178 | switch(argRight->isInInter){ 179 | case 0: 180 | // fprintf(stderr, "joinInterNonInter()\n"); 181 | joinInterNonInter(inter,argLeft,argRight); 182 | break; 183 | case 1: 184 | // fprintf(stderr, "joinInterInter()\n"); 185 | joinInterInter(inter,argLeft,argRight); 186 | break; 187 | } 188 | } 189 | } 190 | 191 | unsigned getVectorPos(struct InterMetaData *inter,unsigned relId) 192 | { 193 | for(unsigned i=0;imaxNumOfVectors;++i) 194 | if(inter->mapRels[i]) 195 | if(inter->mapRels[i][relId]!=-1) 196 | return i; 197 | return getFirstAvailablePos(inter); 198 | } 199 | 200 | unsigned getFirstAvailablePos(struct InterMetaData* inter) 201 | { 202 | for(unsigned i=0;imaxNumOfVectors;++i) 203 | if(!inter->mapRels[i]) 204 | return i; 205 | } 206 | 207 | void createMap(unsigned **mapRels,unsigned size,unsigned *values) 208 | { 209 | // -1 unsigned = 4294967295 [Hopefully we won't have to deal with so many relations] 210 | *mapRels = malloc(size*sizeof(unsigned)); 211 | MALLOC_CHECK(*mapRels); 212 | for(unsigned j=0;jnumOfSelections;++i) 222 | { 223 | unsigned original = getOriginalRelId(q,&q->selections[i]); 224 | unsigned relId = getRelId(&q->selections[i]); 225 | unsigned colId = getColId(&q->selections[i]); 226 | struct Vector **vector = inter->interResults[getVectorPos(inter,relId)]; 227 | unsigned *rowIdMap = inter->mapRels[getVectorPos(inter,relId)]; 228 | uint64_t *col = getColumn(joiner,original,colId); 229 | unsigned isLast = i == q->numOfSelections-1; 230 | 231 | // In case the given relation did not take participate in any of the predicates/filters 232 | if(!isInInter(vector[0])) 233 | printCheckSum(0,isLast); 234 | else 235 | { 236 | // Add checkSum jobs to the queue 237 | for(unsigned i=0;icheckSumJobs[i].argument; 239 | arg->vector = vector[i]; 240 | arg->col = col; 241 | arg->rowIdPos = rowIdMap[relId]; 242 | arg->sum = js->checkSumArray+i; 243 | pthread_mutex_lock(&queueMtx); 244 | enQueue(jobQueue,&js->checkSumJobs[i]); 245 | pthread_cond_signal(&condNonEmpty); 246 | pthread_mutex_unlock(&queueMtx); 247 | } 248 | // Wait for all checkSum jobs to finish 249 | pthread_mutex_lock(&jobsFinishedMtx); 250 | while (jobsFinished!=HASH_RANGE_1) { 251 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 252 | } 253 | jobsFinished = 0; 254 | pthread_cond_signal(&condJobsFinished); 255 | pthread_mutex_unlock(&jobsFinishedMtx); 256 | 257 | // Gather and sum the partial sums 258 | uint64_t allCheckSums=0; 259 | for(unsigned i=0;icheckSumArray[i]; 261 | printCheckSum(allCheckSums,isLast); 262 | } 263 | } 264 | } 265 | 266 | void printCheckSum(uint64_t checkSum,unsigned isLast) 267 | { 268 | char string[100]; 269 | if(checkSum) 270 | sprintf(string,"%lu",checkSum); 271 | else 272 | sprintf(string,"NULL"); 273 | 274 | if(isLast) 275 | { 276 | // fprintf(stderr,"%s\n",string); 277 | printf("%s\n",string); 278 | fflush(stdout); 279 | } 280 | else 281 | { 282 | // fprintf(stderr,"%s ",string); 283 | printf("%s ",string); 284 | } 285 | } 286 | 287 | 288 | void initializeInfo(struct InterMetaData *inter,struct QueryInfo *q,struct SelectInfo *s,struct Joiner *j,RadixHashJoinInfo *arg) 289 | { 290 | arg->relId = getRelId(s); 291 | arg->colId = getColId(s); 292 | arg->col = getColumn(j,getOriginalRelId(q,s),arg->colId); 293 | arg->vector = inter->interResults[getVectorPos(inter,arg->relId)]; 294 | arg->map = inter->mapRels[getVectorPos(inter,arg->relId)]; 295 | arg->queryRelations = inter->queryRelations; 296 | arg->ptrToVec = inter->interResults[getVectorPos(inter,arg->relId)]+0; 297 | arg->ptrToMap = &inter->mapRels[getVectorPos(inter,arg->relId)]; 298 | arg->pos = getVectorPos(inter,arg->relId); 299 | 300 | if(isInInter(arg->vector[0])) 301 | { 302 | arg->isInInter = 1; 303 | arg->numOfTuples = 0; 304 | for(unsigned i=0;ivector[i]) 306 | arg->numOfTuples += getVectorTuples(arg->vector[i]); 307 | arg->tupleSize = getTupleSize(arg->vector[0]); 308 | } 309 | else 310 | { 311 | arg->isInInter = 0; 312 | arg->numOfTuples = getRelationTuples(j,getOriginalRelId(q,s)); 313 | arg->tupleSize = 1; 314 | } 315 | } 316 | 317 | 318 | void destroyInterMetaData(struct InterMetaData *inter) 319 | { 320 | for(unsigned i=0;imaxNumOfVectors;++i) 321 | { 322 | for(unsigned v=0;vinterResults[i]+v); 324 | free(inter->interResults[i]); 325 | free(inter->mapRels[i]); 326 | } 327 | free(inter->interResults); 328 | free(inter->mapRels); 329 | free(inter); 330 | } 331 | 332 | 333 | void destroyRadixHashJoinInfo(RadixHashJoinInfo *info) 334 | { 335 | destroyColumnInfo(&info->unsorted); 336 | destroyColumnInfo(&info->sorted); 337 | free(info->hist); 338 | free(info->pSum); 339 | 340 | /* For every bucket of the relation */ 341 | for(unsigned i=0;iindexArray) 344 | /* If this bucket has an index */ 345 | if(info->indexArray[i] != NULL) 346 | { 347 | /* Free index fields */ 348 | free(info->indexArray[i]->chainArray); 349 | free(info->indexArray[i]->bucketArray); 350 | 351 | /* Free index struct itself */ 352 | free(info->indexArray[i]); 353 | } 354 | free(info->indexArray); 355 | } 356 | 357 | void destroyColumnInfo(ColumnInfo **c) 358 | { 359 | if(*c) 360 | { 361 | free((*c)->values); 362 | free((*c)->rowIds); 363 | destroyVector(&(*c)->tuples); 364 | } 365 | free(*c); 366 | *c = NULL; 367 | } 368 | -------------------------------------------------------------------------------- /final/src/JobScheduler.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include /*malloc(),free()*/ 3 | #include /* strerror() */ 4 | #include /*sleep()--debugging*/ 5 | #include /*time()--debugging*/ 6 | #include 7 | 8 | #include "Utils.h" 9 | #include "JobScheduler.h" 10 | #include "Joiner.h" 11 | #include "Partition.h" 12 | #include "Build.h" 13 | #include "Probe.h" 14 | #include "Operations.h" 15 | #include "Queue.h" 16 | 17 | #define THREAD_NUM 4 18 | 19 | pthread_mutex_t queueMtx = PTHREAD_MUTEX_INITIALIZER; 20 | pthread_mutex_t jobsFinishedMtx = PTHREAD_MUTEX_INITIALIZER; 21 | pthread_cond_t condNonEmpty = PTHREAD_COND_INITIALIZER; 22 | pthread_cond_t condJobsFinished = PTHREAD_COND_INITIALIZER; 23 | struct Queue* jobQueue = NULL; 24 | struct JobScheduler* js = NULL; 25 | pthread_mutex_t* partitionMtxArray = NULL; 26 | unsigned jobsFinished = 0; 27 | pthread_barrier_t barrier; 28 | 29 | void createJobScheduler(struct JobScheduler** js){ 30 | 31 | *js = malloc(sizeof(struct JobScheduler)); 32 | MALLOC_CHECK(*js); 33 | (*js)->threadNum = THREAD_NUM; 34 | (*js)->tids = malloc((*js)->threadNum*sizeof(pthread_t)); 35 | MALLOC_CHECK((*js)->tids); 36 | 37 | // Create Queue with 1000 jobs max size 38 | createQueue(&jobQueue,1000); 39 | 40 | // Create worker threads 41 | int err; 42 | for(unsigned i=0;i<(*js)->threadNum;++i){ 43 | if(err = pthread_create((*js)->tids+i,NULL,threadFunc,(void*)0)){ 44 | exit(EXIT_FAILURE); 45 | } 46 | } 47 | 48 | /* Initialize partition mutex array */ 49 | partitionMtxArray = malloc(HASH_RANGE_1*sizeof(pthread_mutex_t)); 50 | MALLOC_CHECK(partitionMtxArray); 51 | for(unsigned i=0;ithreadNum+1); 56 | /* Create job arrays */ 57 | createJobArrays(*js); 58 | } 59 | 60 | void createJobArrays(struct JobScheduler* js){ 61 | 62 | /* Create array with checkSums */ 63 | js->checkSumArray = malloc(HASH_RANGE_1*sizeof(uint64_t)); 64 | MALLOC_CHECK(js->checkSumArray); 65 | 66 | /* Create array with histograms */ 67 | js->histArray = malloc(js->threadNum*sizeof(unsigned*)); 68 | MALLOC_CHECK(js->histArray); 69 | for(unsigned i=0;ithreadNum;++i) 70 | { 71 | js->histArray[i] = malloc(HASH_RANGE_1*sizeof(unsigned)); 72 | MALLOC_CHECK(js->histArray[i]); 73 | } 74 | 75 | /* Create array with histogram jobs */ 76 | js->histJobs = malloc(js->threadNum*sizeof(struct Job)); 77 | MALLOC_CHECK(js->histJobs); 78 | for(unsigned i=0;ithreadNum;++i){ 79 | js->histJobs[i].argument = malloc(sizeof(struct histArg)); 80 | MALLOC_CHECK(js->histJobs[i].argument); 81 | ((struct histArg *)js->histJobs[i].argument)->histogram = js->histArray[i]; 82 | js->histJobs[i].function = histFunc; 83 | } 84 | 85 | /* Create array with partition jobs */ 86 | js->partitionJobs = malloc(js->threadNum*sizeof(struct Job)); 87 | MALLOC_CHECK(js->partitionJobs); 88 | for(unsigned i=0;ithreadNum;++i){ 89 | js->partitionJobs[i].argument = malloc(sizeof(struct partitionArg)); 90 | MALLOC_CHECK(js->partitionJobs[i].argument); 91 | js->partitionJobs[i].function = partitionFunc; 92 | } 93 | 94 | /* Create array with build jobs */ 95 | js->buildJobs = malloc(HASH_RANGE_1*sizeof(struct Job)); 96 | MALLOC_CHECK(js->buildJobs); 97 | for(unsigned i=0;ibuildJobs[i].argument = malloc(sizeof(struct buildArg)); 99 | MALLOC_CHECK(js->buildJobs[i].argument); 100 | js->buildJobs[i].function = buildFunc; 101 | } 102 | 103 | /* Create array with join jobs */ 104 | js->joinJobs = malloc(HASH_RANGE_1*sizeof(struct Job)); 105 | MALLOC_CHECK(js->joinJobs); 106 | for(unsigned i=0;ijoinJobs[i].argument = malloc(sizeof(struct joinArg)); 108 | MALLOC_CHECK(js->joinJobs[i].argument); 109 | js->joinJobs[i].function = joinFunc; 110 | } 111 | 112 | /* Create array with columnEquality jobs */ 113 | js->colEqualityJobs = malloc(HASH_RANGE_1*sizeof(struct Job)); 114 | MALLOC_CHECK(js->colEqualityJobs); 115 | for(unsigned i=0;icolEqualityJobs[i].argument = malloc(sizeof(struct colEqualityArg)); 117 | MALLOC_CHECK(js->colEqualityJobs[i].argument); 118 | js->colEqualityJobs[i].function = colEqualityFunc; 119 | } 120 | 121 | /* Create array with filter jobs */ 122 | js->filterJobs = malloc(HASH_RANGE_1*sizeof(struct Job)); 123 | MALLOC_CHECK(js->filterJobs); 124 | for(unsigned i=0;ifilterJobs[i].argument = malloc(sizeof(struct filterArg)); 126 | MALLOC_CHECK(js->filterJobs[i].argument); 127 | js->filterJobs[i].function = filterFunc; 128 | } 129 | 130 | /* Create array with checksum jobs */ 131 | js->checkSumJobs = malloc(HASH_RANGE_1*sizeof(struct Job)); 132 | MALLOC_CHECK(js->checkSumJobs); 133 | for(unsigned i=0;icheckSumJobs[i].argument = malloc(sizeof(struct checkSumArg)); 135 | MALLOC_CHECK(js->checkSumJobs[i].argument); 136 | js->checkSumJobs[i].function = checkSumFunc; 137 | } 138 | } 139 | 140 | void *threadFunc(void * arg){ 141 | 142 | // fprintf(stderr, "thread[%u] entering the threadFunc\n",(unsigned)pthread_self()); 143 | int err; 144 | while(1){ 145 | // Acquire mutex and check if there is a job in the queue 146 | pthread_mutex_lock(&queueMtx); 147 | while (isEmpty(jobQueue)) { 148 | // fprintf(stderr, "Going to sleep\n"); 149 | pthread_cond_wait(&condNonEmpty,&queueMtx); 150 | // fprintf(stderr, "thread[%u] woke up\n",(unsigned)pthread_self()); 151 | } 152 | 153 | // Finally found a job to work on 154 | // deQueue the job and release the mutex [also signal the cond var] 155 | struct Job* job = deQueue(jobQueue); 156 | pthread_cond_signal(&condNonEmpty); 157 | pthread_mutex_unlock(&queueMtx); 158 | 159 | // Special kind of job indicating the end of our program 160 | if(job==NULL) 161 | { 162 | // fprintf(stderr, "thread[%u] exiting...\n",(unsigned)pthread_self()); 163 | pthread_exit((void*)0); 164 | } 165 | 166 | // Execute the function of your job & destroy the argument afterwards[if it has been malloc'd] 167 | (*(job->function))(job->argument); 168 | } 169 | } 170 | 171 | void destroyJobScheduler(struct JobScheduler* js){ 172 | 173 | // Send "termination" jobs 174 | for(unsigned i=0;ithreadNum;++i){ 175 | pthread_mutex_lock(&queueMtx); 176 | enQueue(jobQueue,NULL); 177 | pthread_cond_signal(&condNonEmpty); 178 | pthread_mutex_unlock(&queueMtx); 179 | } 180 | // Broadcast to make sure every worker gets its "termination" job 181 | pthread_mutex_lock(&queueMtx); 182 | pthread_cond_broadcast(&condNonEmpty); 183 | pthread_mutex_unlock(&queueMtx); 184 | 185 | // Join worker threads 186 | int err; 187 | for(unsigned i=0;ithreadNum;++i){ 188 | if(err = pthread_join(js->tids[i],NULL)){ 189 | fprintf(stderr, "pthread_join: %s\n",strerror(err)); 190 | exit(EXIT_FAILURE); 191 | } 192 | } 193 | 194 | // Destroy thread id table 195 | free(js->tids); 196 | 197 | // Destroy Queue 198 | destroyQueue(jobQueue); 199 | 200 | // Destroy mutexes and cond vars 201 | if (err = pthread_mutex_destroy(&queueMtx)) { 202 | fprintf(stderr, "pthread_mutex_destroy[queueMtx]: %s\n",strerror(err)); 203 | exit(EXIT_FAILURE); 204 | } 205 | 206 | if (err = pthread_mutex_destroy(&jobsFinishedMtx)) { 207 | fprintf(stderr, "pthread_mutex_destroy[jobsFinishedMtx]: %s\n",strerror(err)); 208 | exit(EXIT_FAILURE); 209 | } 210 | 211 | if(err = pthread_cond_destroy(&condNonEmpty)){ 212 | fprintf(stderr, "pthread_cond_destroy: %s\n",strerror(err)); 213 | exit(EXIT_FAILURE); 214 | } 215 | 216 | if(err = pthread_cond_destroy(&condJobsFinished)){ 217 | fprintf(stderr, "pthread_cond_destroy: %s\n",strerror(err)); 218 | exit(EXIT_FAILURE); 219 | } 220 | 221 | // Destroy partition mutex array 222 | for(unsigned i=0;icheckSumArray); 234 | 235 | for(unsigned i=0;ithreadNum;++i) 236 | free(js->histArray[i]); 237 | free(js->histArray); 238 | 239 | for(unsigned i=0;ithreadNum;++i) 240 | free(js->histJobs[i].argument); 241 | free(js->histJobs); 242 | 243 | for(unsigned i=0;ithreadNum;++i) 244 | free(js->partitionJobs[i].argument); 245 | free(js->partitionJobs); 246 | 247 | for(unsigned i=0;ibuildJobs[i].argument); 249 | free(js->buildJobs); 250 | 251 | for(unsigned i=0;ijoinJobs[i].argument); 253 | free(js->joinJobs); 254 | 255 | for(unsigned i=0;icolEqualityJobs[i].argument); 257 | free(js->colEqualityJobs); 258 | 259 | for(unsigned i=0;ifilterJobs[i].argument); 261 | free(js->filterJobs); 262 | 263 | 264 | for(unsigned i=0;icheckSumJobs[i].argument); 266 | free(js->checkSumJobs); 267 | 268 | // Destroy JobScheduler 269 | free(js); 270 | } 271 | -------------------------------------------------------------------------------- /final/src/Joiner.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #include "Joiner.h" 7 | #include "Relation.h" 8 | #include "Intermediate.h" 9 | #include "Vector.h" 10 | #include "Utils.h" 11 | 12 | unsigned RADIX_BITS; 13 | unsigned HASH_RANGE_1; 14 | 15 | void createJoiner(struct Joiner **joiner) 16 | { 17 | *joiner = malloc(sizeof(struct Joiner)); 18 | MALLOC_CHECK(*joiner); 19 | (*joiner)->numOfRelations = 0; 20 | (*joiner)->relations = NULL; 21 | } 22 | 23 | void setup(struct Joiner *joiner) 24 | { 25 | /* Contains all file names : "r0\nr1\nr2\n....r20\n" */ 26 | char *buffer = malloc(BUFFERSIZE*sizeof(char)); 27 | MALLOC_CHECK(buffer); 28 | char *allNames = buffer; 29 | 30 | /* We assume that file name will be at most 18[do not forget '\n' and '\0'] characters long */ 31 | char fileName[20]; 32 | 33 | /* Get number of relations and store file names to allNames */ 34 | allNames[0] = '\0'; 35 | while (fgets(fileName, sizeof(fileName), stdin) != NULL) 36 | { 37 | if(!strcmp(fileName,"Done\n")) 38 | break; 39 | ++joiner->numOfRelations; 40 | strcat(allNames,fileName); 41 | } 42 | 43 | /* Allocate space to store relations */ 44 | joiner->relations = malloc(joiner->numOfRelations*sizeof(struct Relation*)); 45 | MALLOC_CHECK(joiner->relations); 46 | 47 | /* Add realation corresponding to the fileName scanned from allNames */ 48 | int offset; 49 | while(sscanf(allNames,"%s%n",fileName,&offset)>0) 50 | { 51 | addRelation(joiner,fileName); 52 | allNames+=offset; 53 | } 54 | setRadixBits(joiner); 55 | setVectorInitSize(joiner); 56 | free(buffer); 57 | } 58 | 59 | void addRelation(struct Joiner *joiner,char *fileName) 60 | { 61 | 62 | /* Indicates the number of relations added so far */ 63 | static unsigned i=0; 64 | 65 | /* Create a new relation */ 66 | struct Relation *rel; 67 | createRelation(&rel,fileName); 68 | 69 | /* Add it to joiner's "relations" array */ 70 | joiner->relations[i++] = rel; 71 | // dumpRelation(rel,fileName); 72 | } 73 | 74 | void join(struct Joiner *joiner,struct QueryInfo *q) 75 | { 76 | struct InterMetaData *inter; 77 | createInterMetaData(&inter,q); 78 | 79 | // fprintf(stderr,"=========================================================\n"); 80 | // fprintf(stderr,"Column Equalities\n"); 81 | // fprintf(stderr,"=========================================================\n"); 82 | applyColumnEqualities(inter,joiner,q); 83 | 84 | // fprintf(stderr,"=========================================================\n"); 85 | // fprintf(stderr,"Filters\n"); 86 | // fprintf(stderr,"=========================================================\n"); 87 | applyFilters(inter,joiner,q); 88 | 89 | // fprintf(stderr,"=========================================================\n"); 90 | // fprintf(stderr,"Joins\n"); 91 | // fprintf(stderr,"=========================================================\n"); 92 | applyJoins(inter,joiner,q); 93 | 94 | 95 | // fprintf(stderr,"=========================================================\n"); 96 | // fprintf(stderr,"CheckSums\n"); 97 | // fprintf(stderr,"=========================================================\n"); 98 | applyCheckSums(inter,joiner,q); 99 | 100 | 101 | // fprintf(stderr,"=========================================================\n"); 102 | // fprintf(stderr,"Destruction\n"); 103 | // fprintf(stderr,"=========================================================\n"); 104 | destroyInterMetaData(inter); 105 | } 106 | 107 | uint64_t *getColumn(struct Joiner *joiner,unsigned relId,unsigned colId) 108 | { 109 | return joiner->relations[relId]->columns[colId]; 110 | } 111 | 112 | unsigned getRelationTuples(struct Joiner *joiner,unsigned relId) 113 | { 114 | return joiner->relations[relId]->numOfTuples; 115 | } 116 | 117 | void setVectorInitSize(struct Joiner *joiner) 118 | { 119 | /** 120 | * small : 1000 121 | * medium : 1000 122 | * large : 5000 123 | * public : 500000 124 | */ 125 | 126 | unsigned sum = 0; 127 | unsigned avgNumOfTuples = 0; 128 | for(unsigned i=0;inumOfRelations;++i) 129 | sum+=joiner->relations[i]->numOfTuples; 130 | avgNumOfTuples = sum/joiner->numOfRelations; 131 | 132 | if(avgNumOfTuples<500000) 133 | initSize = 1000; 134 | else if(avgNumOfTuples<1200000) 135 | initSize = 1000; 136 | else if(avgNumOfTuples<2000000) 137 | initSize = 5000; 138 | else 139 | initSize = 500000; 140 | } 141 | 142 | void setRadixBits(struct Joiner *joiner) 143 | { 144 | unsigned sum = 0; 145 | unsigned avgNumOfTuples = 0; 146 | for(unsigned i=0;inumOfRelations;++i) 147 | sum+=joiner->relations[i]->numOfTuples; 148 | avgNumOfTuples = sum/joiner->numOfRelations; 149 | 150 | /** 151 | * small : 4,16 152 | * medium : 5,32 153 | * large : 5,32 154 | * public : 8,256 155 | */ 156 | 157 | if (avgNumOfTuples<500000) { 158 | RADIX_BITS = 4; 159 | HASH_RANGE_1 = 16; 160 | } else if (avgNumOfTuples<2000000) { 161 | RADIX_BITS = 5; 162 | HASH_RANGE_1 = 32; 163 | } else { 164 | RADIX_BITS = 8; 165 | HASH_RANGE_1 = 256; 166 | } 167 | } 168 | 169 | void destroyJoiner(struct Joiner *joiner) 170 | { 171 | for (unsigned i=0;inumOfRelations;++i) 172 | destroyRelation(joiner->relations[i]); 173 | free(joiner->relations); 174 | free(joiner); 175 | } 176 | -------------------------------------------------------------------------------- /final/src/Operations.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include /*free()*/ 3 | #include 4 | #include "Operations.h" 5 | 6 | #include "Joiner.h" 7 | #include "Vector.h" 8 | #include "Intermediate.h" 9 | #include "Utils.h" 10 | #include "Partition.h" 11 | #include "Build.h" 12 | #include "Probe.h" 13 | #include "JobScheduler.h" 14 | #include "Queue.h" 15 | 16 | void colEquality(uint64_t *leftCol,uint64_t *rightCol,unsigned numOfTuples,struct Vector **vector) 17 | { 18 | 19 | /* Create the vectors that will hold the results */ 20 | /* 1 stands for: "1 rowId per tuple" */ 21 | for(unsigned i=0;icolEqualityJobs[i].argument; 44 | arg->new = results[i]; 45 | arg->old = vector[i]; 46 | arg->leftCol = leftCol; 47 | arg->rightCol = rightCol; 48 | arg->posLeft = posLeft; 49 | arg->posRight = posRight; 50 | pthread_mutex_lock(&queueMtx); 51 | enQueue(jobQueue,&js->colEqualityJobs[i]); 52 | pthread_cond_signal(&condNonEmpty); 53 | pthread_mutex_unlock(&queueMtx); 54 | } 55 | pthread_mutex_lock(&jobsFinishedMtx); 56 | while (jobsFinished!=HASH_RANGE_1) { 57 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 58 | } 59 | jobsFinished = 0; 60 | pthread_cond_signal(&condJobsFinished); 61 | pthread_mutex_unlock(&jobsFinishedMtx); 62 | for(unsigned i=0;ivector,1); 74 | for(unsigned i=myarg->start;iend;++i) 75 | if(compare(myarg->col[i],myarg->cmp,myarg->constant)) 76 | insertAtVector(*myarg->vector,&i); 77 | pthread_mutex_lock(&jobsFinishedMtx); 78 | ++jobsFinished; 79 | pthread_cond_signal(&condJobsFinished); 80 | pthread_mutex_unlock(&jobsFinishedMtx); 81 | } 82 | 83 | void filterInter(uint64_t *col,Comparison cmp,uint64_t constant,struct Vector **vector) 84 | { 85 | /* Hold the old vector */ 86 | struct Vector *old = *vector; 87 | 88 | /* Create a new vector */ 89 | createVector(vector,1); 90 | 91 | /* Fill the new one appropriately by scanning the old vector */ 92 | scanFilter(*vector,old,col,cmp,constant); 93 | 94 | /* Destroy the old */ 95 | destroyVector(&old); 96 | } 97 | 98 | void joinNonInterNonInter(struct InterMetaData *inter,RadixHashJoinInfo* left,RadixHashJoinInfo* right) 99 | { 100 | // Partition the two columns 101 | partition(left); 102 | partition(right); 103 | 104 | // Build index (for the smallest one) 105 | build(left,right); 106 | left->isLeft = 1; 107 | right->isLeft = 0; 108 | 109 | // Probe 110 | struct Vector **results; 111 | results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 112 | MALLOC_CHECK(results); 113 | for(unsigned i=0;itupleSize+right->tupleSize); 115 | 116 | for(unsigned i=0;ijoinJobs[i].argument; 118 | arg->bucket = i; 119 | arg->left = left; 120 | arg->right = right; 121 | arg->results = results[i]; 122 | pthread_mutex_lock(&queueMtx); 123 | enQueue(jobQueue,&js->joinJobs[i]); 124 | pthread_cond_signal(&condNonEmpty); 125 | pthread_mutex_unlock(&queueMtx); 126 | } 127 | pthread_mutex_lock(&jobsFinishedMtx); 128 | while (jobsFinished!=HASH_RANGE_1) { 129 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 130 | } 131 | jobsFinished = 0; 132 | pthread_cond_signal(&condJobsFinished); 133 | pthread_mutex_unlock(&jobsFinishedMtx); 134 | 135 | // Update mapRels and interResults // 136 | // Construct new mapping 137 | unsigned *newMap = malloc(inter->queryRelations*sizeof(unsigned)); 138 | MALLOC_CHECK(newMap); 139 | for(unsigned i=0;iqueryRelations;++i) 140 | newMap[i] = -1; 141 | 142 | newMap[left->relId] = 0; 143 | newMap[right->relId] = 1; 144 | 145 | // Free the old map arrays | Destroy the old vectors 146 | free(*left->ptrToMap); 147 | *left->ptrToMap = NULL; 148 | free(*right->ptrToMap); 149 | *right->ptrToMap = NULL; 150 | 151 | for(unsigned i=0;iptrToVec+i); 154 | destroyVector(right->ptrToVec+i); 155 | } 156 | 157 | // Attach the new ones to first available position 158 | unsigned pos = getFirstAvailablePos(inter); 159 | inter->mapRels[pos] = newMap; 160 | for(unsigned i=0;iinterResults[pos][i] = results[i]; 162 | 163 | free(results); 164 | destroyRadixHashJoinInfo(left); 165 | destroyRadixHashJoinInfo(right); 166 | } 167 | 168 | void joinNonInterInter(struct InterMetaData *inter,RadixHashJoinInfo* left,RadixHashJoinInfo* right) 169 | { 170 | // Partition the two columns 171 | partition(left); 172 | partition(right); 173 | 174 | // Build index (for the smallest one) 175 | build(left,right); 176 | left->isLeft = 1; 177 | right->isLeft = 0; 178 | 179 | // Probe 180 | jobsFinished=0; 181 | struct Vector **results; 182 | results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 183 | MALLOC_CHECK(results); 184 | for(unsigned i=0;itupleSize+right->tupleSize); 186 | 187 | for(unsigned i=0;ijoinJobs[i].argument; 189 | arg->bucket = i; 190 | arg->left = left; 191 | arg->right = right; 192 | arg->results = results[i]; 193 | pthread_mutex_lock(&queueMtx); 194 | enQueue(jobQueue,&js->joinJobs[i]); 195 | pthread_cond_signal(&condNonEmpty); 196 | pthread_mutex_unlock(&queueMtx); 197 | } 198 | pthread_mutex_lock(&jobsFinishedMtx); 199 | while (jobsFinished!=HASH_RANGE_1) { 200 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 201 | } 202 | jobsFinished = 0; 203 | pthread_cond_signal(&condJobsFinished); 204 | pthread_mutex_unlock(&jobsFinishedMtx); 205 | 206 | // Update mapRels and interResults // 207 | // Construct new mapping 208 | unsigned *newMap = malloc(inter->queryRelations*sizeof(unsigned)); 209 | MALLOC_CHECK(newMap); 210 | 211 | newMap[left->relId] = 0; 212 | for(unsigned i=0;iqueryRelations;++i) 213 | if(i!=left->relId) 214 | newMap[i] = (right->map[i]!=-1) ? 1+right->map[i] : -1; 215 | 216 | // Free the old map arrays | Destroy the old vectors 217 | free(*left->ptrToMap); 218 | *left->ptrToMap = NULL; 219 | free(*right->ptrToMap); 220 | *right->ptrToMap = NULL; 221 | for(unsigned i=0;iptrToVec+i); 224 | destroyVector(right->ptrToVec+i); 225 | } 226 | 227 | // Attach the new ones to first available position 228 | unsigned pos = getFirstAvailablePos(inter); 229 | inter->mapRels[pos] = newMap; 230 | for(unsigned i=0;iinterResults[pos][i] = results[i]; 232 | 233 | free(results); 234 | destroyRadixHashJoinInfo(left); 235 | destroyRadixHashJoinInfo(right); 236 | } 237 | 238 | void joinInterNonInter(struct InterMetaData *inter,RadixHashJoinInfo* left,RadixHashJoinInfo* right) 239 | { 240 | 241 | // Partition the two columns 242 | partition(left); 243 | partition(right); 244 | 245 | // Build index (for the smallest one) 246 | build(left,right); 247 | left->isLeft = 1; 248 | right->isLeft = 0; 249 | 250 | // Probe 251 | struct Vector **results; 252 | jobsFinished = 0; 253 | results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 254 | MALLOC_CHECK(results); 255 | for(unsigned i=0;itupleSize+right->tupleSize); 257 | 258 | for(unsigned i=0;ijoinJobs[i].argument; 260 | arg->bucket = i; 261 | arg->left = left; 262 | arg->right = right; 263 | arg->results = results[i]; 264 | pthread_mutex_lock(&queueMtx); 265 | enQueue(jobQueue,&js->joinJobs[i]); 266 | pthread_cond_signal(&condNonEmpty); 267 | pthread_mutex_unlock(&queueMtx); 268 | } 269 | pthread_mutex_lock(&jobsFinishedMtx); 270 | while (jobsFinished!=HASH_RANGE_1) { 271 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 272 | } 273 | jobsFinished = 0; 274 | pthread_cond_signal(&condJobsFinished); 275 | pthread_mutex_unlock(&jobsFinishedMtx); 276 | 277 | // Update mapRels and interResults // 278 | // Construct new mapping 279 | unsigned *newMap = malloc(inter->queryRelations*sizeof(unsigned)); 280 | MALLOC_CHECK(newMap); 281 | 282 | for(unsigned i=0;iqueryRelations;++i) 283 | newMap[i] = left->map[i]; 284 | newMap[right->relId] = left->tupleSize; 285 | 286 | // Free the old map arrays | Destroy the old vectors 287 | free(*left->ptrToMap); 288 | *left->ptrToMap = NULL; 289 | free(*right->ptrToMap); 290 | *right->ptrToMap = NULL; 291 | for(unsigned i=0;iptrToVec+i); 294 | destroyVector(right->ptrToVec+i); 295 | } 296 | 297 | // Attach the new ones to first available position 298 | unsigned pos = getFirstAvailablePos(inter); 299 | inter->mapRels[pos] = newMap; 300 | for(unsigned i=0;iinterResults[pos][i] = results[i]; 302 | 303 | free(results); 304 | destroyRadixHashJoinInfo(left); 305 | destroyRadixHashJoinInfo(right); 306 | } 307 | 308 | void joinInterInter(struct InterMetaData *inter,RadixHashJoinInfo* left,RadixHashJoinInfo* right) 309 | { 310 | if(left->vector == right->vector) 311 | { 312 | unsigned posLeft = left->map[left->relId]; 313 | unsigned posRight = right->map[right->relId]; 314 | colEqualityInter(left->col,right->col,posLeft,posRight,left->ptrToVec); 315 | return; 316 | } 317 | // Partition the two columns 318 | partition(left); 319 | partition(right); 320 | 321 | // Build index (for the smallest one) 322 | build(left,right); 323 | left->isLeft = 1; 324 | right->isLeft = 0; 325 | 326 | // Probe 327 | struct Vector **results; 328 | jobsFinished=0; 329 | results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 330 | MALLOC_CHECK(results); 331 | for(unsigned i=0;itupleSize+right->tupleSize); 333 | 334 | for(unsigned i=0;ijoinJobs[i].argument; 336 | arg->bucket = i; 337 | arg->left = left; 338 | arg->right = right; 339 | arg->results = results[i]; 340 | pthread_mutex_lock(&queueMtx); 341 | enQueue(jobQueue,&js->joinJobs[i]); 342 | pthread_cond_signal(&condNonEmpty); 343 | pthread_mutex_unlock(&queueMtx); 344 | } 345 | pthread_mutex_lock(&jobsFinishedMtx); 346 | while (jobsFinished!=HASH_RANGE_1) { 347 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 348 | } 349 | jobsFinished = 0; 350 | pthread_cond_signal(&condJobsFinished); 351 | pthread_mutex_unlock(&jobsFinishedMtx); 352 | 353 | // Update mapRels and interResults // 354 | // Construct new mapping 355 | unsigned *newMap = malloc(inter->queryRelations*sizeof(unsigned)); 356 | MALLOC_CHECK(newMap); 357 | for(unsigned i=0;iqueryRelations;++i) 358 | newMap[i] = left->map[i]; 359 | 360 | for(unsigned i=0;iqueryRelations;++i) 361 | if(newMap[i]==-1) 362 | newMap[i] = (right->map[i]!=-1) ? right->map[i]+left->tupleSize : -1; 363 | 364 | // Free the old map arrays | Destroy the old vectors 365 | free(*left->ptrToMap); 366 | *left->ptrToMap = NULL; 367 | free(*right->ptrToMap); 368 | *right->ptrToMap = NULL; 369 | for(unsigned i=0;iptrToVec+i); 372 | destroyVector(right->ptrToVec+i); 373 | } 374 | 375 | // Attach the new ones to first available position 376 | unsigned pos = getFirstAvailablePos(inter); 377 | inter->mapRels[pos] = newMap; 378 | for(unsigned i=0;iinterResults[pos][i] = results[i]; 380 | 381 | free(results); 382 | destroyRadixHashJoinInfo(left); 383 | destroyRadixHashJoinInfo(right); 384 | } 385 | -------------------------------------------------------------------------------- /final/src/Optimizer.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include "Optimizer.h" 4 | #include "Utils.h" 5 | #include "Parser.h" 6 | #include "Joiner.h" 7 | #include "Relation.h" 8 | 9 | void findStats(uint64_t *column, struct columnStats *stat, unsigned columnSize) 10 | { 11 | /* Find MIN and MAX */ 12 | uint64_t min = column[0]; 13 | uint64_t max = column[0]; 14 | for (unsigned i = 1; i < columnSize; ++i) 15 | { 16 | if (column[i] > max) 17 | max = column[i]; 18 | if (column[i] < min) 19 | min = column[i]; 20 | } 21 | 22 | /* Find discrete values */ 23 | unsigned nbits = (max - min + 1 > PRIMELIMIT) ? PRIMELIMIT : max - min + 1; 24 | stat->bitVector = calloc(BITNSLOTS(nbits),CHAR_BIT); 25 | MALLOC_CHECK(stat->bitVector); 26 | stat->discreteValues = 0; 27 | 28 | 29 | /* Find the way to fill the boolean array depending on its size relative to PRIMELIMIT */ 30 | if (nbits != PRIMELIMIT) 31 | { 32 | stat->typeOfBitVector = 0; 33 | for (unsigned i = 0; i < columnSize; ++i) 34 | { 35 | if(BITTEST(stat->bitVector,column[i]-min) == 0) 36 | (stat->discreteValues)++; 37 | BITSET(stat->bitVector,column[i]-min); 38 | } 39 | } 40 | else 41 | { 42 | stat->typeOfBitVector = 1; 43 | for (unsigned i = 0; i < columnSize; ++i) 44 | { 45 | if(BITTEST(stat->bitVector,(column[i]-min) % PRIMELIMIT) == 0) 46 | (stat->discreteValues)++; 47 | BITSET(stat->bitVector,(column[i]-min) % PRIMELIMIT); 48 | } 49 | } 50 | /* Assign the remaining values to the stats*/ 51 | stat->minValue = min; 52 | stat->maxValue = max; 53 | stat->f = columnSize; 54 | stat->bitVectorSize = nbits; 55 | } 56 | 57 | void applyColEqualityEstimations(struct QueryInfo *q, struct Joiner *j) 58 | { 59 | for(unsigned i = 0 ; i < q->numOfPredicates; ++i) 60 | { 61 | unsigned leftRelId = getRelId(&q->predicates[i].left); 62 | unsigned rightRelId = getRelId(&q->predicates[i].right); 63 | unsigned leftColId = getColId(&q->predicates[i].left); 64 | unsigned rightColId = getColId(&q->predicates[i].right); 65 | unsigned actualId = getOriginalRelId(q, &q->predicates[i].left); 66 | struct columnStats *stat1 = &q->estimations[leftRelId][leftColId]; 67 | struct columnStats *stat2 = &q->estimations[rightRelId][rightColId]; 68 | struct columnStats *temp; 69 | 70 | // Same relation - different columns 71 | if(isColEquality(&q->predicates[i]) && (leftColId != rightColId)) 72 | { 73 | // fprintf(stderr,"%u.%u=%u.%u & ",leftRelId,leftColId,rightRelId,rightColId); 74 | // Find estimations for the two columns 75 | uint64_t newMin; 76 | uint64_t newMax; 77 | unsigned newF; 78 | unsigned newD; 79 | 80 | // fprintf(stderr, "stat1: %lu ~~> %lu\n",stat1->minValue,stat1->maxValue); 81 | // fprintf(stderr, "stat2: %lu ~~> %lu\n",stat2->minValue,stat2->maxValue); 82 | 83 | newMin = (stat1->minValue > stat2->minValue) ? stat1->minValue : stat2->minValue; 84 | newMax = (stat1->maxValue < stat2->maxValue) ? stat1->maxValue : stat2->maxValue; 85 | newF = stat1->f / (newMax - newMin + 1); 86 | newD = stat1->discreteValues * (1-power(1-(newF/stat1->f),stat1->f/stat1->discreteValues)); 87 | 88 | /* Update the statistics of every other column except for the ones taking part in the equality */ 89 | /* Note: Number of columns of every relation is stored in joiner, thus we need joiner to access it */ 90 | for (unsigned c = 0; c < (*(j->relations[actualId])).numOfCols; ++c) 91 | { 92 | /* leftRelId and rightRelId are the same, so we could also use rightRelId */ 93 | temp = &q->estimations[leftRelId][c]; 94 | if ((c!=leftColId) && (c!=rightColId)) 95 | { 96 | /* In case stat1->f or temp->discreteValues is zero */ 97 | stat1->f = (stat1->f == 0) ? 1 : stat1->f; 98 | temp->discreteValues = (temp->discreteValues == 0) ? 1 : temp->discreteValues; 99 | 100 | temp->discreteValues = temp->discreteValues * (1-power(1-(newF/stat1->f),temp->f/temp->discreteValues)); 101 | } 102 | } 103 | 104 | /* Update the statistics for the two columns taking part in the equality */ 105 | stat1->minValue = stat2->minValue = newMin; 106 | stat1->maxValue = stat2->maxValue = newMax; 107 | stat1->f = stat2->f = newF; 108 | stat1->discreteValues = stat2->discreteValues = newD; 109 | // printColumnStats(&q->estimations[leftRelId][1]); 110 | } 111 | // Same relation - same column 112 | else if(isColEquality(&q->predicates[i])) 113 | { 114 | stat1->f = (stat1->f * stat1->f) / (stat1->maxValue - stat1->minValue + 1); 115 | for (unsigned c = 0; c < (*(j->relations[actualId])).numOfCols; ++c) 116 | { 117 | /* leftRelId and rightRelId are the same, so we could also use rightRelId */ 118 | /* nothing changes in this case,apart from "f" */ 119 | temp = &q->estimations[leftRelId][c]; 120 | if (c!=rightColId) 121 | temp->f = stat1->f; 122 | } 123 | } 124 | } 125 | } 126 | 127 | void applyFilterEstimations(struct QueryInfo *q, struct Joiner *j) 128 | { 129 | 130 | for(unsigned i = 0 ; i < q->numOfFilters; ++i) 131 | { 132 | unsigned relId = getRelId(&q->filters[i].filterLhs); 133 | unsigned colId = getColId(&q->filters[i].filterLhs); 134 | Comparison cmp = getComparison(&q->filters[i]); 135 | uint64_t constant = getConstant(&q->filters[i]); 136 | unsigned actualRelId = getOriginalRelId(q, &q->filters[i].filterLhs); 137 | struct columnStats *stat = &q->estimations[relId][colId]; 138 | 139 | // fprintf(stderr, "%s\n", "::::::::::::::::::::::::::::::::::::::::::::::::::"); 140 | // fprintf(stderr,"%u.%u%c%ld",relId,colId,cmp,constant); 141 | // fprintf(stderr, "\n%s\n", "::::::::::::::::::::::::::::::::::::::::::::::::::"); 142 | 143 | // printColumnStats(&q->estimations[relId][colId]); 144 | 145 | // fprintf(stderr, "\n\n"); 146 | filterEstimation(j,q,colId,stat,actualRelId,relId,cmp,constant); 147 | 148 | // printColumnStats(&q->estimations[relId][colId]); 149 | } 150 | } 151 | 152 | void applyJoinEstimations(struct QueryInfo *q, struct Joiner *j) 153 | { 154 | 155 | /* Find stats for columns in predicates */ 156 | for(unsigned i=0;inumOfPredicates;++i) 157 | { 158 | unsigned leftRelId = getRelId(&q->predicates[i].left); 159 | unsigned rightRelId = getRelId(&q->predicates[i].right); 160 | unsigned leftColId = getColId(&q->predicates[i].left); 161 | unsigned rightColId = getColId(&q->predicates[i].right); 162 | unsigned actualRelIdLeft = getOriginalRelId(q, &q->predicates[i].left); 163 | unsigned actualRelIdRight = getOriginalRelId(q, &q->predicates[i].right); 164 | struct columnStats *statLeft = &q->estimations[leftRelId][leftColId]; 165 | struct columnStats *statRight = &q->estimations[rightRelId][rightColId]; 166 | struct columnStats *temp; 167 | 168 | // Join between different relations 169 | if(!isColEquality(&q->predicates[i])) 170 | { 171 | // fprintf(stderr,"%u.%u=%u.%u &\n",leftRelId,leftColId,rightRelId,rightColId); 172 | 173 | /* We'll need them for updating every other column of the relations */ 174 | unsigned oldDiscreteLeft = statLeft->discreteValues; 175 | unsigned oldDiscreteRight = statRight->discreteValues; 176 | 177 | uint64_t max,min; 178 | min = (statLeft->minValue > statRight->minValue) ? statLeft->minValue : statRight->minValue; 179 | max = (statLeft->maxValue < statRight->maxValue) ? statLeft->maxValue : statRight->maxValue; 180 | 181 | // printColumnStats(statLeft); 182 | // printColumnStats(statRight); 183 | 184 | /* Implicit filter for min value */ 185 | filterEstimation(j,q,leftColId,statLeft,actualRelIdLeft,leftRelId,'>',min); 186 | filterEstimation(j,q,rightColId,statRight,actualRelIdRight,rightRelId,'>',min); 187 | 188 | /* Implicit filter for max value */ 189 | filterEstimation(j,q,leftColId,statLeft,actualRelIdLeft,leftRelId,'<',max); 190 | filterEstimation(j,q,rightColId,statRight,actualRelIdRight,rightRelId,'<',max); 191 | 192 | statLeft->f = statRight->f = (statLeft->f * statRight->f)/(max-min+1); 193 | statLeft->minValue = statRight->minValue = min; 194 | statLeft->maxValue = statRight->maxValue = max; 195 | statLeft->discreteValues = statRight->discreteValues = (statLeft->discreteValues*statRight->discreteValues)/(max-min+1); 196 | 197 | /* Update the statistics of every other column of the left relation */ 198 | for (unsigned c = 0; c < (*(j->relations[actualRelIdLeft])).numOfCols; ++c) 199 | { 200 | temp = &q->estimations[leftRelId][c]; 201 | if (c != leftColId) 202 | { 203 | oldDiscreteLeft = (oldDiscreteLeft == 0) ? 1 : oldDiscreteLeft; 204 | temp->discreteValues = (temp->discreteValues == 0) ? 1 : temp->discreteValues; 205 | 206 | temp->f = statLeft->f; 207 | temp->discreteValues = 208 | temp->discreteValues * (1-power(1-(statLeft->discreteValues/oldDiscreteLeft),(temp->f/temp->discreteValues))); 209 | } 210 | } 211 | /* Update the statistics of every other column of the right relation */ 212 | for (unsigned c = 0; c < (*(j->relations[actualRelIdRight])).numOfCols; ++c) 213 | { 214 | temp = &q->estimations[rightRelId][c]; 215 | if (c != rightColId) 216 | { 217 | oldDiscreteRight = (oldDiscreteRight == 0) ? 1 : oldDiscreteRight; 218 | temp->discreteValues = (temp->discreteValues == 0) ? 1 : temp->discreteValues; 219 | 220 | temp->f = statLeft->f; 221 | temp->discreteValues = 222 | temp->discreteValues * (1-power(1-(statRight->discreteValues/oldDiscreteRight),(temp->f/temp->discreteValues))); 223 | } 224 | } 225 | // printColumnStats(statLeft); 226 | // printColumnStats(statRight); 227 | } 228 | } 229 | // fprintf(stderr, "\n"); 230 | } 231 | 232 | void filterEstimation(struct Joiner *j,struct QueryInfo *q,unsigned colId,struct columnStats *stat,unsigned actualRelId,unsigned relId,Comparison cmp,uint64_t constant) 233 | { 234 | struct columnStats *temp; 235 | uint64_t fTemp = stat->f; 236 | char isInArray = 0; 237 | 238 | if (cmp == '=') 239 | { 240 | 241 | /* Find if constant is in the discrete values */ 242 | /* If constant is not in the range of values of the column we won't find it in the bitVector */ 243 | if ( (constant < stat->minValue) || (constant > stat->maxValue) ) 244 | isInArray = 0; 245 | /* If the value in the bitVector is 1 then the constant is in the bitVector */ 246 | else 247 | { 248 | /* Make sure we use the correct way to find if it is in the bitVector */ 249 | if (stat->typeOfBitVector == 0) 250 | { 251 | // fprintf(stderr, "%u\n",BITTEST(stat->bitVector,522 - stat->minValue)); 252 | if (BITTEST(stat->bitVector,constant - stat->minValue) != 0) 253 | { 254 | isInArray = 1; 255 | } 256 | } 257 | else if (stat->typeOfBitVector == 1) 258 | { 259 | if(BITTEST(stat->bitVector,(constant - stat->minValue) % PRIMELIMIT) != 0) 260 | isInArray = 1; 261 | } 262 | } 263 | 264 | /* Change the statistics of the column */ 265 | stat->minValue = constant; 266 | stat->maxValue = constant; 267 | if (isInArray == 0) 268 | { 269 | stat->f = 0; 270 | stat->discreteValues = 0; 271 | } 272 | else 273 | { 274 | stat->f = stat->f / stat->discreteValues; 275 | stat->discreteValues = 1; 276 | } 277 | } 278 | else 279 | { 280 | 281 | /* cmp is '>' or '<' */ 282 | /* lowerLimit */ 283 | uint64_t k1 = (cmp == '>') ? constant + 1 : stat->minValue ; 284 | /* upperLimit */ 285 | uint64_t k2 = (cmp == '<') ? constant - 1 : stat->maxValue; 286 | 287 | if (k1 < stat->minValue) 288 | k1 = stat->minValue; 289 | 290 | if (k2 > stat->maxValue) 291 | k2 = stat->maxValue; 292 | 293 | /* If factor is in (0,1],round it up to 1 */ 294 | double factor = (double)(k2 - k1) / (stat->maxValue - stat->minValue); 295 | if((factor <= 1) && (factor > 0)) 296 | factor = 1; 297 | 298 | // fprintf(stderr, "k1:%lu | k2:%lu\n\n",k1,k2); 299 | // fprintf(stderr, "minPrev:%lu | maxPrev:%lu\n",stat->minValue,stat->maxValue); 300 | // fprintf(stderr, "Factor:%.3lf\n",factor); 301 | 302 | /* Change the statistics of the column */ 303 | if(stat->maxValue - stat->minValue > 0) 304 | { 305 | stat->discreteValues = factor * stat->discreteValues; 306 | stat->f = factor * stat->f; 307 | } 308 | else 309 | /* In case of: R.ACONSTANT */ 310 | stat->discreteValues = stat->f = 0 ; 311 | 312 | stat->minValue = k1; 313 | stat->maxValue = k2; 314 | } 315 | 316 | /* Update the statistics of every other column 317 | The formulas are the same for every filter */ 318 | for (unsigned c = 0; c < (*(j->relations[actualRelId])).numOfCols; ++c) 319 | { 320 | temp = &q->estimations[relId][c]; 321 | /* Make sure we update every other column , except the one that we just applied the filter estimations */ 322 | if (c != colId) 323 | { 324 | /* In case fTemp or temp->discreteValues is zero */ 325 | fTemp = (fTemp == 0) ? 1 : fTemp; 326 | temp->discreteValues = (temp->discreteValues == 0) ? 1 : temp->discreteValues; 327 | 328 | temp->discreteValues = temp->discreteValues * (1 - 329 | power(1 - (1 - stat->f / fTemp), 330 | temp->f / temp->discreteValues)); 331 | temp->f = stat->f; 332 | } 333 | } 334 | } 335 | 336 | 337 | void findOptimalJoinOrder(struct QueryInfo *q, struct Joiner *j) 338 | { 339 | // Holds the number of each join predicate and its cost 340 | // unsigned *costArray = malloc(2*q->numOfPredicates*sizeof(unsigned)); 341 | // MALLOC_CHECK(costArray); 342 | // for(unsigned i=0;inumOfPredicates;++i) 343 | // { 344 | // if(!isColEquality(&q->predicates[i])) 345 | // { 346 | // unsigned leftRelId = getRelId(&q->predicates[i].left); 347 | // unsigned rightRelId = getRelId(&q->predicates[i].right); 348 | // unsigned leftColId = getColId(&q->predicates[i].left); 349 | // unsigned rightColId = getColId(&q->predicates[i].right); 350 | // unsigned actualRelIdLeft = getOriginalRelId(q, &q->predicates[i].left); 351 | // unsigned actualRelIdRight = getOriginalRelId(q, &q->predicates[i].right); 352 | // struct columnStats *statLeft = &q->estimations[leftRelId][leftColId]; 353 | // struct columnStats *statRight = &q->estimations[rightRelId][rightColId]; 354 | // costArray[i] = i; 355 | // costArray[i+1] = statLeft->f*statRight->f; 356 | // fprintf(stderr, "Cost[%u]:%u\n",i,costArray[i+1]); 357 | // } 358 | // } 359 | // // Sort in ascending order 360 | // quickSort(costArray,0,2*q->numOfPredicates-1); 361 | // fprintf(stderr, "\n\n"); 362 | } 363 | 364 | 365 | void printColumnStats(struct columnStats *s) 366 | { 367 | fprintf(stderr, "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"); 368 | fprintf(stderr, "minValue: %lu\n", s->minValue); 369 | fprintf(stderr, "maxValue: %lu\n", s->maxValue); 370 | fprintf(stderr, "f: %u\n", s->f); 371 | fprintf(stderr, "discreteValues: %u\n", s->discreteValues); 372 | // fprintf(stderr, "Type of boolean array is: %d\n", s->typeOfBitVector); 373 | fprintf(stderr, "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"); 374 | } 375 | -------------------------------------------------------------------------------- /final/src/Parser.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include /*strtok(),strcpy()*/ 4 | 5 | #include "Parser.h" 6 | #include "Utils.h" 7 | #include "Relation.h" 8 | #include "Optimizer.h" 9 | 10 | 11 | void createQueryEstimations(struct QueryInfo *qInfo,struct Joiner *j) 12 | { 13 | qInfo->estimations = malloc(qInfo->numOfRelationIds*sizeof(struct columnStats*)); 14 | MALLOC_CHECK(qInfo->estimations); 15 | for(unsigned i=0;inumOfRelationIds;++i) 16 | { 17 | unsigned relId = qInfo->relationIds[i]; 18 | unsigned size = j->relations[relId]->numOfCols*sizeof(struct columnStats); 19 | 20 | // Allocate space to store the estimations 21 | qInfo->estimations[i] = malloc(size); 22 | MALLOC_CHECK(qInfo->estimations[i]); 23 | // Fetch the stats, calculated when the relations were being loaded in memory 24 | memcpy(qInfo->estimations[i],j->relations[relId]->stats,size); 25 | } 26 | } 27 | 28 | void createQueryInfo(struct QueryInfo **qInfo,char *rawQuery) 29 | { 30 | *qInfo = malloc(sizeof(struct QueryInfo)); 31 | MALLOC_CHECK(*qInfo); 32 | (*qInfo)->estimations = NULL; 33 | parseQuery(*qInfo,rawQuery); 34 | } 35 | 36 | void destroyQueryInfo(struct QueryInfo *qInfo) 37 | { 38 | free(qInfo->relationIds); 39 | free(qInfo->predicates); 40 | free(qInfo->filters); 41 | free(qInfo->selections); 42 | if(qInfo->estimations)/* Useful in unit testing */ 43 | for(unsigned i=0;inumOfRelationIds;++i) 44 | free(qInfo->estimations[i]); 45 | free(qInfo->estimations); 46 | free(qInfo); 47 | } 48 | 49 | void parseQuery(struct QueryInfo *qInfo,char *rawQuery) 50 | { 51 | char rawRelations[BUFFERSIZE]; 52 | char rawPredicates[BUFFERSIZE]; 53 | char rawSelections[BUFFERSIZE]; 54 | 55 | /* Split query into three parts */ 56 | if( (sscanf(rawQuery,"%[^|]|%[^|]|%[^|]",rawRelations,rawPredicates,rawSelections)) != 3 ) 57 | { 58 | fprintf(stderr,"Query \"%s\" does not consist of three parts\nExiting...\n\n",rawQuery); 59 | exit(EXIT_FAILURE); 60 | } 61 | 62 | /* Parse each part */ 63 | parseRelationIds(qInfo,rawRelations); 64 | parsePredicates(qInfo,rawPredicates); 65 | parseSelections(qInfo,rawSelections); 66 | } 67 | 68 | void parseRelationIds(struct QueryInfo *qInfo,char *rawRelations) 69 | { 70 | char* temp = rawRelations; 71 | unsigned i; 72 | int offset; 73 | 74 | /* Get number of relationIds */ 75 | qInfo->numOfRelationIds = 0; 76 | while(sscanf(temp,"%*d%n",&offset)>=0) 77 | { 78 | ++qInfo->numOfRelationIds; 79 | temp+=offset; 80 | } 81 | if(!qInfo->numOfRelationIds) 82 | { 83 | fprintf(stderr,"Zero join relations were found in the query\nExiting...\n\n"); 84 | exit(EXIT_FAILURE); 85 | } 86 | 87 | /* Allocate memory for relationIds */ 88 | qInfo->relationIds = malloc(qInfo->numOfRelationIds*sizeof(unsigned)); 89 | MALLOC_CHECK(qInfo->relationIds); 90 | 91 | /* Store relationIds */ 92 | temp = rawRelations; 93 | i=0; 94 | while(sscanf(temp,"%u%n",&qInfo->relationIds[i],&offset)>0) 95 | { 96 | ++i; 97 | temp+=offset; 98 | } 99 | } 100 | 101 | void parseSelections(struct QueryInfo *qInfo,char *rawSelections) 102 | { 103 | 104 | char* temp = rawSelections; 105 | unsigned relId,colId,i; 106 | int offset; 107 | 108 | /* Get number of selections */ 109 | qInfo->numOfSelections = 0; 110 | while(sscanf(temp,"%*u.%*u%n",&offset)>=0) 111 | { 112 | ++qInfo->numOfSelections; 113 | temp+=offset; 114 | } 115 | if(!qInfo->numOfSelections) 116 | { 117 | fprintf(stderr,"Zero selections were found in the query\nExiting...\n\n"); 118 | exit(EXIT_FAILURE); 119 | } 120 | 121 | /* Allocate memory for selections */ 122 | qInfo->selections = malloc(qInfo->numOfSelections*sizeof(struct SelectInfo)); 123 | MALLOC_CHECK(qInfo->selections); 124 | 125 | /* Store selections */ 126 | temp = rawSelections; 127 | i=0; 128 | while(sscanf(temp,"%u.%u%n",&relId,&colId,&offset)>0) 129 | { 130 | qInfo->selections[i].relId = relId; 131 | qInfo->selections[i].colId = colId; 132 | ++i; 133 | temp+=offset; 134 | } 135 | } 136 | 137 | void parsePredicates(struct QueryInfo *qInfo,char *rawPredicates) 138 | { 139 | unsigned i,j; 140 | char *token; 141 | char *temp = malloc((strlen(rawPredicates)+1)*sizeof(char)); 142 | MALLOC_CHECK(temp); 143 | strcpy(temp,rawPredicates); 144 | 145 | /* Get number of predicates and filters */ 146 | qInfo->numOfFilters = 0; 147 | qInfo->numOfPredicates = 0; 148 | token = strtok(temp,"&"); 149 | if(isFilter(token)) 150 | ++qInfo->numOfFilters; 151 | else 152 | ++qInfo->numOfPredicates; 153 | while(token=strtok(NULL,"&")) 154 | if(isFilter(token)) 155 | ++qInfo->numOfFilters; 156 | else 157 | ++qInfo->numOfPredicates; 158 | 159 | if(!(qInfo->numOfPredicates+qInfo->numOfFilters)) 160 | { 161 | fprintf(stderr,"Zero predicates were found in the query\nExiting...\n\n"); 162 | exit(EXIT_FAILURE); 163 | } 164 | 165 | /* Allocate memory for predicates and filters */ 166 | qInfo->predicates = malloc(qInfo->numOfPredicates*sizeof(struct PredicateInfo)); 167 | MALLOC_CHECK(qInfo->predicates); 168 | qInfo->filters = malloc(qInfo->numOfFilters*sizeof(struct FilterInfo)); 169 | MALLOC_CHECK(qInfo->filters); 170 | 171 | /* Store predicates & filters */ 172 | strcpy(temp,rawPredicates); 173 | token = strtok(temp,"&"); 174 | i=j=0; 175 | if(isFilter(token)) 176 | {addFilter(&qInfo->filters[i],token);++i;} 177 | else 178 | {addPredicate(&qInfo->predicates[j],token);++j;} 179 | 180 | 181 | while(token=strtok(NULL,"&")) 182 | if(isFilter(token)) 183 | {addFilter(&qInfo->filters[i],token);++i;} 184 | else 185 | {addPredicate(&qInfo->predicates[j],token);++j;} 186 | 187 | free(temp); 188 | } 189 | 190 | void addFilter(struct FilterInfo *fInfo,char *token) 191 | { 192 | unsigned relId; 193 | unsigned colId; 194 | char cmp; 195 | uint64_t constant; 196 | sscanf(token,"%u.%u%c%lu",&relId,&colId,&cmp,&constant); 197 | // printf("\"%u.%u%c%lu\"\n",relId,colId,cmp,constant); 198 | fInfo->filterLhs.relId = relId; 199 | fInfo->filterLhs.colId = colId; 200 | fInfo->comparison = cmp; 201 | fInfo->constant = constant; 202 | } 203 | 204 | void addPredicate(struct PredicateInfo *pInfo,char *token) 205 | { 206 | unsigned relId1; 207 | unsigned colId1; 208 | unsigned relId2; 209 | unsigned colId2; 210 | sscanf(token,"%u.%u=%u.%u",&relId1,&colId1,&relId2,&colId2); 211 | // printf("\"%u.%u=%u.%u\"\n",relId1,colId1,relId2,colId2); 212 | pInfo->left.relId = relId1; 213 | pInfo->left.colId = colId1; 214 | pInfo->right.relId = relId2; 215 | pInfo->right.colId = colId2; 216 | } 217 | 218 | int isFilter(char *predicate) 219 | { 220 | char constant[20]; 221 | sscanf(predicate,"%*u.%*u%*[=<>]%s",constant); 222 | 223 | if(!strstr(constant,".")) 224 | return 1; 225 | else 226 | return 0; 227 | } 228 | 229 | int isColEquality(struct PredicateInfo *pInfo) 230 | {return (pInfo->left.relId == pInfo->right.relId); } 231 | 232 | unsigned getRelId(struct SelectInfo *sInfo) 233 | {return sInfo->relId;} 234 | 235 | unsigned getOriginalRelId(struct QueryInfo *qInfo,struct SelectInfo *sInfo) 236 | {return qInfo->relationIds[sInfo->relId];} 237 | 238 | unsigned getColId(struct SelectInfo *sInfo) 239 | {return sInfo->colId;} 240 | 241 | uint64_t getConstant(struct FilterInfo *fInfo) 242 | {return fInfo->constant;} 243 | 244 | Comparison getComparison(struct FilterInfo *fInfo) 245 | {return fInfo->comparison;} 246 | 247 | unsigned getNumOfFilters(struct QueryInfo *qInfo) 248 | {return qInfo->numOfFilters;} 249 | 250 | unsigned getNumOfRelations(struct QueryInfo *qInfo) 251 | {return qInfo->numOfRelationIds;} 252 | 253 | unsigned getNumOfColEqualities(struct QueryInfo *qInfo) 254 | { 255 | unsigned sum=0; 256 | for(unsigned i=0;inumOfPredicates;++i) 257 | if(isColEquality(&qInfo->predicates[i])) 258 | ++sum; 259 | return sum; 260 | } 261 | 262 | unsigned getNumOfJoins(struct QueryInfo *qInfo) 263 | { 264 | unsigned sum=0; 265 | for(unsigned i=0;inumOfPredicates;++i) 266 | if(!isColEquality(&qInfo->predicates[i])) 267 | ++sum; 268 | return sum; 269 | } 270 | 271 | /**************************** For Testing... ***************************************/ 272 | void printTest(struct QueryInfo *qInfo) 273 | { 274 | for(unsigned j=0;jnumOfRelationIds;++j) 275 | { 276 | fprintf(stderr,"%u ",qInfo->relationIds[j]); 277 | } 278 | fprintf(stderr,"|"); 279 | for(unsigned j=0;jnumOfPredicates;++j) 280 | { 281 | unsigned leftRelId = getRelId(&qInfo->predicates[j].left); 282 | unsigned rightRelId = getRelId(&qInfo->predicates[j].right); 283 | unsigned leftColId = getColId(&qInfo->predicates[j].left); 284 | unsigned rightColId = getColId(&qInfo->predicates[j].right); 285 | 286 | if(isColEquality(&qInfo->predicates[j])) 287 | fprintf(stderr,"[%u.%u=%u.%u] & ",leftRelId,leftColId,rightRelId,rightColId); 288 | else 289 | fprintf(stderr,"%u.%u=%u.%u & ",leftRelId,leftColId,rightRelId,rightColId); 290 | } 291 | for(unsigned j=0;jnumOfFilters;++j) 292 | { 293 | unsigned relId = getRelId(&qInfo->filters[j].filterLhs); 294 | unsigned colId = getColId(&qInfo->filters[j].filterLhs); 295 | Comparison cmp = getComparison(&qInfo->filters[j]); 296 | uint64_t constant = getConstant(&qInfo->filters[j]); 297 | 298 | fprintf(stderr,"%u.%u%c%ld & ",relId,colId,cmp,constant); 299 | } 300 | fprintf(stderr,"|"); 301 | for(unsigned j=0;jnumOfSelections;++j) 302 | { 303 | unsigned relId = getRelId(&qInfo->selections[j]); 304 | unsigned colId = getColId(&qInfo->selections[j]); 305 | fprintf(stderr,"%u.%u ",relId,colId); 306 | } 307 | fprintf(stderr, "\n"); 308 | } 309 | -------------------------------------------------------------------------------- /final/src/Partition.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include /* strerror() */ 5 | #include /* errno */ 6 | #include 7 | 8 | #include "Partition.h" 9 | #include "Intermediate.h" 10 | #include "JobScheduler.h" 11 | #include "Queue.h" 12 | #include "Utils.h" 13 | #include "Vector.h" 14 | 15 | #define PARALLEL_PARTITION 0 16 | #define PARALLEL_HISTOGRAM 0 17 | 18 | void histFunc(void* arg) 19 | { 20 | // double elapsed; 21 | // struct timespec start, finish; 22 | // clock_gettime(CLOCK_MONOTONIC, &start); 23 | 24 | struct histArg *myarg = arg; 25 | for(unsigned i=0;ihistogram[i] = 0; 27 | } 28 | for(unsigned i=myarg->start;iend;++i){ 29 | myarg->histogram[HASH_FUN_1(myarg->values[i])] += 1; 30 | } 31 | // clock_gettime(CLOCK_MONOTONIC, &finish); 32 | // elapsed = (finish.tv_sec - start.tv_sec); 33 | // elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0; 34 | // fprintf(stderr, "Duration[%u]:%.5lf seconds\n",(unsigned)pthread_self(),elapsed); 35 | 36 | pthread_barrier_wait(&barrier); 37 | } 38 | 39 | void partitionFunc(void* arg) 40 | { 41 | // double elapsed; 42 | // struct timespec start, finish; 43 | // clock_gettime(CLOCK_MONOTONIC, &start); 44 | 45 | struct partitionArg *myarg = arg; 46 | for(unsigned i=myarg->start;iend;++i) 47 | { 48 | /* Get the hashValue from unsorted column */ 49 | unsigned h; 50 | if(myarg->info->isInInter) 51 | h = HASH_FUN_1(myarg->info->unsorted->values[i]); 52 | else 53 | h = HASH_FUN_1(myarg->info->col[i]); 54 | 55 | /* Go to pSumCopy and find where we need to place it in sorted column */ 56 | unsigned offset; 57 | pthread_mutex_lock(&partitionMtxArray[h]); 58 | offset = myarg->pSumCopy[h]; 59 | 60 | 61 | /** 62 | * Increment the value to know where to put he next element with the same hashValue 63 | */ 64 | myarg->pSumCopy[h]++; 65 | if(myarg->info->isInInter) 66 | { 67 | myarg->info->sorted->values[offset] = myarg->info->unsorted->values[i]; 68 | myarg->info->sorted->rowIds[offset] = myarg->info->unsorted->rowIds[i]; 69 | insertAtPos(myarg->info->sorted->tuples,&myarg->info->unsorted->tuples->table[i*myarg->info->tupleSize],offset); 70 | } 71 | else 72 | { 73 | myarg->info->sorted->values[offset] = myarg->info->col[i]; 74 | myarg->info->sorted->rowIds[offset] = i; 75 | insertAtPos(myarg->info->sorted->tuples,&i,offset); 76 | } 77 | pthread_mutex_unlock(&partitionMtxArray[h]); 78 | } 79 | // clock_gettime(CLOCK_MONOTONIC, &finish); 80 | // elapsed = (finish.tv_sec - start.tv_sec); 81 | // elapsed += (finish.tv_nsec - start.tv_nsec) / 1000000000.0; 82 | // fprintf(stderr, "Duration[%u]:%.5lf seconds\n",(unsigned)pthread_self(),elapsed); 83 | pthread_barrier_wait(&barrier); 84 | } 85 | 86 | void partition(RadixHashJoinInfo *info) 87 | { 88 | info->unsorted = malloc(sizeof(ColumnInfo)); 89 | MALLOC_CHECK(info->unsorted); 90 | 91 | info->unsorted->values = malloc(info->numOfTuples*sizeof(uint64_t)); 92 | MALLOC_CHECK(info->unsorted->values); 93 | 94 | info->unsorted->rowIds = malloc(info->numOfTuples*sizeof(unsigned)); 95 | MALLOC_CHECK(info->unsorted->rowIds); 96 | 97 | info->sorted = malloc(sizeof(ColumnInfo)); 98 | MALLOC_CHECK(info->sorted); 99 | 100 | info->sorted->values = malloc(info->numOfTuples*sizeof(uint64_t)); 101 | MALLOC_CHECK(info->sorted->values); 102 | 103 | info->sorted->rowIds = malloc(info->numOfTuples*sizeof(unsigned)); 104 | MALLOC_CHECK(info->sorted->rowIds); 105 | 106 | info->indexArray = NULL; 107 | 108 | createVectorFixedSize(&info->sorted->tuples,info->tupleSize,info->numOfTuples); 109 | 110 | createVector(&info->unsorted->tuples,info->tupleSize); 111 | 112 | // Get values-rowIds from intermediate vector 113 | if(info->isInInter) 114 | scanJoin(info); 115 | 116 | // Create our main histogram 117 | info->hist = malloc(HASH_RANGE_1*sizeof(unsigned)); 118 | MALLOC_CHECK(info->hist); 119 | 120 | #if PARALLEL_HISTOGRAM 121 | // Enqueue histogram jobs 122 | unsigned chunkSize = info->numOfTuples / js->threadNum; 123 | unsigned lastEnd = 0; 124 | unsigned i; 125 | for(i=0;ithreadNum-1;++i) 126 | { 127 | struct histArg *arg = js->histJobs[i].argument; 128 | arg->start = i*chunkSize; 129 | arg->end = arg->start + chunkSize; 130 | arg->values = (info->isInInter) ? info->unsorted->values : info->col; 131 | pthread_mutex_lock(&queueMtx); 132 | enQueue(jobQueue,&js->histJobs[i]); 133 | pthread_cond_signal(&condNonEmpty); 134 | pthread_mutex_unlock(&queueMtx); 135 | lastEnd = arg->end; 136 | } 137 | // Remaining values, in case numOfTuples could not be divided exactly between the threads 138 | struct histArg *arg = js->histJobs[i].argument; 139 | arg->start = lastEnd; 140 | arg->end = info->numOfTuples; 141 | arg->values = (info->isInInter) ? info->unsorted->values : info->col; 142 | pthread_mutex_lock(&queueMtx); 143 | enQueue(jobQueue,&js->histJobs[i]); 144 | pthread_cond_signal(&condNonEmpty); 145 | pthread_mutex_unlock(&queueMtx); 146 | 147 | // Wait until all threads are done 148 | // fprintf(stderr, "%s\n","Before barrier" ); 149 | pthread_barrier_wait(&barrier); 150 | // fprintf(stderr, "%s\n","After barrier" ); 151 | 152 | 153 | // Merge the partial histograms 154 | for(unsigned i=0;ihist[i] = 0; 156 | 157 | for(unsigned t=0;tthreadNum;++t) 158 | for(unsigned h=0;hhist[h]+=js->histArray[t][h]; 160 | #else 161 | 162 | for(unsigned i=0;ihist[i] = 0; 164 | 165 | for(unsigned i=0;inumOfTuples;++i) 166 | if(info->isInInter) 167 | info->hist[HASH_FUN_1(info->unsorted->values[i])] += 1; 168 | else 169 | info->hist[HASH_FUN_1(info->col[i])] += 1; 170 | #endif 171 | 172 | // Calculate Prefix Sum 173 | unsigned sum = 0; 174 | info->pSum = malloc(HASH_RANGE_1*sizeof(unsigned)); 175 | MALLOC_CHECK(info->pSum); 176 | 177 | for(unsigned i=0;ipSum[i] = 0; 179 | 180 | for(unsigned i=0;ipSum[i] = sum; 183 | sum += info->hist[i]; 184 | } 185 | 186 | unsigned *pSumCopy = malloc(HASH_RANGE_1*sizeof(unsigned)); 187 | MALLOC_CHECK(pSumCopy); 188 | for(unsigned i=0;ipSum[i]; 190 | 191 | #if PARALLEL_PARTITION 192 | lastEnd = 0; 193 | for(i=0;ithreadNum-1;++i) 194 | { 195 | struct partitionArg *arg = js->partitionJobs[i].argument; 196 | arg->start = i*chunkSize; 197 | arg->end = arg->start + chunkSize; 198 | arg->pSumCopy = pSumCopy; 199 | arg->info = info; 200 | pthread_mutex_lock(&queueMtx); 201 | enQueue(jobQueue,&js->partitionJobs[i]); 202 | pthread_cond_signal(&condNonEmpty); 203 | pthread_mutex_unlock(&queueMtx); 204 | lastEnd = arg->end; 205 | } 206 | // Remaining values, in case numOfTuples could not be divided exactly between the threads 207 | struct partitionArg *pArg = js->partitionJobs[i].argument; 208 | pArg->start = lastEnd; 209 | pArg->end = info->numOfTuples; 210 | pArg->pSumCopy = pSumCopy; 211 | pArg->info = info; 212 | pthread_mutex_lock(&queueMtx); 213 | enQueue(jobQueue,&js->partitionJobs[i]); 214 | pthread_cond_signal(&condNonEmpty); 215 | pthread_mutex_unlock(&queueMtx); 216 | 217 | // Wait until all threads are done 218 | // fprintf(stderr, "%s\n","Before barrier" ); 219 | pthread_barrier_wait(&barrier); 220 | // fprintf(stderr, "%s\n","After barrier" ); 221 | #else 222 | 223 | for(unsigned i=0;inumOfTuples;++i) 224 | { 225 | /* Get the hashValue from unsorted column */ 226 | unsigned h; 227 | if(info->isInInter) 228 | h = HASH_FUN_1(info->unsorted->values[i]); 229 | else 230 | h = HASH_FUN_1(info->col[i]); 231 | 232 | /* Go to pSumCopy and find where we need to place it in sorted column */ 233 | unsigned offset; 234 | offset = pSumCopy[h]; 235 | 236 | /** 237 | * Increment the value to know where to put he next element with the same hashValue 238 | * With this we lose the original representation of pSum 239 | * If we want to have access we must create a acopy before entering this for loop 240 | */ 241 | pSumCopy[h]++; 242 | if(info->isInInter) 243 | { 244 | info->sorted->values[offset] = info->unsorted->values[i]; 245 | info->sorted->rowIds[offset] = info->unsorted->rowIds[i]; 246 | insertAtPos(info->sorted->tuples,&info->unsorted->tuples->table[i*info->tupleSize],offset); 247 | } 248 | else 249 | { 250 | info->sorted->values[offset] = info->col[i]; 251 | info->sorted->rowIds[offset] = i; 252 | insertAtPos(info->sorted->tuples,&i,offset); 253 | } 254 | } 255 | #endif 256 | free(pSumCopy); 257 | } 258 | -------------------------------------------------------------------------------- /final/src/Probe.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include /*malloc() and free()*/ 3 | #include /* strerror() */ 4 | #include /* errno */ 5 | #include 6 | 7 | #include "Probe.h" 8 | #include "Vector.h" 9 | #include "Build.h" 10 | #include "Utils.h" 11 | #include "JobScheduler.h" 12 | 13 | void joinFunc(void *arg){ 14 | 15 | struct joinArg *myarg = arg; 16 | RadixHashJoinInfo *left,*right; 17 | left = myarg->left; 18 | right = myarg->right; 19 | struct Vector* results = myarg->results; 20 | unsigned searchBucket = myarg->bucket; 21 | 22 | unsigned i; 23 | unsigned searchValue; 24 | uint64_t hash; 25 | unsigned start; 26 | int k; 27 | struct Index *searchIndex; 28 | 29 | RadixHashJoinInfo *big,*small; 30 | big = left->isSmall ? right:left; 31 | small = left->isSmall ? left:right; 32 | 33 | /* The range that current thread is responsible for */ 34 | unsigned tStart = big->pSum[searchBucket]; 35 | unsigned tEnd = tStart + big->hist[searchBucket]; 36 | 37 | unsigned *tupleToInsert; 38 | unsigned tupleSize = small->tupleSize+big->tupleSize; 39 | tupleToInsert = malloc(tupleSize*sizeof(unsigned)); 40 | MALLOC_CHECK(tupleToInsert); 41 | 42 | /* For every tuple(i.e:record) in the big relation */ 43 | for(i=tStart;isorted->values[i]; 47 | 48 | /** 49 | * Find out in which bucket of the small relation 50 | * we should search. 51 | * 52 | * No need for that, as each thread is assigned 53 | * a specific bucket to work on. 54 | */ 55 | // searchBucket = HASH_FUN_1(searchValue); 56 | 57 | /* Bucket is empty, there is nothing to search here */ 58 | if(small->hist[searchBucket] == 0) 59 | continue; 60 | 61 | /* Fetch starting point of the bucket */ 62 | start = small->pSum[searchBucket]; 63 | 64 | /* Fetch the index of this bucket */ 65 | searchIndex = small->indexArray[searchBucket]; 66 | 67 | /* Find out where to look for in the bucketArray of the index */ 68 | hash = HASH_FUN_2(searchValue); 69 | 70 | /* Bucket is not empty, but there is no value equal to the searchValue */ 71 | if(searchIndex->bucketArray[hash]==0) 72 | continue; 73 | 74 | /* Warning: In bucketArray and chainArray we've stored the rowIds relevant to the bucket [i.e: 0 ~> bucketSize-1] */ 75 | 76 | k = searchIndex->bucketArray[hash] - 1; 77 | checkEqual(small,big,i,start,searchValue,k,results,tupleToInsert); 78 | 79 | while(1) 80 | { 81 | // We've reached the end of the chain 82 | if(searchIndex->chainArray[k] == 0) 83 | break; 84 | 85 | /* Step further on the chain */ 86 | else 87 | { 88 | k = searchIndex->chainArray[k] - 1; 89 | checkEqual(small,big,i,start,searchValue,k,results,tupleToInsert); 90 | } 91 | } 92 | 93 | } 94 | free(tupleToInsert); 95 | pthread_mutex_lock(&jobsFinishedMtx); 96 | ++jobsFinished; 97 | pthread_cond_signal(&condJobsFinished); 98 | pthread_mutex_unlock(&jobsFinishedMtx); 99 | } 100 | 101 | void checkEqual(RadixHashJoinInfo *small,RadixHashJoinInfo *big,unsigned i,unsigned start,unsigned searchValue,unsigned pseudoRow,struct Vector *results,unsigned *tupleToInsert) 102 | { 103 | uint32_t actualRow; 104 | /* We calculate the rowId relevant to the sorted array [i.e: 0 ~> numOfTuples] */ 105 | actualRow = start + pseudoRow; 106 | 107 | /* 108 | * Check equality and insert tuple into results vector 109 | * The new tuple is constructed by combining the tuple of each relation. 110 | */ 111 | if(small->sorted->values[actualRow] == searchValue) 112 | { 113 | constructTuple(small,big,actualRow,i,tupleToInsert); 114 | insertAtVector(results,tupleToInsert); 115 | 116 | } 117 | } 118 | 119 | void constructTuple(RadixHashJoinInfo *small,RadixHashJoinInfo *big,unsigned actualRow,unsigned i,unsigned *target) 120 | { 121 | /** 122 | * We need to construct the tuple that we're gonna add in the results vector 123 | * 124 | * First,we add left column's tuple [it is important to add left column's tuple first] 125 | * Then,we add right column's tuple. 126 | * 127 | * Notice that we need to access different row depending on 128 | * whether the column is small or big [i.e: indexed or not] 129 | */ 130 | unsigned *t; 131 | unsigned k=0; 132 | RadixHashJoinInfo *left = small->isLeft ? small : big; 133 | RadixHashJoinInfo *right = small->isLeft ? big : small; 134 | 135 | // Add values from left column's tuple 136 | if(left->isSmall) 137 | t = &left->sorted->tuples->table[actualRow*left->sorted->tuples->tupleSize]; 138 | else 139 | t = &left->sorted->tuples->table[i*left->sorted->tuples->tupleSize]; 140 | 141 | for(unsigned i=0;itupleSize;++i) 142 | target[k++] = t[i]; 143 | 144 | // Add values from right column's tuple 145 | if(right->isSmall) 146 | t = &right->sorted->tuples->table[actualRow*right->sorted->tuples->tupleSize]; 147 | else 148 | t = &right->sorted->tuples->table[i*right->sorted->tuples->tupleSize]; 149 | 150 | for(unsigned i=0;itupleSize;++i) 151 | target[k++] = t[i]; 152 | } 153 | -------------------------------------------------------------------------------- /final/src/Queue.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include "Queue.h" 5 | #include "Utils.h" 6 | 7 | void createQueue(struct Queue **q, int size) 8 | { 9 | /* Dynamic */ 10 | *q = malloc(sizeof(struct Queue)); 11 | MALLOC_CHECK(*q); 12 | (*q)->array = malloc(sizeof(int) * size); 13 | MALLOC_CHECK((*q)->array); 14 | 15 | /* Data for ds */ 16 | (*q)->size = size; 17 | (*q)->front = -1; 18 | (*q)->rear = -1; 19 | 20 | } 21 | 22 | void destroyQueue(struct Queue *q) 23 | { 24 | free(q->array); 25 | free(q); 26 | } 27 | 28 | int enQueue(struct Queue *q, void* item) 29 | { 30 | // fprintf(stderr,"item %p\n", item); 31 | if ( ((q->rear == q->size - 1) && (q->front == 0)) || (q->rear == (q->front - 1) % (q->size - 1)) ) 32 | { 33 | // display(q); 34 | fprintf(stderr, "%s\n", "Circular Queue is full"); 35 | return 0; 36 | } 37 | /* This is the first item inserted */ 38 | else if (q->front == -1) 39 | { 40 | q->rear = 0; 41 | q->front = 0; 42 | q->array[0] = item; 43 | } 44 | /* When the rear has reached the end but there is still space in the array 45 | we just move rear to the begining of the queue */ 46 | else if ( (q->rear == q->size - 1) && (q->front != 0)) 47 | { 48 | q->rear = 0; 49 | q->array[q->rear] = item; 50 | } 51 | /* just insert in the next position */ 52 | else 53 | q->array[++(q->rear)] = item; 54 | } 55 | 56 | int isEmpty(struct Queue *q){ 57 | return (q->front == -1); 58 | } 59 | 60 | void* deQueue(struct Queue *q) 61 | { 62 | if (q->front == -1) 63 | { 64 | fprintf(stderr, "%s\n", "Circular Queue is empty"); 65 | return 0; 66 | } 67 | 68 | void* value = q->array[q->front]; 69 | 70 | /* We had an one item queue if this is true */ 71 | if (q->front == q->rear) 72 | q->front = q->rear = -1; 73 | /* we have reached the end , so we start extracting from the start again */ 74 | else if (q->front == q->size - 1) 75 | q->front = 0; 76 | /* we just go to the next element */ 77 | else 78 | ++(q->front); 79 | 80 | return value; 81 | } 82 | 83 | 84 | void display(struct Queue *q) 85 | { 86 | printf("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"); 87 | fprintf(stderr,"front is :%d \n", q->front); 88 | fprintf(stderr,"rear is :%d \n", q->rear); 89 | for (int i = 0; i < q->size; i++) 90 | fprintf(stderr,"%p| ", q->array[i]); 91 | fprintf(stderr,"\n"); 92 | fprintf(stderr,"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n"); 93 | } 94 | -------------------------------------------------------------------------------- /final/src/Relation.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include "Relation.h" 13 | #include "Utils.h" 14 | #include "Optimizer.h" 15 | 16 | 17 | void createRelation(struct Relation **rel,char *fileName) 18 | { 19 | *rel = malloc(sizeof(struct Relation)); 20 | MALLOC_CHECK(*rel); 21 | (*rel)->columns = NULL; 22 | loadRelation(*rel,fileName); 23 | 24 | // Allocate space for stats 25 | (*rel)->stats = malloc((*rel)->numOfCols*sizeof(struct columnStats)); 26 | MALLOC_CHECK((*rel)->stats); 27 | 28 | // Calculate stats for every column of the relation 29 | for (unsigned i = 0; i < (*rel)->numOfCols; ++i) 30 | { 31 | findStats((*rel)->columns[i], &(*rel)->stats[i], (*rel)->numOfTuples); 32 | // fprintf(stderr, "Relation[%s]\n",fileName); 33 | // printColumnStats(&(*rel)->stats[i]); 34 | } 35 | // fprintf(stderr, "\n\n\n"); 36 | } 37 | 38 | void loadRelation(struct Relation *rel,char *fileName) 39 | { 40 | int fd; 41 | 42 | /* Open relation file */ 43 | if( (fd = open(fileName, O_RDONLY)) == -1){ 44 | perror("open failed[loadRelation]"); 45 | exit(EXIT_FAILURE); 46 | } 47 | 48 | /* Find its size (in bytes) */ 49 | struct stat sb; 50 | if (fstat(fd,&sb)==-1){ 51 | perror("fstat failed[loadRelation]"); 52 | exit(EXIT_FAILURE); 53 | } 54 | 55 | unsigned fileSize = sb.st_size; 56 | if (fileSize<16){ 57 | fprintf(stderr,"Relation file \"%s\" does not contain a valid header",fileName); 58 | exit(EXIT_FAILURE); 59 | } 60 | 61 | /* Map file to memory */ 62 | char *addr; 63 | if( (addr = mmap(NULL,fileSize,PROT_READ,MAP_PRIVATE,fd,0u)) == MAP_FAILED ){ 64 | 65 | perror("mmap failed[loadRelation]"); 66 | exit(EXIT_FAILURE); 67 | } 68 | 69 | /* Fetch numOfTuples & numOfCols */ 70 | rel->numOfTuples = *((uint64_t*) addr); 71 | addr+=sizeof((uint64_t)rel->numOfTuples); 72 | rel->numOfCols = *((uint64_t*) addr); 73 | addr+=sizeof((uint64_t)rel->numOfCols); 74 | 75 | rel->columns = malloc(rel->numOfCols*sizeof(uint64_t*)); 76 | MALLOC_CHECK(rel->columns); 77 | 78 | /* Map every relation's column to rel->columns array */ 79 | for (unsigned i=0;inumOfCols;++i) 80 | { 81 | rel->columns[i] = (uint64_t*) addr; 82 | addr+=rel->numOfTuples*sizeof(uint64_t); 83 | } 84 | 85 | /* Closing the file does not affect mmap according to man page */ 86 | close(fd); 87 | } 88 | 89 | void dumpRelation(struct Relation *rel,char *fileName) 90 | { 91 | /* Create path */ 92 | char path[100] = "../../dumpFiles/"; 93 | strcat(path,fileName); 94 | strcat(path,".dump"); 95 | 96 | FILE* fp; 97 | if( (fp=fopen(path,"w"))==NULL) 98 | { 99 | perror("fopen failed[dumpRelation]"); 100 | exit(EXIT_FAILURE); 101 | } 102 | 103 | for (unsigned i=0;inumOfTuples;++i) 104 | { 105 | for(unsigned j=0;jnumOfCols;++j) 106 | fprintf(fp,"%lu|", rel->columns[j][i]); 107 | fprintf(fp,"\n"); 108 | } 109 | fclose(fp); 110 | } 111 | 112 | void printRelation(struct Relation *rel) 113 | { 114 | for (unsigned i=0;inumOfTuples;++i) 115 | { 116 | for(unsigned j=0;jnumOfCols;++j) 117 | printf("%lu|", rel->columns[j][i]); 118 | printf("\n"); 119 | } 120 | } 121 | 122 | void destroyRelation(struct Relation *rel) 123 | { 124 | /** 125 | * It is recommened to call munmap(...) but the process is going 126 | * to terminate anyway afterwards. 127 | */ 128 | for(unsigned i=0;inumOfCols;++i) 129 | free(rel->stats[i].bitVector); 130 | free(rel->stats); 131 | free(rel->columns); 132 | free(rel); 133 | } 134 | -------------------------------------------------------------------------------- /final/src/Utils.c: -------------------------------------------------------------------------------- 1 | #include /* uint64_t */ 2 | 3 | #include "Utils.h" 4 | 5 | int compare(uint64_t key,Comparison cmp,uint64_t constant) 6 | { 7 | switch(cmp) 8 | { 9 | case '=': 10 | return key==constant; 11 | case '<': 12 | return key': 14 | return key>constant; 15 | } 16 | } 17 | 18 | uint64_t power(uint64_t base, uint64_t exponent) 19 | { 20 | if (exponent == 0) 21 | return 1; 22 | else if (exponent % 2 == 0) 23 | { 24 | uint64_t temp = power(base, exponent / 2); 25 | return temp * temp; 26 | } 27 | else 28 | return base * power(base, exponent - 1); 29 | } 30 | 31 | uint64_t linearPower(uint64_t base, uint64_t exponent) 32 | { 33 | uint64_t res = 1; 34 | for (uint64_t i = 0; i < exponent; ++i) 35 | res *= base; 36 | return res; 37 | } 38 | -------------------------------------------------------------------------------- /final/src/Vector.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include /*strlen()*/ 4 | #include 5 | 6 | #include "Vector.h" 7 | #include "Utils.h" 8 | #include "JobScheduler.h" 9 | 10 | unsigned initSize; 11 | 12 | void createVector(struct Vector **vector,unsigned tupleSize) 13 | { 14 | // *vector = malloc(sizeof(struct Vector)); 15 | int err; 16 | if( err = posix_memalign((void **)vector,128,sizeof(struct Vector))) 17 | { 18 | perror("posix_memalign failed.Exiting..."); 19 | exit(EXIT_FAILURE); 20 | } 21 | MALLOC_CHECK(*vector); 22 | (*vector)->table = NULL; 23 | (*vector)->tupleSize = tupleSize; 24 | (*vector)->nextPos = 0; 25 | (*vector)->capacity = 0; 26 | } 27 | 28 | void createVectorFixedSize(struct Vector **vector,unsigned tupleSize,unsigned fixedSize) 29 | { 30 | *vector = malloc(sizeof(struct Vector)); 31 | MALLOC_CHECK(*vector); 32 | (*vector)->capacity = tupleSize*fixedSize; 33 | (*vector)->tupleSize = tupleSize; 34 | (*vector)->table = malloc((*vector)->capacity *sizeof(unsigned)); 35 | MALLOC_CHECK((*vector)->table); 36 | (*vector)->nextPos = 0; 37 | } 38 | 39 | void insertAtVector(struct Vector *vector,unsigned *tuple) 40 | { 41 | /* If vector is empty , create table */ 42 | if(vectorIsEmpty(vector)) 43 | { 44 | /* initSize is initialized according to avgNumOfTuples of the given relations */ 45 | /* check Joiner.c */ 46 | vector->capacity = initSize*vector->tupleSize; 47 | vector->table = malloc(vector->capacity*sizeof(unsigned)); 48 | MALLOC_CHECK(vector->table); 49 | } 50 | 51 | /* Else if vector is full, remalloc more space */ 52 | else if(vectorIsFull(vector)) 53 | { 54 | vector->capacity*=2; 55 | unsigned *new = realloc(vector->table,vector->capacity*sizeof(unsigned)); 56 | if(!new) 57 | { 58 | fprintf(stderr,"realloc failed[insertAtVector]\nExiting...\n"); 59 | exit(EXIT_FAILURE); 60 | } 61 | vector->table = new; 62 | // fprintf(stderr, "realloc() has been called\n"); 63 | } 64 | 65 | /* Insert tuple */ 66 | unsigned pos = vector->nextPos; 67 | for(unsigned i=0;itupleSize;++i) 68 | vector->table[pos+i] = tuple[i]; 69 | 70 | vector->nextPos+=vector->tupleSize; 71 | } 72 | 73 | /* Caution: This function will be called only for fixed size vectors */ 74 | /* Useful in case we want to insert a tuple in a specific offset inside the vector */ 75 | void insertAtPos(struct Vector *vector,unsigned *tuple,unsigned offset) 76 | { 77 | unsigned pos = offset*vector->tupleSize; 78 | for(unsigned i=0;itupleSize;++i) 79 | vector->table[pos+i] = tuple[i]; 80 | vector->nextPos += vector->tupleSize; 81 | } 82 | 83 | void printVector(struct Vector *vector) 84 | { 85 | if(vector->nextPos==0) 86 | { 87 | // fprintf(stderr,"Vector is empty\n"); 88 | return; 89 | } 90 | /* nextPos holds the number of the rowIds */ 91 | /* which is the same with the first available position in vector */ 92 | unsigned k=0; 93 | for(unsigned i=0;inextPos;++i) 94 | if(i%vector->tupleSize==0) 95 | { 96 | if(k++==10) // Stop after printing the first 10 tuples 97 | break; 98 | printTuple(vector,i); 99 | } 100 | } 101 | 102 | void printTuple(struct Vector *vector,unsigned pos) 103 | { 104 | fprintf(stderr,"("); 105 | for(unsigned i=0;itupleSize;++i) 106 | if(i==vector->tupleSize-1) 107 | fprintf(stderr,"%u",vector->table[pos+i]+1); 108 | else 109 | fprintf(stderr,"%u,",vector->table[pos+i]+1); 110 | fprintf(stderr,")\n"); 111 | } 112 | 113 | void scanColEquality(struct Vector *new,struct Vector* old,uint64_t *leftCol,uint64_t* rightCol,unsigned posLeft,unsigned posRight) 114 | { 115 | /* Note: Each tuple in old vector contains one rowId */ 116 | /* Except for the case of join between relations from the same intermediate entity[i.e: vector] */ 117 | for(unsigned i=0;inextPos;i+=old->tupleSize) 118 | if(leftCol[old->table[i+posLeft]] == rightCol[old->table[i+posRight]]) 119 | insertAtVector(new,&old->table[i]); 120 | } 121 | 122 | void scanFilter(struct Vector *new,struct Vector* old,uint64_t *col,Comparison cmp,uint64_t constant) 123 | { 124 | /* Note: Each tuple in old vector contains one rowId */ 125 | /* Insert every tuple of the old vector to the new one, if it satisfies the filter */ 126 | for(unsigned i=0;inextPos;++i) 127 | if(compare(col[old->table[i]],cmp,constant)) 128 | insertAtVector(new,&old->table[i]); 129 | } 130 | 131 | // Fill info with values and tuples 132 | void scanJoin(RadixHashJoinInfo *joinRel) 133 | { 134 | struct Vector *old; 135 | struct Vector *new = joinRel->unsorted->tuples; 136 | unsigned sizeOfVector; 137 | 138 | // Position of this relation's rowId inside tuple 139 | unsigned tupleOffset = joinRel->map[joinRel->relId]; 140 | 141 | uint64_t *origValues = joinRel->col; 142 | uint64_t *colValues = joinRel->unsorted->values; 143 | unsigned *rowIds = joinRel->unsorted->rowIds; 144 | 145 | unsigned k=0; 146 | // We scan the old vectors [i.e: the intermediate result] 147 | for(unsigned v=0;vvector[v]) 150 | for(unsigned i=0;inextPos;i+=old->tupleSize) 151 | { 152 | unsigned origRowId = old->table[i+tupleOffset]; 153 | // Add value 154 | colValues[k] = origValues[origRowId]; 155 | // Add tuple 156 | insertAtVector(new,&old->table[i]); 157 | // Add rowId 158 | rowIds[k++] = k; 159 | } 160 | } 161 | } 162 | 163 | void destroyVector(struct Vector **vector) 164 | { 165 | if(*vector==NULL) 166 | { 167 | // fprintf(stderr,"Vector is empty\n"); 168 | return; 169 | } 170 | free((*vector)->table); 171 | free(*vector); 172 | *vector = NULL; 173 | } 174 | 175 | int vectorIsFull(struct Vector *vector) 176 | { 177 | return vector->nextPos == vector->capacity; 178 | } 179 | 180 | int vectorIsEmpty(struct Vector *vector) 181 | { 182 | return vector->table == NULL; 183 | } 184 | 185 | unsigned getVectorTuples(struct Vector *vector) 186 | { 187 | return vector->nextPos/vector->tupleSize; 188 | } 189 | 190 | /* Returns a pointer to the i-th tuple */ 191 | unsigned *getTuple(struct Vector *vector,unsigned i) 192 | { 193 | if(i>vector->nextPos/vector->tupleSize) 194 | { 195 | fprintf(stderr, "Trying to access a tuple that does not exist\nExiting...\n\n"); 196 | exit(EXIT_FAILURE); 197 | } 198 | return &vector->table[i*vector->tupleSize]; 199 | } 200 | 201 | unsigned getTupleSize(struct Vector *vector) 202 | {return vector->tupleSize;} 203 | 204 | 205 | uint64_t checkSum(struct Vector *vector,uint64_t *col,unsigned rowIdPos) 206 | { 207 | uint64_t sum=0; 208 | for(unsigned i=0;inextPos;i+=vector->tupleSize) 209 | sum+=col[vector->table[i+rowIdPos]]; 210 | return sum; 211 | } 212 | 213 | void checkSumFunc(void *arg){ 214 | struct checkSumArg *myarg = arg; 215 | *myarg->sum=0; 216 | for(unsigned i=0;ivector->nextPos;i+=myarg->vector->tupleSize) 217 | *myarg->sum+=myarg->col[myarg->vector->table[i+myarg->rowIdPos]]; 218 | pthread_mutex_lock(&jobsFinishedMtx); 219 | ++jobsFinished; 220 | pthread_cond_signal(&condJobsFinished); 221 | pthread_mutex_unlock(&jobsFinishedMtx); 222 | } 223 | 224 | 225 | void colEqualityFunc(void *arg) 226 | { 227 | struct colEqualityArg *myarg = arg; 228 | for(unsigned i=0;iold->nextPos;i+=myarg->old->tupleSize) 229 | if(myarg->leftCol[myarg->old->table[i+myarg->posLeft]] == myarg->rightCol[myarg->old->table[i+myarg->posRight]]) 230 | insertAtVector(myarg->new,&myarg->old->table[i]); 231 | pthread_mutex_lock(&jobsFinishedMtx); 232 | ++jobsFinished; 233 | pthread_cond_signal(&condJobsFinished); 234 | pthread_mutex_unlock(&jobsFinishedMtx); 235 | } 236 | -------------------------------------------------------------------------------- /final/src/harness.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | //--------------------------------------------------------------------------- 14 | using namespace std; 15 | //--------------------------------------------------------------------------- 16 | const unsigned long MAX_FAILED_QUERIES = 100; 17 | //--------------------------------------------------------------------------- 18 | static void usage() { 19 | cerr << "Usage: harness " << endl; 20 | } 21 | //--------------------------------------------------------------------------- 22 | static int set_nonblocking(int fd) 23 | // Set a file descriptor to be non-blocking 24 | { 25 | int flags = fcntl(fd, F_GETFL, 0); 26 | if (flags < 0) return flags; 27 | return fcntl(fd, F_SETFL, flags | O_NONBLOCK); 28 | } 29 | 30 | //--------------------------------------------------------------------------- 31 | static ssize_t read_bytes(int fd, void *buffer, size_t num_bytes) 32 | // Read a given number of bytes to the specified file descriptor 33 | { 34 | char *p = (char *)buffer; 35 | char *end = p + num_bytes; 36 | while (p != end) { 37 | ssize_t res = read(fd, p, end - p); 38 | if (res < 0) { 39 | if (errno == EINTR) continue; 40 | return res; 41 | } 42 | p += res; 43 | } 44 | 45 | return num_bytes; 46 | } 47 | //--------------------------------------------------------------------------- 48 | static ssize_t write_bytes(int fd, const void *buffer, size_t num_bytes) 49 | // Write a given number of bytes to the specified file descriptor 50 | { 51 | const char *p = (const char *)buffer; 52 | const char *end = p + num_bytes; 53 | while (p != end) { 54 | ssize_t res = write(fd, p, end - p); 55 | if (res < 0) { 56 | if (errno == EINTR) continue; 57 | return res; 58 | } 59 | p += res; 60 | } 61 | 62 | return num_bytes; 63 | } 64 | 65 | //--------------------------------------------------------------------------- 66 | int main(int argc, char *argv[]) { 67 | // Check for the correct number of arguments 68 | if (argc != 5) { 69 | usage(); 70 | exit(EXIT_FAILURE); 71 | } 72 | 73 | vector input_batches; 74 | vector > result_batches; 75 | 76 | // Load the workload and result files and parse them into batches 77 | { 78 | ifstream work_file(argv[2]); 79 | if (!work_file) { 80 | cerr << "Cannot open workload file" << endl; 81 | exit(EXIT_FAILURE); 82 | } 83 | 84 | ifstream result_file(argv[3]); 85 | if (!result_file) { 86 | cerr << "Cannot open result file" << endl; 87 | exit(EXIT_FAILURE); 88 | } 89 | 90 | string input_chunk; 91 | input_chunk.reserve(100000); 92 | 93 | vector result_chunk; 94 | result_chunk.reserve(150); 95 | 96 | string line; 97 | while (getline(work_file, line)) { 98 | input_chunk += line; 99 | input_chunk += '\n'; 100 | 101 | if (line.length() > 0 && (line[0] != 'F')) { 102 | // Add result 103 | string result; 104 | getline(result_file, result); 105 | result_chunk.emplace_back(move(result)); 106 | } else { 107 | // End of batch 108 | // Copy input and results 109 | input_batches.push_back(input_chunk); 110 | result_batches.push_back(result_chunk); 111 | input_chunk=""; 112 | result_chunk.clear(); 113 | } 114 | } 115 | } 116 | 117 | // Create pipes for child communication 118 | int stdin_pipe[2]; 119 | int stdout_pipe[2]; 120 | if (pipe(stdin_pipe) == -1 || pipe(stdout_pipe) == -1) { 121 | perror("pipe"); 122 | exit(EXIT_FAILURE); 123 | } 124 | 125 | // Start the test executable 126 | pid_t pid = fork(); 127 | if (pid == -1) { 128 | perror("fork"); 129 | exit(EXIT_FAILURE); 130 | } else if (pid == 0) { 131 | dup2(stdin_pipe[0], STDIN_FILENO); 132 | close(stdin_pipe[0]); 133 | close(stdin_pipe[1]); 134 | dup2(stdout_pipe[1], STDOUT_FILENO); 135 | close(stdout_pipe[0]); 136 | close(stdout_pipe[1]); 137 | execlp(argv[4], argv[4], (char *)NULL); 138 | perror("execlp"); 139 | exit(EXIT_FAILURE); 140 | } 141 | close(stdin_pipe[0]); 142 | close(stdout_pipe[1]); 143 | 144 | // Open the file and feed the initial relations 145 | int init_file = open(argv[1], O_RDONLY); 146 | if (init_file == -1) { 147 | cerr << "Cannot open init file" << endl; 148 | exit(EXIT_FAILURE); 149 | } 150 | 151 | while (1) { 152 | char buffer[4096]; 153 | ssize_t bytes = read(init_file, buffer, sizeof(buffer)); 154 | if (bytes < 0) { 155 | if (errno == EINTR) continue; 156 | perror("read"); 157 | exit(EXIT_FAILURE); 158 | } 159 | if (bytes == 0) break; 160 | ssize_t written = write_bytes(stdin_pipe[1], buffer, bytes); 161 | if (written < 0) { 162 | perror("write"); 163 | exit(EXIT_FAILURE); 164 | } 165 | } 166 | 167 | close(init_file); 168 | 169 | // Signal the end of the initial phase 170 | ssize_t status_bytes = write_bytes(stdin_pipe[1], "Done\n", 5); 171 | if (status_bytes < 0) { 172 | perror("write"); 173 | exit(EXIT_FAILURE); 174 | } 175 | 176 | 177 | #if 1 178 | // Wait for 1 second 179 | this_thread::sleep_for(1s); 180 | 181 | #else 182 | // Wait for the ready signal 183 | char status_buffer[6]; 184 | status_bytes = read_bytes(stdout_pipe[0], status_buffer, sizeof(status_buffer)); 185 | if (status_bytes < 0) { 186 | perror("read"); 187 | exit(EXIT_FAILURE); 188 | } 189 | 190 | if (status_bytes != sizeof(status_buffer) || (status_buffer[0] != 'R' && status_buffer[0] != 'r') || 191 | status_buffer[5] != '\n') { 192 | cerr << "Test program did not return ready status" << endl; 193 | exit(EXIT_FAILURE); 194 | } 195 | #endif 196 | 197 | // Use select with non-blocking files to read and write from the child process, avoiding deadlocks 198 | if (set_nonblocking(stdout_pipe[0]) == -1) { 199 | perror("fcntl"); 200 | exit(EXIT_FAILURE); 201 | } 202 | 203 | if (set_nonblocking(stdin_pipe[1]) == -1) { 204 | perror("fcntl"); 205 | exit(EXIT_FAILURE); 206 | } 207 | 208 | // Start the stopwatch 209 | struct timeval start; 210 | gettimeofday(&start, NULL); 211 | 212 | unsigned long query_no = 0; 213 | unsigned long failure_cnt = 0; 214 | 215 | // Loop over all batches 216 | for (unsigned long batch = 0; batch != input_batches.size() && failure_cnt < MAX_FAILED_QUERIES; ++batch) { 217 | string output; // raw output is collected here 218 | output.reserve(1000000); 219 | 220 | size_t input_ofs = 0; // byte position in the input batch 221 | size_t output_read = 0; // number of lines read from the child output 222 | 223 | while (input_ofs != input_batches[batch].length() || output_read < result_batches[batch].size()) { 224 | fd_set read_fd, write_fd; 225 | FD_ZERO(&read_fd); 226 | FD_ZERO(&write_fd); 227 | 228 | if (input_ofs != input_batches[batch].length()) FD_SET(stdin_pipe[1], &write_fd); 229 | 230 | if (output_read != result_batches[batch].size()) FD_SET(stdout_pipe[0], &read_fd); 231 | 232 | int retval = select(max(stdin_pipe[1], stdout_pipe[0]) + 1, &read_fd, &write_fd, NULL, NULL); 233 | if (retval == -1) { 234 | perror("select"); 235 | exit(EXIT_FAILURE); 236 | } 237 | 238 | // Read output from the test program 239 | if (FD_ISSET(stdout_pipe[0], &read_fd)) { 240 | char buffer[4096]; 241 | int bytes = read(stdout_pipe[0], buffer, sizeof(buffer)); 242 | if (bytes < 0) { 243 | if (errno == EINTR) continue; 244 | perror("read"); 245 | exit(1); 246 | } 247 | // Count how many lines were returned 248 | for (size_t j = 0; j != size_t(bytes); ++j) { 249 | if (buffer[j] == '\n') ++output_read; 250 | } 251 | output.append(buffer, bytes); 252 | } 253 | 254 | // Feed another histogram of data from this batch to the test program 255 | if (FD_ISSET(stdin_pipe[1], &write_fd)) { 256 | int bytes = 257 | write(stdin_pipe[1], input_batches[batch].data() + input_ofs, input_batches[batch].length() - input_ofs); 258 | if (bytes < 0) { 259 | if (errno == EINTR) continue; 260 | perror("write"); 261 | exit(EXIT_FAILURE); 262 | } 263 | input_ofs += bytes; 264 | } 265 | } 266 | 267 | // Parse and compare the batch result 268 | stringstream result(output); 269 | 270 | for (unsigned i = 0; i != result_batches[batch].size() && failure_cnt < MAX_FAILED_QUERIES; ++i) { 271 | string val; 272 | 273 | // result >> val; 274 | getline(result, val); 275 | if (!result) { 276 | cerr << "Incomplete batch output for batch " << batch << endl; 277 | exit(EXIT_FAILURE); 278 | } 279 | 280 | bool matched = val == result_batches[batch][i]; 281 | if (!matched) { 282 | // cerr << "Result mismatch for query " << query_no << ", expected: " <<"\""< 2 | #include /*strcmp()*/ 3 | #include 4 | #include "Joiner.h" 5 | #include "Parser.h" 6 | #include "JobScheduler.h" 7 | #include "Optimizer.h" 8 | 9 | 10 | int main(int argc, char const *argv[]) 11 | { 12 | // Create a new Joiner 13 | // Joiner holds meta data for every relation 14 | struct Joiner* joiner; 15 | createJoiner(&joiner); 16 | 17 | // Read join relations and load them to memory 18 | setup(joiner); 19 | 20 | // Create Job Scheduler[worker threads,JobQueue, e.t.c] 21 | createJobScheduler(&js); 22 | 23 | // Read query line 24 | // Parse query by splitting it into parts 25 | // Find estimations using the statistics of each column 26 | // Find the optimal join order 27 | // Execute query and write checksum to stdout 28 | // Destroy query 29 | struct QueryInfo *q; 30 | char buffer[BUFFERSIZE]; 31 | while (fgets(buffer, sizeof(buffer), stdin) != NULL) 32 | { 33 | if(!strcmp(buffer,"F\n"))continue; 34 | 35 | createQueryInfo(&q,buffer); 36 | createQueryEstimations(q,joiner); 37 | 38 | applyColEqualityEstimations(q,joiner); 39 | applyFilterEstimations(q,joiner); 40 | applyJoinEstimations(q,joiner); 41 | // findOptimalJoinOrder(q,joiner); 42 | join(joiner,q); 43 | destroyQueryInfo(q); 44 | } 45 | 46 | // Cleanup memory 47 | destroyJobScheduler(js); 48 | destroyJoiner(joiner); 49 | return 0; 50 | } 51 | -------------------------------------------------------------------------------- /final/test/TestHeader.h: -------------------------------------------------------------------------------- 1 | #ifndef TEST_HEADER_H 2 | #define TEST_HEADER_H 3 | 4 | /* Testing for Joiner & Relation functions */ 5 | void testCreateRelation(void); 6 | void testAddRelation(void); 7 | 8 | /* Testing for Queue functions */ 9 | void testCreateQueue(void); 10 | void testEnqueue(void); 11 | void testDeQueue(void); 12 | 13 | /* Testing Optimizer functions */ 14 | void testFindStats(void); 15 | 16 | /* Testing for Parser functions */ 17 | void testCreateQueryInfo(void); 18 | void testAddFilter(void); 19 | void testAddPredicate(void); 20 | void testGetOriginalRelid(void); 21 | 22 | /* Testing for Operators' functions */ 23 | void testColEquality(void); 24 | void testFilterFunc(void); 25 | void testFitlerInter(void); 26 | 27 | /* Testing for RadixHashJoin functions */ 28 | void testPartition(void); 29 | void testPartitionFunc(void); 30 | void testBuildFunc(void); 31 | void testJoinFunc(void); 32 | 33 | #endif 34 | -------------------------------------------------------------------------------- /final/test/TestJoiner.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on joiner and relation functions [Joiner.c | Relation.c] 3 | */ 4 | #include 5 | #include /*time()*/ 6 | #include /*rand()*/ 7 | #include "CUnit/Basic.h" 8 | #include "Utils.h" 9 | #include "Relation.h" 10 | #include "Joiner.h" 11 | 12 | void testCreateRelation(void) 13 | { 14 | char sameFile[] = "r0"; 15 | struct Relation *R1,*R2; 16 | 17 | // NOTE: Remember to change the file path in Relation.h !!! 18 | createRelation(&R1,sameFile); 19 | createRelation(&R2,sameFile); 20 | 21 | CU_ASSERT_EQUAL(R1->numOfTuples,R2->numOfTuples); 22 | CU_ASSERT_EQUAL(R1->numOfCols,R2->numOfCols); 23 | 24 | for(unsigned c=0;cnumOfCols;++c) 25 | for(unsigned r=0;rnumOfTuples;++r) 26 | CU_ASSERT_EQUAL(R1->columns[c][r],R2->columns[c][r]); 27 | 28 | destroyRelation(R1); 29 | destroyRelation(R2); 30 | } 31 | 32 | /* Also tests getRelationTuples(..) */ 33 | void testAddRelation(void) 34 | { 35 | struct Joiner *J; 36 | createJoiner(&J); 37 | const int NUM_OF_RELS = 13; 38 | J->numOfRelations = NUM_OF_RELS; 39 | J->relations = malloc(J->numOfRelations*sizeof(struct Relation*)); 40 | MALLOC_CHECK(J->relations); 41 | 42 | for(unsigned i=0;i 2 | 3 | #include "CUnit/Basic.h" 4 | #include "JobScheduler.h" 5 | #include "Vector.h" 6 | #include "Joiner.h" 7 | #include "TestHeader.h" 8 | 9 | int init_suite(void) 10 | {return 0;} 11 | 12 | int cleanup_suite(void) 13 | {return 0;} 14 | 15 | int main(int argc, char const *argv[]) 16 | { 17 | /* Initialize the CUnit test registry */ 18 | if (CUE_SUCCESS != CU_initialize_registry()) 19 | return CU_get_error(); 20 | 21 | /* Create 5 Suites */ 22 | CU_pSuite suites[6]; 23 | for(unsigned i=0;i<6;++i){ 24 | 25 | char suitName[100]; 26 | sprintf(suitName,"Suite%u",i); 27 | suites[i] = CU_add_suite(suitName,init_suite,cleanup_suite); 28 | if (NULL == suites[i]) { 29 | CU_cleanup_registry(); 30 | return CU_get_error(); 31 | } 32 | } 33 | 34 | /* Set global variables */ 35 | RADIX_BITS = 4; 36 | HASH_RANGE_1 = 16; 37 | // Vector's initial size 38 | initSize = 20; 39 | 40 | /* Create job scheduler */ 41 | createJobScheduler(&js); 42 | 43 | /* Add tests to each suite */ 44 | if ((NULL == CU_add_test(suites[0],"Test of Realtion.c::createRelation()",testCreateRelation))|| 45 | (NULL == CU_add_test(suites[0],"Test of Joiner.c::addRelation()",testAddRelation))) 46 | { 47 | CU_cleanup_registry(); 48 | return CU_get_error(); 49 | } 50 | 51 | if ((NULL == CU_add_test(suites[1],"Test of Parser.c::createQueryInfo()",testCreateQueryInfo))|| 52 | (NULL == CU_add_test(suites[1],"Test of Parser.c::addFilter()",testAddFilter))|| 53 | (NULL == CU_add_test(suites[1],"Test of Parser.c::getOriginalRelid()",testGetOriginalRelid))) 54 | { 55 | CU_cleanup_registry(); 56 | return CU_get_error(); 57 | } 58 | 59 | if ((NULL == CU_add_test(suites[2],"Test of Queue.c::createQueue()",testCreateQueue))|| 60 | (NULL == CU_add_test(suites[2],"Test of Queue.c::enQueue()",testEnqueue))|| 61 | (NULL == CU_add_test(suites[2],"Test of Queue.c::deQueue()",testDeQueue))) 62 | { 63 | CU_cleanup_registry(); 64 | return CU_get_error(); 65 | } 66 | 67 | if (NULL == CU_add_test(suites[3],"Test of Optimizer.c::findStats()",testFindStats)) 68 | { 69 | CU_cleanup_registry(); 70 | return CU_get_error(); 71 | } 72 | 73 | if ((NULL == CU_add_test(suites[4],"Test of Operations.c::colEquality()",testColEquality))|| 74 | (NULL == CU_add_test(suites[4],"Test of Operations.c::filterFunc()",testFilterFunc))|| 75 | (NULL == CU_add_test(suites[4],"Test of Operations.c::fitlerInter()",testFitlerInter))) 76 | { 77 | CU_cleanup_registry(); 78 | return CU_get_error(); 79 | } 80 | if ((NULL == CU_add_test(suites[5],"Test of Partition.c::partition()",testPartition))|| 81 | (NULL == CU_add_test(suites[5],"Test of Partition.c::partitionFunc()",testPartitionFunc))|| 82 | (NULL == CU_add_test(suites[5],"Test of Build.c::buildFunc()",testBuildFunc)) || 83 | (NULL == CU_add_test(suites[5],"Test of Probe.c::joinFunc()",testJoinFunc))) 84 | { 85 | CU_cleanup_registry(); 86 | return CU_get_error(); 87 | } 88 | 89 | /** 90 | * Maximum output of run details. 91 | * 92 | * The basic interface is also non-interactive, 93 | * with results output to stdout. 94 | */ 95 | CU_basic_set_mode(CU_BRM_VERBOSE); 96 | CU_basic_run_tests(); 97 | /** 98 | * Cleanup CU_TestRegistry. 99 | * Return any possible error. 100 | */ 101 | CU_cleanup_registry(); 102 | destroyJobScheduler(js); 103 | return CU_get_error(); 104 | } 105 | -------------------------------------------------------------------------------- /final/test/TestOperators.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on operator's functions [Operations.c] 3 | * 4 | * Reminder: operation functions create a result vectors and pass them back to the 5 | * caller via the struct Vector** argument. [Destruction is needed to avoid mem leaks] 6 | * Also,note that each "col" array represents a column from the relation. 7 | */ 8 | #include 9 | #include 10 | #include 11 | #include "CUnit/Basic.h" 12 | #include "Joiner.h" 13 | #include "JobScheduler.h" 14 | #include "Operations.h" 15 | 16 | void testColEquality(void) 17 | { 18 | RADIX_BITS = 4; 19 | HASH_RANGE_1 = 16; 20 | initSize = 20; 21 | uint64_t col1[] = {1u,2u,3u,4,5u,6u,7u,8u,9u,10u}; 22 | uint64_t col2[] = {10u,20u,30u,40u,50u,60u,70u,80u,90u,100u}; 23 | unsigned numOfRows = 10; 24 | 25 | struct Vector **results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 26 | MALLOC_CHECK(results); 27 | 28 | colEquality(col1,col1,numOfRows,results); 29 | CU_ASSERT_EQUAL(10,getVectorTuples(results[0])); 30 | for(unsigned i=0;i',5u,&vector); 85 | CU_ASSERT_EQUAL(2,getVectorTuples(vector)); 86 | 87 | destroyVector(&vector); 88 | } 89 | -------------------------------------------------------------------------------- /final/test/TestOptimizer.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on Optimizer functions [Optimizer.c] 3 | */ 4 | #include 5 | #include 6 | #include "CUnit/Basic.h" 7 | #include "Optimizer.h" 8 | 9 | 10 | void testFindStats(void) 11 | { 12 | 13 | struct columnStats stat; 14 | unsigned colSize = 16; 15 | uint64_t col[] = {0u,1u,2u,3u,4u,5u,6u,7u,8u,9u,10u, 16 | 11u,12u,13u,14u,15u}; 17 | 18 | findStats(col,&stat,colSize); 19 | CU_ASSERT_EQUAL(stat.maxValue,15u); 20 | CU_ASSERT_EQUAL(stat.minValue,0u); 21 | CU_ASSERT_EQUAL(stat.f,16); 22 | CU_ASSERT_EQUAL(stat.discreteValues,16); 23 | 24 | free(stat.bitVector); 25 | } 26 | -------------------------------------------------------------------------------- /final/test/TestParser.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on parser's functions [Parser.c] 3 | */ 4 | #include 5 | #include /*time()*/ 6 | #include /*rand()*/ 7 | #include "CUnit/Basic.h" 8 | #include "Parser.h" 9 | 10 | /** 11 | * This function not only tests createQueryInfo(..), but also : 12 | * parseQuery(..), parseRelationIds(..), parseSelections(..) and parsePredicates(..) 13 | * because these function are called from one another. 14 | */ 15 | void testCreateQueryInfo(void) 16 | { 17 | // Query with 4 relIds, 3 filters and 1 column equality predicate. 18 | char rawQuery[] = "12 1 6 12|3.0=3.2&0.2=1.0&1.2>9999&1.1=4444&1.0=2.1&0.1=3.2&3.0<33199|2.1 0.1 0.2"; 19 | struct QueryInfo *q; 20 | createQueryInfo(&q,rawQuery); 21 | 22 | CU_ASSERT_EQUAL(4,getNumOfRelations(q)); 23 | CU_ASSERT_EQUAL(3,getNumOfFilters(q)); 24 | CU_ASSERT_EQUAL(1,getNumOfColEqualities(q)); 25 | destroyQueryInfo(q); 26 | } 27 | 28 | /** 29 | * This function acts as a test for getComparison() too. 30 | */ 31 | void testAddFilter(void) 32 | { 33 | struct FilterInfo f; 34 | char string1[] = "4.0<3.1"; 35 | addFilter(&f,string1); 36 | CU_ASSERT_EQUAL('<',getComparison(&f)); 37 | 38 | char string2[] = "4.0>3.1"; 39 | addFilter(&f,string2); 40 | CU_ASSERT_EQUAL('>',getComparison(&f)); 41 | 42 | char string3[] = "4.0=3.1"; 43 | addFilter(&f,string3); 44 | CU_ASSERT_EQUAL('=',getComparison(&f)); 45 | } 46 | 47 | /** 48 | * This function acts also as a test for getColId() and getRelId() 49 | */ 50 | void testAddPredicate(void) 51 | { 52 | struct PredicateInfo p; 53 | char string[] = "3.1=5.2"; 54 | addPredicate(&p,string); 55 | CU_ASSERT_EQUAL(getColId(&p.left),3); 56 | CU_ASSERT_EQUAL(getColId(&p.right),5); 57 | CU_ASSERT_EQUAL(getRelId(&p.left),1); 58 | CU_ASSERT_EQUAL(getRelId(&p.right),2); 59 | } 60 | 61 | 62 | void testGetOriginalRelid(void) 63 | { 64 | char rawQuery[] = "12 1 6 12|3.0=1.2&0.2=1.0&1.0=2.1&0.1=3.2&3.0<33199|2.1 0.1 0.2"; 65 | struct QueryInfo *q; 66 | createQueryInfo(&q,rawQuery); 67 | struct PredicateInfo p = q->predicates[0]; 68 | CU_ASSERT_EQUAL(12,getOriginalRelId(q,&p.left)); 69 | CU_ASSERT_EQUAL(1,getOriginalRelId(q,&p.right)); 70 | destroyQueryInfo(q); 71 | } 72 | -------------------------------------------------------------------------------- /final/test/TestQueue.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on queue functions [Queue.c] 3 | */ 4 | #include 5 | #include "CUnit/Basic.h" 6 | #include "Queue.h" 7 | #include "Utils.h" 8 | 9 | /* Also tests isEmpty(..) function */ 10 | void testCreateQueue(void) 11 | { 12 | struct Queue *q; 13 | createQueue(&q,20); 14 | CU_ASSERT_EQUAL(q->front,q->rear); 15 | CU_ASSERT_EQUAL(isEmpty(q),1); 16 | destroyQueue(q); 17 | } 18 | 19 | 20 | void testEnqueue(void) 21 | { 22 | struct Queue *q; 23 | createQueue(&q,10); 24 | enQueue(q,(void*)1); 25 | enQueue(q,(void*)4); 26 | enQueue(q,(void*)10); 27 | enQueue(q,(void*)2); 28 | CU_ASSERT_EQUAL(q->rear,3); 29 | CU_ASSERT_EQUAL(q->front,0); 30 | CU_ASSERT_EQUAL(isEmpty(q),0); 31 | destroyQueue(q); 32 | } 33 | 34 | 35 | void testDeQueue(void) 36 | { 37 | struct Queue *q; 38 | createQueue(&q,10); 39 | enQueue(q,(void*)1); 40 | enQueue(q,(void*)4); 41 | 42 | deQueue(q); 43 | CU_ASSERT_EQUAL(q->rear,1); 44 | CU_ASSERT_EQUAL(q->front,1); 45 | 46 | deQueue(q); 47 | CU_ASSERT_EQUAL(q->rear,-1); 48 | CU_ASSERT_EQUAL(q->front,-1); 49 | 50 | 51 | CU_ASSERT_EQUAL(isEmpty(q),1); 52 | destroyQueue(q); 53 | } 54 | -------------------------------------------------------------------------------- /final/test/TestRadixHashJoin.c: -------------------------------------------------------------------------------- 1 | /** 2 | * Unit testing on Radix Hash Join functions [Partition.c | Build.c | Probe.c] 3 | * 4 | */ 5 | #include 6 | #include 7 | #include 8 | #include "CUnit/Basic.h" 9 | #include "Intermediate.h" 10 | #include "JobScheduler.h" 11 | #include "Operations.h" 12 | #include "Partition.h" 13 | #include "Build.h" 14 | #include "Queue.h" 15 | #include "Probe.h" 16 | 17 | 18 | /** 19 | * @brief Tests if histogram and prefix sum are correct. 20 | */ 21 | void testPartition(void) 22 | { 23 | RADIX_BITS = 4; 24 | HASH_RANGE_1 = 16; 25 | initSize = 20; 26 | 27 | RadixHashJoinInfo info; 28 | uint64_t col[] = {0u,1u,2u,3u,4u,5u,6u,7u,8u,9u,10u, 29 | 11u,12u,13u,14u,15u}; 30 | info.isInInter = 0; 31 | info.col = col; 32 | info.tupleSize = 1; 33 | info.numOfTuples = 16; 34 | partition(&info); 35 | 36 | /* 37 | * HASH_FUN_1(col[i]) will be different for each col element. 38 | * Thus, we expect one column element per bucket. 39 | * 40 | * pSum[i] will be equal to i. [1 element per bucket] 41 | */ 42 | for(unsigned i=0;ivalues[i]); 71 | 72 | destroyRadixHashJoinInfo(&info); 73 | } 74 | 75 | void testBuildFunc(void) 76 | { 77 | RADIX_BITS = 4; 78 | HASH_RANGE_1 = 16; 79 | initSize = 20; 80 | 81 | RadixHashJoinInfo infoLeft,infoRight; 82 | uint64_t col1[] = {0u,1u,2u,3u,4u,5u,6u,7u,8u,9u,10u, 83 | 11u,12u,13u,14u,15u}; 84 | uint64_t col2[] = {22u,0u,2u,0u,6u}; 85 | 86 | infoLeft.isInInter = 0; 87 | infoLeft.col = col1; 88 | infoLeft.numOfTuples = 16; 89 | infoLeft.tupleSize = 1; 90 | 91 | infoRight.isInInter = 0; 92 | infoRight.col = col2; 93 | infoRight.numOfTuples = 5; 94 | infoRight.tupleSize = 1; 95 | partition(&infoLeft); 96 | partition(&infoRight); 97 | 98 | build(&infoLeft,&infoRight); 99 | 100 | destroyRadixHashJoinInfo(&infoLeft); 101 | destroyRadixHashJoinInfo(&infoRight); 102 | } 103 | 104 | void testJoinFunc(void) 105 | { 106 | RADIX_BITS = 4; 107 | HASH_RANGE_1 = 16; 108 | initSize = 20; 109 | 110 | RadixHashJoinInfo infoLeft,infoRight; 111 | uint64_t col1[] = {0u,1u,2u,3u,4u,5u,6u,7u,8u,9u,10u, 112 | 11u,12u,13u,14u,15u}; 113 | uint64_t col2[] = {22u,0u,2u,0u,6u}; 114 | 115 | infoLeft.isInInter = 0; 116 | infoLeft.col = col1; 117 | infoLeft.numOfTuples = 16; 118 | infoLeft.tupleSize = 1; 119 | 120 | infoRight.isInInter = 0; 121 | infoRight.col = col2; 122 | infoRight.numOfTuples = 5; 123 | infoRight.tupleSize = 1; 124 | 125 | // Paritioning 126 | partition(&infoLeft); 127 | partition(&infoRight); 128 | 129 | // Building [bucket chaining] 130 | build(&infoLeft,&infoRight); 131 | infoLeft.isLeft = 1; 132 | infoRight.isLeft = 0; 133 | 134 | struct Vector **results; 135 | results = malloc(HASH_RANGE_1*sizeof(struct Vector*)); 136 | MALLOC_CHECK(results); 137 | 138 | // Split the join jobs 139 | jobsFinished = 0; 140 | for(unsigned i=0;ijoinJobs[i].argument; 144 | arg->bucket = i; 145 | arg->left = &infoLeft; 146 | arg->right = &infoRight; 147 | arg->results = results[i]; 148 | pthread_mutex_lock(&queueMtx); 149 | enQueue(jobQueue,&js->joinJobs[i]); 150 | pthread_cond_signal(&condNonEmpty); 151 | pthread_mutex_unlock(&queueMtx); 152 | } 153 | 154 | // Wait for all jobs to be completed 155 | pthread_mutex_lock(&jobsFinishedMtx); 156 | while (jobsFinished!=HASH_RANGE_1) { 157 | pthread_cond_wait(&condJobsFinished,&jobsFinishedMtx); 158 | } 159 | jobsFinished = 0; 160 | pthread_cond_signal(&condJobsFinished); 161 | pthread_mutex_unlock(&jobsFinishedMtx); 162 | 163 | // Add partial results 164 | unsigned ntuples = 0; 165 | for(unsigned i=0;i tuples */ 169 | CU_ASSERT_EQUAL(4,ntuples); 170 | 171 | 172 | for(unsigned i=0;i3499|1.2 0.1 2 | 5 0|0.2=1.0&0.3=9881|1.1 0.2 1.0 3 | 9 0 2|0.1=1.0&1.0=2.2&0.0>12472|1.0 0.3 0.4 4 | 9 0|0.1=1.0&0.1>1150|0.3 1.0 5 | 6 1 12|0.1=1.0&1.0=2.2&0.0<62236|1.0 6 | 11 0 5|0.2=1.0&1.0=2.2&0.1=5784|2.3 0.1 0.1 7 | 4 1 2 11|0.1=1.0&1.0=2.1&1.0=3.1&0.1>2493|3.2 2.2 2.1 8 | 10 0 13 1|0.2=1.0&1.0=2.2&0.1=3.0&0.1=209|0.2 2.5 2.2 9 | 6 1 11 5|0.1=1.0&1.0=2.1&1.0=3.1&0.0>44809|2.0 10 | 3 1|0.1=1.0&0.2<3071|0.2 0.2 11 | F 12 | 3 1 12|0.1=1.0&1.0=2.1&0.0>26374|2.0 0.1 2.1 13 | 7 0|0.1=1.0&0.4<9936|0.4 0.0 1.0 14 | 2 1 9|0.1=1.0&1.0=2.2&0.1=10731|1.2 2.3 15 | 5 1|0.1=1.0&0.2=4531|1.2 16 | 3 0 13 13|0.2=1.0&1.0=2.1&2.1=3.2&0.2<74|1.2 2.5 3.5 17 | 9 1|0.2=1.0&0.1=1574|0.1 0.3 0.0 18 | 0 5|0.0=1.2&1.3=9855|1.1 0.1 19 | 11 0 2|0.2=1.0&1.0=2.2&0.1<5283|0.0 0.2 2.3 20 | 8 0 7|0.2=1.0&1.0=2.1&0.3>10502|1.1 1.2 2.5 21 | 9 1 11|0.2=1.0&1.0=2.1&1.0=0.2&0.3>3991|1.0 22 | 4 1|0.1=1.0&0.1<5730|1.1 0.1 0.1 23 | 3 1 5 7|0.1=1.0&1.0=2.1&1.0=3.2&0.2=4273|2.2 3.2 24 | F 25 | 9 1 12|0.2=1.0&1.0=2.1&2.2=1.0&0.2<2685|2.0 26 | 1 12 2|0.0=1.2&0.0=2.1&1.1=0.0&1.0>25064|0.2 1.3 27 | 2 0|0.2=1.0&0.2<787|0.0 28 | 1 6|0.0=1.1&1.1>10707|1.0 1.1 0.2 29 | 13 0 3|0.1=1.0&1.0=2.2&0.4=10571|2.3 0.0 30 | 12 1 6 12|0.2=1.0&1.0=2.1&0.1=3.2&3.0<33199|2.1 0.1 0.2 31 | 11 0 10 8|0.2=1.0&1.0=2.2&1.0=3.2&0.0<9872|3.3 2.2 32 | 11 0 2|0.2=1.0&1.0=2.2&0.1<4217|1.0 33 | 10 0|0.2=1.0&0.2>1791|1.0 1.2 0.2 34 | 7 1 3|0.2=1.0&1.0=2.1&0.3<8722|1.0 35 | 4 1 9|0.1=1.0&1.0=2.2&0.1>345|0.0 1.2 36 | 11 1 12 10|0.1=1.0&1.0=2.1&1.0=3.1&0.2=598|3.2 37 | 7 0 9|0.1=1.0&1.0=0.1&1.0=2.1&0.1>3791|1.2 1.2 38 | F 39 | 8 0 11|0.2=1.0&1.0=2.2&0.3=9477|0.2 40 | 0 13 7 10|0.0=1.2&0.0=2.1&0.0=3.2&1.2>295|3.2 0.0 41 | 7 1 3|0.2=1.0&1.0=2.1&1.0=0.2&0.2>6082|2.3 2.1 42 | 0 7 10 5|0.0=1.1&0.0=2.2&0.0=3.2&1.3=8728|2.0 3.1 43 | 1 4 9 8|0.0=1.1&0.0=2.2&0.0=3.1&1.1>2936|1.0 1.0 3.0 44 | 4 1|0.1=1.0&0.1<9795|1.2 0.1 45 | 11 1|0.1=1.0&0.1<1688|0.1 46 | 5 0|0.2=1.0&0.0<1171|1.0 0.3 47 | 4 1 6|0.1=1.0&1.0=2.1&0.0<13500|2.1 0.1 0.0 48 | 13 13|0.1=1.2&1.6=8220|1.5 49 | F 50 | 11 0 8|0.2=1.0&1.0=2.2&0.2>4041|1.0 1.1 1.0 51 | 8 0 10|0.2=1.0&1.0=2.2&0.3<9473|0.3 2.0 52 | 5 1 8|0.1=1.0&1.0=2.1&0.1<3560|1.2 53 | 13 0 2|0.2=1.0&1.0=0.1&1.0=2.2&0.1>4477|2.0 2.3 1.2 54 | 8 0 13 13|0.2=1.0&1.0=2.2&2.1=3.2&0.1>7860|3.3 2.1 3.6 55 | F 56 | -------------------------------------------------------------------------------- /final/workloads/small/small.work.sql: -------------------------------------------------------------------------------- 1 | SELECT SUM("1".c2), SUM("0".c1) FROM r3 "0", r0 "1", r1 "2" WHERE "0".c2="1".c0 and "0".c1="2".c0 and "0".c2>3499; 2 | SELECT SUM("1".c1), SUM("0".c2), SUM("1".c0) FROM r5 "0", r0 "1" WHERE "0".c2="1".c0 and "0".c3=9881; 3 | SELECT SUM("1".c0), SUM("0".c3), SUM("0".c4) FROM r9 "0", r0 "1", r2 "2" WHERE "0".c1="1".c0 and "1".c0="2".c2 and "0".c0>12472; 4 | SELECT SUM("0".c3), SUM("1".c0) FROM r9 "0", r0 "1" WHERE "0".c1="1".c0 and "0".c1>1150; 5 | SELECT SUM("1".c0) FROM r6 "0", r1 "1", r12 "2" WHERE "0".c1="1".c0 and "1".c0="2".c2 and "0".c0<62236; 6 | SELECT SUM("2".c3), SUM("0".c1), SUM("0".c1) FROM r11 "0", r0 "1", r5 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c1=5784; 7 | SELECT SUM("3".c2), SUM("2".c2), SUM("2".c1) FROM r4 "0", r1 "1", r2 "2", r11 "3" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "1".c0="3".c1 and "0".c1>2493; 8 | SELECT SUM("0".c2), SUM("2".c5), SUM("2".c2) FROM r10 "0", r0 "1", r13 "2", r1 "3" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c1="3".c0 and "0".c1=209; 9 | SELECT SUM("2".c0) FROM r6 "0", r1 "1", r11 "2", r5 "3" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "1".c0="3".c1 and "0".c0>44809; 10 | SELECT SUM("0".c2), SUM("0".c2) FROM r3 "0", r1 "1" WHERE "0".c1="1".c0 and "0".c2<3071; 11 | 12 | SELECT SUM("2".c0), SUM("0".c1), SUM("2".c1) FROM r3 "0", r1 "1", r12 "2" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "0".c0>26374; 13 | SELECT SUM("0".c4), SUM("0".c0), SUM("1".c0) FROM r7 "0", r0 "1" WHERE "0".c1="1".c0 and "0".c4<9936; 14 | SELECT SUM("1".c2), SUM("2".c3) FROM r2 "0", r1 "1", r9 "2" WHERE "0".c1="1".c0 and "1".c0="2".c2 and "0".c1=10731; 15 | SELECT SUM("1".c2) FROM r5 "0", r1 "1" WHERE "0".c1="1".c0 and "0".c2=4531; 16 | SELECT SUM("1".c2), SUM("2".c5), SUM("3".c5) FROM r3 "0", r0 "1", r13 "2", r13 "3" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "2".c1="3".c2 and "0".c2<74; 17 | SELECT SUM("0".c1), SUM("0".c3), SUM("0".c0) FROM r9 "0", r1 "1" WHERE "0".c2="1".c0 and "0".c1=1574; 18 | SELECT SUM("1".c1), SUM("0".c1) FROM r0 "0", r5 "1" WHERE "0".c0="1".c2 and "1".c3=9855; 19 | SELECT SUM("0".c0), SUM("0".c2), SUM("2".c3) FROM r11 "0", r0 "1", r2 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c1<5283; 20 | SELECT SUM("1".c1), SUM("1".c2), SUM("2".c5) FROM r8 "0", r0 "1", r7 "2" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "0".c3>10502; 21 | SELECT SUM("1".c0) FROM r9 "0", r1 "1", r11 "2" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "1".c0="0".c2 and "0".c3>3991; 22 | SELECT SUM("1".c1), SUM("0".c1), SUM("0".c1) FROM r4 "0", r1 "1" WHERE "0".c1="1".c0 and "0".c1<5730; 23 | SELECT SUM("2".c2), SUM("3".c2) FROM r3 "0", r1 "1", r5 "2", r7 "3" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "1".c0="3".c2 and "0".c2=4273; 24 | 25 | SELECT SUM("2".c0) FROM r9 "0", r1 "1", r12 "2" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "2".c2="1".c0 and "0".c2<2685; 26 | SELECT SUM("0".c2), SUM("1".c3) FROM r1 "0", r12 "1", r2 "2" WHERE "0".c0="1".c2 and "0".c0="2".c1 and "1".c1="0".c0 and "1".c0>25064; 27 | SELECT SUM("0".c0) FROM r2 "0", r0 "1" WHERE "0".c2="1".c0 and "0".c2<787; 28 | SELECT SUM("1".c0), SUM("1".c1), SUM("0".c2) FROM r1 "0", r6 "1" WHERE "0".c0="1".c1 and "1".c1>10707; 29 | SELECT SUM("2".c3), SUM("0".c0) FROM r13 "0", r0 "1", r3 "2" WHERE "0".c1="1".c0 and "1".c0="2".c2 and "0".c4=10571; 30 | SELECT SUM("2".c1), SUM("0".c1), SUM("0".c2) FROM r12 "0", r1 "1", r6 "2", r12 "3" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "0".c1="3".c2 and "3".c0<33199; 31 | SELECT SUM("3".c3), SUM("2".c2) FROM r11 "0", r0 "1", r10 "2", r8 "3" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "1".c0="3".c2 and "0".c0<9872; 32 | SELECT SUM("1".c0) FROM r11 "0", r0 "1", r2 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c1<4217; 33 | SELECT SUM("1".c0), SUM("1".c2), SUM("0".c2) FROM r10 "0", r0 "1" WHERE "0".c2="1".c0 and "0".c2>1791; 34 | SELECT SUM("1".c0) FROM r7 "0", r1 "1", r3 "2" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "0".c3<8722; 35 | SELECT SUM("0".c0), SUM("1".c2) FROM r4 "0", r1 "1", r9 "2" WHERE "0".c1="1".c0 and "1".c0="2".c2 and "0".c1>345; 36 | SELECT SUM("3".c2) FROM r11 "0", r1 "1", r12 "2", r10 "3" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "1".c0="3".c1 and "0".c2=598; 37 | SELECT SUM("1".c2), SUM("1".c2) FROM r7 "0", r0 "1", r9 "2" WHERE "0".c1="1".c0 and "1".c0="0".c1 and "1".c0="2".c1 and "0".c1>3791; 38 | 39 | SELECT SUM("0".c2) FROM r8 "0", r0 "1", r11 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c3=9477; 40 | SELECT SUM("3".c2), SUM("0".c0) FROM r0 "0", r13 "1", r7 "2", r10 "3" WHERE "0".c0="1".c2 and "0".c0="2".c1 and "0".c0="3".c2 and "1".c2>295; 41 | SELECT SUM("2".c3), SUM("2".c1) FROM r7 "0", r1 "1", r3 "2" WHERE "0".c2="1".c0 and "1".c0="2".c1 and "1".c0="0".c2 and "0".c2>6082; 42 | SELECT SUM("2".c0), SUM("3".c1) FROM r0 "0", r7 "1", r10 "2", r5 "3" WHERE "0".c0="1".c1 and "0".c0="2".c2 and "0".c0="3".c2 and "1".c3=8728; 43 | SELECT SUM("1".c0), SUM("1".c0), SUM("3".c0) FROM r1 "0", r4 "1", r9 "2", r8 "3" WHERE "0".c0="1".c1 and "0".c0="2".c2 and "0".c0="3".c1 and "1".c1>2936; 44 | SELECT SUM("1".c2), SUM("0".c1) FROM r4 "0", r1 "1" WHERE "0".c1="1".c0 and "0".c1<9795; 45 | SELECT SUM("0".c1) FROM r11 "0", r1 "1" WHERE "0".c1="1".c0 and "0".c1<1688; 46 | SELECT SUM("1".c0), SUM("0".c3) FROM r5 "0", r0 "1" WHERE "0".c2="1".c0 and "0".c0<1171; 47 | SELECT SUM("2".c1), SUM("0".c1), SUM("0".c0) FROM r4 "0", r1 "1", r6 "2" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "0".c0<13500; 48 | SELECT SUM("1".c5) FROM r13 "0", r13 "1" WHERE "0".c1="1".c2 and "1".c6=8220; 49 | 50 | SELECT SUM("1".c0), SUM("1".c1), SUM("1".c0) FROM r11 "0", r0 "1", r8 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c2>4041; 51 | SELECT SUM("0".c3), SUM("2".c0) FROM r8 "0", r0 "1", r10 "2" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "0".c3<9473; 52 | SELECT SUM("1".c2) FROM r5 "0", r1 "1", r8 "2" WHERE "0".c1="1".c0 and "1".c0="2".c1 and "0".c1<3560; 53 | SELECT SUM("2".c0), SUM("2".c3), SUM("1".c2) FROM r13 "0", r0 "1", r2 "2" WHERE "0".c2="1".c0 and "1".c0="0".c1 and "1".c0="2".c2 and "0".c1>4477; 54 | SELECT SUM("3".c3), SUM("2".c1), SUM("3".c6) FROM r8 "0", r0 "1", r13 "2", r13 "3" WHERE "0".c2="1".c0 and "1".c0="2".c2 and "2".c1="3".c2 and "0".c1>7860; 55 | 56 | -------------------------------------------------------------------------------- /img/cost.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giorgospan/Radix-Hash-Join/1e81f4d8e87ea542e4bf25f4292f925f7f7eb29f/img/cost.png -------------------------------------------------------------------------------- /img/plot1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giorgospan/Radix-Hash-Join/1e81f4d8e87ea542e4bf25f4292f925f7f7eb29f/img/plot1.png -------------------------------------------------------------------------------- /img/plot2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giorgospan/Radix-Hash-Join/1e81f4d8e87ea542e4bf25f4292f925f7f7eb29f/img/plot2.png -------------------------------------------------------------------------------- /img/radix_hash_join.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/giorgospan/Radix-Hash-Join/1e81f4d8e87ea542e4bf25f4292f925f7f7eb29f/img/radix_hash_join.png --------------------------------------------------------------------------------