└── README.txt /README.txt: -------------------------------------------------------------------------------- 1 | 2 | This is a list of papers I would like to implement, or would like to have an 3 | implementation of. This list is likely to change as my interests change, 4 | including deletions. Do not expect this list to remain static. 5 | 6 | These are in no particular order: 7 | 8 | SONIK: Efficient In-situ All Item Rank Generation using Bit Operations 9 | - https://arxiv.org/abs/1605.06992 10 | 11 | CAMP: A Cost Adaptive Multi-Queue Eviction Policy for Key-Value Stores 12 | - http://dblab.usc.edu/users/papers/CAMPTR.pdf 13 | 14 | SimString: A fast and simple algorithm for approximate string matching/retrieval 15 | http://www.chokkan.org/software/simstring/ 16 | 17 | Simpira: cryptographic permutations designed to be fast on modern 64-bit 18 | processors, yet provide a comfortable security margin against all 19 | currently-known attacks. 20 | - http://mouha.be/simpira/ 21 | 22 | Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives 23 | - https://arxiv.org/abs/1705.03934 24 | 25 | Adaptive Cuckoo-Filters 26 | - https://arxiv.org/abs/1704.06818 27 | 28 | Continuous Top-k Queries over Real-Time Web Streams 29 | - https://arxiv.org/abs/1610.06500 30 | 31 | A practical index for approximate dictionary matching with few mismatches 32 | - https://arxiv.org/abs/1501.04948 33 | 34 | Robust benchmarking in noisy environments 35 | - https://arxiv.org/abs/1608.04295 36 | 37 | Fast intersection of sorted lists with SSE: 38 | - https://highlyscalable.wordpress.com/2012/06/05/fast-intersection-sorted-lists-sse/ 39 | - Also, https://arxiv.org/abs/1401.6399 40 | 41 | PAD: Performance Anomaly Detection in Multi-Server Distributed Systems 42 | https://www.microsoft.com/en-us/research/wp-content/uploads/2014/06/PAD-Performance-Anomaly-Detection-in-Multi-Server-Distributed-Systems.pdf 43 | 44 | Detecting Abnormal Machine Characteristics in Cloud Infrastructures 45 | - https://ti.arc.nasa.gov/publications/4268/download/ 46 | 47 | PerfAugur: Robust Diagnostics for Performance Anomalies in Cloud Services 48 | - https://www.microsoft.com/en-us/research/publication/perfaugur-robust-diagnostics-for-performance-anomalies-in-cloud-services/ 49 | 50 | Statistical Techniques for Online Anomaly Detection in Data Centers 51 | - http://www.hpl.hp.com/techreports/2011/HPL-2011-8.pdf 52 | 53 | Fast table-driven base64 encoding/decoding: 54 | - https://github.com/powturbo/TurboBase64/blob/master/turbob64d.c 55 | 56 | Assembly versions of hash functions / cryptographic algorithms: 57 | - t1ha (Go version: https://github.com/dgryski/go-t1ha ) 58 | - rc5 / rc6 (Go version: https://github.com/dgryski/go-rc5 / https://github.com/dgryski/go-rc6 ) 59 | 60 | In-memory data layout for Netflix's Hollow: 61 | - http://hollow.how/advanced-topics/#in-memory-data-layout 62 | 63 | Omnisearch Index Formats 64 | - https://blog.twitter.com/2016/omnisearch-index-formats 65 | 66 | NORX8 and NORX16: Authenticated Encryption for Low-End Systems 67 | - https://eprint.iacr.org/2015/1154 68 | 69 | LightMAC: A MAC Mode for Lightweight Block Ciphers: 70 | - https://eprint.iacr.org/2016/190.pdf 71 | 72 | Fast Deterministic Selection (adaptive QuickSelect) 73 | - https://arxiv.org/abs/1606.00484 74 | 75 | A Bloom filter based semi-index on q-grams 76 | - https://arxiv.org/abs/1507.02989 77 | 78 | Faster Population Counts using AVX2 Instructions 79 | - https://arxiv.org/abs/1611.07612 80 | 81 | Quasi-Succinct Indices (compressed inverted indexes): 82 | - http://vigna.di.unimi.it/ftp/papers/QuasiSuccinctIndices.pdf 83 | 84 | Efficient Summing over Sliding Windows (stream statistics) 85 | - http://arxiv.org/pdf/1604.02450v1.pdf 86 | 87 | A Novel Technique for Long-Term Anomaly Detection in the Cloud 88 | - https://www.usenix.org/system/files/conference/hotcloud14/hotcloud14-vallis.pdf 89 | - Twitter's anomaly detection algorithm 90 | - related, http://www.ebaytechblog.com/2015/08/19/statistical-anomaly-detection/ 91 | - related, http://nerds.airbnb.com/anomaly-detection/ 92 | 93 | TinySet - An Access Efficient Self Adjusting Bloom Filter Construction 94 | - http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2015/CS/CS-2015-03.pdf 95 | 96 | Detecting Change in Data Streams: 97 | - https://cs.uwaterloo.ca/~shai/vldb04.pdf 98 | 99 | Hierarchical Delta Debugging: 100 | - https://blog.acolyer.org/2015/11/17/hierarchical-delta-debugging/ 101 | - (to go with https://github.com/dgryski/go-ddmin ) 102 | 103 | FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space 104 | - http://cs.fit.edu/~pkc/papers/tdm04.pdf 105 | - many implementations to use as base, for example https://github.com/slaypni/fastdtw/blob/master/fastdtw.py 106 | 107 | Mining frequent items in the time fading model 108 | - http://arxiv.org/pdf/1601.03892v1.pdf 109 | 110 | Hierarchical Agglomerative Clustering: 111 | - http://nlp.stanford.edu/IR-book/html/htmledition/hierarchical-agglomerative-clustering-1.html 112 | - needed for https://www.microsoft.com/en-us/research/wp-content/uploads/2016/07/rebucket-icse2012.pdf 113 | - preliminary implementation of rebucket: https://github.com/dgryski/go-rebucket 114 | 115 | Balanced Allocation: Patience is not a Virtue (FirstDiff load balancing): 116 | - http://arxiv.org/abs/1602.08298 117 | 118 | Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream 119 | - http://www.cs.ubc.ca/~xujian/paper/quant.pdf 120 | 121 | The Eternal Sunshine of the Sketch Data Structure 122 | - http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.2889&rep=rep1&type=pdf 123 | 124 | Copysets and Chainsets: A Better Way to Replicate 125 | http://hackingdistributed.com/2014/02/14/chainsets/ 126 | 127 | A Fast Algorithm for Approximate Quantiles in High Speed Data Streams 128 | - http://web.cs.ucla.edu/~weiwang/paper/SSDBM07_2.pdf 129 | - this algorithm has haunted me for ages, I could never get my code working 130 | - unresponsive authors, details missing from papers, etc 131 | - there now appear to be more implementations that could be used as a base 132 | --------------------------------------------------------------------------------