├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2018 Thomas Fan 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Python Data Science 2 | 3 | A curated list of Python libraries used for data science. 4 | 5 | ## Contents 6 | 7 | - [Machine Learning Frameworks](#machine-learning-frameworks) 8 | - [Scientific](#scientific) 9 | - [Outlier Detection](#outliter-detection) 10 | - [Deep Learning Frameworks](#deep-learning-frameworks) 11 | - [Deep Learning Tools](#deep-learning-tools) 12 | - [Deep Learning Projects](#deep-learning-projects) 13 | - [Visualization](#visualization) 14 | - [AutoML](#automl) 15 | - [Exploration](#exploration) 16 | - [Feature Extraction](#feature-extraction) 17 | - [Trading](#trading) 18 | - [Misc](#misc) 19 | - [Deployment](#deployment) 20 | - [Profiling](#profiling) 21 | - [Python Tools](#python-tools) 22 | - [Data Gathering](#data-gathering) 23 | 24 | ## Machine Learning Frameworks 25 | 26 | - [scikit-learn](http://scikit-learn.org/stable/) - Machine learning. 27 | - [CatBoost](https://catboost.yandex) - Gradient boosting library with categorical features support. 28 | - [LightGBM](http://lightgbm.readthedocs.io) - Fast, distributed, high performance gradient boosting. 29 | - [Xgboost](https://xgboost.readthedocs.io/en/latest/) - Scalable, Portable and Distributed Gradient Boosting. 30 | - [PyMC](https://github.com/pymc-devs/pymc3) - Probabilistic Programming. 31 | - [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics. 32 | - [SymPy](https://github.com/sympy/sympy) - A computer algebra system. 33 | - [NetworkX](https://networkx.github.io/) - Creation, manipulation, and study of the structure, dynamics, and functions of complex networks. 34 | - [dask-ml](https://github.com/dask/dask-ml) - Distributed and parallel machine learning. 35 | - [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Perform under sampling and over sampling. 36 | - [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear models. 37 | - [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize) - Sequential model-based optimization with a `scipy.optimize` interface. 38 | - [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) - Global optimization with gaussian processes. 39 | - [gplearn](https://github.com/trevorstephens/gplearn) - Genetic Programming. 40 | - [python-glmnet](https://github.com/civisanalytics/python-glmnet) - glmnet package for fitting generalized linear models. 41 | - [hmmlearn](https://github.com/hmmlearn/hmmlearn) - Hidden Markov Models. 42 | - [vecstack](https://github.com/vecxoz/vecstack) - stacking (machine learning technique). 43 | - [modAL](https://github.com/cosmic-cortex/modAL) - Modular Active Learning framework 44 | - [deap](https://github.com/DEAP/deap) - Evolutionary computation framework. 45 | - [pyro](https://github.com/uber/pyro) - Deep universal probabilistic programming with PyTorch. 46 | - [civisml-extensions](https://github.com/civisanalytics/civisml-extensions) - scikit-learn-compatible estimators from Civis Analytics. 47 | - [hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) - Hyper-parameter optimization for sklearn. 48 | - [scikit-survival](https://github.com/sebp/scikit-survival) - Survival analysis built on top of scikit-learn. 49 | - [dstoolbox](https://github.com/ottogroup/dstoolbox) - Tools that make working with scikit-learn and pandas easier. 50 | - [modin](https://github.com/modin-project/modin) - Unify the way you interact with your data. 51 | - [pyomo](https://github.com/Pyomo/pyomo) - Python Optimization MOdels. 52 | - [BAMBI](https://github.com/bambinos/bambi) - BAyesian Model-Building Interface. 53 | - [combo](https://github.com/yzhao062/combo) - A Python Toolbox for Machine Learning Model Combination. 54 | - [fastai](https://github.com/fastai/fastai) - The fast.ai deep learning library, lessons, and tutorials. 55 | - [pycaret](https://github.com/pycaret/pycaret) - Low-code machine learning library in Python. 56 | - [river](https://github.com/online-ml/river) - River is a Python library for online machine learning. 57 | 58 | ## Scientific 59 | 60 | - [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python. 61 | - [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering. 62 | - [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools. 63 | - [Numba](http://numba.pydata.org/) - NumPy aware dynamic Python compiler using LLVM. 64 | - [blaze](https://github.com/blaze/blaze) - NumPy and Pandas for databases. 65 | - [astropy](http://www.astropy.org/) - Astronomy and astrophysics. 66 | - [Biopython](http://biopython.org) - Astronomy and astrophysics. 67 | - [PyDy](http://www.pydy.org) - Multibody Dynamics. 68 | - [nilearn](https://github.com/nilearn/nilearn) - NeuroImaging. 69 | - [patsy](https://github.com/pydata/patsy) - Describing statistical models using symbolic formulas. 70 | - [numexpr](https://github.com/pydata/numexpr) - Fast numerical array expression evaluator. 71 | - [dask](https://github.com/dask/dask) - Parallel computing with task scheduling. 72 | - [or-tools](https://github.com/google/or-tools) - Google's Operations Research tools. Classical CS algorithms. 73 | - [cvxpy](https://github.com/cvxgrp/cvxpy) - Python-embedded modeling language for convex optimization problems. 74 | 75 | ## Outlier Detection 76 | 77 | - [PyOD](https://github.com/yzhao062/pyod) - Versatile Python library for detecting anomalies in multivariate data. 78 | - [DeepOD](https://github.com/xuhongzuo/DeepOD) - Deep learning-based outlier/anomaly detection 79 | 80 | ## Deep Learning Frameworks 81 | 82 | - [Tensorflow](https://github.com/tensorflow/tensorflow) - DL Framework. 83 | - [PyTorch](http://pytorch.org) - DL Framework. 84 | - [Keras](https://keras.io) - High-level neutral networks API. 85 | - [tensorlayer](https://github.com/tensorlayer/tensorlayer) - A Deep Learning and Reinforcement Learning Library for Researchers and Engineers. 86 | - [mxnet](https://mxnet.incubator.apache.org) - Apache MXNet: A flexible and efficient library for deep learning. 87 | 88 | ## Deep Learning Tools 89 | 90 | - [TorchDrift](https://github.com/torchdrift/torchdrift/) - TorchDrift is a data and concept drift library for PyTorch. 91 | - [Edward](https://github.com/blei-lab/edward) - Probabilistic programming language in TensorFlow. 92 | - [pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling. 93 | - [skorch](https://github.com/dnouri/skorch) - Scikit-learn PyTorch. 94 | - [DLTK](https://github.com/DLTK/DLTK) - Deep Learning Toolkit for Medical Image Analysis. 95 | - [sonnet](https://github.com/deepmind/sonnet) - TensorFlow-based neural network library. 96 | - [rasa_core](https://github.com/RasaHQ/rasa_core) - Dialogue engine. 97 | - [luminoth](https://github.com/tryolabs/luminoth) - Computer Vision. 98 | - [allennlp](https://github.com/allenai/allennlp) - NLP Research library. 99 | - [spotlight](https://github.com/maciejkula/spotlight) - Pytorch Recommender framework. 100 | - [tensorforce](https://github.com/reinforceio/tensorforce) - TensorFlow library for applied reinforcement learning. 101 | - [tensorboard-pytorch](https://github.com/lanpa/tensorboard-pytorch) - Tensorboard for pytorch. 102 | - [keras-vis](https://github.com/raghakot/keras-vis) - Neural network visualization toolkit for keras. 103 | - [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt. 104 | - [spaCy](https://spacy.io) - Natural Language processing. 105 | - [tensorboard_logger](https://github.com/TeamHG-Memex/tensorboard_logger) - Log TensorBoard events without touching TensorFlow. 106 | - [foolbox](https://github.com/bethgelab/foolbox) - Python toolbox to create adversarial examples that fool neural networks. 107 | - [pytorch/vision](https://github.com/pytorch/vision) - Datasets, Transforms and Models specific to Computer Vision. 108 | - [gluon-nlp](https://github.com/dmlc/gluon-nlp) - NLP made easy. 109 | - [pytorch/ignite](https://github.com/pytorch/ignite) - High-level library to help with training neural networks in PyTorch. 110 | - [Netron](https://github.com/lutzroeder/Netron) - Visualizer for deep learning and machine learning models. 111 | - [gpytorch](https://github.com/cornellius-gp/gpytorch) - A highly efficient and modular implementation of Gaussian Processes in PyTorch. 112 | - [tensorly](https://github.com/tensorly/tensorly) - Tensor Learning in Python. 113 | - [einops](https://github.com/arogozhnikov/einops) - Deep learning operations reinvented. 114 | - [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras. 115 | - [segmentation_models.pytorch](https://github.com/qubvel/segmentation_models.pytorch) - Segmentation models with pretrained backbones. 116 | - [pytorch-lightning](https://github.com/williamFalcon/pytorch-lightning) - The lightweight PyTorch wrapper. 117 | - [lightly](https://docs.lightly.ai/index.html) - Lightly is a computer vision framework for self-supervised learning. 118 | 119 | ## Deep Learning Projects 120 | 121 | - [fairseq](https://github.com/pytorch/fairseq) - Sequence-to-Sequence Toolkit. 122 | - [tensorflow-wavenet](https://github.com/ibab/tensorflow-wavenet) - DeepMind's WaveNet. 123 | - [DeepRecommender](https://github.com/NVIDIA/DeepRecommender) - Recommender systems. 124 | - [DrQA](https://github.com/facebookresearch/DrQA) - Reading Wikipedia to Answer Open-Domain Questions. 125 | - [vqa.pytorch](https://github.com/Cadene/vqa.pytorch) - Visual Question Answering in Pytorch. 126 | - [Half-Life Regression](https://github.com/duolingo/halflife-regression) - Model for spaced repetition practice. 127 | - [learning-to-learn](https://github.com/deepmind/learning-to-learn) - Learning to Learn in Tensorflow. 128 | - [capsule-networks](https://github.com/gram-ai/capsule-networks) - A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules". 129 | - [Mask_RCNN](https://github.com/matterport/Mask_RCNN) - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. 130 | - [lightnet](https://github.com/explosion/lightnet) - Bringing pjreddie's DarkNet out of the shadows. 131 | - [pytorch-openai-transformer-lm](https://github.com/huggingface/pytorch-openai-transformer-lm) - OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI. 132 | - [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) - Fast, modular reference implementation of Semantic Segmentation and Object Detection algorithm in PyTorch. 133 | - [LovaszSoftmax](https://github.com/bermanmaxim/LovaszSoftmax) - Lovász-Softmax loss. 134 | - [ludwing](https://github.com/uber/ludwig) - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. 135 | 136 | ## Visualization 137 | 138 | - [Great Tables](https://github.com/posit-dev/great-tables) - Absolutely Delightful Table-making in Python. 139 | - [PyGWalker](https://docs.kanaries.net/pygwalker) - Turns pandas and polars dataframes into a Tableau-like user interface for visual exploration. 140 | - [diagrams](https://github.com/mingrammer/diagrams) - Diagrams lets you draw the cloud system architecture in Python code. 141 | - [matplotlib](http://matplotlib.org/) - 2D plotting. 142 | - [seaborn](https://seaborn.pydata.org) - Visualization library. 143 | - [bokeh](https://github.com/bokeh/bokeh) - Interactive web plotting. 144 | - [plotly](https://plot.ly/python/) - Collaborative web plotting. 145 | - [dash](https://github.com/plotly/dash) - Interactive Web plotting. 146 | - [altair](https://github.com/altair-viz/altair) - Declarative statistical visualization. 147 | - [folium](https://github.com/python-visualization/folium) - Leaflet.js Maps. 148 | - [geoplot](https://github.com/ResidentMario/geoplot) - High-level geospatial data visualization. 149 | - [datashader](http://datashader.org) - Graphics pipeline system. 150 | - [mplleaftlet](https://github.com/jwass/mplleaflet) - Matplotlib plots from Python into interactive Leaflet web maps. 151 | - [matplotlib-venn](https://github.com/konstantint/matplotlib-venn) - Area-weighted venn-diagrams. 152 | - [pyLDAvis](https://github.com/bmabey/pyLDAvis) - Interactive topic model visualization. 153 | - [cufflinks](https://github.com/santosjorge/cufflinks) - Productivity Tools for Plotly + Pandas. 154 | - [scatterText](https://github.com/JasonKessler/scattertext) - Visualizations of how language differs among document types. 155 | - [plotnine](https://github.com/has2k1/plotnine) - ggplot for python. 156 | - [mizani](https://github.com/has2k1/mizani) - scales package. 157 | - [bqplot](https://github.com/bloomberg/bqplot) - Plotting library for IPython/Jupyter Notebooks. 158 | - [PtitPrince](https://github.com/pog87/PtitPrince) - Raindrop cloud. 159 | - [joypy](https://github.com/sbebo/joypy) - Ridgeline plots. 160 | - [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation. 161 | - [ipyvolume](https://github.com/maartenbreddels/ipyvolume) - 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL. 162 | 163 | ## AutoML 164 | 165 | - [Nevergrad](https://github.com/facebookresearch/nevergrad) - Gradient-free optimization. 166 | - [featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering. 167 | - [auto-sklearn](https://github.com/automl/auto-sklearn) - Automated machine learning. 168 | - [tpot](https://github.com/EpistasisLab/tpot) - Automated machine learning. 169 | - [auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning. 170 | - [MLBox](https://github.com/AxeldeRomblay/MLBox) - Automated Machine Learning python library. 171 | - [devol](https://github.com/joeddav/devol) - Automated deep neural network design via genetic programming. 172 | - [skll](https://github.com/EducationalTestingService/skll) - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments. 173 | - [autokeras](https://github.com/jhfjhfj1/autokeras) - Automated machine learning in Keras. 174 | - [SMAC3](https://github.com/automl/SMAC3) - Sequential Model-based Algorithm Configuration. 175 | 176 | ## Exploration 177 | 178 | - [mlxtend](https://github.com/rasbt/mlxtend) - A library of extension and helper modules for Python's data analysis and machine learning libraries. 179 | - [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visual analysis and diagnostic tools. 180 | - [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling) - Profiling reports for pandas DataFrame objects. 181 | - [Skater](https://github.com/datascienceinc/Skater) - Model Agnostic Interpretation. 182 | - [Dora](https://github.com/NathanEpstein/Dora) - Exploratory data analysis. 183 | - [sklearn-evaluation](https://github.com/edublancas/sklearn-evaluation) - scikit-learn model evaluation. 184 | - [fitter](http://pythonhosted.org/fitter/) - simple class to identify the distribution from which a data samples is generated from. 185 | - [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization. 186 | - [hypertools](https://github.com/ContextLab/hypertools) - Gaining geometric insights into high-dimensional data. 187 | - [scikit-plot](https://github.com/reiinakano/scikit-plot) - Plotting functionality to scikit-learn objects. 188 | - [elih](https://github.com/fvinas/elih) - Explain Machine Learning. 189 | - [kmeans_smote](https://github.com/felix-last/kmeans_smote) - Oversampling for imbalanced learning based on k-means and SMOTE. 190 | - [pyUpSet](https://github.com/ImSoErgodic/py-upset) - UpSet suite of visualisation methods. 191 | - [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier. 192 | - [pandas-summary](https://github.com/mouradmourafiq/pandas-summary) - An extension to pandas dataframes describe function. 193 | - [SauceCat/PDPbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox. 194 | - [shap](https://github.com/slundberg/shap) - A unified approach to explain the output of any machine learning model. 195 | - [eli5](https://github.com/TeamHG-Memex/eli5) - Debug machine learning classifiers and explain their predictions. 196 | - [rfpimp](https://github.com/parrt/random-forest-importances) - Permutation and drop-column importance for scikit-learn random forests. 197 | - [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines made easy. 198 | - [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix library in Python. 199 | - [great_expectations](https://github.com/great-expectations/great_expectations) - Always know what to expect from your data. 200 | - [alibi](https://github.com/SeldonIO/alibi) - Algorithms for monitoring and explaining machine learning models. 201 | - [InterpretML](https://github.com/interpretml/interpret) - Fit interpretable models. Explain blackbox machine learning. 202 | - [cleanlab](https://github.com/cgnorthcutt/cleanlab) - Finding label errors in datasets and learning with noisy labels. 203 | - [dtale](https://github.com/man-group/dtale) - Flask/React client for visualizing pandas data structures 204 | - [dabl](https://github.com/dabl/dabl) - Data Analysis Baseline Library 205 | - [XAI](https://github.com/EthicalML/xai) - XAI - An eXplainability toolbox for machine learning 206 | - [explainerdashboard](https://github.com/oegedijk/explainerdashboard) - This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model. 207 | - [alibi-detect](https://github.com/SeldonIO/alibi-detect) - Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. 208 | 209 | ## Feature Extraction 210 | 211 | ### General Feature Extraction 212 | 213 | - [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Pandas integration with sklearn. 214 | - [pdpipe](https://github.com/shaypal5/pdpipe) - Easy pipelines for pandas DataFrames. 215 | - [engarde](https://github.com/TomAugspurger/engarde) - Defensive data analysis. 216 | - [datacleaner](https://github.com/rhiever/datacleaner) - Tool that automatically cleans data sets and readies them for analysis. 217 | - [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - sklearn compatible categorical variable encoders. 218 | - [fancyimpute](https://github.com/iskandr/fancyimpute) - Multivariate imputation and matrix completion algorithms. 219 | - [raccoon](https://github.com/rsheftel/raccoon) - DataFrame with fast insert and appends. 220 | - [kmodes](https://github.com/nicodv/kmodes) - k-modes and k-prototypes clustering algorithm. 221 | - [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors. 222 | - [datacleaner](https://github.com/rhiever/datacleaner) - Automatically cleans data sets and readies them for analysis. 223 | - [scikit-feature](https://github.com/jundongl/scikit-feature) - Filter methods for feature selection. 224 | - [mifs](https://github.com/danielhomola/mifs) - Parallelized Mutual Information based Feature Selection module. 225 | - [skggm](https://github.com/skggm/skggm) - Scikit-learn compatible estimation of general graphical models. 226 | - [dirty_cat](https://dirty-cat.github.io/stable/index.html) - Encoding methods for dirty categorical variables. 227 | - [Impyute](https://github.com/eltonlaw/impyute) - Data imputations library to preprocess datasets with missing data. 228 | - [eif](https://github.com/sahandha/eif) - Extended Isolation Forest for Anomaly Detection. 229 | - [featexp](https://github.com/abhayspawar/featexp) - Feature exploration for supervised learning. 230 | - [feature_engine](https://github.com/solegalli/feature_engine) - Feature engineering package with sklearn like functionality. 231 | - [stumpy](https://github.com/TDAmeritrade/stumpy) - STUMPY is a powerful and scalable Python library that can be used for a variety of time series data mining tasks. 232 | - [n2](https://github.com/kakao/n2) - Lightweight approximate Nearest Neighbor library which runs faster even with large datasets. 233 | - [compressio](https://github.com/dylan-profiler/compressio) - Compressio provides lossless in-memory compression of pandas DataFrames and Series. 234 | 235 | ### Time Series 236 | 237 | - [Merlion](https://github.com/salesforce/Merlion) - A Machine Learning Library for Time Series 238 | - [Darts](https://github.com/unit8co/darts) - darts is a Python library for easy manipulation and forecasting of time series. 239 | - [GrayKite](https://github.com/linkedin/greykite) - Greykite: A flexible, intuitive and fast forecasting library 240 | - [Causality](https://github.com/akelleh/causality) - Causal analysis. 241 | - [traces](https://github.com/datascopeanalytics/traces) - Unevenly-spaced time series analysis. 242 | - [PyFlux](https://github.com/RJT1990/pyflux) - Time series library for Python. 243 | - [prophet](https://github.com/facebook/prophet) - Tool for producing high quality forecasts. 244 | - [tsfresh](https://github.com/blue-yonder/tsfresh) - Automatic extraction of relevant features from time series. 245 | - [tslearn](https://github.com/rtavenar/tslearn) - Machine learning toolkit dedicated to time-series data. 246 | - [pyts](https://github.com/johannfaouzi/pyts) - A Python package for time series transformation and classification. 247 | - [sktime](https://github.com/alan-turing-institute/sktime) - A scikit-learn compatible Python toolbox for learning with time series data. 248 | - [stumpy](https://github.com/TDAmeritrade/stumpy) - Matrix profiles. 249 | - [luminaire](https://github.com/zillow/luminaire) - ML driven solutions for monitoring time series data. 250 | - [NeuralProphet](https://github.com/ourownstory/neural_prophet) - A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch. 251 | 252 | ### Audio 253 | 254 | - [python_speech_features](https://github.com/jameslyons/python_speech_features) - Speech features. 255 | - [speechpy](https://github.com/astorfi/speechpy) - A Library for Speech Processing and Recognition. 256 | - [magenta](https://github.com/tensorflow/magenta) - Music and Art Generation with Machine Intelligence. 257 | - [librosa](https://github.com/librosa/librosa) - Audio and music analysis. 258 | - [pydub](https://github.com/jiaaro/pydub) - Manipulate audio with a simple and easy high level interface. 259 | - [pytorch/audio](https://github.com/pytorch/audio) - simple audio I/O for pytorch. 260 | 261 | ### Images and Video 262 | 263 | - [pillow](https://github.com/python-pillow/Pillow) - PIL fork. 264 | - [scikit-image](http://scikit-image.org/) - Image processing. 265 | - [hmap](https://github.com/rossgoodwin/hmap) - Image histogram remapping. 266 | - [pyocr](https://github.com/openpaperwork/pyocr) - A wrapper for Tesseract and Cuneiform (Optical Character Recognition). 267 | - [scikit-video](https://github.com/aizvorski/scikit-video) - Video processing. 268 | - [moviepy](http://zulko.github.io/moviepy/) - Video editing. 269 | - [OpenCV](http://opencv.org/) - Open Source Computer Vision Library. 270 | - [SimpleCV](http://simplecv.org/) - Wrapper around OpenCV. 271 | - [label-maker](https://github.com/developmentseed/label-maker) - Data Preparation for Satellite Machine Learning. 272 | - [face_recognition](https://github.com/ageitgey/face_recognition) - Facial recognition. 273 | - [imgaug](https://github.com/aleju/imgaug) - Image augmentation. 274 | - [pyvips](https://github.com/jcupitt/pyvips) - Fast image processing. 275 | - [ImageHash](https://github.com/JohannesBuchner/imagehash) - Image hashing. 276 | - [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library. 277 | - [PyAV](https://github.com/mikeboers/PyAV) - Bindings for FFmpeg. 278 | - [imutils](https://github.com/jrosebr1/imutils) - Convenience functions to make basic image processing operations. 279 | - [albumentations](https://github.com/albu/albumentations) - fast image augmentation library. 280 | 281 | ### Geolocation 282 | 283 | - [geojson](https://github.com/frewsxcv/python-geojson) - Python bindings for GeoJSON. 284 | - [geopy](https://github.com/geopy/geopy) - Python Geocoding Toolbox. 285 | - [OSMnx](https://github.com/gboeing/osmnx) - Street networks. 286 | - [reverse-geocoder](https://github.com/thampiman/reverse-geocoder) - A fast, offline reverse geocoder. 287 | - [pysal](https://github.com/pysal/pysal) - Spatial Analysis Library. 288 | - [geopandas](https://github.com/geopandas/geopandas) - Tools for geographic data. 289 | 290 | ### Text/NLP 291 | 292 | - [wordfreq](https://github.com/rspeer/wordfreq) - Library for looking up the frequencies of words in many languages, based on many sources of data. 293 | - [BlingFire](https://github.com/Microsoft/BlingFire) - A lightning fast Finite State machine and REgular expression manipulation library. 294 | - [BERT-pytorch](https://github.com/codertimo/BERT-pytorch) - Google AI 2018 BERT pytorch implementation. 295 | - [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) - PyTorch version of Google AI's BERT model with script to load Google's pre-trained models. 296 | - [gensim](https://github.com/piskvorky/gensim) - Topic Modeling. 297 | - [pattern](https://github.com/clips/pattern) - Web ining module. 298 | - [probablepeople](https://github.com/datamade/probablepeople) - Parsing unstructured western names into name components. 299 | - [Expynent](https://github.com/lk-geimfari/expynent) - Regular expression patterns. 300 | - [mimesis](https://github.com/lk-geimfari/mimesis) - Generate synthetic data. 301 | - [pyenchant](https://github.com/rfk/pyenchant) - Spell checking. 302 | - [parserator](https://github.com/datamade/parserator) - Domain-specific probabilistic parsers. 303 | - [scrubadub](https://github.com/datascopeanalytics/scrubadub) - Clean personally identifiable information from dirty dirty text. 304 | - [usaddress](https://github.com/datamade/usaddress) - Parsing unstructured address strings into address components. 305 | - [python-phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Python port of Google's libphonenumber. 306 | - [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate and phonetic matching of strings. 307 | - [preprocessing](https://pronouncing.readthedocs.io/en/latest/) - Simple interface for the CMU Pronouncing Dictionary. 308 | - [langid](https://github.com/saffsd/langid.py) - Stand-alone language identification system. 309 | - [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy String Matching. 310 | - [Fuzzy](https://github.com/yougov/Fuzzy) - Soundex, NYSIIS, Double Metaphone. 311 | - [snowball](https://github.com/snowballstem/snowball) - Snowball compiler and stemming algorithms. 312 | - [leven](https://github.com/semanticize/leven) - Levenshtein edit distance. 313 | - [flashtext](https://github.com/vi3k6i5/flashtext) - Extract Keywords from sentence or Replace keywords in sentences. 314 | - [polyglot](https://github.com/aboSamoor/polyglot) - Multilingual text NLP processing toolkit. 315 | - [sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation. 316 | - [pyfasttext](https://github.com/vrasneur/pyfasttext) - Binding for fastText. 317 | - [python-wordsegment](https://github.com/grantjenks/python-wordsegment) - English word segmentation. 318 | - [pyahocorasick](https://github.com/WojciechMula/pyahocorasick) - Exact or approximate multi-pattern string search. 319 | - [Wordbatch](https://github.com/anttttti/Wordbatch) - Parallel text feature extraction for machine learning. 320 | - [langdetect](https://github.com/Mimino666/langdetect) - Port of Google's language-detection library. 321 | - [translation](https://github.com/littlecodersh/translation) - Uses web services for text translation. 322 | - [nltk](http://www.nltk.org) - Natural Language Toolkit. 323 | - [unidecode](https://github.com/avian2/unidecode) - ASCII transliterations of Unicode text. 324 | - [pytorch/text](https://github.com/pytorch/text) - Data loaders and abstractions for text and NLP. 325 | - [textdistance](https://github.com/orsinium/textdistance) - Compute distance between sequences. 326 | - [sent2vec](https://github.com/epfml/sent2vec) - General purpose unsupervised sentence representations. 327 | - [pyhunspell](https://github.com/blatinier/pyhunspell) - Python bindings for the Hunspell spellchecker engine. 328 | - [facebook/fastText](https://github.com/facebookresearch/fastText) - Library for fast text representation and classification. 329 | - [textblob](https://github.com/sloria/textblob) - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more. 330 | - [facebook/InferSent](https://github.com/facebookresearch/InferSent) - Sentence embeddings (InferSent) and training code for NLI. 331 | - [nmslib](https://github.com/nmslib/nmslib) - Non-Metric Space Library. 332 | - [google/sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation. 333 | - [ftfy](https://github.com/LuminosoInsight/python-ftfy) - Fixes mojibake and other glitches in Unicode text, after the fact. 334 | - [fletcher](https://github.com/xhochy/fletcher) - Pandas ExtensionDType/Array backed by Apache Arrow. 335 | - [textacy](https://github.com/chartbeat-labs/textacy) - NLP, before and after spaCy. 336 | - [hmtl](https://github.com/huggingface/hmtl) - Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP. 337 | - [pytext](https://github.com/facebookresearch/pytext) - A natural language modeling framework based on PyTorch. 338 | - [flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art Natural Language Processing. 339 | - [LASER](https://github.com/facebookresearch/LASER) - Language-Agnostic SEntence Representations. 340 | - [transformer-xl](https://github.com/kimiyoung/transformer-xl) - Attentive Language Models Beyond a Fixed-Length Context. 341 | - [textstat](https://github.com/shivam5992/textstat) - Calculate readability statistics of a text object - paragraphs, sentences, articles. 342 | - [nlpaug](https://github.com/makcedward/nlpaug) - Augmenting nlp for your machine learning projects. 343 | - [sum](https://github.com/miso-belica/sumy) - Automatic summarization of text documents and HTML. 344 | - [textract](https://github.com/deanmalmgren/textract) - Extract text from any document. 345 | - [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation. 346 | 347 | ### Ranking/Recommender 348 | 349 | - [recommenders](https://github.com/microsoft/recommenders) - Examples and best practices for building recommendation systems 350 | - [Surprise](https://github.com/NicolasHug/Surprise) - Analyzing recommender systems. 351 | - [trueskill](https://github.com/sublee/trueskill) - TrueSkill rating system. 352 | - [LightFM](https://github.com/lyst/lightfm) - Hybrid recommendation algorithm. 353 | - [implicit](https://github.com/benfred/implicit) - Collaborative Filtering for Implicit Datasets. 354 | 355 | ## Trading 356 | 357 | - [Clairvoyant](https://github.com/anfederico/Clairvoyant) - Identify and monitor social/historical cues. 358 | - [zipline](https://github.com/quantopian/zipline) - Algorithmic Trading Library. 359 | - [qstrader](https://github.com/mhallsmoore/qstrader/) - Advanced Trading Infrastructure. 360 | 361 | ## Misc 362 | 363 | - [mmh3](https://github.com/hajimes/mmh3) - MurmurHash3, a set of fast and robust hash functions. 364 | - [fbpca](https://github.com/facebook/fbpca) - Fast Randomized PCA/SVD. 365 | - [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors. 366 | - [pipeline](https://github.com/PipelineAI/pipeline) - Standard Runtime For Every Real-Time Machine Learning. 367 | - [crayon](https://github.com/torrvision/crayon) - A language-agnostic interface to TensorBoard. 368 | - [faiss](https://github.com/facebookresearch/faiss) - A library for efficient similarity search and clustering of dense vectors. 369 | - [pyod](https://github.com/yzhao062/pyod) - Comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. 370 | 371 | ## Deployment 372 | 373 | - [evidently](https://github.com/evidentlyai/evidently) - Evidently helps evaluate machine learning models during validation and monitor them in production. 374 | - [onnx](https://github.com/onnx/onnx) - Open Neutral Network Exchange. 375 | - [lore](https://github.com/instacart/lore) - Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers. 376 | - [kubeflow](https://github.com/kubeflow/kubeflow) - Machine Learning Toolkit for Kubernetes. 377 | - [airflow](https://github.com/apache/incubator-airflow) - ETL. 378 | - [mlflow](https://github.com/databricks/mlflow) - Open source platform for the complete machine learning lifecycle. 379 | - [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators. 380 | - [sklearn-compiledtrees](https://github.com/ajtulloch/sklearn-compiledtrees) - Compiled Decision Trees for scikit-learn. 381 | 382 | ## Profiling 383 | 384 | - [mem_usage_ui](https://github.com/parikls/mem_usage_ui) - Measuring and graphing memory usage of local processes. 385 | - [viztracer](https://github.com/gaogaotiantian/viztracer) - VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution. 386 | - [py-spy](https://github.com/benfred/py-spy) - Sampling profiler for Python programs. 387 | - [memory_profiler](https://pypi.python.org/pypi/memory_profiler) - monitoring memory usage of a python program. 388 | - [line_profiler](https://github.com/rkern/line_profiler) - Line-by-line profiling. 389 | - [filprofiler](https://github.com/pythonspeed/filprofiler) - Fil a memory profiler designed for data processing applications. 390 | - [scalene](https://github.com/emeryberger/scalene) - High-performance CPU and memory profiler for Python. 391 | - [python-flamegraph](https://github.com/evanhempel/python-flamegraph) - Statistical profiler which outputs in format suitable for FlameGraph. 392 | 393 | ## Python Tools 394 | 395 | - [Typer](https://github.com/tiangolo/typer) - Build CLIs with type hints. 396 | - [hydra](https://hydra.cc) - Framework for elegantly configuring complex applications. 397 | - [neurtu](https://github.com/symerio/neurtu) - A Python package for parametric benchmarks. 398 | - [pyprojroot](https://github.com/chendaniely/pyprojroot) - Finding project directories in Python. 399 | - [datasette](https://datasette.io) - An open source multi-tool for exploring and publishing data. 400 | - [delorean](https://github.com/myusuf3/delorean) - Time Travel Made Easy. 401 | - [pip-tools](https://github.com/nvie/pip-tools) - Keeps dependencies up to date. 402 | - [devpi](http://doc.devpi.net/latest/) - PyPI server and packaging/testing/release tool. 403 | - [Jupyter Notebook](https://jupyter.org) - Notebooks are awseome. 404 | - [click](https://github.com/pallets/click) - CLI package. 405 | - [sacredboard](https://github.com/chovanecm/sacredboard) - Dashboard for sacred. 406 | - [sacred](http://sacred.readthedocs.io/en/latest/) - Reproduce computational experiments. 407 | - [magic-wormhole](https://github.com/warner/magic-wormhole) - get things from one computer to another, safely. 408 | 409 | ## Data Gathering 410 | 411 | - [gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio. 412 | - [MechanicalSoup](https://github.com/MechanicalSoup/MechanicalSoup) - A Python library for automating interaction with websites. 413 | - [camelot](https://github.com/socialcopsdev/camelot) - Camelot: PDF Table Extraction for Humans. 414 | - [Pandarallel](https://github.com/nalepae/pandarallel) - Parallel pandas. 415 | - [great_expectations](https://github.com/great-expectations/great_expectations) - F framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. 416 | - [parse](https://github.com/r1chardj0n3s/parse) - Parse strings using a specification based on the Python format() syntax. 417 | - [CleverCSV](https://github.com/alan-turing-institute/CleverCSV) - CleverCSV is a Python package for handling messy CSV files 418 | --------------------------------------------------------------------------------