├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | Attribution-NonCommercial-ShareAlike 3.0 Unported 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR 10 | DAMAGES RESULTING FROM ITS USE. 11 | 12 | License 13 | 14 | THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE 15 | COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY 16 | COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS 17 | AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED. 18 | 19 | BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE 20 | TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY 21 | BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS 22 | CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND 23 | CONDITIONS. 24 | 25 | 1. Definitions 26 | 27 | a. "Adaptation" means a work based upon the Work, or upon the Work and 28 | other pre-existing works, such as a translation, adaptation, 29 | derivative work, arrangement of music or other alterations of a 30 | literary or artistic work, or phonogram or performance and includes 31 | cinematographic adaptations or any other form in which the Work may be 32 | recast, transformed, or adapted including in any form recognizably 33 | derived from the original, except that a work that constitutes a 34 | Collection will not be considered an Adaptation for the purpose of 35 | this License. For the avoidance of doubt, where the Work is a musical 36 | work, performance or phonogram, the synchronization of the Work in 37 | timed-relation with a moving image ("synching") will be considered an 38 | Adaptation for the purpose of this License. 39 | b. "Collection" means a collection of literary or artistic works, such as 40 | encyclopedias and anthologies, or performances, phonograms or 41 | broadcasts, or other works or subject matter other than works listed 42 | in Section 1(g) below, which, by reason of the selection and 43 | arrangement of their contents, constitute intellectual creations, in 44 | which the Work is included in its entirety in unmodified form along 45 | with one or more other contributions, each constituting separate and 46 | independent works in themselves, which together are assembled into a 47 | collective whole. A work that constitutes a Collection will not be 48 | considered an Adaptation (as defined above) for the purposes of this 49 | License. 50 | c. "Distribute" means to make available to the public the original and 51 | copies of the Work or Adaptation, as appropriate, through sale or 52 | other transfer of ownership. 53 | d. "License Elements" means the following high-level license attributes 54 | as selected by Licensor and indicated in the title of this License: 55 | Attribution, Noncommercial, ShareAlike. 56 | e. "Licensor" means the individual, individuals, entity or entities that 57 | offer(s) the Work under the terms of this License. 58 | f. "Original Author" means, in the case of a literary or artistic work, 59 | the individual, individuals, entity or entities who created the Work 60 | or if no individual or entity can be identified, the publisher; and in 61 | addition (i) in the case of a performance the actors, singers, 62 | musicians, dancers, and other persons who act, sing, deliver, declaim, 63 | play in, interpret or otherwise perform literary or artistic works or 64 | expressions of folklore; (ii) in the case of a phonogram the producer 65 | being the person or legal entity who first fixes the sounds of a 66 | performance or other sounds; and, (iii) in the case of broadcasts, the 67 | organization that transmits the broadcast. 68 | g. "Work" means the literary and/or artistic work offered under the terms 69 | of this License including without limitation any production in the 70 | literary, scientific and artistic domain, whatever may be the mode or 71 | form of its expression including digital form, such as a book, 72 | pamphlet and other writing; a lecture, address, sermon or other work 73 | of the same nature; a dramatic or dramatico-musical work; a 74 | choreographic work or entertainment in dumb show; a musical 75 | composition with or without words; a cinematographic work to which are 76 | assimilated works expressed by a process analogous to cinematography; 77 | a work of drawing, painting, architecture, sculpture, engraving or 78 | lithography; a photographic work to which are assimilated works 79 | expressed by a process analogous to photography; a work of applied 80 | art; an illustration, map, plan, sketch or three-dimensional work 81 | relative to geography, topography, architecture or science; a 82 | performance; a broadcast; a phonogram; a compilation of data to the 83 | extent it is protected as a copyrightable work; or a work performed by 84 | a variety or circus performer to the extent it is not otherwise 85 | considered a literary or artistic work. 86 | h. "You" means an individual or entity exercising rights under this 87 | License who has not previously violated the terms of this License with 88 | respect to the Work, or who has received express permission from the 89 | Licensor to exercise rights under this License despite a previous 90 | violation. 91 | i. "Publicly Perform" means to perform public recitations of the Work and 92 | to communicate to the public those public recitations, by any means or 93 | process, including by wire or wireless means or public digital 94 | performances; to make available to the public Works in such a way that 95 | members of the public may access these Works from a place and at a 96 | place individually chosen by them; to perform the Work to the public 97 | by any means or process and the communication to the public of the 98 | performances of the Work, including by public digital performance; to 99 | broadcast and rebroadcast the Work by any means including signs, 100 | sounds or images. 101 | j. "Reproduce" means to make copies of the Work by any means including 102 | without limitation by sound or visual recordings and the right of 103 | fixation and reproducing fixations of the Work, including storage of a 104 | protected performance or phonogram in digital form or other electronic 105 | medium. 106 | 107 | 2. Fair Dealing Rights. Nothing in this License is intended to reduce, 108 | limit, or restrict any uses free from copyright or rights arising from 109 | limitations or exceptions that are provided for in connection with the 110 | copyright protection under copyright law or other applicable laws. 111 | 112 | 3. License Grant. Subject to the terms and conditions of this License, 113 | Licensor hereby grants You a worldwide, royalty-free, non-exclusive, 114 | perpetual (for the duration of the applicable copyright) license to 115 | exercise the rights in the Work as stated below: 116 | 117 | a. to Reproduce the Work, to incorporate the Work into one or more 118 | Collections, and to Reproduce the Work as incorporated in the 119 | Collections; 120 | b. to create and Reproduce Adaptations provided that any such Adaptation, 121 | including any translation in any medium, takes reasonable steps to 122 | clearly label, demarcate or otherwise identify that changes were made 123 | to the original Work. For example, a translation could be marked "The 124 | original work was translated from English to Spanish," or a 125 | modification could indicate "The original work has been modified."; 126 | c. to Distribute and Publicly Perform the Work including as incorporated 127 | in Collections; and, 128 | d. to Distribute and Publicly Perform Adaptations. 129 | 130 | The above rights may be exercised in all media and formats whether now 131 | known or hereafter devised. The above rights include the right to make 132 | such modifications as are technically necessary to exercise the rights in 133 | other media and formats. Subject to Section 8(f), all rights not expressly 134 | granted by Licensor are hereby reserved, including but not limited to the 135 | rights described in Section 4(e). 136 | 137 | 4. Restrictions. The license granted in Section 3 above is expressly made 138 | subject to and limited by the following restrictions: 139 | 140 | a. You may Distribute or Publicly Perform the Work only under the terms 141 | of this License. You must include a copy of, or the Uniform Resource 142 | Identifier (URI) for, this License with every copy of the Work You 143 | Distribute or Publicly Perform. You may not offer or impose any terms 144 | on the Work that restrict the terms of this License or the ability of 145 | the recipient of the Work to exercise the rights granted to that 146 | recipient under the terms of the License. You may not sublicense the 147 | Work. You must keep intact all notices that refer to this License and 148 | to the disclaimer of warranties with every copy of the Work You 149 | Distribute or Publicly Perform. When You Distribute or Publicly 150 | Perform the Work, You may not impose any effective technological 151 | measures on the Work that restrict the ability of a recipient of the 152 | Work from You to exercise the rights granted to that recipient under 153 | the terms of the License. This Section 4(a) applies to the Work as 154 | incorporated in a Collection, but this does not require the Collection 155 | apart from the Work itself to be made subject to the terms of this 156 | License. If You create a Collection, upon notice from any Licensor You 157 | must, to the extent practicable, remove from the Collection any credit 158 | as required by Section 4(d), as requested. If You create an 159 | Adaptation, upon notice from any Licensor You must, to the extent 160 | practicable, remove from the Adaptation any credit as required by 161 | Section 4(d), as requested. 162 | b. You may Distribute or Publicly Perform an Adaptation only under: (i) 163 | the terms of this License; (ii) a later version of this License with 164 | the same License Elements as this License; (iii) a Creative Commons 165 | jurisdiction license (either this or a later license version) that 166 | contains the same License Elements as this License (e.g., 167 | Attribution-NonCommercial-ShareAlike 3.0 US) ("Applicable License"). 168 | You must include a copy of, or the URI, for Applicable License with 169 | every copy of each Adaptation You Distribute or Publicly Perform. You 170 | may not offer or impose any terms on the Adaptation that restrict the 171 | terms of the Applicable License or the ability of the recipient of the 172 | Adaptation to exercise the rights granted to that recipient under the 173 | terms of the Applicable License. You must keep intact all notices that 174 | refer to the Applicable License and to the disclaimer of warranties 175 | with every copy of the Work as included in the Adaptation You 176 | Distribute or Publicly Perform. When You Distribute or Publicly 177 | Perform the Adaptation, You may not impose any effective technological 178 | measures on the Adaptation that restrict the ability of a recipient of 179 | the Adaptation from You to exercise the rights granted to that 180 | recipient under the terms of the Applicable License. This Section 4(b) 181 | applies to the Adaptation as incorporated in a Collection, but this 182 | does not require the Collection apart from the Adaptation itself to be 183 | made subject to the terms of the Applicable License. 184 | c. You may not exercise any of the rights granted to You in Section 3 185 | above in any manner that is primarily intended for or directed toward 186 | commercial advantage or private monetary compensation. The exchange of 187 | the Work for other copyrighted works by means of digital file-sharing 188 | or otherwise shall not be considered to be intended for or directed 189 | toward commercial advantage or private monetary compensation, provided 190 | there is no payment of any monetary compensation in con-nection with 191 | the exchange of copyrighted works. 192 | d. If You Distribute, or Publicly Perform the Work or any Adaptations or 193 | Collections, You must, unless a request has been made pursuant to 194 | Section 4(a), keep intact all copyright notices for the Work and 195 | provide, reasonable to the medium or means You are utilizing: (i) the 196 | name of the Original Author (or pseudonym, if applicable) if supplied, 197 | and/or if the Original Author and/or Licensor designate another party 198 | or parties (e.g., a sponsor institute, publishing entity, journal) for 199 | attribution ("Attribution Parties") in Licensor's copyright notice, 200 | terms of service or by other reasonable means, the name of such party 201 | or parties; (ii) the title of the Work if supplied; (iii) to the 202 | extent reasonably practicable, the URI, if any, that Licensor 203 | specifies to be associated with the Work, unless such URI does not 204 | refer to the copyright notice or licensing information for the Work; 205 | and, (iv) consistent with Section 3(b), in the case of an Adaptation, 206 | a credit identifying the use of the Work in the Adaptation (e.g., 207 | "French translation of the Work by Original Author," or "Screenplay 208 | based on original Work by Original Author"). The credit required by 209 | this Section 4(d) may be implemented in any reasonable manner; 210 | provided, however, that in the case of a Adaptation or Collection, at 211 | a minimum such credit will appear, if a credit for all contributing 212 | authors of the Adaptation or Collection appears, then as part of these 213 | credits and in a manner at least as prominent as the credits for the 214 | other contributing authors. For the avoidance of doubt, You may only 215 | use the credit required by this Section for the purpose of attribution 216 | in the manner set out above and, by exercising Your rights under this 217 | License, You may not implicitly or explicitly assert or imply any 218 | connection with, sponsorship or endorsement by the Original Author, 219 | Licensor and/or Attribution Parties, as appropriate, of You or Your 220 | use of the Work, without the separate, express prior written 221 | permission of the Original Author, Licensor and/or Attribution 222 | Parties. 223 | e. For the avoidance of doubt: 224 | 225 | i. Non-waivable Compulsory License Schemes. In those jurisdictions in 226 | which the right to collect royalties through any statutory or 227 | compulsory licensing scheme cannot be waived, the Licensor 228 | reserves the exclusive right to collect such royalties for any 229 | exercise by You of the rights granted under this License; 230 | ii. Waivable Compulsory License Schemes. In those jurisdictions in 231 | which the right to collect royalties through any statutory or 232 | compulsory licensing scheme can be waived, the Licensor reserves 233 | the exclusive right to collect such royalties for any exercise by 234 | You of the rights granted under this License if Your exercise of 235 | such rights is for a purpose or use which is otherwise than 236 | noncommercial as permitted under Section 4(c) and otherwise waives 237 | the right to collect royalties through any statutory or compulsory 238 | licensing scheme; and, 239 | iii. Voluntary License Schemes. The Licensor reserves the right to 240 | collect royalties, whether individually or, in the event that the 241 | Licensor is a member of a collecting society that administers 242 | voluntary licensing schemes, via that society, from any exercise 243 | by You of the rights granted under this License that is for a 244 | purpose or use which is otherwise than noncommercial as permitted 245 | under Section 4(c). 246 | f. Except as otherwise agreed in writing by the Licensor or as may be 247 | otherwise permitted by applicable law, if You Reproduce, Distribute or 248 | Publicly Perform the Work either by itself or as part of any 249 | Adaptations or Collections, You must not distort, mutilate, modify or 250 | take other derogatory action in relation to the Work which would be 251 | prejudicial to the Original Author's honor or reputation. Licensor 252 | agrees that in those jurisdictions (e.g. Japan), in which any exercise 253 | of the right granted in Section 3(b) of this License (the right to 254 | make Adaptations) would be deemed to be a distortion, mutilation, 255 | modification or other derogatory action prejudicial to the Original 256 | Author's honor and reputation, the Licensor will waive or not assert, 257 | as appropriate, this Section, to the fullest extent permitted by the 258 | applicable national law, to enable You to reasonably exercise Your 259 | right under Section 3(b) of this License (right to make Adaptations) 260 | but not otherwise. 261 | 262 | 5. Representations, Warranties and Disclaimer 263 | 264 | UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING AND TO THE 265 | FULLEST EXTENT PERMITTED BY APPLICABLE LAW, LICENSOR OFFERS THE WORK AS-IS 266 | AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE 267 | WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE, INCLUDING, WITHOUT 268 | LIMITATION, WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 269 | PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, 270 | ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT 271 | DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED 272 | WARRANTIES, SO THIS EXCLUSION MAY NOT APPLY TO YOU. 273 | 274 | 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE 275 | LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR 276 | ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES 277 | ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS 278 | BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 279 | 280 | 7. Termination 281 | 282 | a. This License and the rights granted hereunder will terminate 283 | automatically upon any breach by You of the terms of this License. 284 | Individuals or entities who have received Adaptations or Collections 285 | from You under this License, however, will not have their licenses 286 | terminated provided such individuals or entities remain in full 287 | compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will 288 | survive any termination of this License. 289 | b. Subject to the above terms and conditions, the license granted here is 290 | perpetual (for the duration of the applicable copyright in the Work). 291 | Notwithstanding the above, Licensor reserves the right to release the 292 | Work under different license terms or to stop distributing the Work at 293 | any time; provided, however that any such election will not serve to 294 | withdraw this License (or any other license that has been, or is 295 | required to be, granted under the terms of this License), and this 296 | License will continue in full force and effect unless terminated as 297 | stated above. 298 | 299 | 8. Miscellaneous 300 | 301 | a. Each time You Distribute or Publicly Perform the Work or a Collection, 302 | the Licensor offers to the recipient a license to the Work on the same 303 | terms and conditions as the license granted to You under this License. 304 | b. Each time You Distribute or Publicly Perform an Adaptation, Licensor 305 | offers to the recipient a license to the original Work on the same 306 | terms and conditions as the license granted to You under this License. 307 | c. If any provision of this License is invalid or unenforceable under 308 | applicable law, it shall not affect the validity or enforceability of 309 | the remainder of the terms of this License, and without further action 310 | by the parties to this agreement, such provision shall be reformed to 311 | the minimum extent necessary to make such provision valid and 312 | enforceable. 313 | d. No term or provision of this License shall be deemed waived and no 314 | breach consented to unless such waiver or consent shall be in writing 315 | and signed by the party to be charged with such waiver or consent. 316 | e. This License constitutes the entire agreement between the parties with 317 | respect to the Work licensed here. There are no understandings, 318 | agreements or representations with respect to the Work not specified 319 | here. Licensor shall not be bound by any additional provisions that 320 | may appear in any communication from You. This License may not be 321 | modified without the mutual written agreement of the Licensor and You. 322 | f. The rights granted under, and the subject matter referenced, in this 323 | License were drafted utilizing the terminology of the Berne Convention 324 | for the Protection of Literary and Artistic Works (as amended on 325 | September 28, 1979), the Rome Convention of 1961, the WIPO Copyright 326 | Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996 327 | and the Universal Copyright Convention (as revised on July 24, 1971). 328 | These rights and subject matter take effect in the relevant 329 | jurisdiction in which the License terms are sought to be enforced 330 | according to the corresponding provisions of the implementation of 331 | those treaty provisions in the applicable national law. If the 332 | standard suite of rights granted under applicable copyright law 333 | includes additional rights not granted under this License, such 334 | additional rights are deemed to be included in the License; this 335 | License is not intended to restrict the license of any rights under 336 | applicable law. 337 | 338 | 339 | Creative Commons Notice 340 | 341 | Creative Commons is not a party to this License, and makes no warranty 342 | whatsoever in connection with the Work. Creative Commons will not be 343 | liable to You or any party on any legal theory for any damages 344 | whatsoever, including without limitation any general, special, 345 | incidental or consequential damages arising in connection to this 346 | license. Notwithstanding the foregoing two (2) sentences, if Creative 347 | Commons has expressly identified itself as the Licensor hereunder, it 348 | shall have all rights and obligations of Licensor. 349 | 350 | Except for the limited purpose of indicating to the public that the 351 | Work is licensed under the CCPL, Creative Commons does not authorize 352 | the use by either party of the trademark "Creative Commons" or any 353 | related trademark or logo of Creative Commons without the prior 354 | written consent of Creative Commons. Any permitted use will be in 355 | compliance with Creative Commons' then-current trademark usage 356 | guidelines, as may be published on its website or otherwise made 357 | available upon request from time to time. For the avoidance of doubt, 358 | this trademark restriction does not form part of this License. 359 | 360 | Creative Commons may be contacted at https://creativecommons.org/. 361 | 362 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Feature Engineering for Machine Learning 2 | 3 | [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome) 4 | 5 | A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning 6 | 7 | Maintainers - [Andrei Khobnia](https://github.com/aikho) 8 | 9 | This page is licensed under [Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported License](https://creativecommons.org/licenses/by-nc-sa/3.0/) 10 | 11 | Please feel free to create [pull requests](https://github.com/aikho/awesome-feature-engineering/pulls). 12 | 13 | 14 | ## Contents 15 | 16 | - [Numeric Data](#numeric-data) 17 | - [Scaling](#scaling) 18 | - [Ranking](#ranking) 19 | - [Quantization and Binning](#quantization-and-binning) 20 | - [Box-Cox Transformation](#box-cox-transformation) 21 | - [Yeo-Johnson Transformation](#yeo-johnson-transformation) 22 | - [Feature Interactions](#feature-interactions) 23 | - [Clustering Features](#clustering-features) 24 | - [t-SNE Features](#t-sne-features) 25 | - [PCA Features](#pca-features) 26 | - [Textual Data](#textual-data) 27 | - [Bag of Words](#bag-of-words) 28 | - [Phrase Detection Features](#phrase-detection-features) 29 | - [TFIDF](#tfidf) 30 | - [Word Embeddings](#word-embeddings) 31 | - [Subword Embeddings](#subword-embeddings) 32 | - [Pattern Features](#pattern-features) 33 | - [Lexicon Features](#lexicon-features) 34 | - [PoS Features](#pos-features) 35 | - [Image Data](#image-data) 36 | - [Computer Vision Algorithm Features](#computer-vision-algorithm-features) 37 | - [Image Statistics Features](#image-statistics-features) 38 | - [OCR Features](#ocr-features) 39 | - [Deep Learning Features](#deep-learning-features) 40 | - [Categorical Data](#categorical-data) 41 | - [One Hot Encoding](#one-hot-encoding) 42 | - [Count Encoding](#count-encoding) 43 | - [Label Encoding](#label-encoding) 44 | - [Dummy Encoding](#dummy-encoding) 45 | - [Mean Encoding](#mean-encoding) 46 | - [Hashing](#hashing) 47 | - [Time Series Data](#time-series-data) 48 | - [Rolling Window Features](#rolling-window-features) 49 | - [Lag Features](#lag-features) 50 | - [Geospatial Data](#geospatial-data) 51 | 52 | 53 | ## Numeric Data 54 | * [Understanding Feature Engineering (Part 1) -- Continuous Numeric Data](https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b) 55 | ### Scaling 56 | * [sklearn.preprocessing.MinMaxScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) 57 | * [sklearn.preprocessing.StandartScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) 58 | ### Ranking 59 | * [Ranking](https://en.wikipedia.org/wiki/Ranking) 60 | * [scipy.stats.rankdata](https://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.rankdata.html) 61 | ### Quantization and Binning 62 | * [Data Binning](https://en.wikipedia.org/wiki/Data_binning) 63 | * [Bucketing Continuous Variables in pandas](http://benalexkeen.com/bucketing-continuous-variables-in-pandas/) 64 | * [pandas.cat](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.cut.html) 65 | ### Box-Cox Transformation 66 | * [scipy.stats.boxcox](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.boxcox.html) 67 | * `np.log (x + const)` 68 | ### Yeo-Johnson Transformation 69 | * [Yeo-Johnson Transformation](https://gist.github.com/mesgarpour/f24769cd186e2db853957b10ff6b7a95) 70 | ### Feature Interactions 71 | * [Featuretools](https://docs.featuretools.com/) 72 | * [sklearn.preprocessing.PolynomialFeatures](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html) 73 | * Divisions 74 | * Other interactions 75 | ### Clustering Features 76 | * [How to create New Features using Clustering!!](https://towardsdatascience.com/how-to-create-new-features-using-clustering-4ae772387290) 77 | ### t-SNE Features 78 | * [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) 79 | * [Automatic feature extraction with t-SNE](https://medium.com/jungle-book/automatic-feature-extraction-with-t-sne-62826ce09268) 80 | ### PCA Features 81 | * [Principal component analysis (PCA)](https://en.wikipedia.org/wiki/Principal_component_analysis) 82 | * [sklearn.decomposition.PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) 83 | 84 | 85 | ## Textual Data 86 | * [Understanding Feature Engineering (Part 3) -- Traditional Methods for Text Data](https://towardsdatascience.com/understanding-feature-engineering-part-3-traditional-methods-for-text-data-f6f7d70acd41) 87 | ### Bag of Words 88 | * [Bag-of-words model](https://en.wikipedia.org/wiki/Bag-of-words_model) 89 | * [A Gentle Introduction to the Bag-of-Words Model](https://machinelearningmastery.com/gentle-introduction-bag-words-model/) 90 | * [sklearn.feature_extraction.text.CountVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html) 91 | * [sklearn.feature_extraction.DictVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html) 92 | * [sklearn.feature_extraction.FeatureHasher](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html) 93 | ### Phrase Detection Features 94 | * [sklearn_api.phrases – Scikit learn wrapper for phrase (collocation) detection](https://radimrehurek.com/gensim/models/phrases.html) 95 | ### TFIDF 96 | * [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) 97 | * [sklearn.feature_extraction.text.TfidfVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html) 98 | ### Word Embeddings 99 | * [Word embedding](https://en.wikipedia.org/wiki/Word_embedding) 100 | * [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/projects/glove/) 101 | * [Gensim: models.word2vec – Word2vec embeddings](https://radimrehurek.com/gensim/models/word2vec.html) 102 | * [fastText](https://fasttext.cc/) 103 | * [Word2Vec and FastText Word Embedding with Gensim](https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a209c1d3e12c) 104 | * [Do Pretrained Embeddings Give You The Extra Edge?](https://www.kaggle.com/sbongo/do-pretrained-embeddings-give-you-the-extra-edge) 105 | ### Subword Embeddings 106 | * [Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)](https://github.com/bheinzerling/bpemb) 107 | ### Pattern Features 108 | * [ClearTK - Feature Extraction Tutorial](https://cleartk.github.io/cleartk/docs/tutorial/feature_extraction.html) 109 | * Regular Expressions 110 | ### Lexicon Features 111 | * [Named Entity Recognition with Bidirectional LSTM-CNNs (arXiv:1511.08308)](https://arxiv.org/abs/1511.08308v4) 112 | ### PoS Features 113 | * [Part-of-Speech_Tagging](https://en.wikipedia.org/wiki/Part-of-speech_tagging) 114 | * [NLTK Categorizing and Tagging Words](https://www.nltk.org/book/ch05.html) 115 | * [How to use PoS features in scikit learn classfiers](https://stackoverflow.com/questions/24002485/python-how-to-use-pos-part-of-speech-features-in-scikit-learn-classfiers-svm) 116 | 117 | ## Image Data 118 | ### Computer Vision Algorithm Features 119 | * [Feature extraction and similar image search with OpenCV for newbies](https://medium.com/machine-learning-world/feature-extraction-and-similar-image-search-with-opencv-for-newbies-3c59796bf774) 120 | * [OpenCV -- Feature Detection and Description](https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_feature2d/py_table_of_contents_feature2d/py_table_of_contents_feature2d.html) 121 | * [SimpleCV.Features package](http://simplecv.readthedocs.io/en/latest/SimpleCV.Features.html) 122 | * [Scikit-image feature module](http://scikit-image.org/docs/stable/api/skimage.feature.html) 123 | ### Image Statistics Features 124 | * [ImageStat Module -- Pillow](http://pillow.readthedocs.io/en/3.1.x/reference/ImageStat.html) 125 | ### OCR Features 126 | * [A Python wrapper for Google Tesseract](https://github.com/madmaze/pytesseract) 127 | ### Deep Learning Features 128 | * [Keras pre-trained models feature extraction](https://keras.io/applications/) 129 | * [Using Keras’ Pre-trained Models for Feature Extraction in Image Clustering](https://medium.com/@franky07724_57962/using-keras-pre-trained-models-for-feature-extraction-in-image-clustering-a142c6cdf5b1) 130 | 131 | 132 | ## Categorical Data 133 | * [Understanding Feature Engineering (Part 2) -- Categorical Data](https://towardsdatascience.com/understanding-feature-engineering-part-2-categorical-data-f54324193e63) 134 | ### One Hot Encoding 135 | * [Why One-Hot Encode Data in Machine Learning?](https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/) 136 | * [How to One Hot Encode Sequence Data in Python](https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/) 137 | * [sklearn.preprocessing.OneHotEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html) 138 | * [Keras - to_categorical](https://keras.io/utils/#to_categorical) 139 | ### Count Encoding 140 | * [Feature engineering: Count encoding](https://www.slideshare.net/HJvanVeen/feature-engineering-72376750/11) 141 | ### Label Encoding 142 | * [Label encoding in scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) 143 | * [Feature engineering: Label encoding](https://www.slideshare.net/HJvanVeen/feature-engineering-72376750/9) 144 | ### Dummy Encoding 145 | * [Dummy Coding: The how and why](http://www.statisticssolutions.com/dummy-coding-the-how-and-why/) 146 | * [pandas.get_dummies](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html) 147 | * [One-Hot vs Dummy encoding](https://stats.stackexchange.com/questions/224051/one-hot-vs-dummy-encoding-in-scikit-learn) 148 | ### Mean Encoding 149 | * [Likelihood encoding of categorical features](https://www.kaggle.com/tnarik/likelihood-encoding-of-categorical-features) 150 | * [Python target encoding for categorical features](https://www.kaggle.com/ogrellier/python-target-encoding-for-categorical-features) 151 | * [Adding variance column when mean encoding](https://www.kaggle.com/general/16927#95887) 152 | ### Hashing 153 | * [Feature Hashing on Wikipedia](https://en.wikipedia.org/wiki/Feature_hashing) 154 | * [Feature hashing and Extraction in VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Feature-Hashing-and-Extraction) 155 | * [Feature hashing in scikit-learn](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html) 156 | 157 | 158 | ## Time Series Data 159 | * [Automatic extraction of relevant features from time series](http://tsfresh.readthedocs.io) 160 | * [Basic Feature Engineering With Time Series Data in Python](https://machinelearningmastery.com/basic-feature-engineering-time-series-data-python/) 161 | ### Rolling Window Features 162 | * [pandas.DataFrame.rolling](https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.rolling.html) 163 | ### Lag Features 164 | * [Use pandas to lag your timeseries data in order to examine causal relationships](https://medium.com/@NatalieOlivo/use-pandas-to-lag-your-timeseries-data-in-order-to-examine-causal-relationships-f8186451b3a9) 165 | 166 | 167 | ## Geospatial Data 168 | * [Geospatial Feature Engineering and Visualization](https://www.kaggle.com/camnugent/geospatial-feature-engineering-and-visualization) 169 | * [Intro to Geospatial Data using Python](https://github.com/SocialDataSci/Geospatial_Data_with_Python/blob/master/Intro%20to%20Geospatial%20Data%20with%20Python.ipynb) 170 | 171 | 172 | [Back to Top](#contents) 173 | --------------------------------------------------------------------------------