├── COPYRIGHT ├── Makefile ├── Makefile.win ├── README ├── ffm-predict.cpp ├── ffm-train.cpp ├── ffm.cpp ├── ffm.h ├── timer.cpp └── timer.h /COPYRIGHT: -------------------------------------------------------------------------------- 1 | 2 | Copyright (c) 2017 The LIBFFM Project. 3 | All rights reserved. 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions 7 | are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright 10 | notice, this list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in the 14 | documentation and/or other materials provided with the distribution. 15 | 16 | 3. Neither name of copyright holders nor the names of its contributors 17 | may be used to endorse or promote products derived from this software 18 | without specific prior written permission. 19 | 20 | 21 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 22 | ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 23 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 24 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR 25 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 26 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, 27 | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 28 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 29 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 30 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 31 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CXX = g++ 2 | CXXFLAGS = -Wall -O3 -std=c++0x -march=native 3 | 4 | # comment the following flags if you do not want to SSE instructions 5 | DFLAG += -DUSESSE 6 | 7 | # comment the following flags if you do not want to use OpenMP 8 | DFLAG += -DUSEOMP 9 | CXXFLAGS += -fopenmp 10 | 11 | all: ffm-train ffm-predict 12 | 13 | ffm-train: ffm-train.cpp ffm.o timer.o 14 | $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^ 15 | 16 | ffm-predict: ffm-predict.cpp ffm.o timer.o 17 | $(CXX) $(CXXFLAGS) $(DFLAG) -o $@ $^ 18 | 19 | ffm.o: ffm.cpp ffm.h timer.o 20 | $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $< 21 | 22 | timer.o: timer.cpp timer.h 23 | $(CXX) $(CXXFLAGS) $(DFLAG) -c -o $@ $< 24 | 25 | clean: 26 | rm -f ffm-train ffm-predict ffm.o timer.o 27 | -------------------------------------------------------------------------------- /Makefile.win: -------------------------------------------------------------------------------- 1 | CXX = cl.exe 2 | CFLAGS = /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp 3 | 4 | TARGET = windows 5 | 6 | all: $(TARGET) $(TARGET)\ffm-train.exe $(TARGET)\ffm-predict.exe 7 | 8 | $(TARGET)\ffm-predict.exe: ffm.h ffm-predict.cpp ffm.obj timer.obj 9 | $(CXX) $(CFLAGS) ffm-predict.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-predict.exe 10 | 11 | $(TARGET)\ffm-train.exe: ffm.h ffm-train.cpp ffm.obj timer.obj 12 | $(CXX) $(CFLAGS) ffm-train.cpp ffm.obj timer.obj -Fe$(TARGET)\ffm-train.exe 13 | 14 | ffm.obj: ffm.cpp ffm.h 15 | $(CXX) $(CFLAGS) -c ffm.cpp 16 | 17 | timer.obj: timer.cpp timer.h 18 | $(CXX) $(CFLAGS) -c timer.cpp 19 | 20 | .PHONY: $(TARGET) 21 | $(TARGET): 22 | -mkdir $(TARGET) 23 | 24 | clean: 25 | -erase /Q *.obj *.exe $(TARGET)\. 26 | -rd $(TARGET) 27 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Table of Contents 2 | ================= 3 | 4 | - What is LIBFFM 5 | - Overfitting and Early Stopping 6 | - Installation 7 | - Data Format 8 | - Command Line Usage 9 | - Examples 10 | - OpenMP and SSE 11 | - Building Windows Binaries 12 | - FAQ 13 | 14 | 15 | What is LIBFFM 16 | ============== 17 | 18 | LIBFFM is a library for field-aware factorization machine (FFM). 19 | 20 | Field-aware factorization machine is a effective model for CTR prediction. It has been used to win the top-3 positions 21 | of following competitions: 22 | 23 | * Criteo: https://www.kaggle.com/c/criteo-display-ad-challenge 24 | 25 | * Avazu: https://www.kaggle.com/c/avazu-ctr-prediction 26 | 27 | * Outbrain: https://www.kaggle.com/c/outbrain-click-prediction 28 | 29 | * RecSys 2015: http://dl.acm.org/citation.cfm?id=2813511&dl=ACM&coll=DL&CFID=941880276&CFTOKEN=60022934 30 | 31 | You can find more information about FFM in the following paper / slides: 32 | 33 | * http://www.csie.ntu.edu.tw/~r01922136/slides/ffm.pdf 34 | 35 | * http://www.csie.ntu.edu.tw/~cjlin/papers/ffm.pdf 36 | 37 | * https://arxiv.org/abs/1701.04099 38 | 39 | 40 | Overfitting and Early Stopping 41 | ============================== 42 | 43 | FFM is prone to overfitting, and the solution we have so far is early stopping. See how FFM behaves on a certain data 44 | set: 45 | 46 | > ffm-train -p va.ffm -l 0.00002 tr.ffm 47 | iter tr_logloss va_logloss 48 | 1 0.49738 0.48776 49 | 2 0.47383 0.47995 50 | 3 0.46366 0.47480 51 | 4 0.45561 0.47231 52 | 5 0.44810 0.47034 53 | 6 0.44037 0.47003 54 | 7 0.43239 0.46952 55 | 8 0.42362 0.46999 56 | 9 0.41394 0.47088 57 | 10 0.40326 0.47228 58 | 11 0.39156 0.47435 59 | 12 0.37886 0.47683 60 | 13 0.36522 0.47975 61 | 14 0.35079 0.48321 62 | 15 0.33578 0.48703 63 | 64 | 65 | We see the best validation loss is achieved at 7th iteration. If we keep training, then overfitting begins. It is worth 66 | noting that increasing regularization parameter do not help: 67 | 68 | > ffm-train -p va.ffm -l 0.0002 -t 50 -s 12 tr.ffm 69 | iter tr_logloss va_logloss 70 | 1 0.50532 0.49905 71 | 2 0.48782 0.49242 72 | 3 0.48136 0.48748 73 | ... 74 | 29 0.42183 0.47014 75 | ... 76 | 48 0.37071 0.47333 77 | 49 0.36767 0.47374 78 | 50 0.36472 0.47404 79 | 80 | 81 | To avoid overfitting, we recommend always provide a validation set with option `-p.' You can use option `--auto-stop' to 82 | stop at the iteration that reaches the best validation loss: 83 | 84 | > ffm-train -p va.ffm -l 0.00002 --auto-stop tr.ffm 85 | iter tr_logloss va_logloss 86 | 1 0.49738 0.48776 87 | 2 0.47383 0.47995 88 | 3 0.46366 0.47480 89 | 4 0.45561 0.47231 90 | 5 0.44810 0.47034 91 | 6 0.44037 0.47003 92 | 7 0.43239 0.46952 93 | 8 0.42362 0.46999 94 | Auto-stop. Use model at 7th iteration. 95 | 96 | 97 | Installation 98 | ============ 99 | 100 | Requirement: It requires a C++11 compatible compiler. We also use OpenMP to provide multi-threading. If OpenMP is not 101 | available on your platform, please refer to section `OpenMP and SSE.' 102 | 103 | - Unix-like systems: 104 | Typeype `make' in the command line. 105 | 106 | - Windows: 107 | See `Building Windows Binaries' to compile. 108 | 109 | 110 | 111 | Data Format 112 | =========== 113 | 114 | The data format of LIBFFM is: 115 | 116 |