├── .gitignore ├── Makefile ├── README.md ├── ccdc ├── 2d_array.c ├── 2d_array.h ├── Makefile ├── ccdc.c ├── ccdc.h ├── const.h ├── defines.h ├── gdb.args ├── glmnet5.f ├── input.c ├── input.h ├── misc.c ├── multirobust.c ├── output.h ├── scripts │ ├── cloudCover.pl │ ├── renameMTLfiles.sh │ ├── tileLandsat.sh │ └── tileLandsatParent.sh ├── utilities.c └── utilities.h ├── classification ├── 2d_array.c ├── 2d_array.h ├── Makefile ├── classRF.c ├── classTree.c ├── classification.c ├── classification.h ├── cokus.c ├── get_args.c ├── qsort.c ├── rf.h ├── rfsub.f ├── rfutils.c ├── utilities.c └── utilities.h ├── docker ├── Makefile ├── debian │ ├── Dockerfile │ └── run.py └── ubuntu │ ├── Dockerfile │ └── run.py └── docs ├── CCDC ADD CY5 V1.0.docx ├── ccdc_work_flow.pdf └── flowchart_description.txt /.gitignore: -------------------------------------------------------------------------------- 1 | Y_hat.txt 2 | *.o 3 | *~ 4 | bin/* 5 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | all: 2 | make ccdc 3 | make classification 4 | 5 | ccdc: 6 | cd ccdc && \ 7 | make && \ 8 | make install 9 | 10 | classification: 11 | cd classification && \ 12 | make && \ 13 | make install 14 | 15 | clean: 16 | cd ccdc && make clean 17 | cd classification && make clean 18 | 19 | docker: 20 | cd docker && make 21 | 22 | dockerhub: 23 | cd docker && make publish-docker 24 | 25 | ubuntu-bash: 26 | docker run -i -t --entrypoint=/bin/bash usgseros/ubuntu-c-ccdc -s 27 | 28 | debian-bash: 29 | docker run -i -t --entrypoint=/bin/bash usgseros/debian-c-ccdc -s 30 | 31 | .PHONY: ccdc classification clean docker 32 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## LCMAP Change Detection 2 | 3 | This project contains application source code for Change Detection C library 4 | and related scripts. **This code is deprecated, non-reviewed and non-verified. All future work is being directed toward USGS-EROS/lcmap-pyccd.** 5 | 6 | 7 | ## Installation 8 | 9 | To install, simply run the top-level ``make`` target: 10 | 11 | ```bash 12 | $ git clone git@github.com:USGS-EROS/lcmap-change-detection-c.git 13 | $ cd lcmap-change-detection-c 14 | $ make 15 | ``` 16 | 17 | The executables and scripts will be installed into ``./bin`` by default. This 18 | can be overridden by setting a ``BIN`` environment variable or using a ``BIN`` 19 | variable when running the target: 20 | 21 | ```bash 22 | $ BIN=/my/path/bin make 23 | ``` 24 | 25 | For additional notes, such as installing dependencies (Ubuntu), overriding 26 | ``Makefile`` variables, etc., see: 27 | 28 | * [Building CCDC](../..//wiki/Building-CCDC) 29 | 30 | 31 | ## Usage 32 | 33 | [We're in active development on making this not only work, but be usable. 34 | Ticket #5 has some early usability notes + tasks that we're trying to hit right 35 | away, if you're interested in tracking this.] 36 | 37 | 38 | ## Development 39 | 40 | Development notes for C-CCDC are maintained in the project wiki. For more 41 | details, see: 42 | 43 | * [CCDC Development](../../wiki/CCDC Development) 44 | * [Using CCDC with Docker](../../CCDC-%26-Docker) 45 | 46 | 47 | ## Implementation 48 | 49 | ### CCDC - Continuous Change Detection and Classification (Algorithm) 50 | 51 | * NOTE: This algorithm is not validated and considered prototype. 52 | * See [CCDC ADD](https://landsat.usgs.gov/sites/default/files/documents/ccdc_add.pdf) for the 53 | detailed description. 54 | 55 | 56 | ## More Information 57 | 58 | This project is hosted by the US Geological Survey (USGS) Earth Resources 59 | Observation and Science (EROS) Land Change Monitoring, Assessment, and 60 | Projection ([LCMAP](https://github.com/USGS-EROS?utf8=%E2%9C%93&query=lcmap)) 61 | Project. For questions regarding this source code, please contact the 62 | [Landsat Contact Us](https://landsat.usgs.gov/contactus.php) page and specify 63 | ``USGS LCMAP`` in the "Regarding" section. 64 | -------------------------------------------------------------------------------- /ccdc/2d_array.c: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | 5 | #include "const.h" 6 | #include "2d_array.h" 7 | #include "utilities.h" 8 | #include "defines.h" 9 | 10 | 11 | /* The 2D_ARRAY maintains a 2D array that can be sized at run-time. */ 12 | typedef struct lsrd_2d_array 13 | { 14 | unsigned int signature; /* Signature used to make sure the pointer 15 | math from a row_array_ptr actually gets back to 16 | the expected structure (helps detect errors). */ 17 | int rows; /* Rows in the 2D array */ 18 | int columns; /* Columns in the 2D array */ 19 | int member_size; /* Size of each entry in the array */ 20 | void *data_ptr; /* Pointer to the data storage for the array */ 21 | void **row_array_ptr; /* Pointer to an array of pointers to each row in 22 | the 2D array */ 23 | double memory_block[0]; /* Block of memory for storage of the array. 24 | It is broken into two blocks. The first 'rows * 25 | sizeof(void *)' block stores the pointer the 26 | first column in each of the rows. The remainder 27 | of the block is for storing the actual data. 28 | Note: the type is double to force the worst case 29 | memory alignment on sparc boxes. */ 30 | } LSRD_2D_ARRAY; 31 | 32 | 33 | /************************************************************************* 34 | NAME: allocate_2d_array 35 | 36 | PURPOSE: Allocate memory for 2D array. 37 | 38 | RETURNS: A pointer to a 2D array, or NULL if the routine fails. A pointer 39 | to an array of void pointers to the storage for each row of the 40 | array is returned. The returned pointer must be freed by the 41 | free_2d_array routine. 42 | 43 | HISTORY: 44 | Date Programmer Reason 45 | -------- --------------- ------------------------------------- 46 | 3/15/2013 Song Guo Modified from LDCM IAS library 47 | **************************************************************************/ 48 | void **allocate_2d_array 49 | ( 50 | int rows, /* I: Number of rows for the 2D array */ 51 | int columns, /* I: Number of columns for the 2D array */ 52 | size_t member_size /* I: Size of the 2D array element */ 53 | ) 54 | { 55 | int row; 56 | LSRD_2D_ARRAY *array; 57 | size_t size; 58 | int data_start_index; 59 | 60 | /* Calculate the size needed for the array memory. The size includes the 61 | size of the base structure, an array of pointers to the rows in the 62 | 2D array, an array for the data, and additional space 63 | (2 * sizeof(void*)) to account for different memory alignment rules 64 | on some machine architectures. */ 65 | size = sizeof (*array) + (rows * sizeof (void *)) 66 | + (rows * columns * member_size) + 2 * sizeof (void *); 67 | 68 | /* Allocate the structure */ 69 | array = malloc (size); 70 | if (!array) 71 | { 72 | RETURN_ERROR ("Failure to allocate memory for the array", 73 | "allocate_2d_array", NULL); 74 | } 75 | 76 | /* Initialize the member structures */ 77 | array->signature = SIGNATURE; 78 | array->rows = rows; 79 | array->columns = columns; 80 | array->member_size = member_size; 81 | 82 | /* The array of pointers to rows starts at the beginning of the memory 83 | block */ 84 | array->row_array_ptr = (void **) array->memory_block; 85 | 86 | /* The data starts after the row pointers, with the index adjusted in 87 | case the void pointer and memory block pointers are not the same 88 | size */ 89 | data_start_index = 90 | rows * sizeof (void *) / sizeof (array->memory_block[0]); 91 | if ((rows % 2) == 1) 92 | data_start_index++; 93 | array->data_ptr = &array->memory_block[data_start_index]; 94 | 95 | /* Initialize the row pointers */ 96 | for (row = 0; row < rows; row++) 97 | { 98 | array->row_array_ptr[row] = array->data_ptr 99 | + row * columns * member_size; 100 | } 101 | 102 | return array->row_array_ptr; 103 | } 104 | 105 | 106 | /************************************************************************* 107 | NAME: free_2d_array 108 | 109 | PURPOSE: Free memory for a 2D array allocated by allocate_2d_array 110 | 111 | RETURNS: SUCCESS or FAILURE 112 | 113 | HISTORY: 114 | Date Programmer Reason 115 | -------- --------------- ------------------------------------- 116 | 3/15/2013 Song Guo Modified from LDCM IAS library 117 | **************************************************************************/ 118 | int free_2d_array 119 | ( 120 | void **array_ptr /* I: Pointer returned by the alloc routine */ 121 | ) 122 | { 123 | if (array_ptr != NULL) 124 | { 125 | /* Convert the array_ptr into a pointer to the structure */ 126 | LSRD_2D_ARRAY *array = GET_ARRAY_STRUCTURE_FROM_PTR (array_ptr); 127 | 128 | /* Verify it is a valid 2D array */ 129 | if (array->signature != SIGNATURE) 130 | { 131 | /* Programming error of sort - exit the program */ 132 | RETURN_ERROR ("Invalid signature on 2D array - memory " 133 | "corruption or programming error?", "free_2d_array", 134 | FAILURE); 135 | } 136 | free (array); 137 | } 138 | 139 | return SUCCESS; 140 | } 141 | -------------------------------------------------------------------------------- /ccdc/2d_array.h: -------------------------------------------------------------------------------- 1 | #ifndef MISC_2D_ARRAY_H 2 | #define MISC_2D_ARRAY_H 3 | 4 | 5 | #include 6 | 7 | 8 | void **allocate_2d_array 9 | ( 10 | int rows, /* I: Number of rows for the 2D array */ 11 | int columns, /* I: Number of columns for the 2D array */ 12 | size_t member_size /* I: Size of the 2D array element */ 13 | ); 14 | 15 | 16 | int get_2d_array_size 17 | ( 18 | void **array_ptr, /* I: Pointer returned by the alloc routine */ 19 | int *rows, /* O: Pointer to number of rows */ 20 | int *columns /* O: Pointer to number of columns */ 21 | ); 22 | 23 | 24 | int free_2d_array 25 | ( 26 | void **array_ptr /* I: Pointer returned by the alloc routine */ 27 | ); 28 | 29 | 30 | #endif 31 | -------------------------------------------------------------------------------- /ccdc/Makefile: -------------------------------------------------------------------------------- 1 | # Configuration 2 | SRC_DIR = . 3 | SCRIPTS = ./scripts 4 | BIN ?= ../bin 5 | XML2INC ?= /usr/include/libxml2/libxml 6 | ESPAINC ?= 7 | GSL_SCI_INC ?= /usr/include/gsl 8 | GSL_SCI_LIB ?= /usr/lib 9 | 10 | # Set up compile options 11 | CC = gcc 12 | FORTRAN = gfortran 13 | RM = rm -f 14 | MV = mv 15 | EXTRA = -Wall -Wextra -g 16 | FFLAGS=-g -fdefault-real-8 17 | 18 | # Define the include files 19 | INC = $(wildcard $(SRC_DIR)/*.h) 20 | INCDIR = -I. -I$(SRC_DIR) -I$(GSL_SCI_INC) -I$(XML2INC) -I$(ESPAINC) -I$(GSL_SCI_INC) 21 | NCFLAGS = $(EXTRA) $(INCDIR) 22 | 23 | # Define the source code and object files 24 | #SRC = input.c 2d_array.c ccdc.c utilities.c misc.c 25 | SRC = $(wildcard $(SRC_DIR)/*.c) 26 | OBJ = $(SRC:.c=.o) 27 | 28 | # Define the object libraries 29 | LIB = -L$(GSL_SCI_LIB) -lz -lpthread -lrt -lgsl -lgslcblas -lgfortran -lm 30 | 31 | # Define the executable 32 | EXE = ccdc 33 | 34 | # Target for the executable 35 | all: $(EXE) 36 | 37 | ccdc: $(OBJ) glmnet5 $(INC) 38 | $(CC) $(NCFLAGS) -o ccdc $(OBJ) glmnet5.o $(LIB) 39 | 40 | glmnet5: $(SRC) glmnet5.f 41 | $(FORTRAN) $(FFLAGS) -c glmnet5.f -o glmnet5.o 42 | 43 | 44 | $(BIN): 45 | mkdir -p $(BIN) 46 | 47 | install: $(BIN) 48 | mv $(EXE) $(BIN) 49 | cp $(SCRIPTS)/* $(BIN) 50 | 51 | clean: 52 | $(RM) $(BIN)/$(EXE) 53 | $(RM) $(BIN)/*.r 54 | $(RM) *.o 55 | 56 | $(OBJ): $(INC) 57 | 58 | .c.o: 59 | $(CC) $(NCFLAGS) $(INCDIR) -c $< 60 | 61 | -------------------------------------------------------------------------------- /ccdc/ccdc.h: -------------------------------------------------------------------------------- 1 | #ifndef CCDC_H 2 | #define CCDC_H 3 | 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | 12 | #include "input.h" 13 | 14 | int get_args 15 | ( 16 | int argc, /* I: number of cmd-line args */ 17 | char *argv[], /* I: string of cmd-line args */ 18 | int *row, /* O: row number for the pixel */ 19 | int *col, /* O: col number for the pixel */ 20 | char *in_path, /* O: directory location of input data */ 21 | char *out_path, /* O: direcotry location of output files */ 22 | char *data_type, /* O: data type: tifs, bip, stdin, bip_lines. */ 23 | char *scene_list_file, /* O: optional file name of list of sceneIDs */ 24 | bool *verbose /* O: verbose flag */ 25 | ); 26 | 27 | void get_scenename 28 | ( 29 | const char *filename, /* I: Name of file to split */ 30 | char *directory, /* O: Directory portion of file name */ 31 | char *scene_name, /* O: Scene name portion of the file name */ 32 | char *appendix /* O: Appendix portion of the file name */ 33 | ); 34 | 35 | int create_scene_list 36 | ( 37 | const char *item, /* I: string of file items be found */ 38 | int *num_scenes, /* I/O: number of scenes */ 39 | char *sceneListFileName /* I: file name containing list of scene IDs */ 40 | ); 41 | 42 | int convert_year_doy_to_jday_from_0000 43 | ( 44 | int year, /* I: year */ 45 | int doy, /* I: day of the year */ 46 | int *jday /* O: julian date since year 0000 */ 47 | ); 48 | 49 | int sort_scene_based_on_year_doy_row 50 | ( 51 | char **scene_list, /* I/O: scene_list, sorted as output */ 52 | int num_scenes, /* I: number of scenes in the scene list */ 53 | int *sdate /* O: year plus date since 0000 */ 54 | ); 55 | 56 | void quick_sort_2d_float 57 | ( 58 | float arr[], 59 | float *brr[], 60 | int left, 61 | int right 62 | ); 63 | 64 | void update_cft 65 | ( 66 | int i_span, 67 | int n_times, 68 | int min_num_c, 69 | int mid_num_c, 70 | int max_num_c, 71 | int num_c, 72 | int *update_number_c 73 | ); 74 | 75 | int median_variogram 76 | ( 77 | float **array, /* I: input array */ 78 | int dim1_len, /* I: dimension 1 length in input array */ 79 | int dim2_start, /* I: dimension 2 start index */ 80 | int dim2_end, /* I: dimension 2 end index */ 81 | float *output_array /* O: output array */ 82 | ); 83 | 84 | void split_directory_scenename 85 | ( 86 | const char *filename, /* I: Name of scene with path to split */ 87 | char *directory, /* O: Directory portion of file name */ 88 | char *scene_name /* O: Scene name portion of the file name */ 89 | ); 90 | 91 | void rmse_from_square_root_mean 92 | ( 93 | float **array, /* I: input array */ 94 | float fit_cft, /* I: input fit_cft value */ 95 | int dim1_index, /* I: dimension 1 index in input array */ 96 | int dim2_len, /* I: dimension 2 length */ 97 | float *rmse /* O: output rmse */ 98 | ); 99 | 100 | void partial_square_root_mean 101 | ( 102 | float **array, /* I: input array */ 103 | int dim1_index, /* I: 1st dimension index */ 104 | int start, /* I: number of start elements in 1st dim */ 105 | int end, /* I: number of end elements in 1st dim */ 106 | float **fit_ctf, /* I: */ 107 | float *rmse /* O: output rmse value */ 108 | ); 109 | 110 | void matlab_2d_array_mean 111 | ( 112 | float **array, /* I: input array */ 113 | int dim1_index, /* I: 1st dimension index */ 114 | int dim2_len, /* I: number of input elements in 2nd dim */ 115 | float *output_mean /* O: output norm value */ 116 | ); 117 | 118 | void matlab_2d_float_median 119 | ( 120 | float **array, /* I: input array */ 121 | int dim1_index, /* I: 1st dimension index */ 122 | int dim2_len, /* I: number of input elements in 2nd dim */ 123 | float *output_median /* O: output norm value */ 124 | ); 125 | 126 | void matlab_2d_partial_mean 127 | ( 128 | float **array, /* I: input array */ 129 | int dim1_index, /* I: 1st dimension index */ 130 | int start, /* I: number of start elements in 2nd dim */ 131 | int end, /* I: number of end elements in 2nd dim */ 132 | float *output_mean /* O: output norm value */ 133 | ); 134 | 135 | void matlab_float_2d_partial_median 136 | ( 137 | float **array, /* I: input array */ 138 | int dim1_index, /* I: 1st dimension index */ 139 | int start, /* I: number of start elements in 2nd dim */ 140 | int end, /* I: number of end elements in 2nd dim */ 141 | float *output_median /* O: output norm value */ 142 | ); 143 | 144 | void matlab_2d_partial_square_mean 145 | ( 146 | float **array, /* I: input array */ 147 | int dim1_index, /* I: 1st dimension index */ 148 | int start, /* I: number of start elements in 2nd dim */ 149 | int end, /* I: number of end elements in 2nd dim */ 150 | float *output_mean /* O: output norm value */ 151 | ); 152 | 153 | void matlab_2d_array_norm 154 | ( 155 | float **array, /* I: input array */ 156 | int dim1_index, /* I: 1st dimension index */ 157 | int dim2_len, /* I: number of input elements in 2nd dim */ 158 | float *output_norm /* O: output norm value */ 159 | ); 160 | 161 | void get_ids_length 162 | ( 163 | int *id_array, /* I: input array */ 164 | int start, /* I: array start index */ 165 | int end, /* I: array end index */ 166 | int *id_len /* O: number of non-zero number in the array */ 167 | ); 168 | 169 | void matlab_unique 170 | ( 171 | int *clrx, 172 | float **clry, 173 | int nums, 174 | int *new_nums 175 | ); 176 | 177 | int auto_mask 178 | ( 179 | int *clrx, 180 | float **clry, 181 | int start, 182 | int end, 183 | float years, 184 | float t_b1, 185 | float t_b2, 186 | float n_t, 187 | int *bl_ids 188 | ); 189 | 190 | int auto_ts_fit 191 | ( 192 | int *clrx, 193 | float **clry, 194 | int band_index, 195 | int start, 196 | int end, 197 | int df, 198 | float **coefs, 199 | float *rmse, 200 | float **v_dif 201 | ); 202 | 203 | int auto_ts_predict 204 | ( 205 | int *clrx, 206 | float **coefs, 207 | int df, 208 | int band_index, 209 | int start, 210 | int end, 211 | float *pred_y 212 | ); 213 | 214 | extern void elnet_( 215 | 216 | // input: 217 | 218 | int *ka, // ka = algorithm flag 219 | // ka=1 => covariance updating algorithm 220 | // ka=2 => naive algorithm 221 | double *parm, // parm = penalty member index (0 <= parm <= 1) 222 | // = 0.0 => ridge 223 | // = 1.0 => lasso 224 | int *no, // no = number of observations 225 | int *ni, // ni = number of predictor variables 226 | double *x, // x[ni][no] = predictor data matrix flat file (overwritten) 227 | double *y, // y[no] = response vector (overwritten) 228 | double *w, // w[no]= observation weights (overwritten) 229 | int *jd, // jd(jd(1)+1) = predictor variable deletion flag 230 | // jd(1) = 0 => use all variables 231 | // jd(1) != 0 => do not use variables jd(2)...jd(jd(1)+1) 232 | double *vp, // vp(ni) = relative penalties for each predictor variable 233 | // vp(j) = 0 => jth variable unpenalized 234 | double cl[][2], // cl(2,ni) = interval constraints on coefficient values (overwritten) 235 | // cl(1,j) = lower bound for jth coefficient value (<= 0.0) 236 | // cl(2,j) = upper bound for jth coefficient value (>= 0.0) 237 | int *ne, // ne = maximum number of variables allowed to enter largest model 238 | // (stopping criterion) 239 | int *nx, // nx = maximum number of variables allowed to enter all models 240 | // along path (memory allocation, nx > ne). 241 | int *nlam, // nlam = (maximum) number of lamda values 242 | double *flmin, // flmin = user control of lamda values (>=0) 243 | // flmin < 1.0 => minimum lamda = flmin*(largest lamda value) 244 | // flmin >= 1.0 => use supplied lamda values (see below) 245 | double *ulam, // ulam(nlam) = user supplied lamda values (ignored if flmin < 1.0) 246 | double *thr, // thr = convergence threshold for each lamda solution. 247 | // iterations stop when the maximum reduction in the criterion value 248 | // as a result of each parameter update over a single pass 249 | // is less than thr times the null criterion value. 250 | // (suggested value, thr=1.0e-5) 251 | int *isd, // isd = predictor variable standarization flag: 252 | // isd = 0 => regression on original predictor variables 253 | // isd = 1 => regression on standardized predictor variables 254 | // Note: output solutions always reference original 255 | // variables locations and scales. 256 | int *intr, // intr = intercept flag 257 | // intr = 0/1 => don't/do include intercept in model 258 | int *maxit, // maxit = maximum allowed number of passes over the data for all lambda 259 | // values (suggested values, maxit = 100000) 260 | 261 | // output: 262 | 263 | int *lmu, // lmu = actual number of lamda values (solutions) 264 | double *a0, // a0(lmu) = intercept values for each solution 265 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution 266 | int *ia, // ia(nx) = pointers to compressed coefficients 267 | int *nin, // nin(lmu) = number of compressed coefficients for each solution 268 | double *rsq, // rsq(lmu) = R**2 values for each solution 269 | double *alm, // alm(lmu) = lamda values corresponding to each solution 270 | int *nlp, // nlp = actual number of passes over the data for all lamda values 271 | int *jerr // jerr = error flag: 272 | // jerr = 0 => no error 273 | // jerr > 0 => fatal error - no output returned 274 | // jerr < 7777 => memory allocation error 275 | // jerr = 7777 => all used predictors have zero variance 276 | // jerr = 10000 => maxval(vp) <= 0.0 277 | // jerr < 0 => non fatal error - partial output: 278 | // Solutions for larger lamdas (1:(k-1)) returned. 279 | // jerr = -k => convergence for kth lamda value not reached 280 | // after maxit (see above) iterations. 281 | // jerr = -10000-k => number of non zero coefficients along path 282 | // exceeds nx (see above) at kth lamda value. 283 | ); 284 | 285 | extern void spelnet_( 286 | 287 | // input: 288 | 289 | int *ka, // ka = algorithm flag 290 | // ka=1 => covariance updating algorithm 291 | // ka=2 => naive algorithm 292 | double *parm, // parm = penalty member index (0 <= parm <= 1) 293 | // = 0.0 => ridge 294 | // = 1.0 => lasso 295 | int *no, // no = number of observations 296 | int *ni, // ni = number of predictor variables 297 | double *x, // x[ni][no] = predictor data matrix flat file (overwritten) 298 | double *y, // y[no] = response vector (overwritten) 299 | double *w, // w[no]= observation weights (overwritten) 300 | int *jd, // jd(jd(1)+1) = predictor variable deletion flag 301 | // jd(1) = 0 => use all variables 302 | // jd(1) != 0 => do not use variables jd(2)...jd(jd(1)+1) 303 | double *vp, // vp(ni) = relative penalties for each predictor variable 304 | // vp(j) = 0 => jth variable unpenalized 305 | double cl[][2], // cl(2,ni) = interval constraints on coefficient values (overwritten) 306 | // cl(1,j) = lower bound for jth coefficient value (<= 0.0) 307 | // cl(2,j) = upper bound for jth coefficient value (>= 0.0) 308 | int *ne, // ne = maximum number of variables allowed to enter largest model 309 | // (stopping criterion) 310 | int *nx, // nx = maximum number of variables allowed to enter all models 311 | // along path (memory allocation, nx > ne). 312 | int *nlam, // nlam = (maximum) number of lamda values 313 | double *flmin, // flmin = user control of lamda values (>=0) 314 | // flmin < 1.0 => minimum lamda = flmin*(largest lamda value) 315 | // flmin >= 1.0 => use supplied lamda values (see below) 316 | double *ulam, // ulam(nlam) = user supplied lamda values (ignored if flmin < 1.0) 317 | double *thr, // thr = convergence threshold for each lamda solution. 318 | // iterations stop when the maximum reduction in the criterion value 319 | // as a result of each parameter update over a single pass 320 | // is less than thr times the null criterion value. 321 | // (suggested value, thr=1.0e-5) 322 | int *isd, // isd = predictor variable standarization flag: 323 | // isd = 0 => regression on original predictor variables 324 | // isd = 1 => regression on standardized predictor variables 325 | // Note: output solutions always reference original 326 | // variables locations and scales. 327 | int *intr, // intr = intercept flag 328 | // intr = 0/1 => don't/do include intercept in model 329 | int *maxit, // maxit = maximum allowed number of passes over the data for all lambda 330 | // values (suggested values, maxit = 100000) 331 | 332 | // output: 333 | 334 | int *lmu, // lmu = actual number of lamda values (solutions) 335 | double *a0, // a0(lmu) = intercept values for each solution 336 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution 337 | int *ia, // ia(nx) = pointers to compressed coefficients 338 | int *nin, // nin(lmu) = number of compressed coefficients for each solution 339 | double *rsq, // rsq(lmu) = R**2 values for each solution 340 | double *alm, // alm(lmu) = lamda values corresponding to each solution 341 | int *nlp, // nlp = actual number of passes over the data for all lamda values 342 | int *jerr // jerr = error flag: 343 | // jerr = 0 => no error 344 | // jerr > 0 => fatal error - no output returned 345 | // jerr < 7777 => memory allocation error 346 | // jerr = 7777 => all used predictors have zero variance 347 | // jerr = 10000 => maxval(vp) <= 0.0 348 | // jerr < 0 => non fatal error - partial output: 349 | // Solutions for larger lamdas (1:(k-1)) returned. 350 | // jerr = -k => convergence for kth lamda value not reached 351 | // after maxit (see above) iterations. 352 | // jerr = -10000-k => number of non zero coefficients along path 353 | // exceeds nx (see above) at kth lamda value. 354 | ); 355 | 356 | /*-------------------------------------------------------------------- 357 | c uncompress coefficient vectors for all solutions: 358 | c 359 | c call solns(ni,nx,lmu,ca,ia,nin,b) 360 | c 361 | c input: 362 | c 363 | c ni,nx = input to elnet 364 | c lmu,ca,ia,nin = output from elnet 365 | c 366 | c output: 367 | c 368 | c b(ni,lmu) = all elnet returned solutions in uncompressed format 369 | ----------------------------------------------------------------------*/ 370 | extern int solns_( 371 | int *ni, // ni = number of predictor variables 372 | int *nx, // nx = maximum number of variables allowed to enter all models 373 | int *lmu, // lmu = actual number of lamda values (solutions) 374 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution 375 | int *ia, // ia(nx) = pointers to compressed coefficients 376 | int *nin, // nin(lmu) = number of compressed coefficients for each solution 377 | double *b // b(ni,lmu) = compressed coefficient values for each solution 378 | ); 379 | 380 | extern int c_glmnet( 381 | int no, // number of observations (no) 382 | int ni, // number of predictor variables (ni) 383 | double *x, // input matrix, x[ni][no] 384 | double *y, // response vaiable, of dimentions (no) 385 | int nlam, // number of lambda values 386 | double *ulam, // value of lambda values, of dimentions (nlam) 387 | double parm, // the alpha variable 388 | 389 | int *lmu, // lmu = actual number of lamda values (solutions) 390 | double cfs[nlam][ni+1] // results = cfs[lmu][ni + 1] 391 | ); 392 | 393 | #endif /* CCDC_H */ 394 | -------------------------------------------------------------------------------- /ccdc/const.h: -------------------------------------------------------------------------------- 1 | #ifndef CONST_H 2 | #define CONST_H 3 | 4 | 5 | #include 6 | 7 | typedef signed short int16; 8 | typedef unsigned char uint8; 9 | typedef signed char int8; 10 | 11 | #ifndef min 12 | #define min(a,b) (((a) < (b)) ? (a) : (b)) 13 | #endif 14 | 15 | 16 | #ifndef max 17 | #define max(a,b) (((a) > (b)) ? (a) : (b)) 18 | #endif 19 | 20 | #define TOTAL_BANDS 8 21 | 22 | #define PI 3.1415926535897935 23 | #define TWO_PI (2.0 * PI) 24 | #define HALF_PI (PI / 2.0) 25 | 26 | 27 | #define DEG (180.0 / PI) 28 | #define RAD (PI / 180.0) 29 | 30 | 31 | #ifndef SUCCESS 32 | #define SUCCESS 0 33 | #endif 34 | 35 | #ifndef ERROR 36 | #define ERROR -1 37 | #endif 38 | 39 | 40 | #ifndef FAILURE 41 | #define FAILURE 1 42 | #endif 43 | 44 | #ifndef TRUE 45 | #define TRUE 1 46 | #endif 47 | 48 | #ifndef FALSE 49 | #define FALSE 0 50 | #endif 51 | 52 | 53 | #define MINSIGMA 1e-5 54 | 55 | #define MAX_STR_LEN 512 56 | #define MAX_SCENE_LIST 3922 57 | 58 | #endif 59 | -------------------------------------------------------------------------------- /ccdc/defines.h: -------------------------------------------------------------------------------- 1 | /* this is an effort to consolidate all defines. Previously, they */ 2 | /* scattered throughout .c and/or .h files, and not always included */ 3 | /* and/or avilable everywhere or where needed. Also, some reduncancy */ 4 | /* and conflicts existed. */ 5 | 6 | /* from ccdc.c */ 7 | #define NUM_LASSO_BANDS 5 /* Number of bands for Least Absolute Shrinkage */ 8 | /* and Selection Operator LASSO regressions */ 9 | #define TOTAL_IMAGE_BANDS 7 /* Number of image bands, for loops. */ 10 | #define TOTAL_BANDS 8 /* Total image plus mask bands, for loops. */ 11 | #define MIN_NUM_C 4 /* Minimum number of coefficients */ 12 | #define MID_NUM_C 6 /* Mid-point number of coefficients */ 13 | #define MAX_NUM_C 8 /* Maximum number of coefficients */ 14 | #define CONSE 6 /* No. of CONSEquential pixels 4 bldg. model*/ 15 | #define N_TIMES 3 /* number of clear observations/coefficients*/ 16 | #define NUM_YEARS 365.25 /* average number of days per year */ 17 | #define NUM_FC 10 /* Values change with number of pixels run */ 18 | #define T_CONST 4.89 /* Threshold for cloud, shadow, and snow detection */ 19 | #define MIN_YEARS 1 /* minimum year for model intialization */ 20 | #define T_SN 0.75 /* no change detection for permanent snow pixels */ 21 | #define T_CLR 0.25 /* Fmask fails threshold */ 22 | #define T_CG 15.0863 /* chi-square inversed T_cg (0.99) for noise removal */ 23 | #define T_MAX_CG 35.8882 /* chi-square inversed T_max_cg (1e-6) for 24 | last step noise removal */ 25 | 26 | 27 | /* from 2darray.c */ 28 | /* Define a unique (i.e. random) value that can be used to verify a pointer 29 | points to an LSRD_2D_ARRAY. This is used to verify the operation succeeds to 30 | get an LSRD_2D_ARRAY pointer from a row pointer. */ 31 | #define SIGNATURE 0x326589ab 32 | 33 | /* Given an address returned by the allocate routine, get a pointer to the 34 | entire structure. */ 35 | #define GET_ARRAY_STRUCTURE_FROM_PTR(ptr) \ 36 | ((LSRD_2D_ARRAY *)((char *)(ptr) - offsetof(LSRD_2D_ARRAY, memory_block))) 37 | 38 | 39 | /* from input.c */ 40 | //#define TOTAL_IMAGE_BANDS 7 41 | 42 | /* from misc.c */ 43 | /* 12-31-1972 is 720624 in julian day since year 0000 */ 44 | #define JULIAN_DATE_LAST_DAY_1972 720624 45 | #define LANDSAT_START_YEAR 1973 46 | #define LEAP_YEAR_DAYS 366 47 | #define NON_LEAP_YEAR_DAYS 365 48 | #define AVE_DAYS_IN_A_YEAR 365.25 49 | #define ROBUST_COEFFS 5 50 | #define LASSO_COEFFS 8 51 | //#define TOTAL_IMAGE_BANDS 7 52 | 53 | /* from input.h */ 54 | /* possible cfmask values */ 55 | #define CFMASK_CLEAR 0 56 | #define CFMASK_WATER 1 57 | #define CFMASK_SHADOW 2 58 | #define CFMASK_SNOW 3 59 | #define CFMASK_CLOUD 4 60 | #define CFMASK_FILL 255 61 | #define IMAGE_FILL -9999 62 | #define CFMASK_BAND 7 63 | 64 | /* from output.h */ 65 | #define FILL_VALUE 255 66 | #define NUM_COEFFS 8 67 | #define NUM_BANDS 7 68 | -------------------------------------------------------------------------------- /ccdc/gdb.args: -------------------------------------------------------------------------------- 1 | run --row=23 --col=3022 --inDir=/shared/users/bdavis/ARD_out/grid07/C --outDir=/shared/bdavis/grid07/C_out/all --sceneList=/shared/users/bdavis/ARD_out/grid07/all.txt --verbose 2 | 3 | inDir = /shared/users/bdavis/ARD_out/grid07/C 4 | outDir = /shared/bdavis/grid07/C_out/all 5 | sceneList = /shared/users/bdavis/ARD_out/grid07/0001-0064.txt 6 | verbose = 1 7 | 8 | -------------------------------------------------------------------------------- /ccdc/input.h: -------------------------------------------------------------------------------- 1 | #ifndef INPUT_H 2 | #define INPUT_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include "const.h" 10 | 11 | /* possible cfmask values */ 12 | #include "defines.h" 13 | 14 | /* Input file type definition */ 15 | typedef enum { 16 | INPUT_TYPE_NULL = -1, 17 | INPUT_TYPE_BINARY = 0, 18 | INPUT_TYPE_MAX 19 | } Input_type_t; 20 | 21 | /* Structure for the metadata */ 22 | typedef struct { 23 | int lines; /* number of lines in a scene */ 24 | int samples; /* number of samples in a scene */ 25 | int data_type; /* envi data type */ 26 | int byte_order; /* envi byte order */ 27 | int utm_zone; /* UTM zone; use a negative number if this is a 28 | southern zone */ 29 | int pixel_size; /* pixel size */ 30 | char interleave[MAX_STR_LEN]; /* envi save format */ 31 | int upper_left_x; /* upper left x coordinates */ 32 | int upper_left_y; /* upper left y coordinates */ 33 | } Input_meta_t; 34 | 35 | /* Structure for the 'input' data type */ 36 | typedef struct { 37 | Input_type_t file_type; /* Type of the input image files */ 38 | Input_meta_t meta; /* Input metadata */ 39 | FILE *fp_bin[TOTAL_BANDS][MAX_SCENE_LIST]; 40 | } Input_t; 41 | 42 | /* Prototypes */ 43 | FILE *open_raw_binary 44 | ( 45 | char *infile, /* I: name of the input file to be opened */ 46 | char *access_type /* I: string for the access type for reading the 47 | input file */ 48 | ); 49 | 50 | void close_raw_binary 51 | ( 52 | FILE *fptr /* I: pointer to raw binary file to be closed */ 53 | ); 54 | 55 | int write_raw_binary 56 | ( 57 | FILE *rb_fptr, /* I: pointer to the raw binary file */ 58 | int nlines, /* I: number of lines to write to the file */ 59 | int nsamps, /* I: number of samples to write to the file */ 60 | int size, /* I: number of bytes per pixel (ex. sizeof(uint8)) */ 61 | void *img_array /* I: array of nlines * nsamps * size to be written 62 | to the raw binary file */ 63 | ); 64 | 65 | int read_raw_binary 66 | ( 67 | FILE *rb_fptr, /* I: pointer to the raw binary file */ 68 | int nlines, /* I: number of lines to read from the file */ 69 | int nsamps, /* I: number of samples to read from the file */ 70 | int size, /* I: number of bytes per pixel (ex. sizeof(uint8)) */ 71 | void *img_array /* O: array of nlines * nsamps * size to be read from 72 | the raw binary file (sufficient space should 73 | already have been allocated) */ 74 | ); 75 | 76 | int read_envi_header 77 | ( 78 | char *data_type, /* I: input data type */ 79 | char *scene_name, /* I: scene name */ 80 | Input_meta_t *meta /* O: saved header file info */ 81 | ); 82 | 83 | int read_cfmask 84 | ( 85 | int curr_scene_num, /* I: current num. in list of scenes to read */ 86 | char *data_type, /* I: type of flies, tifs or single BIP */ 87 | char **scene_list, /* I: current scene name in list of sceneIDs */ 88 | int row, /* I: the row (Y) location within img/grid */ 89 | int col, /* I: the col (X) location within img/grid */ 90 | int num_samples, /* I: number of image samples (X width) */ 91 | FILE ***fp_tifs, /* I/O: file ptr array for tif band file names */ 92 | FILE **fp_bip, /* I/O: file pointer array for BIP file names */ 93 | unsigned char *fmask_buf,/* O: pointer to cfmask band values */ 94 | /* I/O: Worldwide Reference System path and row for */ 95 | /* I/O: the current swath, this group of variables */ 96 | /* I/O: is for filtering out swath overlap, and */ 97 | int *prev_wrs_path, /* I/O: using the first of two scenes in a swath, */ 98 | int *prev_wrs_row, /* I/O: , because it is recommended to use the meta */ 99 | int *prev_year, /* I/O: data from the first for things like sun */ 100 | int *prev_jday, /* I/O: angle, etc. However, always removing a */ 101 | unsigned char *prev_fmask_buf,/* I/O: redundant x/y location specified */ 102 | int *valid_scene_count,/* I/O: x/y is not always valid for gridded data, */ 103 | int *swath_overlap_count,/* I/O: it may/may not be in swap overlap area. */ 104 | char **valid_scene_list,/* I/O: 2-D array for list of filtered */ 105 | int *clear_sum, /* I/O: Total number of clear cfmask pixels */ 106 | int *water_sum, /* I/O: counter for cfmask water pixels. */ 107 | int *shadow_sum, /* I/O: counter for cfmask shadow pixels. */ 108 | int *sn_sum, /* I/O: Total number of snow cfmask pixels */ 109 | int *cloud_sum, /* I/O: counter for cfmask cloud pixels. */ 110 | int *fill_sum, /* I/O: counter for cfmask fill pixels. */ 111 | int *all_sum, /* I/O: Total of all cfmask pixels */ 112 | unsigned char *updated_fmask_buf, /* I/O: new entry in valid fmask values */ 113 | int *updated_sdate_array, /* I/O: new buf of valid date values */ 114 | int *sdate, /* I: Original array of julian date values */ 115 | int *valid_num_scenes/* I/O: number of valid scenes after reading cfmask */ 116 | ); 117 | 118 | 119 | int read_stdin 120 | ( 121 | int *updated_sdate_array, /* pointer to date values buffer. */ 122 | int **buf, /* pointer to image bands buffer. */ 123 | unsigned char *updated_cfmask_buf, /* pointer to cfmask pixel buffer.*/ 124 | int num_bands, /* total number of bands. */ 125 | int *clear_sum, /* accumulator for clear pixels. */ 126 | int *water_sum, /* accumulator for water pixels. */ 127 | int *shadow_sum, /* accumulator for shadow pixels. */ 128 | int *snow_sum, /* accumulator for snow pixels. */ 129 | int *cloud_sum, /* accumulator for cloud pixels. */ 130 | int *fill_sum, /* accumulator for fill pixels. */ 131 | int *all_sum, /* accumulator for all pixels. */ 132 | int *valid_num_scenes, /* total scenes read. */ 133 | bool debug /* flag for printing mesgs/info. */ 134 | ); 135 | 136 | 137 | int assign_cfmask_values 138 | ( 139 | unsigned char cfmask_value, /* I: current cfmask pixel value. */ 140 | int *clear_sum, /* O: accumulator for clear pixels */ 141 | int *water_sum, /* O: accumulator for water pixels */ 142 | int *shadow_sum, /* O: accumulator for shadow pixels */ 143 | int *snow_sum, /* O: accumulator for snow pixels */ 144 | int *cloud_sum, /* O: accumulator for cloud pixels */ 145 | int *fill_sum, /* O: accumulator for full pixels */ 146 | int *all_sum /* O: accumulator for all pixels */ 147 | ); 148 | 149 | 150 | int read_tifs 151 | ( 152 | char *sceneID_name, /* I: current file name in list of sceneIDs */ 153 | FILE ***fp_tifs, /* I/O: file pointer array for band file names */ 154 | int curr_scene_num, /* I: current num. in list of scenes to read */ 155 | int row, /* I: the row (Y) location within img/grid */ 156 | int col, /* I: the col (X) location within img/grid */ 157 | int num_samples, /* I: number of image samples (X width) */ 158 | bool debug, /* I: flag for printing debug messages */ 159 | int **image_buf /* O: pointer to 2-D image band values array */ 160 | ); 161 | 162 | 163 | int read_bip 164 | ( 165 | char *current_scene_name, /* I: current file name in list of sceneIDs */ 166 | FILE **fp_bip, /* I/O: file pointer array for BIP file names */ 167 | int curr_scene_num, /* I: current num. in list of scenes to read */ 168 | int row, /* I: the row (Y) location within img/grid */ 169 | int col, /* I: the col (X) location within img/grid */ 170 | int num_samples, /* I: number of image samples (X width) */ 171 | int **image_buf /* O: pointer to 2-D image band values array */ 172 | ); 173 | 174 | 175 | void usage (); 176 | 177 | #endif 178 | -------------------------------------------------------------------------------- /ccdc/multirobust.c: -------------------------------------------------------------------------------- 1 | /* multirobust.c 2 | * 3 | * Copyright (C) 2013 Patrick Alken 4 | * 5 | * This program is free software; you can redistribute it and/or modify 6 | * it under the terms of the GNU General Public License as published by 7 | * the Free Software Foundation; either version 3 of the License, or (at 8 | * your option) any later version. 9 | * 10 | * This program is distributed in the hope that it will be useful, but 11 | * WITHOUT ANY WARRANTY; without even the implied warranty of 12 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 13 | * General Public License for more details. 14 | * 15 | * You should have received a copy of the GNU General Public License 16 | * along with this program; if not, write to the Free Software 17 | * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. 18 | * 19 | * This module contains routines related to robust linear least squares. The 20 | * algorithm used closely follows the publications: 21 | * 22 | * [1] DuMouchel, W. and F. O'Brien (1989), "Integrating a robust 23 | * option into a multiple regression computing environment," 24 | * Computer Science and Statistics: Proceedings of the 21st 25 | * Symposium on the Interface, American Statistical Association 26 | * 27 | * [2] Street, J.O., R.J. Carroll, and D. Ruppert (1988), "A note on 28 | * computing robust regression estimates via iteratively 29 | * reweighted least squares," The American Statistician, v. 42, 30 | * pp. 152-154. 31 | */ 32 | 33 | //#include 34 | #include 35 | #include 36 | #include 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | 44 | static int robust_test_convergence(const gsl_vector *c_prev, const gsl_vector *c, 45 | const double tol); 46 | static double robust_madsigma(const gsl_vector *x, gsl_multifit_robust_workspace *w); 47 | static double robust_robsigma(const gsl_vector *r, const double s, 48 | const double tune, gsl_multifit_robust_workspace *w); 49 | static double robust_sigma(const double s_ols, const double s_rob, 50 | gsl_multifit_robust_workspace *w); 51 | static int robust_covariance(const double sigma, gsl_matrix *cov, 52 | gsl_multifit_robust_workspace *w); 53 | 54 | /* 55 | gsl_multifit_robust_alloc 56 | Allocate a robust workspace 57 | 58 | Inputs: T - robust weighting algorithm 59 | n - number of observations 60 | p - number of model parameters 61 | 62 | Return: pointer to workspace 63 | */ 64 | 65 | gsl_multifit_robust_workspace * 66 | gsl_multifit_robust_alloc(const gsl_multifit_robust_type *T, 67 | const size_t n, const size_t p) 68 | { 69 | gsl_multifit_robust_workspace *w; 70 | 71 | if (n < p) 72 | { 73 | GSL_ERROR_VAL("observations n must be >= p", GSL_EINVAL, 0); 74 | } 75 | 76 | w = calloc(1, sizeof(gsl_multifit_robust_workspace)); 77 | if (w == 0) 78 | { 79 | GSL_ERROR_VAL("failed to allocate space for multifit_robust struct", 80 | GSL_ENOMEM, 0); 81 | } 82 | 83 | w->n = n; 84 | w->p = p; 85 | w->type = T; 86 | /* bdavis */ 87 | //w->maxiter = 100; /* maximum iterations */ 88 | w->maxiter = 5; /* maximum iterations */ 89 | /* bdavis */ 90 | w->tune = w->type->tuning_default; 91 | 92 | w->multifit_p = gsl_multifit_linear_alloc(n, p); 93 | if (w->multifit_p == 0) 94 | { 95 | GSL_ERROR_VAL("failed to allocate space for multifit_linear struct", 96 | GSL_ENOMEM, 0); 97 | } 98 | 99 | w->r = gsl_vector_alloc(n); 100 | if (w->r == 0) 101 | { 102 | GSL_ERROR_VAL("failed to allocate space for residuals", 103 | GSL_ENOMEM, 0); 104 | } 105 | 106 | w->weights = gsl_vector_alloc(n); 107 | if (w->weights == 0) 108 | { 109 | GSL_ERROR_VAL("failed to allocate space for weights", GSL_ENOMEM, 0); 110 | } 111 | 112 | w->c_prev = gsl_vector_alloc(p); 113 | if (w->c_prev == 0) 114 | { 115 | GSL_ERROR_VAL("failed to allocate space for c_prev", GSL_ENOMEM, 0); 116 | } 117 | 118 | w->resfac = gsl_vector_alloc(n); 119 | if (w->resfac == 0) 120 | { 121 | GSL_ERROR_VAL("failed to allocate space for residual factors", 122 | GSL_ENOMEM, 0); 123 | } 124 | 125 | w->psi = gsl_vector_alloc(n); 126 | if (w->psi == 0) 127 | { 128 | GSL_ERROR_VAL("failed to allocate space for psi", GSL_ENOMEM, 0); 129 | } 130 | 131 | w->dpsi = gsl_vector_alloc(n); 132 | if (w->dpsi == 0) 133 | { 134 | GSL_ERROR_VAL("failed to allocate space for dpsi", GSL_ENOMEM, 0); 135 | } 136 | 137 | w->QSI = gsl_matrix_alloc(p, p); 138 | if (w->QSI == 0) 139 | { 140 | GSL_ERROR_VAL("failed to allocate space for QSI", GSL_ENOMEM, 0); 141 | } 142 | 143 | w->D = gsl_vector_alloc(p); 144 | if (w->D == 0) 145 | { 146 | GSL_ERROR_VAL("failed to allocate space for D", GSL_ENOMEM, 0); 147 | } 148 | 149 | w->workn = gsl_vector_alloc(n); 150 | if (w->workn == 0) 151 | { 152 | GSL_ERROR_VAL("failed to allocate space for workn", GSL_ENOMEM, 0); 153 | } 154 | 155 | w->stats.sigma_ols = 0.0; 156 | w->stats.sigma_mad = 0.0; 157 | w->stats.sigma_rob = 0.0; 158 | w->stats.sigma = 0.0; 159 | w->stats.Rsq = 0.0; 160 | w->stats.adj_Rsq = 0.0; 161 | w->stats.rmse = 0.0; 162 | w->stats.sse = 0.0; 163 | w->stats.dof = n - p; 164 | w->stats.weights = w->weights; 165 | w->stats.r = w->r; 166 | 167 | return w; 168 | } /* gsl_multifit_robust_alloc() */ 169 | 170 | /* 171 | gsl_multifit_robust_free() 172 | Free memory associated with robust workspace 173 | */ 174 | 175 | void 176 | gsl_multifit_robust_free(gsl_multifit_robust_workspace *w) 177 | { 178 | if (w->multifit_p) 179 | gsl_multifit_linear_free(w->multifit_p); 180 | 181 | if (w->r) 182 | gsl_vector_free(w->r); 183 | 184 | if (w->weights) 185 | gsl_vector_free(w->weights); 186 | 187 | if (w->c_prev) 188 | gsl_vector_free(w->c_prev); 189 | 190 | if (w->resfac) 191 | gsl_vector_free(w->resfac); 192 | 193 | if (w->psi) 194 | gsl_vector_free(w->psi); 195 | 196 | if (w->dpsi) 197 | gsl_vector_free(w->dpsi); 198 | 199 | if (w->QSI) 200 | gsl_matrix_free(w->QSI); 201 | 202 | if (w->D) 203 | gsl_vector_free(w->D); 204 | 205 | if (w->workn) 206 | gsl_vector_free(w->workn); 207 | 208 | free(w); 209 | } /* gsl_multifit_robust_free() */ 210 | 211 | int 212 | gsl_multifit_robust_tune(const double tune, gsl_multifit_robust_workspace *w) 213 | { 214 | w->tune = tune; 215 | return GSL_SUCCESS; 216 | } 217 | 218 | const char * 219 | gsl_multifit_robust_name(const gsl_multifit_robust_workspace *w) 220 | { 221 | return w->type->name; 222 | } 223 | 224 | gsl_multifit_robust_stats 225 | gsl_multifit_robust_statistics(const gsl_multifit_robust_workspace *w) 226 | { 227 | return w->stats; 228 | } 229 | 230 | /* 231 | gsl_multifit_robust() 232 | Perform robust iteratively reweighted linear least squares 233 | fit 234 | 235 | Inputs: X - design matrix of basis functions 236 | y - right hand side vector 237 | c - (output) model coefficients 238 | cov - (output) covariance matrix 239 | w - workspace 240 | */ 241 | 242 | int 243 | gsl_multifit_robust(const gsl_matrix * X, 244 | const gsl_vector * y, 245 | gsl_vector * c, 246 | gsl_matrix * cov, 247 | gsl_multifit_robust_workspace *w) 248 | { 249 | /* check matrix and vector sizes */ 250 | if (X->size1 != y->size) 251 | { 252 | GSL_ERROR 253 | ("number of observations in y does not match rows of matrix X", 254 | GSL_EBADLEN); 255 | } 256 | else if (X->size2 != c->size) 257 | { 258 | GSL_ERROR ("number of parameters c does not match columns of matrix X", 259 | GSL_EBADLEN); 260 | } 261 | else if (cov->size1 != cov->size2) 262 | { 263 | GSL_ERROR ("covariance matrix is not square", GSL_ENOTSQR); 264 | } 265 | else if (c->size != cov->size1) 266 | { 267 | GSL_ERROR 268 | ("number of parameters does not match size of covariance matrix", 269 | GSL_EBADLEN); 270 | } 271 | else if (X->size1 != w->n || X->size2 != w->p) 272 | { 273 | GSL_ERROR 274 | ("size of workspace does not match size of observation matrix", 275 | GSL_EBADLEN); 276 | } 277 | else 278 | { 279 | int s; 280 | double chisq; 281 | const double tol = GSL_SQRT_DBL_EPSILON; 282 | int converged = 0; 283 | size_t numit = 0; 284 | const size_t n = y->size; 285 | double sigy = gsl_stats_sd(y->data, y->stride, n); 286 | double sig_lower; 287 | size_t i; 288 | 289 | /* 290 | * if the initial fit is very good, then finding outliers by comparing 291 | * them to the residual standard deviation is difficult. Therefore we 292 | * set a lower bound on the standard deviation estimate that is a small 293 | * fraction of the standard deviation of the data values 294 | */ 295 | sig_lower = 1.0e-6 * sigy; 296 | if (sig_lower == 0.0) 297 | sig_lower = 1.0; 298 | 299 | /* compute initial estimates using ordinary least squares */ 300 | s = gsl_multifit_linear(X, y, c, cov, &chisq, w->multifit_p); 301 | if (s) 302 | return s; 303 | 304 | /* save Q S^{-1} of original matrix */ 305 | gsl_matrix_memcpy(w->QSI, w->multifit_p->QSI); 306 | gsl_vector_memcpy(w->D, w->multifit_p->D); 307 | 308 | /* compute statistical leverage of each data point */ 309 | s = gsl_linalg_SV_leverage(w->multifit_p->A, w->resfac); 310 | if (s) 311 | return s; 312 | 313 | /* correct residuals with factor 1 / sqrt(1 - h) */ 314 | for (i = 0; i < n; ++i) 315 | { 316 | double h = gsl_vector_get(w->resfac, i); 317 | 318 | if (h > 0.9999) 319 | h = 0.9999; 320 | 321 | gsl_vector_set(w->resfac, i, 1.0 / sqrt(1.0 - h)); 322 | } 323 | 324 | /* compute residuals from OLS fit r = y - X c */ 325 | s = gsl_multifit_linear_residuals(X, y, c, w->r); 326 | if (s) 327 | return s; 328 | 329 | /* compute estimate of sigma from ordinary least squares */ 330 | w->stats.sigma_ols = gsl_blas_dnrm2(w->r) / sqrt((double) w->stats.dof); 331 | 332 | while (!converged && ++numit <= w->maxiter) 333 | { 334 | double sig; 335 | 336 | /* adjust residuals by statistical leverage (see DuMouchel and O'Brien) */ 337 | s = gsl_vector_mul(w->r, w->resfac); 338 | if (s) 339 | return s; 340 | 341 | /* compute estimate of standard deviation using MAD */ 342 | sig = robust_madsigma(w->r, w); 343 | 344 | /* scale residuals by standard deviation and tuning parameter */ 345 | gsl_vector_scale(w->r, 1.0 / (GSL_MAX(sig, sig_lower) * w->tune)); 346 | 347 | /* compute weights using these residuals */ 348 | s = w->type->wfun(w->r, w->weights); 349 | if (s) 350 | return s; 351 | 352 | gsl_vector_memcpy(w->c_prev, c); 353 | 354 | /* solve weighted least squares with new weights */ 355 | s = gsl_multifit_wlinear(X, w->weights, y, c, cov, &chisq, w->multifit_p); 356 | if (s) 357 | return s; 358 | 359 | /* compute new residuals r = y - X c */ 360 | s = gsl_multifit_linear_residuals(X, y, c, w->r); 361 | if (s) 362 | return s; 363 | 364 | converged = robust_test_convergence(w->c_prev, c, tol); 365 | } 366 | 367 | /* compute final MAD sigma */ 368 | w->stats.sigma_mad = robust_madsigma(w->r, w); 369 | 370 | /* compute robust estimate of sigma */ 371 | w->stats.sigma_rob = robust_robsigma(w->r, w->stats.sigma_mad, w->tune, w); 372 | 373 | /* compute final estimate of sigma */ 374 | w->stats.sigma = robust_sigma(w->stats.sigma_ols, w->stats.sigma_rob, w); 375 | 376 | /* store number of iterations */ 377 | w->stats.numit = numit; 378 | 379 | { 380 | double dof = (double) w->stats.dof; 381 | double rnorm = w->stats.sigma * sqrt(dof); /* see DuMouchel, sec 4.2 */ 382 | double ss_err = rnorm * rnorm; 383 | double ss_tot = gsl_stats_tss(y->data, y->stride, n); 384 | 385 | /* compute R^2 */ 386 | w->stats.Rsq = 1.0 - ss_err / ss_tot; 387 | 388 | /* compute adjusted R^2 */ 389 | w->stats.adj_Rsq = 1.0 - (1.0 - w->stats.Rsq) * (n - 1.0) / dof; 390 | 391 | /* compute rmse */ 392 | w->stats.rmse = sqrt(ss_err / dof); 393 | 394 | /* store SSE */ 395 | w->stats.sse = ss_err; 396 | } 397 | 398 | /* calculate covariance matrix = sigma^2 (X^T X)^{-1} */ 399 | s = robust_covariance(w->stats.sigma, cov, w); 400 | if (s) 401 | return s; 402 | 403 | /* raise an error if not converged */ 404 | /* bdavis */ 405 | /* Eliminating this check is to avoid an error when iterations */ 406 | /* exceed 5. A better solution is probably recommended, such as */ 407 | /* reverting to default of 100 if an input specification is not */ 408 | /* enabled and defined. */ 409 | /* bdavis@usgs.gov */ 410 | /* 411 | if (numit > w->maxiter) 412 | { 413 | GSL_ERROR("maximum iterations exceeded", GSL_EMAXITER); 414 | } 415 | */ 416 | /* bdavis */ 417 | 418 | return s; 419 | } 420 | } /* gsl_multifit_robust() */ 421 | 422 | /* Estimation of values for given x */ 423 | int 424 | gsl_multifit_robust_est(const gsl_vector * x, const gsl_vector * c, 425 | const gsl_matrix * cov, double *y, double *y_err) 426 | { 427 | int s = gsl_multifit_linear_est(x, c, cov, y, y_err); 428 | 429 | return s; 430 | } 431 | 432 | /*********************************** 433 | * INTERNAL ROUTINES * 434 | ***********************************/ 435 | 436 | /* 437 | robust_test_convergence() 438 | Test for convergence in robust least squares 439 | 440 | Convergence criteria: 441 | 442 | |c_i^(k) - c_i^(k-1)| <= tol * max(|c_i^(k)|, |c_i^(k-1)|) 443 | 444 | for all i. k refers to iteration number. 445 | 446 | Inputs: c_prev - coefficients from previous iteration 447 | c - coefficients from current iteration 448 | tol - tolerance 449 | 450 | Return: 1 if converged, 0 if not 451 | */ 452 | 453 | static int 454 | robust_test_convergence(const gsl_vector *c_prev, const gsl_vector *c, 455 | const double tol) 456 | { 457 | size_t p = c->size; 458 | size_t i; 459 | 460 | for (i = 0; i < p; ++i) 461 | { 462 | double ai = gsl_vector_get(c_prev, i); 463 | double bi = gsl_vector_get(c, i); 464 | 465 | if (fabs(bi - ai) > tol * GSL_MAX(fabs(ai), fabs(bi))) 466 | return 0; /* not yet converged */ 467 | } 468 | 469 | /* converged */ 470 | return 1; 471 | } /* robust_test_convergence() */ 472 | 473 | /* 474 | robust_madsigma() 475 | Estimate the standard deviation of the residuals using 476 | the Median-Absolute-Deviation (MAD) of the residuals, 477 | throwing away the smallest p residuals. 478 | 479 | See: Street et al, 1988 480 | 481 | Inputs: r - vector of residuals 482 | w - workspace 483 | */ 484 | 485 | static double 486 | robust_madsigma(const gsl_vector *r, gsl_multifit_robust_workspace *w) 487 | { 488 | gsl_vector_view v; 489 | double sigma; 490 | size_t n = r->size; 491 | const size_t p = w->p; 492 | size_t i; 493 | 494 | /* copy |r| into workn */ 495 | for (i = 0; i < n; ++i) 496 | { 497 | gsl_vector_set(w->workn, i, fabs(gsl_vector_get(r, i))); 498 | } 499 | 500 | gsl_sort_vector(w->workn); 501 | 502 | /* 503 | * ignore the smallest p residuals when computing the median 504 | * (see Street et al 1988) 505 | */ 506 | v = gsl_vector_subvector(w->workn, p - 1, n - p + 1); 507 | sigma = gsl_stats_median_from_sorted_data(v.vector.data, v.vector.stride, v.vector.size) / 0.6745; 508 | 509 | return sigma; 510 | } /* robust_madsigma() */ 511 | 512 | /* 513 | robust_robsigma() 514 | Compute robust estimate of sigma so that 515 | sigma^2 * inv(X' * X) is a reasonable estimate of 516 | the covariance for robust regression. Based heavily 517 | on the equations of Street et al, 1988. 518 | 519 | Inputs: r - vector of residuals y - X c 520 | s - sigma estimate using MAD 521 | tune - tuning constant 522 | w - workspace 523 | */ 524 | 525 | static double 526 | robust_robsigma(const gsl_vector *r, const double s, 527 | const double tune, gsl_multifit_robust_workspace *w) 528 | { 529 | double sigma; 530 | size_t i; 531 | const size_t n = w->n; 532 | const size_t p = w->p; 533 | const double st = s * tune; 534 | double a, b, lambda; 535 | 536 | /* compute u = r / sqrt(1 - h) / st */ 537 | gsl_vector_memcpy(w->workn, r); 538 | gsl_vector_mul(w->workn, w->resfac); 539 | gsl_vector_scale(w->workn, 1.0 / st); 540 | 541 | /* compute w(u) and psi'(u) */ 542 | w->type->wfun(w->workn, w->psi); 543 | w->type->psi_deriv(w->workn, w->dpsi); 544 | 545 | /* compute psi(u) = u*w(u) */ 546 | gsl_vector_mul(w->psi, w->workn); 547 | 548 | /* Street et al, Eq (3) */ 549 | a = gsl_stats_mean(w->dpsi->data, w->dpsi->stride, n); 550 | 551 | /* Street et al, Eq (5) */ 552 | b = 0.0; 553 | for (i = 0; i < n; ++i) 554 | { 555 | double psi_i = gsl_vector_get(w->psi, i); 556 | double resfac = gsl_vector_get(w->resfac, i); 557 | double fac = 1.0 / (resfac*resfac); /* 1 - h */ 558 | 559 | b += fac * psi_i * psi_i; 560 | } 561 | b /= (double) (n - p); 562 | 563 | /* Street et al, Eq (5) */ 564 | lambda = 1.0 + ((double)p)/((double)n) * (1.0 - a) / a; 565 | 566 | sigma = lambda * sqrt(b) * st / a; 567 | 568 | return sigma; 569 | } /* robust_robsigma() */ 570 | 571 | /* 572 | robust_sigma() 573 | Compute final estimate of residual standard deviation, using 574 | the OLS and robust sigma estimates. 575 | 576 | This equation is taken from DuMouchel and O'Brien, sec 4.1: 577 | \hat{\sigma_R} 578 | 579 | Inputs: s_ols - OLS sigma 580 | s_rob - robust sigma 581 | w - workspace 582 | 583 | Return: final estimate of sigma 584 | */ 585 | 586 | static double 587 | robust_sigma(const double s_ols, const double s_rob, 588 | gsl_multifit_robust_workspace *w) 589 | { 590 | double sigma; 591 | const size_t p = w->p; 592 | const size_t n = w->n; 593 | 594 | /* see DuMouchel and O'Brien, sec 4.1 */ 595 | sigma = GSL_MAX(s_rob, 596 | sqrt((s_ols*s_ols*p*p + s_rob*s_rob*n) / 597 | (p*p + n))); 598 | 599 | return sigma; 600 | } /* robust_sigma() */ 601 | 602 | /* 603 | robust_covariance() 604 | Calculate final covariance matrix, defined as: 605 | 606 | sigma * (X^T X)^{-1} 607 | 608 | Inputs: sigma - residual standard deviation 609 | cov - (output) covariance matrix 610 | w - workspace 611 | */ 612 | 613 | static int 614 | robust_covariance(const double sigma, gsl_matrix *cov, 615 | gsl_multifit_robust_workspace *w) 616 | { 617 | int s = 0; 618 | const size_t p = w->p; 619 | const double s2 = sigma * sigma; 620 | size_t i, j; 621 | gsl_matrix *QSI = w->QSI; 622 | gsl_vector *D = w->D; 623 | 624 | /* Form variance-covariance matrix cov = s2 * (Q S^-1) (Q S^-1)^T */ 625 | 626 | for (i = 0; i < p; i++) 627 | { 628 | gsl_vector_view row_i = gsl_matrix_row (QSI, i); 629 | double d_i = gsl_vector_get (D, i); 630 | 631 | for (j = i; j < p; j++) 632 | { 633 | gsl_vector_view row_j = gsl_matrix_row (QSI, j); 634 | double d_j = gsl_vector_get (D, j); 635 | double s; 636 | 637 | gsl_blas_ddot (&row_i.vector, &row_j.vector, &s); 638 | 639 | gsl_matrix_set (cov, i, j, s * s2 / (d_i * d_j)); 640 | gsl_matrix_set (cov, j, i, s * s2 / (d_i * d_j)); 641 | } 642 | } 643 | 644 | return s; 645 | } /* robust_covariance() */ 646 | -------------------------------------------------------------------------------- /ccdc/output.h: -------------------------------------------------------------------------------- 1 | #ifndef OUTPUT_H 2 | #define OUTPUT_H 3 | 4 | #include "defines.h" 5 | 6 | typedef struct { 7 | int row; 8 | int col; 9 | } Position_t; 10 | 11 | /* Structure for the 'output' data type */ 12 | typedef struct 13 | { 14 | int t_start; /* time when series model gets started */ 15 | int t_end; /* time when series model gets ended */ 16 | int t_break; /* time when the first break (change) is observed */ 17 | float coefs[NUM_BANDS][NUM_COEFFS]; 18 | /* coefficients for each time series model for each 19 | spectral band*/ 20 | float rmse[NUM_BANDS]; 21 | /* RMSE for each time series model for each 22 | spectral band*/ 23 | Position_t pos; /* the location of each time series model */ 24 | float change_prob; /* the probability of a pixel that have undergone 25 | change (between 0 and 100) */ 26 | int num_obs; /* the number of "good" observations used for model 27 | estimation */ 28 | int category; /* the quality of the model estimation (what model 29 | is used, what process is used) 30 | 1x: persistent snow 2x: persistent water 31 | 3x: Fmask fails 4x: normal precedure 32 | x1: mean value (1) x4: simple fit (4) 33 | x6: basic fit (6) x8: full fit (8) */ 34 | float magnitude[NUM_BANDS];/* the magnitude of change (difference between model 35 | prediction and observation for each spectral band)*/ 36 | } Output_t; 37 | 38 | #endif 39 | -------------------------------------------------------------------------------- /ccdc/scripts/cloudCover.pl: -------------------------------------------------------------------------------- 1 | #!/usr/bin/perl 2 | 3 | # ###################################################################### 4 | # 5 | # Name: cloudCover.pl 6 | # 7 | # Author: bdavis 8 | # 9 | # Date: 20151029 10 | # 11 | # Description: 12 | # Copied from cc.pl. Receives as arguments the output from gdal -hist 13 | # the first 5 values of which are number of clear, water, cloud shadow, 14 | # snow, and cloud, respectively. It returns the percentage of clear 15 | # pixels as determined by clear + water / total. cc.pl returns percent 16 | # cloud cover. This is called from bash scripts because floating 17 | # point math is difficult in bash, and rounding errors were causing 18 | # divide by zero when calculating the total pixels as a percentage of 19 | # possible pixels in a grid. 20 | # 21 | # As it turns out, we have come across 1 ESPA scene which was all fill 22 | # values, so zero valid pixels, resulting in divide by zero, hence the 23 | # check for total not equal to zero. 24 | # 25 | # Multipy result times by 100 because tileLandsat.sh is expecting 26 | # percents as integer values. 27 | # 28 | # ###################################################################### 29 | 30 | $clear = $ARGV[0]; 31 | $water = $ARGV[1]; 32 | $cloudShadow = $ARGV[2]; 33 | $snow = $ARGV[3]; 34 | $cloud = $ARGV[4]; 35 | ##print "clear $clear\n"; 36 | ##print "water $water\n"; 37 | ##print "cloudShadow $cloudShadow\n"; 38 | ##print "snow $snow\n"; 39 | ##print "cloud $cloud\n"; 40 | 41 | $total = $clear + $water + $cloudShadow + $snow + $cloud; 42 | #$cc = (($cloudShadow + $cloud) / $total ) * 100; 43 | if ($total != 0) 44 | { 45 | $pctClear = (($clear + $water) / $total ) * 100; 46 | } 47 | else 48 | { 49 | $pctClear = 0; 50 | } 51 | 52 | ## debug 53 | # test of 1 for total gave a value of 4e-08 for pctTotal, 54 | # which still was interpreted as a valid non-zero value 55 | # and did not give a divide by zero error. 56 | #$pctTotal = $total / (5000 * 5000); 57 | #print "pctTotal $pctTotal\n"; 58 | #print "total $total\n"; 59 | #print "cc $cc\n"; 60 | ## debug 61 | #print "$cc\n"; 62 | print "$pctClear\n"; 63 | 64 | exit; 65 | -------------------------------------------------------------------------------- /ccdc/scripts/renameMTLfiles.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | for sceneID in * 4 | do 5 | 6 | 7 | cd $sceneID 8 | echo $sceneID 9 | 10 | img=`ls -1 | grep -v hdr` 11 | echo img $img 12 | hdr=`ls -1 | grep hdr` 13 | echo hdr $hdr 14 | 15 | new_img=$img"_MTLstack" 16 | new_hdr=`echo $hdr|sed 's/.hdr//'` 17 | new_hdr=$new_hdr"_MTLstack.hdr" 18 | 19 | echo img $img new_img $new_img 20 | echo hdr $hdr new_hdr $new_hdr 21 | mv $img $new_img 22 | mv $hdr $new_hdr 23 | 24 | exit 25 | 26 | cd .. 27 | 28 | done 29 | -------------------------------------------------------------------------------- /ccdc/scripts/tileLandsat.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ######################################################################## 4 | # 5 | # tileLandsat.sh 6 | # 7 | # Parent script to "tile" landsat scenes into files of stacked bands 8 | # in ENVI format, in BIP order, sequenced as required by the CCDC. 9 | # 10 | # Assumes inputs are in a directory and ESPA-packaged .tar.gz files. 11 | # Output is written to sub-directories under a parent directory, 12 | # one per scene ID. 13 | # 14 | # 20151019 bdavis 15 | # Original development, plagiarizing from fireMapping.sh. Input 16 | # scene IDs should be arguments, somehow. 17 | # 18 | ######################################################################## 19 | 20 | 21 | ######################################################################## 22 | # 23 | # Set up environment. This should be valid for any of the SLURM nodes 24 | # in the EROS YetiJr environment. 25 | # 26 | ######################################################################## 27 | 28 | export PATH=.:/alt/local/run:/alt/local/bin:/usr/local/local.host/bin:/usr/lib64/qt-3.3/bin:/usr/local/local.host/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/home/bdavis/bin 29 | export LD_LIBRARY_PATH=/alt/local/lib:/usr/local/local.host/ssl/lib:/usr/local/local.host/lib64:/usr/local/local.host/lib:/lib64:/usr/lib64:/usr/local/local.host/mysql/lib 30 | export PYTHONPATH=/alt/local/bin/:/alt/local/lib/python2.7/site-packages 31 | 32 | 33 | ######################################################################## 34 | # 35 | # Parse arguments. So far just in and out paths. 36 | # 37 | ######################################################################## 38 | 39 | date 40 | echo "" 41 | 42 | if [ $# -ne 3 ]; then 43 | echo "Usage: tileLandsat.sh input-path output-path exe" 44 | echo "Where input-path is the directory containing ESPA-created .tar.gz files," 45 | echo "and output-path is the parent directory in which to write results in sub-directories." 46 | echo "and exe is the type of executable to prepare for, MatLAB or C." 47 | echo "" 48 | echo "For Example: tileLandsat.sh /data/bdavis/WA/2005-2015/Albers_grid07 /data/bdavis/WA/2005-2015/Tiles_grid07 C" 49 | echo "" 50 | exit 51 | fi 52 | 53 | 54 | #in=/data/bdavis/WA/2005-2015/Albers_grid07 55 | #out=/data/bdavis/WA/2005-2015/Tiles_grid07 56 | 57 | in=$1 58 | out=$2 59 | exe=$3 60 | originalDir=`pwd` 61 | 62 | echo Reading from $in, writing to $out 63 | echo "" 64 | #echo in $in 65 | #echo out $out 66 | #echo originalDir $originalDir 67 | 68 | ######################################################################## 69 | # 70 | # It seems one could just read from /hsm, if it were mounted, but it's 71 | # not. 72 | # 73 | # Also, the 20% clear requirement is outdated thinking, because WA data 74 | # is ESPA rectangles (Area of Interest), with much more temporal density 75 | # (and fill), but also the change for any one x/y pixel location to be 76 | # clear/snow/water, even though the total scene/ESPA-AOI is less than 77 | # 20%. Therefore, process without checking for clear threshold and 78 | # skipping, but for the C verion only. Retain the 20% restriction for 79 | # the MatLAB version, because that is what is expected. 80 | # 81 | ######################################################################## 82 | 83 | ######################################################################## 84 | # 85 | # ls -1 LE7*.gz|sort -n --key=1.10,1.16 86 | # will give all path/row file names in a gridNN directory, sorted by 87 | # date, if that is required. Zhe says no, his software sorts 88 | # internally.checking with Song. 89 | # There are 174 46/27 LE7 in grid07, n of them are clear enough. 90 | # 91 | # This list of scenes needs to be parameterized. Piping an ls, 92 | # cat-ing a text file, etc. Limted to the first 24 path 46 row 27 93 | # LE7 files for initial testing purposes. 94 | # 95 | ######################################################################## 96 | 97 | #for sceneID in LE70460272005010 LE70460272005042 LE70460272005058 LE70460272005074 LE70460272005106 LE70460272005154 LE70460272005170 LE70460272005186 LE70460272005202 LE70460272005218 LE70460272005234 LE70460272005250 LE70460272005266 LE70460272005282 LE70460272005298 LE70460272005314 LE70460272005330 LE70460272005346 LE70460272006045 LE70460272006061 LE70460272006077 LE70460272006109 LE70460272006125 LE70460272006141 98 | 99 | cd $in 100 | 101 | #just zz scenes 102 | #for sceneID in `ls -1 LE7046027*.gz|sort -n --key=1.10,1.16` 103 | #production 104 | for sceneID in `ls -1 *.gz|sort -n --key=1.10,1.16` 105 | #testing 106 | #for sceneID in `ls -1 LT4*.gz|sort -n --key=1.10,1.16` 107 | #for sceneID in `ls -1 LT5*.gz|sort -n --key=1.10,1.16` 108 | #for sceneID in `ls -1 LE7*.gz|sort -n --key=1.10,1.16` 109 | #for sceneID in `ls -1 LC8*.gz|sort -n --key=1.10,1.16` 110 | #for sceneID in LT50450281996227 # test of zero valid pixels 111 | #for sceneID in LT50450281996243 # test of 100 pct cloud cover 112 | #for sceneID in LT50450281995112 # test of valid scene 113 | 114 | do 115 | 116 | #################################################################### 117 | # 118 | # Set up the names to use, and unpackage the scene. 119 | # 120 | #################################################################### 121 | 122 | sceneID=`echo $sceneID|cut -c 1-16` 123 | pkg=`ls -1 $in/*.gz|grep $sceneID` 124 | echo sceneID $sceneID 125 | echo pkg $pkg 126 | echo "" 127 | 128 | # if the directory and hdr already exist, assume this is done, skip. 129 | if [ -d $out/$sceneID ]; then 130 | 131 | hdr=`ls -1 $out/$sceneID/*.hdr|wc|awk '{print $1}'` 132 | echo hdr 133 | ls $out/$sceneID/*hdr 134 | 135 | if [ $hdr -ne 0 ]; then 136 | echo skipping hdr 137 | ls $out/$sceneID 138 | echo "" 139 | continue 140 | fi 141 | fi 142 | 143 | mkdir -p $out/$sceneID 144 | tar -C $out/$sceneID -zxvf $pkg 145 | cd $out/$sceneID 146 | name=`ls -1 *cfmask.tif|cut -c 1-21` 147 | echo name $name 148 | 149 | 150 | 151 | #################################################################### 152 | # 153 | # At some point, we may need to extract the sensor, to determine 154 | # sensor-specific processing of which bands, etc. 155 | # 156 | #################################################################### 157 | 158 | sensor=`echo $sceneID|cut -c 1-3` 159 | 160 | 161 | #################################################################### 162 | # 163 | # If the scene is not at least 20% clear, skip. Clear is defined 164 | # as clear + water / total. Call a perl tool modified from 165 | # fireMapping.sh which calculated cloud cover, because we need 166 | # clear. FYI, "clear" is a reserved word, hence "cleer". 167 | # Attempting to do floating point math in bash was causing some 168 | # divide by zero errors because of rounding very small values 169 | # (less than 0.01) to integer values. cloudCover.pl also returns 170 | # zero for scenes whose values are all fill, so zero valid pixels. 171 | # (yes, I've found one.) 172 | # 173 | #################################################################### 174 | 175 | ccargs=`gdalinfo -hist *cfmask.tif|tail -2|head -1|awk '{print $1 " " $2 " " $3 " " $4 " " $5}'` 176 | echo ccargs $ccargs 177 | pctClear=`cloudCover.pl $ccargs` 178 | intPctClear=`echo $pctClear | cut -d '.' -f 1` 179 | 180 | echo Percent clear pixels in total valid pixels: $intPctClear 181 | echo "" 182 | 183 | if [ $intPctClear -lt 20 ] && [ $exe == "MatLAB" ]; then 184 | echo Percent clear pixels less than 20: $intPctClear 185 | echo "" 186 | #rm -f *.tif *.xml *.txt 187 | cd .. 188 | rm -f -r $sceneID 189 | cd $originalDir 190 | continue 191 | fi 192 | 193 | 194 | #################################################################### 195 | # 196 | # For the Song C version of ccdc, individual envi files for each 197 | # band are required. For the Zhe MatLAB version, stacked band envi 198 | # files in BIP format are required. 199 | # 200 | #################################################################### 201 | 202 | if [ $exe == "C" ]; then 203 | 204 | if [ $sensor == "LC8" ]; then 205 | 206 | gdal_translate -of ENVI $name"_sr_band2.tif" $name"_sr_band2.img" 207 | gdal_translate -of ENVI $name"_sr_band3.tif" $name"_sr_band3.img" 208 | gdal_translate -of ENVI $name"_sr_band4.tif" $name"_sr_band4.img" 209 | gdal_translate -of ENVI $name"_sr_band5.tif" $name"_sr_band5.img" 210 | gdal_translate -of ENVI $name"_sr_band6.tif" $name"_sr_band6.img" 211 | gdal_translate -of ENVI $name"_sr_band7.tif" $name"_sr_band7.img" 212 | gdal_translate -of ENVI $name"_toa_band10.tif" $name"_toa_band10.img" 213 | gdal_translate -of ENVI $name"_cfmask.tif" $name"_cfmask.img" 214 | 215 | mv *.img *sr_band2.hdr $out/. 216 | 217 | else 218 | 219 | gdal_translate -of ENVI $name"_sr_band1.tif" $name"_sr_band1.img" 220 | gdal_translate -of ENVI $name"_sr_band2.tif" $name"_sr_band2.img" 221 | gdal_translate -of ENVI $name"_sr_band3.tif" $name"_sr_band3.img" 222 | gdal_translate -of ENVI $name"_sr_band4.tif" $name"_sr_band4.img" 223 | gdal_translate -of ENVI $name"_sr_band5.tif" $name"_sr_band5.img" 224 | gdal_translate -of ENVI $name"_sr_band7.tif" $name"_sr_band7.img" 225 | gdal_translate -of ENVI $name"_toa_band6.tif" $name"_toa_band6.img" 226 | gdal_translate -of ENVI $name"_cfmask.tif" $name"_cfmask.img" 227 | 228 | mv *.img *sr_band1.hdr $out/. 229 | 230 | fi 231 | 232 | cd .. 233 | rm -r $sceneID 234 | cd $originalDir 235 | 236 | else 237 | 238 | 239 | ################################################################ 240 | # 241 | # Convert the cfmask from byte to 16bit. All "bands" in the 242 | # merged tif need to be of the same data type. 243 | # 244 | ################################################################ 245 | 246 | cfmask=`ls -1 *_cfmask.tif` 247 | cfmask16=`echo $cfmask | sed 's/cfmask/cfmask16/'` 248 | echo cfmask $cfmask cfmask16 $cfmask16 249 | echo "" 250 | echo "Converting cfmask to 16-bit." 251 | echo "" 252 | gdal_translate -ot UInt16 $cfmask $cfmask16 253 | echo "" 254 | 255 | 256 | ################################################################ 257 | # 258 | # Merge all the bands together, in the specified order, in ENVI 259 | # format. 260 | # 261 | ################################################################ 262 | 263 | echo "Merging bands." 264 | echo "" 265 | 266 | if [ $sensor == "LC8" ]; then 267 | 268 | gdal_merge.py -o $name"_stack.img" \ 269 | -separate \ 270 | -of ENVI \ 271 | $name"_sr_band2.tif" \ 272 | $name"_sr_band3.tif" \ 273 | $name"_sr_band4.tif" \ 274 | $name"_sr_band5.tif" \ 275 | $name"_sr_band6.tif" \ 276 | $name"_sr_band7.tif" \ 277 | $name"_toa_band10.tif" \ 278 | $name"_cfmask16.tif" 279 | 280 | else 281 | 282 | gdal_merge.py -o $name"_stack.img" \ 283 | -separate \ 284 | -of ENVI \ 285 | $name"_sr_band1.tif" \ 286 | $name"_sr_band2.tif" \ 287 | $name"_sr_band3.tif" \ 288 | $name"_sr_band4.tif" \ 289 | $name"_sr_band5.tif" \ 290 | $name"_sr_band7.tif" \ 291 | $name"_toa_band6.tif" \ 292 | $name"_cfmask16.tif" 293 | 294 | fi 295 | 296 | echo "" 297 | 298 | 299 | ################################################################ 300 | # 301 | # Create a jpg just during testing if a sanity check is 302 | # required. 303 | # 304 | ################################################################ 305 | 306 | # echo "Creating JPEG." 307 | # echo "" 308 | # gdal_translate -of JPEG \ 309 | # -b 6 -b 4 -b 3 \ 310 | # -ot Byte \ 311 | # -scale \ 312 | # $name"_stack.img" \ 313 | # $name.jpg 314 | # echo "" 315 | 316 | 317 | ################################################################ 318 | # 319 | # Translate the envi file to BIP. This could be combined with 320 | # the initial merge if we do not need a bsq jpg. Eventually. 321 | # 322 | ################################################################ 323 | 324 | echo "Converting to BIP." 325 | echo "" 326 | gdal_translate -co "INTERLEAVE=BIP" \ 327 | -of ENVI \ 328 | $name"_stack.img" \ 329 | $name"_MTLstack.img" 330 | echo "" 331 | 332 | 333 | ################################################################ 334 | # 335 | # Clean up after yourself. Leaves only the required files and 336 | # reduces confusion. (was going to say "eliminates".) 337 | # 338 | ################################################################ 339 | 340 | mv $name"_MTLstack.img" $name"_MTLstack" 341 | rm *.tif *.xml *_stack.hdr *_stack.img 342 | cd .. 343 | 344 | echo "" 345 | 346 | fi 347 | 348 | done 349 | 350 | echo "Processing complete." 351 | echo "" 352 | date 353 | 354 | exit 355 | 356 | -------------------------------------------------------------------------------- /ccdc/scripts/tileLandsatParent.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | tileLandsat.sh /shared/bdavis/WA/1982-1984/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1982-M11.out & 4 | tileLandsat.sh /shared/bdavis/WA/1984-1990/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1984-M11.out & 5 | tileLandsat.sh /shared/bdavis/WA/1990-1995/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1990-M11.out & 6 | tileLandsat.sh /shared/bdavis/WA/1995-2000/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1995-M11.out & 7 | tileLandsat.sh /shared/bdavis/WA/2000-2005/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 2000-M11.out & 8 | tileLandsat.sh /shared/bdavis/WA/2005-2015/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 2005-M11.out & 9 | 10 | exit 11 | 12 | tileLandsat.sh /shared/bdavis/WA/1982-1984/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1982-M.out & 13 | tileLandsat.sh /shared/bdavis/WA/1984-1990/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1984-M.out & 14 | tileLandsat.sh /shared/bdavis/WA/1990-1995/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1990-M.out & 15 | tileLandsat.sh /shared/bdavis/WA/1995-2000/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1995-M.out & 16 | tileLandsat.sh /shared/bdavis/WA/2000-2005/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 2000-M.out & 17 | tileLandsat.sh /shared/bdavis/WA/2005-2015/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 2005-M.out & 18 | 19 | exit 20 | 21 | #tileLandsat.sh /shared/bdavis/WA/1982-1984/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1982-M.out & 22 | #tileLandsat.sh /shared/bdavis/WA/1984-1990/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1984-M.out & 23 | #tileLandsat.sh /shared/bdavis/WA/1990-1995/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1990-M.out & 24 | #tileLandsat.sh /shared/bdavis/WA/1995-2000/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1995-M.out & 25 | #tileLandsat.sh /shared/bdavis/WA/2000-2005/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 2000-M.out & 26 | #tileLandsat.sh /shared/bdavis/WA/2005-2015/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 2005-M.out & 27 | 28 | exit 29 | 30 | #tileLandsat.sh /data/bdavis/WA/1982-1984/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1982-M.out & 31 | #tileLandsat.sh /data/bdavis/WA/1984-1990/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1984-M.out & 32 | #tileLandsat.sh /data/bdavis/WA/1990-1995/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1990-M.out & 33 | #tileLandsat.sh /data/bdavis/WA/1995-2000/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1995-M.out & 34 | #tileLandsat.sh /shared/bdavis/WA/2000-2005/grid02 /shared/bdavis/grid02/MatLAB MatLAB >& 2000-M.out & 35 | #tileLandsat.sh /shared/bdavis/WA/2005-2015/grid02 /shared/bdavis/grid02/MatLAB MatLAB >& 2005-M.out & 36 | 37 | exit 38 | 39 | #tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1982-1984/grid07 /data/bdavis/1982-1984/C C >& 1982-1984-C.out & 40 | #tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1982-1984/grid07 /data/bdavis/1982-1984/MatLAB MatLAB >& 1982-1984-M.out & 41 | 42 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1984-1990/grid07 /data/bdavis/grid07/C/1984-1990 C >& 1984-1990-C.out 43 | #tileLandsat.sh /data/bdavis/WA/1984-1990/grid07 /data/bdavis/test/MatLAB MatLAB >& 1984-1990-M.out 44 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1990-1995/grid07 /data/bdavis/grid07/C/1990-1995 C >& 1990-1995-C.out 45 | #tileLandsat.sh /data/bdavis/WA/1990-1995/grid07 /data/bdavis/test/MatLAB MatLAB >& 1990-1995-M.out 46 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1995-2000/grid07 /data/bdavis/grid07/C/1995-2000 C >& 1995-2000-C.out 47 | #tileLandsat.sh /data/bdavis/WA/1995-2000/grid07 /data/bdavis/test/MatLAB MatLAB >& 1995-2000-M.out 48 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/2000-2005/grid07 /data/bdavis/grid07/C/2000-2005 C >& 2000-2005-C.out 49 | #tileLandsat.sh /data/bdavis/WA/2000-2005/grid07 /data/bdavis/test/MatLAB MatLAB >& 2000-2005-M.out 50 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/2005-2015/grid07 /data/bdavis/grid07/C/2005-2015 C >& 2005-2015-C.out 51 | #tileLandsat.sh /data/bdavis/WA/2005-2015/grid07 /data/bdavis/test/MatLAB MatLAB >& 2005-2015-M.out 52 | 53 | -------------------------------------------------------------------------------- /ccdc/utilities.c: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | 11 | #include "utilities.h" 12 | 13 | 14 | /***************************************************************************** 15 | NAME: write_message 16 | 17 | PURPOSE: Writes a formatted log message to the specified file handle. 18 | 19 | RETURN VALUE: None 20 | 21 | NOTES: 22 | - Log Message Format: 23 | yyyy-mm-dd HH:mm:ss pid:module [filename]:line message 24 | *****************************************************************************/ 25 | 26 | void write_message 27 | ( 28 | const char *message, /* I: message to write to the log */ 29 | const char *module, /* I: module the message is from */ 30 | const char *type, /* I: type of the error */ 31 | char *file, /* I: file the message was generated in */ 32 | int line, /* I: line number in the file where the message was 33 | generated */ 34 | FILE *fd /* I: where to write the log message */ 35 | ) 36 | { 37 | time_t current_time; 38 | struct tm *time_info; 39 | int year; 40 | pid_t pid; 41 | 42 | time (¤t_time); 43 | time_info = localtime (¤t_time); 44 | year = time_info->tm_year + 1900; 45 | 46 | pid = getpid (); 47 | 48 | fprintf (fd, "%04d:%02d:%02d %02d:%02d:%02d %d:%s [%s]:%d [%s]:%s\n", 49 | year, 50 | time_info->tm_mon, 51 | time_info->tm_mday, 52 | time_info->tm_hour, 53 | time_info->tm_min, 54 | time_info->tm_sec, 55 | pid, module, basename (file), line, type, message); 56 | } 57 | 58 | 59 | /***************************************************************************** 60 | NAME: sub_string 61 | 62 | PURPOSE: To control the specific way in with a string is manipulated. 63 | 64 | RETURN VALUE: Sub-setted character string 65 | 66 | NOTES: Probably dangerous. 67 | *****************************************************************************/ 68 | 69 | char *sub_string /* explicit control of a substring function */ 70 | ( 71 | const char *source, /* I: input string */ 72 | size_t start, /* I: index for start of sub string */ 73 | size_t length /* I: number of characters to grab */ 74 | ) 75 | { 76 | size_t i; 77 | char *target; 78 | 79 | target = malloc(length*sizeof(char)); 80 | 81 | for(i = 0; i != length; ++i) 82 | { 83 | target[i] = source[start + i]; 84 | } 85 | target[i] = 0; 86 | return target; 87 | } 88 | 89 | -------------------------------------------------------------------------------- /ccdc/utilities.h: -------------------------------------------------------------------------------- 1 | 2 | #ifndef UTILITIES_H 3 | #define UTILITIES_H 4 | 5 | 6 | #include 7 | 8 | 9 | #define LOG_MESSAGE(message, module) \ 10 | write_message((message), (module), "INFO", \ 11 | __FILE__, __LINE__, stdout); 12 | 13 | 14 | #define WARNING_MESSAGE(message, module) \ 15 | write_message((message), (module), "WARNING", \ 16 | __FILE__, __LINE__, stdout); 17 | 18 | 19 | #define ERROR_MESSAGE(message, module) \ 20 | write_message((message), (module), "ERROR", \ 21 | __FILE__, __LINE__, stdout); 22 | 23 | 24 | #define RETURN_ERROR(message, module, status) \ 25 | {write_message((message), (module), "ERROR", \ 26 | __FILE__, __LINE__, stdout); \ 27 | return (status);} 28 | 29 | 30 | void write_message 31 | ( 32 | const char *message, /* I: message to write to the log */ 33 | const char *module, /* I: module the message is from */ 34 | const char *type, /* I: type of the error */ 35 | char *file, /* I: file the message was generated in */ 36 | int line, /* I: line number in the file where the message was 37 | generated */ 38 | FILE * fd /* I: where to write the log message */ 39 | ); 40 | 41 | 42 | char *sub_string /* explicit control of a substring function */ 43 | ( 44 | const char *source, /* I: input string */ 45 | size_t start, /* I: index for start of sub string */ 46 | size_t length /* I: number of characters to grab */ 47 | ); 48 | 49 | 50 | #endif /* UTILITIES_H */ 51 | -------------------------------------------------------------------------------- /classification/2d_array.c: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | 5 | #include "const.h" 6 | #include "2d_array.h" 7 | #include "utilities.h" 8 | 9 | 10 | /* The 2D_ARRAY maintains a 2D array that can be sized at run-time. */ 11 | typedef struct lsrd_2d_array 12 | { 13 | unsigned int signature; /* Signature used to make sure the pointer 14 | math from a row_array_ptr actually gets back to 15 | the expected structure (helps detect errors). */ 16 | int rows; /* Rows in the 2D array */ 17 | int columns; /* Columns in the 2D array */ 18 | int member_size; /* Size of each entry in the array */ 19 | void *data_ptr; /* Pointer to the data storage for the array */ 20 | void **row_array_ptr; /* Pointer to an array of pointers to each row in 21 | the 2D array */ 22 | double memory_block[0]; /* Block of memory for storage of the array. 23 | It is broken into two blocks. The first 'rows * 24 | sizeof(void *)' block stores the pointer the 25 | first column in each of the rows. The remainder 26 | of the block is for storing the actual data. 27 | Note: the type is double to force the worst case 28 | memory alignment on sparc boxes. */ 29 | } LSRD_2D_ARRAY; 30 | 31 | 32 | /* Define a unique (i.e. random) value that can be used to verify a pointer 33 | points to an LSRD_2D_ARRAY. This is used to verify the operation succeeds to 34 | get an LSRD_2D_ARRAY pointer from a row pointer. */ 35 | #define SIGNATURE 0x326589ab 36 | 37 | 38 | /* Given an address returned by the allocate routine, get a pointer to the 39 | entire structure. */ 40 | #define GET_ARRAY_STRUCTURE_FROM_PTR(ptr) \ 41 | ((LSRD_2D_ARRAY *)((char *)(ptr) - offsetof(LSRD_2D_ARRAY, memory_block))) 42 | 43 | 44 | /************************************************************************* 45 | NAME: allocate_2d_array 46 | 47 | PURPOSE: Allocate memory for 2D array. 48 | 49 | RETURNS: A pointer to a 2D array, or NULL if the routine fails. A pointer 50 | to an array of void pointers to the storage for each row of the 51 | array is returned. The returned pointer must be freed by the 52 | free_2d_array routine. 53 | 54 | HISTORY: 55 | Date Programmer Reason 56 | -------- --------------- ------------------------------------- 57 | 3/15/2013 Song Guo Modified from LDCM IAS library 58 | **************************************************************************/ 59 | void **allocate_2d_array 60 | ( 61 | int rows, /* I: Number of rows for the 2D array */ 62 | int columns, /* I: Number of columns for the 2D array */ 63 | size_t member_size /* I: Size of the 2D array element */ 64 | ) 65 | { 66 | int row; 67 | LSRD_2D_ARRAY *array; 68 | size_t size; 69 | int data_start_index; 70 | 71 | /* Calculate the size needed for the array memory. The size includes the 72 | size of the base structure, an array of pointers to the rows in the 73 | 2D array, an array for the data, and additional space 74 | (2 * sizeof(void*)) to account for different memory alignment rules 75 | on some machine architectures. */ 76 | size = sizeof (*array) + (rows * sizeof (void *)) 77 | + (rows * columns * member_size) + 2 * sizeof (void *); 78 | 79 | /* Allocate the structure */ 80 | array = malloc (size); 81 | if (!array) 82 | { 83 | RETURN_ERROR ("Failure to allocate memory for the array", 84 | "allocate_2d_array", NULL); 85 | } 86 | 87 | /* Initialize the member structures */ 88 | array->signature = SIGNATURE; 89 | array->rows = rows; 90 | array->columns = columns; 91 | array->member_size = member_size; 92 | 93 | /* The array of pointers to rows starts at the beginning of the memory 94 | block */ 95 | array->row_array_ptr = (void **) array->memory_block; 96 | 97 | /* The data starts after the row pointers, with the index adjusted in 98 | case the void pointer and memory block pointers are not the same 99 | size */ 100 | data_start_index = 101 | rows * sizeof (void *) / sizeof (array->memory_block[0]); 102 | if ((rows % 2) == 1) 103 | data_start_index++; 104 | array->data_ptr = &array->memory_block[data_start_index]; 105 | 106 | /* Initialize the row pointers */ 107 | for (row = 0; row < rows; row++) 108 | { 109 | array->row_array_ptr[row] = array->data_ptr 110 | + row * columns * member_size; 111 | } 112 | 113 | return array->row_array_ptr; 114 | } 115 | 116 | 117 | /************************************************************************* 118 | NAME: free_2d_array 119 | 120 | PURPOSE: Free memory for a 2D array allocated by allocate_2d_array 121 | 122 | RETURNS: SUCCESS or FAILURE 123 | 124 | HISTORY: 125 | Date Programmer Reason 126 | -------- --------------- ------------------------------------- 127 | 3/15/2013 Song Guo Modified from LDCM IAS library 128 | **************************************************************************/ 129 | int free_2d_array 130 | ( 131 | void **array_ptr /* I: Pointer returned by the alloc routine */ 132 | ) 133 | { 134 | if (array_ptr != NULL) 135 | { 136 | /* Convert the array_ptr into a pointer to the structure */ 137 | LSRD_2D_ARRAY *array = GET_ARRAY_STRUCTURE_FROM_PTR (array_ptr); 138 | 139 | /* Verify it is a valid 2D array */ 140 | if (array->signature != SIGNATURE) 141 | { 142 | /* Programming error of sort - exit the program */ 143 | RETURN_ERROR ("Invalid signature on 2D array - memory " 144 | "corruption or programming error?", "free_2d_array", 145 | FAILURE); 146 | } 147 | free (array); 148 | } 149 | 150 | return SUCCESS; 151 | } 152 | -------------------------------------------------------------------------------- /classification/2d_array.h: -------------------------------------------------------------------------------- 1 | #ifndef MISC_2D_ARRAY_H 2 | #define MISC_2D_ARRAY_H 3 | 4 | 5 | #include 6 | 7 | 8 | void **allocate_2d_array 9 | ( 10 | int rows, /* I: Number of rows for the 2D array */ 11 | int columns, /* I: Number of columns for the 2D array */ 12 | size_t member_size /* I: Size of the 2D array element */ 13 | ); 14 | 15 | 16 | int get_2d_array_size 17 | ( 18 | void **array_ptr, /* I: Pointer returned by the alloc routine */ 19 | int *rows, /* O: Pointer to number of rows */ 20 | int *columns /* O: Pointer to number of columns */ 21 | ); 22 | 23 | 24 | int free_2d_array 25 | ( 26 | void **array_ptr /* I: Pointer returned by the alloc routine */ 27 | ); 28 | 29 | 30 | #endif 31 | -------------------------------------------------------------------------------- /classification/Makefile: -------------------------------------------------------------------------------- 1 | BIN ?= ../bin 2 | SCRIPTS = ./scripts 3 | EXE = classification 4 | 5 | SRC_FILES = classRF.c classTree.c rfutils.c cokus.c utilities.c classification.c get_args.c 6 | 7 | CC = gcc 8 | FORTRAN = gfortran # or g77 whichever is present 9 | HDF5INC ?= /usr/include/hdf5 10 | HDF5LIB ?= /usr/lib/x86_64-linux-gnu/hdf5/serial 11 | MATIOLIB ?= /usr/lib/x86_64-linux-gnu 12 | INCDIR = -I. -I$(MATIO_INC) -I$(HDF5INC) 13 | CFLAGS = -fpic -O2 -funroll-loops -march=native -Wall $(INCDIR) 14 | FFLAGS = -O2 -fpic -march=native#-g -Wall 15 | LDFORTRAN = #-gfortran 16 | MEXFLAGS = -O 17 | INC = classification.h utilities.h rf.h 18 | LIB = -lgfortran -lm -DmxCalloc=calloc -DmxFree=free -L$(MATIOLIB) -L$(HDF5LIB) -lmatio -lhdf5 -lhdf5_hl 19 | 20 | all: clean rfsub $(EXE) 21 | 22 | $(EXE): clean rfsub 23 | $(CC) $(CFLAGS) $(INC) $(SRC_FILES) rfsub.o -o $(EXE) $(LIB) 24 | 25 | rfsub: $(SRC)rfsub.f 26 | echo 'Compiling rfsub.f (fortran subroutines)' 27 | $(FORTRAN) $(FFLAGS) -c rfsub.f -o rfsub.o 28 | 29 | install: 30 | mv $(EXE) $(BIN) 31 | 32 | clean: 33 | rm $(BIN)/$(EXE) -rf 34 | rm *~ -rf 35 | rm *.o -rf 36 | -------------------------------------------------------------------------------- /classification/classRF.c: -------------------------------------------------------------------------------- 1 | /************************************************************** 2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest) 3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu ) 4 | * License: GPLv2 5 | * Version: 0.02 6 | * 7 | * File: contains all the supporting code for a standalone C or mex for 8 | * Classification RF. 9 | * Copied all the code from the randomForest 4.5-28 or was it -29? 10 | * 11 | * important changes (other than the many commented out printf's) 12 | * 1. realized that instead of changing individual S_allocs to callocs 13 | * a better way is to emulate them 14 | * 2. found some places where memory is not freed in classRF via valgrind so 15 | * added frees 16 | * 3. made sure that C can now interface with brieman's fortran code so added 17 | * externs "C"'s and the F77_* macros 18 | * 4. added cokus's mersenne twister. 19 | * 20 | *************************************************************/ 21 | 22 | /***************************************************************** 23 | * Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc. 24 | * 25 | * This program is free software; you can redistribute it and/or 26 | * modify it under the terms of the GNU General Public License 27 | * as published by the Free Software Foundation; either version 2 28 | * of the License, or (at your option) any later version. 29 | * 30 | * This program is distributed in the hope that it will be useful, 31 | * but WITHOUT ANY WARRANTY; without even the implied warranty of 32 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 33 | * GNU General Public License for more details. 34 | * 35 | * You should have received a copy of the GNU General Public License 36 | * along with this program; if not, write to the Free Software 37 | * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. 38 | * 39 | * C driver for Breiman & Cutler's random forest code. 40 | * Re-written from the original main program in Fortran. 41 | * Andy Liaw Feb. 7, 2002. 42 | * Modifications to get the forest out Matt Wiener Feb. 26, 2002. 43 | *****************************************************************/ 44 | 45 | #include "stdlib.h" 46 | #include "memory.h" 47 | #include "rf.h" 48 | #include "stdio.h" 49 | #include "math.h" 50 | #include "time.h" 51 | 52 | #ifndef MATLAB 53 | #define Rprintf printf 54 | #endif 55 | 56 | #ifdef MATLAB 57 | #include "mex.h" 58 | #define Rprintf mexPrintf 59 | #endif 60 | 61 | #define F77_CALL(x) x ## _ 62 | #define F77_NAME(x) F77_CALL(x) 63 | #define F77_SUB(x) F77_CALL(x) 64 | 65 | 66 | #define MAX_UINT_COKUS 4294967295 //basically 2^32-1 67 | 68 | typedef unsigned long uint32; 69 | extern void seedMT(uint32 seed); 70 | extern uint32 reloadMT(void); 71 | extern uint32 randomMT(void); 72 | /*extern void F77_NAME(buildtree)(int *a, int *b, int *cl, int *cat, 73 | * int *maxcat, int *mdim, int *nsample, 74 | * int *nclass, int *treemap, int *bestvar, 75 | * int *bestsplit, int *bestsplitnext, 76 | * double *tgini, int *nodestatus, int *nodepop, 77 | * int *nodestart, double *classpop, 78 | * double *tclasspop, double *tclasscat, 79 | * int *ta, int *nrnodes, int *, 80 | * int *, int *, int *, int *, int *, int *, 81 | * double *, double *, double *, 82 | * int *, int *, int *); 83 | */ 84 | 85 | void buildtree_(int *a, int *b, int *cl, int *cat, 86 | int *maxcat, int *mdim, int *nsample, 87 | int *nclass, int *treemap, int *bestvar, 88 | int *bestsplit, int *bestsplitnext, 89 | double *tgini, int *nodestatus, int *nodepop, 90 | int *nodestart, double *classpop, 91 | double *tclasspop, double *tclasscat, 92 | int *ta, int *nrnodes, int *, 93 | int *, int *, int *, int *, int *, int *, 94 | double *, double *, double *, 95 | int *, int *, int *); 96 | 97 | void rrand_(double *r) ; 98 | 99 | double unif_rand(){ 100 | return (((double)randomMT())/((double)MAX_UINT_COKUS)); 101 | } 102 | 103 | void GetRNGstate(){}; 104 | void PutRNGstate(){}; 105 | 106 | void oob(int nsample, int nclass, int *jin, int *cl, int *jtr, int *jerr, 107 | int *counttr, int *out, double *errtr, int *jest, double *cutoff); 108 | 109 | void TestSetError(double *countts, int *jts, int *clts, int *jet, int ntest, 110 | int nclass, int nvote, double *errts, 111 | int labelts, int *nclts, double *cutoff); 112 | 113 | /* Define the R RNG for use from Fortran. */ 114 | #ifdef WIN64 115 | void _rrand_(double *r) { *r = unif_rand(); } 116 | #endif 117 | 118 | #ifndef WIN64 119 | void rrand_(double *r) { *r = unif_rand(); } 120 | #endif 121 | 122 | 123 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat, 124 | int *sampsize, int *strata, int *Options, int *ntree, int *nvar, 125 | int *ipi, double *classwt, double *cut, int *nodesize, 126 | int *outcl, int *counttr, double *prox, 127 | double *imprt, double *impsd, double *impmat, int *nrnodes, 128 | int *ndbigtree, int *nodestatus, int *bestvar, int *treemap, 129 | int *nodeclass, double *xbestsplit, double *errtr, 130 | int *testdat, double *xts, int *clts, int *nts, double *countts, 131 | int *outclts, int labelts, double *proxts, double *errts, 132 | int *inbag, int print_verbose_tree_progression) { 133 | /****************************************************************** 134 | * C wrapper for random forests: get input from R and drive 135 | * the Fortran routines. 136 | * 137 | * Input: 138 | * 139 | * x: matrix of predictors (transposed!) 140 | * dimx: two integers: number of variables and number of cases 141 | * cl: class labels of the data 142 | * ncl: number of classes in the responsema 143 | * cat: integer vector of number of classes in the predictor; 144 | * 1=continuous 145 | * maxcat: maximum of cat 146 | * Options: 7 integers: (0=no, 1=yes) 147 | * add a second class (for unsupervised RF)? 148 | * 1: sampling from product of marginals 149 | * 2: sampling from product of uniforms 150 | * assess variable importance? 151 | * calculate proximity? 152 | * calculate proximity based on OOB predictions? 153 | * calculate outlying measure? 154 | * how often to print output? 155 | * keep the forest for future prediction? 156 | * ntree: number of trees 157 | * nvar: number of predictors to use for each split 158 | * ipi: 0=use class proportion as prob.; 1=use supplied priors 159 | * pi: double vector of class priors 160 | * nodesize: minimum node size: no node with fewer than ndsize 161 | * cases will be split 162 | * 163 | * Output: 164 | * 165 | * outcl: class predicted by RF 166 | * counttr: matrix of votes (transposed!) 167 | * imprt: matrix of variable importance measures 168 | * impmat: matrix of local variable importance measures 169 | * prox: matrix of proximity (if iprox=1) 170 | ******************************************************************/ 171 | 172 | int nsample0, mdim, nclass, addClass, mtry, ntest, nsample, ndsize, 173 | mimp, nimp, near, nuse, noutall, nrightall, nrightimpall, 174 | keepInbag; 175 | int nstrata = 0; 176 | int jb, j, n, m, k, idxByNnode, idxByNsample, imp, localImp, iprox, 177 | oobprox, keepf, replace, stratify, trace, *nright, 178 | *nrightimp, *nout, Ntree; 179 | int *nclts = NULL; 180 | int *out, *bestsplitnext, *bestsplit, *nodepop, *jin, *nodex, 181 | *nodexts, *nodestart, *ta, *ncase, *jerr, *varUsed, 182 | *jtr, *classFreq, *idmove, *jvr, 183 | *at, *a, *b, *mind, *jts; 184 | int *nind = NULL; 185 | int *oobpair = NULL; 186 | int last, ktmp, anyEmpty, ntry; 187 | int **strata_idx = NULL; 188 | int *strata_size = NULL; 189 | double av=0.0; 190 | 191 | double *tgini, *tx, *wl, *classpop, *tclasscat, *tclasspop, *win, 192 | *tp, *wr; 193 | 194 | srand(time(NULL)); 195 | //Do initialization for COKUS's Random generator 196 | seedMT(2*rand()+1); //works well with odd number so why don't use that 197 | 198 | addClass = Options[0]; 199 | imp = Options[1]; 200 | localImp = Options[2]; 201 | iprox = Options[3]; 202 | oobprox = Options[4]; 203 | trace = Options[5]; 204 | keepf = Options[6]; 205 | replace = Options[7]; 206 | stratify = Options[8]; 207 | keepInbag = Options[9]; 208 | mdim = dimx[0]; 209 | nsample0 = dimx[1]; 210 | nclass = (*ncl==1) ? 2 : *ncl; 211 | ndsize = *nodesize; 212 | Ntree = *ntree; 213 | mtry = *nvar; 214 | ntest = *nts; 215 | nsample = addClass ? (nsample0 + nsample0) : nsample0; 216 | mimp = imp ? mdim : 1; 217 | nimp = imp ? nsample : 1; 218 | near = iprox ? nsample0 : 1; 219 | if (trace == 0) trace = Ntree + 1; 220 | 221 | /*printf("\nmdim %d, nclass %d, nrnodes %d, nsample %d, ntest %d\n", mdim, nclass, *nrnodes, nsample, ntest); 222 | printf("\noobprox %d, mdim %d, nsample0 %d, Ntree %d, mtry %d, mimp %d", oobprox, mdim, nsample0, Ntree, mtry, mimp); 223 | printf("\nstratify %d, replace %d",stratify,replace); 224 | printf("\n");*/ 225 | tgini = (double *) mxCalloc(mdim, sizeof(double)); 226 | wl = (double *) mxCalloc(nclass, sizeof(double)); 227 | wr = (double *) mxCalloc(nclass, sizeof(double)); 228 | classpop = (double *) mxCalloc(nclass* *nrnodes, sizeof(double)); 229 | tclasscat = (double *) mxCalloc(nclass*32, sizeof(double)); 230 | tclasspop = (double *) mxCalloc(nclass, sizeof(double)); 231 | tx = (double *) mxCalloc(nsample, sizeof(double)); 232 | win = (double *) mxCalloc(nsample, sizeof(double)); 233 | tp = (double *) mxCalloc(nsample, sizeof(double)); 234 | out = (int *) mxCalloc(nsample, sizeof(int)); 235 | bestsplitnext = (int *) mxCalloc(*nrnodes, sizeof(int)); 236 | bestsplit = (int *) mxCalloc(*nrnodes, sizeof(int)); 237 | nodepop = (int *) mxCalloc(*nrnodes, sizeof(int)); 238 | nodestart = (int *) mxCalloc(*nrnodes, sizeof(int)); 239 | jin = (int *) mxCalloc(nsample, sizeof(int)); 240 | nodex = (int *) mxCalloc(nsample, sizeof(int)); 241 | nodexts = (int *) mxCalloc(ntest, sizeof(int)); 242 | ta = (int *) mxCalloc(nsample, sizeof(int)); 243 | ncase = (int *) mxCalloc(nsample, sizeof(int)); 244 | jerr = (int *) mxCalloc(nsample, sizeof(int)); 245 | varUsed = (int *) mxCalloc(mdim, sizeof(int)); 246 | jtr = (int *) mxCalloc(nsample, sizeof(int)); 247 | jvr = (int *) mxCalloc(nsample, sizeof(int)); 248 | classFreq = (int *) mxCalloc(nclass, sizeof(int)); 249 | jts = (int *) mxCalloc(ntest, sizeof(int)); 250 | idmove = (int *) mxCalloc(nsample, sizeof(int)); 251 | at = (int *) mxCalloc(mdim*nsample, sizeof(int)); 252 | a = (int *) mxCalloc(mdim*nsample, sizeof(int)); 253 | b = (int *) mxCalloc(mdim*nsample, sizeof(int)); 254 | mind = (int *) mxCalloc(mdim, sizeof(int)); 255 | nright = (int *) mxCalloc(nclass, sizeof(int)); 256 | nrightimp = (int *) mxCalloc(nclass, sizeof(int)); 257 | nout = (int *) mxCalloc(nclass, sizeof(int)); 258 | if (oobprox) { 259 | oobpair = (int *) mxCalloc(near*near, sizeof(int)); 260 | } 261 | //printf("nsample=%d mdim=%d nclass=%d nsample0=%d nsample=%d ntest=%d\n", nsample, mdim,nclass, nsample0, nsample, ntest); 262 | /* Count number of cases in each class. */ 263 | zeroInt(classFreq, nclass); 264 | for (n = 0; n < nsample; ++n) classFreq[cl[n] - 1] ++; 265 | /* Normalize class weights. */ 266 | //Rprintf("ipi %d ",*ipi); 267 | //for(n=0;n nstrata) nstrata = strata[n]; 276 | /* Create the array of pointers, each pointing to a vector 277 | * of indices of where data of each stratum is. */ 278 | strata_size = (int *) mxCalloc(nstrata, sizeof(int)); 279 | for (n = 0; n < nsample0; ++n) { 280 | strata_size[strata[n] - 1] ++; 281 | } 282 | strata_idx = (int **) mxCalloc(nstrata, sizeof(int *)); 283 | for (n = 0; n < nstrata; ++n) { 284 | strata_idx[n] = (int *) mxCalloc(strata_size[n], sizeof(int)); 285 | } 286 | zeroInt(strata_size, nstrata); 287 | for (n = 0; n < nsample0; ++n) { 288 | strata_size[strata[n] - 1] ++; 289 | strata_idx[strata[n] - 1][strata_size[strata[n] - 1] - 1] = n; 290 | } 291 | } else { 292 | nind = replace ? NULL : (int *) mxCalloc(nsample, sizeof(int)); 293 | } 294 | 295 | /* INITIALIZE FOR RUN */ 296 | if (*testdat) zeroDouble(countts, ntest * nclass); 297 | zeroInt(counttr, nclass * nsample); 298 | zeroInt(out, nsample); 299 | zeroDouble(tgini, mdim); 300 | zeroDouble(errtr, (nclass + 1) * Ntree); 301 | 302 | if (labelts) { 303 | nclts = (int *) mxCalloc(nclass, sizeof(int)); 304 | for (n = 0; n < ntest; ++n) nclts[clts[n]-1]++; 305 | zeroDouble(errts, (nclass + 1) * Ntree); 306 | } 307 | //printf("labelts %d\n",labelts);fflush(stdout); 308 | if (imp) { 309 | zeroDouble(imprt, (nclass+2) * mdim); 310 | zeroDouble(impsd, (nclass+1) * mdim); 311 | if (localImp) zeroDouble(impmat, nsample * mdim); 312 | } 313 | if (iprox) { 314 | zeroDouble(prox, nsample0 * nsample0); 315 | if (*testdat) zeroDouble(proxts, ntest * (ntest + nsample0)); 316 | } 317 | makeA(x, mdim, nsample, cat, at, b); 318 | 319 | //R_CheckUserInterrupt(); 320 | 321 | 322 | /* Starting the main loop over number of trees. */ 323 | GetRNGstate(); 324 | if (trace <= Ntree) { 325 | /* Print header for running output. */ 326 | Rprintf("ntree OOB"); 327 | for (n = 1; n <= nclass; ++n) Rprintf("%7i", n); 328 | if (labelts) { 329 | Rprintf("| Test"); 330 | for (n = 1; n <= nclass; ++n) Rprintf("%7i", n); 331 | } 332 | Rprintf("\n"); 333 | } 334 | idxByNnode = 0; 335 | idxByNsample = 0; 336 | 337 | // time_t curr_time; 338 | //Rprintf("addclass %d, ntree %d, cl[300]=%d", addClass,Ntree,cl[299]); 339 | for(jb = 0; jb < Ntree; jb++) { 340 | //Rprintf("addclass %d, ntree %d, cl[300]=%d", addClass,Ntree,cl[299]); 341 | //printf("jb=%d,\n",jb); 342 | /* Do we need to simulate data for the second class? */ 343 | if (addClass) createClass(x, nsample0, nsample, mdim); 344 | do { 345 | zeroInt(nodestatus + idxByNnode, *nrnodes); 346 | zeroInt(treemap + 2*idxByNnode, 2 * *nrnodes); 347 | zeroDouble(xbestsplit + idxByNnode, *nrnodes); 348 | zeroInt(nodeclass + idxByNnode, *nrnodes); 349 | zeroInt(varUsed, mdim); 350 | /* TODO: Put all sampling code into a function. */ 351 | /* drawSample(sampsize, nsample, ); */ 352 | if (stratify) { /* stratified sampling */ 353 | zeroInt(jin, nsample); 354 | zeroDouble(tclasspop, nclass); 355 | zeroDouble(win, nsample); 356 | if (replace) { /* with replacement */ 357 | for (n = 0; n < nstrata; ++n) { 358 | for (j = 0; j < sampsize[n]; ++j) { 359 | ktmp = (int) (unif_rand() * strata_size[n]); 360 | k = strata_idx[n][ktmp]; 361 | tclasspop[cl[k] - 1] += classwt[cl[k] - 1]; 362 | win[k] += classwt[cl[k] - 1]; 363 | jin[k] = 1; 364 | } 365 | } 366 | } else { /* stratified sampling w/o replacement */ 367 | /* re-initialize the index array */ 368 | zeroInt(strata_size, nstrata); 369 | for (j = 0; j < nsample; ++j) { 370 | strata_size[strata[j] - 1] ++; 371 | strata_idx[strata[j] - 1][strata_size[strata[j] - 1] - 1] = j; 372 | } 373 | /* sampling without replacement */ 374 | for (n = 0; n < nstrata; ++n) { 375 | last = strata_size[n] - 1; 376 | for (j = 0; j < sampsize[n]; ++j) { 377 | ktmp = (int) (unif_rand() * (last+1)); 378 | k = strata_idx[n][ktmp]; 379 | swapInt(strata_idx[n][last], strata_idx[n][ktmp]); 380 | last--; 381 | tclasspop[cl[k] - 1] += classwt[cl[k]-1]; 382 | win[k] += classwt[cl[k]-1]; 383 | jin[k] = 1; 384 | } 385 | } 386 | } 387 | } else { /* unstratified sampling */ 388 | anyEmpty = 0; 389 | ntry = 0; 390 | do { 391 | zeroInt(jin, nsample); 392 | zeroDouble(tclasspop, nclass); 393 | zeroDouble(win, nsample); 394 | if (replace) { 395 | for (n = 0; n < *sampsize; ++n) { 396 | k = unif_rand() * nsample; 397 | tclasspop[cl[k] - 1] += classwt[cl[k]-1]; 398 | win[k] += classwt[cl[k]-1]; 399 | jin[k] = 1; 400 | } 401 | } else { 402 | for (n = 0; n < nsample; ++n) nind[n] = n; 403 | last = nsample - 1; 404 | for (n = 0; n < *sampsize; ++n) { 405 | ktmp = (int) (unif_rand() * (last+1)); 406 | k = nind[ktmp]; 407 | swapInt(nind[ktmp], nind[last]); 408 | last--; 409 | tclasspop[cl[k] - 1] += classwt[cl[k]-1]; 410 | win[k] += classwt[cl[k]-1]; 411 | jin[k] = 1; 412 | } 413 | } 414 | /* check if any class is missing in the sample */ 415 | for (n = 0; n < nclass; ++n) { 416 | if (tclasspop[n] == 0) anyEmpty = 1; 417 | } 418 | ntry++; 419 | } while (anyEmpty && ntry <= 10); 420 | } 421 | 422 | /* If need to keep indices of inbag data, do that here. */ 423 | if (keepInbag) { 424 | for (n = 0; n < nsample0; ++n) { 425 | inbag[n + idxByNsample] = jin[n]; 426 | } 427 | } 428 | 429 | /* Copy the original a matrix back. */ 430 | memcpy(a, at, sizeof(int) * mdim * nsample); 431 | modA(a, &nuse, nsample, mdim, cat, *maxcat, ncase, jin); 432 | 433 | #ifdef WIN64 434 | F77_CALL(_buildtree) 435 | #endif 436 | 437 | #ifndef WIN64 438 | F77_CALL(buildtree) 439 | #endif 440 | (a, b, cl, cat, maxcat, &mdim, &nsample, 441 | &nclass, 442 | treemap + 2*idxByNnode, bestvar + idxByNnode, 443 | bestsplit, bestsplitnext, tgini, 444 | nodestatus + idxByNnode, nodepop, 445 | nodestart, classpop, tclasspop, tclasscat, 446 | ta, nrnodes, idmove, &ndsize, ncase, 447 | &mtry, varUsed, nodeclass + idxByNnode, 448 | ndbigtree + jb, win, wr, wl, &mdim, 449 | &nuse, mind); 450 | /* if the "tree" has only the root node, start over */ 451 | } while (ndbigtree[jb] < 1); 452 | 453 | Xtranslate(x, mdim, *nrnodes, nsample, bestvar + idxByNnode, 454 | bestsplit, bestsplitnext, xbestsplit + idxByNnode, 455 | nodestatus + idxByNnode, cat, ndbigtree[jb]); 456 | 457 | /* Get test set error */ 458 | if (*testdat) { 459 | predictClassTree(xts, ntest, mdim, treemap + 2*idxByNnode, 460 | nodestatus + idxByNnode, xbestsplit + idxByNnode, 461 | bestvar + idxByNnode, 462 | nodeclass + idxByNnode, ndbigtree[jb], 463 | cat, nclass, jts, nodexts, *maxcat); 464 | TestSetError(countts, jts, clts, outclts, ntest, nclass, jb+1, 465 | errts + jb*(nclass+1), labelts, nclts, cut); 466 | } 467 | 468 | /* Get out-of-bag predictions and errors. */ 469 | predictClassTree(x, nsample, mdim, treemap + 2*idxByNnode, 470 | nodestatus + idxByNnode, xbestsplit + idxByNnode, 471 | bestvar + idxByNnode, 472 | nodeclass + idxByNnode, ndbigtree[jb], 473 | cat, nclass, jtr, nodex, *maxcat); 474 | 475 | zeroInt(nout, nclass); 476 | noutall = 0; 477 | for (n = 0; n < nsample; ++n) { 478 | if (jin[n] == 0) { 479 | /* increment the OOB votes */ 480 | counttr[n*nclass + jtr[n] - 1] ++; 481 | /* count number of times a case is OOB */ 482 | out[n]++; 483 | /* count number of OOB cases in the current iteration. 484 | * nout[n] is the number of OOB cases for the n-th class. 485 | * noutall is the number of OOB cases overall. */ 486 | nout[cl[n] - 1]++; 487 | noutall++; 488 | } 489 | } 490 | 491 | /* Compute out-of-bag error rate. */ 492 | oob(nsample, nclass, jin, cl, jtr, jerr, counttr, out, 493 | errtr + jb*(nclass+1), outcl, cut); 494 | 495 | if ((jb+1) % trace == 0) { 496 | Rprintf("%5i: %6.2f%%", jb+1, 100.0*errtr[jb * (nclass+1)]); 497 | for (n = 1; n <= nclass; ++n) { 498 | Rprintf("%6.2f%%", 100.0 * errtr[n + jb * (nclass+1)]); 499 | } 500 | if (labelts) { 501 | Rprintf("| "); 502 | for (n = 0; n <= nclass; ++n) { 503 | Rprintf("%6.2f%%", 100.0 * errts[n + jb * (nclass+1)]); 504 | } 505 | } 506 | Rprintf("\n"); 507 | 508 | //R_CheckUserInterrupt(); 509 | } 510 | 511 | /* DO VARIABLE IMPORTANCE */ 512 | if (imp) { 513 | nrightall = 0; 514 | /* Count the number of correct prediction by the current tree 515 | * among the OOB samples, by class. */ 516 | zeroInt(nright, nclass); 517 | for (n = 0; n < nsample; ++n) { 518 | /* out-of-bag and predicted correctly: */ 519 | if (jin[n] == 0 && jtr[n] == cl[n]) { 520 | nright[cl[n] - 1]++; 521 | nrightall++; 522 | } 523 | } 524 | for (m = 0; m < mdim; ++m) { 525 | if (varUsed[m]) { 526 | nrightimpall = 0; 527 | zeroInt(nrightimp, nclass); 528 | for (n = 0; n < nsample; ++n) tx[n] = x[m + n*mdim]; 529 | /* Permute the m-th variable. */ 530 | permuteOOB(m, x, jin, nsample, mdim); 531 | /* Predict the modified data using the current tree. */ 532 | predictClassTree(x, nsample, mdim, treemap + 2*idxByNnode, 533 | nodestatus + idxByNnode, 534 | xbestsplit + idxByNnode, 535 | bestvar + idxByNnode, 536 | nodeclass + idxByNnode, ndbigtree[jb], 537 | cat, nclass, jvr, nodex, *maxcat); 538 | /* Count how often correct predictions are made with 539 | * the modified data. */ 540 | for (n = 0; n < nsample; n++) { 541 | if (jin[n] == 0) { 542 | if (jvr[n] == cl[n]) { 543 | nrightimp[cl[n] - 1]++; 544 | nrightimpall++; 545 | } 546 | if (localImp && jvr[n] != jtr[n]) { 547 | if (cl[n] == jvr[n]) { 548 | impmat[m + n*mdim] -= 1.0; 549 | } else { 550 | impmat[m + n*mdim] += 1.0; 551 | } 552 | } 553 | } 554 | /* Restore the original data for that variable. */ 555 | x[m + n*mdim] = tx[n]; 556 | } 557 | /* Accumulate decrease in proportions of correct 558 | * predictions. */ 559 | for (n = 0; n < nclass; ++n) { 560 | if (nout[n] > 0) { 561 | imprt[m + n*mdim] += 562 | ((double) (nright[n] - nrightimp[n])) / 563 | nout[n]; 564 | impsd[m + n*mdim] += 565 | ((double) (nright[n] - nrightimp[n]) * 566 | (nright[n] - nrightimp[n])) / nout[n]; 567 | } 568 | } 569 | if (noutall > 0) { 570 | imprt[m + nclass*mdim] += 571 | ((double)(nrightall - nrightimpall)) / noutall; 572 | impsd[m + nclass*mdim] += 573 | ((double) (nrightall - nrightimpall) * 574 | (nrightall - nrightimpall)) / noutall; 575 | } 576 | } 577 | } 578 | } 579 | 580 | /* DO PROXIMITIES */ 581 | if (iprox) { 582 | computeProximity(prox, oobprox, nodex, jin, oobpair, near); 583 | /* proximity for test data */ 584 | if (*testdat) { 585 | computeProximity(proxts, 0, nodexts, jin, oobpair, ntest); 586 | // /* Compute proximity between testset and training set. */ 587 | for (n = 0; n < ntest; ++n) { 588 | for (k = 0; k < near; ++k) { 589 | if (nodexts[n] == nodex[k]) 590 | proxts[n + ntest * (k+ntest)] += 1.0; 591 | } 592 | } 593 | } 594 | } 595 | 596 | if (keepf) idxByNnode += *nrnodes; 597 | if (keepInbag) idxByNsample += nsample0; 598 | 599 | if(print_verbose_tree_progression){ 600 | #ifdef MATLAB 601 | time(&curr_time); 602 | mexPrintf("tree num %d created at %s", jb, ctime(&curr_time));mexEvalString("drawnow;"); 603 | #endif 604 | } 605 | } 606 | PutRNGstate(); 607 | 608 | 609 | /* Final processing of variable importance. */ 610 | for (m = 0; m < mdim; m++) tgini[m] /= Ntree; 611 | 612 | if (imp) { 613 | for (m = 0; m < mdim; ++m) { 614 | if (localImp) { /* casewise measures */ 615 | for (n = 0; n < nsample; ++n) impmat[m + n*mdim] /= out[n]; 616 | } 617 | /* class-specific measures */ 618 | for (k = 0; k < nclass; ++k) { 619 | av = imprt[m + k*mdim] / Ntree; 620 | impsd[m + k*mdim] = 621 | sqrt(((impsd[m + k*mdim] / Ntree) - av*av) / Ntree); 622 | imprt[m + k*mdim] = av; 623 | /* imprt[m + k*mdim] = (se <= 0.0) ? -1000.0 - av : av / se; */ 624 | } 625 | /* overall measures */ 626 | av = imprt[m + nclass*mdim] / Ntree; 627 | impsd[m + nclass*mdim] = 628 | sqrt(((impsd[m + nclass*mdim] / Ntree) - av*av) / Ntree); 629 | imprt[m + nclass*mdim] = av; 630 | imprt[m + (nclass+1)*mdim] = tgini[m]; 631 | } 632 | } else { 633 | for (m = 0; m < mdim; ++m) imprt[m] = tgini[m]; 634 | } 635 | 636 | /* PROXIMITY DATA ++++++++++++++++++++++++++++++++*/ 637 | if (iprox) { 638 | for (n = 0; n < near; ++n) { 639 | for (k = n + 1; k < near; ++k) { 640 | prox[near*k + n] /= oobprox ? 641 | (oobpair[near*k + n] > 0 ? oobpair[near*k + n] : 1) : 642 | Ntree; 643 | prox[near*n + k] = prox[near*k + n]; 644 | } 645 | prox[near*n + n] = 1.0; 646 | } 647 | if (*testdat) { 648 | for (n = 0; n < ntest; ++n){ 649 | for (k = 0; k < ntest + nsample; ++k) 650 | proxts[ntest*k + n] /= Ntree; 651 | proxts[ntest*n + n]=1.0; 652 | } 653 | } 654 | } 655 | if (trace <= Ntree){ 656 | printf("\nmdim %d, nclass %d, nrnodes %d, nsample %d, ntest %d\n", mdim, nclass, *nrnodes, nsample, ntest); 657 | printf("\noobprox %d, mdim %d, nsample0 %d, Ntree %d, mtry %d, mimp %d", oobprox, mdim, nsample0, Ntree, mtry, mimp); 658 | printf("\nstratify %d, replace %d",stratify,replace); 659 | printf("\n"); 660 | } 661 | 662 | //frees up the memory 663 | mxFree(tgini); 664 | mxFree(wl); 665 | mxFree(wr); 666 | mxFree(classpop); 667 | mxFree(tclasscat); 668 | mxFree(tclasspop); 669 | mxFree(tx); 670 | mxFree(win); 671 | mxFree(tp); 672 | mxFree(out); 673 | 674 | mxFree(bestsplitnext); 675 | mxFree(bestsplit); 676 | mxFree(nodepop); 677 | mxFree(nodestart); 678 | mxFree(jin); 679 | mxFree(nodex); 680 | mxFree(nodexts); 681 | mxFree(ta); 682 | mxFree(ncase); 683 | mxFree(jerr); 684 | 685 | mxFree(varUsed); 686 | mxFree(jtr); 687 | mxFree(jvr); 688 | mxFree(classFreq); 689 | mxFree(jts); 690 | mxFree(idmove); 691 | mxFree(at); 692 | mxFree(a); 693 | mxFree(b); 694 | mxFree(mind); 695 | 696 | mxFree(nright); 697 | mxFree(nrightimp); 698 | mxFree(nout); 699 | 700 | if (oobprox) { 701 | mxFree(oobpair); 702 | } 703 | 704 | if (stratify) { 705 | mxFree(strata_size); 706 | for (n = 0; n < nstrata; ++n) { 707 | mxFree(strata_idx[n]); 708 | } 709 | mxFree(strata_idx); 710 | } else { 711 | if (replace) 712 | mxFree(nind); 713 | } 714 | //printf("labelts %d\n",labelts);fflush(stdout); 715 | fflush(stdout); 716 | if (labelts) { 717 | mxFree(nclts); 718 | } 719 | //printf("stratify %d",stratify);fflush(stdout); 720 | } 721 | 722 | 723 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat, 724 | int *nrnodes, int *ntree, double *x, double *xbestsplit, 725 | double *pid, double *cutoff, double *countts, int *treemap, 726 | int *nodestatus, int *cat, int *nodeclass, int *jts, 727 | int *jet, int *bestvar, int *node, int *treeSize, 728 | int *keepPred, int *prox, double *proxMat, int *nodes) { 729 | int j, n, n1, n2, idxNodes, offset1, offset2, *junk, ntie; 730 | double crit, cmax; 731 | 732 | zeroDouble(countts, *nclass * *ntest); 733 | idxNodes = 0; 734 | offset1 = 0; 735 | offset2 = 0; 736 | junk = NULL; 737 | 738 | //Rprintf("nclass %d\n", *nclass); 739 | for (j = 0; j < *ntree; ++j) { 740 | //Rprintf("pCT nclass %d \n", *nclass); 741 | /* predict by the j-th tree */ 742 | predictClassTree(x, *ntest, *mdim, treemap + 2*idxNodes, 743 | nodestatus + idxNodes, xbestsplit + idxNodes, 744 | bestvar + idxNodes, nodeclass + idxNodes, 745 | treeSize[j], cat, *nclass, 746 | jts + offset1, node + offset2, *maxcat); 747 | 748 | /* accumulate votes: */ 749 | for (n = 0; n < *ntest; ++n) { 750 | countts[jts[n + offset1] - 1 + n * *nclass] += 1.0; 751 | } 752 | 753 | /* if desired, do proximities for this round */ 754 | if (*prox) computeProximity(proxMat, 0, node + offset2, junk, junk, 755 | *ntest); 756 | idxNodes += *nrnodes; 757 | if (*keepPred) offset1 += *ntest; 758 | if (*nodes) offset2 += *ntest; 759 | } 760 | 761 | //Rprintf("ntest %d\n", *ntest); 762 | /* Aggregated prediction is the class with the maximum votes/cutoff */ 763 | for (n = 0; n < *ntest; ++n) { 764 | //Rprintf("Ap: ntest %d\n", *ntest); 765 | cmax = 0.0; 766 | ntie = 1; 767 | for (j = 0; j < *nclass; ++j) { 768 | crit = (countts[j + n * *nclass] / *ntree) / cutoff[j]; 769 | if (crit > cmax) { 770 | jet[n] = j + 1; 771 | cmax = crit; 772 | } 773 | /* Break ties at random: */ 774 | if (crit == cmax) { 775 | ntie++; 776 | if (unif_rand() > 1.0 / ntie) jet[n] = j + 1; 777 | } 778 | } 779 | } 780 | 781 | //Rprintf("ntest %d\n", *ntest); 782 | /* if proximities requested, do the final adjustment 783 | * (division by number of trees) */ 784 | 785 | //Rprintf("prox %d",*prox); 786 | if (*prox) { 787 | //Rprintf("prox: ntest %d\n", *ntest); 788 | for (n1 = 0; n1 < *ntest; ++n1) { 789 | for (n2 = n1 + 1; n2 < *ntest; ++n2) { 790 | proxMat[n1 + n2 * *ntest] /= *ntree; 791 | proxMat[n2 + n1 * *ntest] = proxMat[n1 + n2 * *ntest]; 792 | } 793 | proxMat[n1 + n1 * *ntest] = 1.0; 794 | } 795 | } 796 | //Rprintf("END ntest %d\n", *ntest); 797 | 798 | } 799 | 800 | /* 801 | * Modified by A. Liaw 1/10/2003 (Deal with cutoff) 802 | * Re-written in C by A. Liaw 3/08/2004 803 | */ 804 | void oob(int nsample, int nclass, int *jin, int *cl, int *jtr, int *jerr, 805 | int *counttr, int *out, double *errtr, int *jest, 806 | double *cutoff) { 807 | int j, n, noob, *noobcl, ntie; 808 | double qq, smax, smaxtr; 809 | 810 | noobcl = (int *) mxCalloc(nclass, sizeof(int)); 811 | zeroInt(jerr, nsample); 812 | zeroDouble(errtr, nclass+1); 813 | 814 | noob = 0; 815 | for (n = 0; n < nsample; ++n) { 816 | if (out[n]) { 817 | noob++; 818 | noobcl[cl[n]-1]++; 819 | smax = 0.0; 820 | smaxtr = 0.0; 821 | ntie = 1; 822 | for (j = 0; j < nclass; ++j) { 823 | qq = (((double) counttr[j + n*nclass]) / out[n]) / cutoff[j]; 824 | if (j+1 != cl[n]) smax = (qq > smax) ? qq : smax; 825 | /* if vote / cutoff is larger than current max, re-set max and 826 | * change predicted class to the current class */ 827 | if (qq > smaxtr) { 828 | smaxtr = qq; 829 | jest[n] = j+1; 830 | } 831 | /* break tie at random */ 832 | if (qq == smaxtr) { 833 | ntie++; 834 | if (unif_rand() > 1.0 / ntie) { 835 | smaxtr = qq; 836 | jest[n] = j+1; 837 | } 838 | } 839 | } 840 | if (jest[n] != cl[n]) { 841 | errtr[cl[n]] += 1.0; 842 | errtr[0] += 1.0; 843 | jerr[n] = 1; 844 | } 845 | } 846 | } 847 | errtr[0] /= noob; 848 | for (n = 1; n <= nclass; ++n) errtr[n] /= noobcl[n-1]; 849 | mxFree(noobcl); 850 | } 851 | 852 | 853 | void TestSetError(double *countts, int *jts, int *clts, int *jet, int ntest, 854 | int nclass, int nvote, double *errts, 855 | int labelts, int *nclts, double *cutoff) { 856 | int j, n, ntie; 857 | double cmax, crit; 858 | 859 | for (n = 0; n < ntest; ++n) countts[jts[n]-1 + n*nclass] += 1.0; 860 | 861 | /* Prediction is the class with the maximum votes */ 862 | for (n = 0; n < ntest; ++n) { 863 | cmax=0.0; 864 | ntie = 1; 865 | for (j = 0; j < nclass; ++j) { 866 | crit = (countts[j + n*nclass] / nvote) / cutoff[j]; 867 | if (crit > cmax) { 868 | jet[n] = j+1; 869 | cmax = crit; 870 | } 871 | /* Break ties at random: */ 872 | if (crit == cmax) { 873 | ntie++; 874 | if (unif_rand() > 1.0 / ntie) { 875 | jet[n] = j+1; 876 | cmax = crit; 877 | } 878 | } 879 | } 880 | } 881 | if (labelts) { 882 | zeroDouble(errts, nclass + 1); 883 | for (n = 0; n < ntest; ++n) { 884 | if (jet[n] != clts[n]) { 885 | errts[0] += 1.0; 886 | errts[clts[n]] += 1.0; 887 | } 888 | } 889 | errts[0] /= ntest; 890 | for (n = 1; n <= nclass; ++n) errts[n] /= nclts[n-1]; 891 | } 892 | } 893 | 894 | -------------------------------------------------------------------------------- /classification/classTree.c: -------------------------------------------------------------------------------- 1 | /************************************************************** 2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest) 3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu ) 4 | * License: GPLv2 5 | * Version: 0.02 6 | * 7 | * File: contains all the other supporting code for a standalone C or mex for 8 | * Classification RF. 9 | * Copied all the code from the randomForest 4.5-28 or was it -29? 10 | * added externs "C"'s and the F77_* macros 11 | * 12 | *************************************************************/ 13 | 14 | /******************************************************************* 15 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc. 16 | 17 | This program is free software; you can redistribute it and/or 18 | modify it under the terms of the GNU General Public License 19 | as published by the Free Software Foundation; either version 2 20 | of the License, or (at your option) any later version. 21 | 22 | This program is distributed in the hope that it will be useful, 23 | but WITHOUT ANY WARRANTY; without even the implied warranty of 24 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 25 | GNU General Public License for more details. 26 | *******************************************************************/ 27 | #include "rf.h" 28 | #include "memory.h" 29 | #include "stdlib.h" 30 | #include "math.h" 31 | 32 | #ifdef MATLAB 33 | #define Rprintf mexPrintf 34 | #include "mex.h" 35 | #endif 36 | 37 | #ifndef MATLAB 38 | #define Rprintf printf 39 | #include "stdio.h" 40 | #endif 41 | 42 | typedef unsigned long uint32; 43 | extern void seedMT(uint32 seed); 44 | extern uint32 reloadMT(void); 45 | extern uint32 randomMT(void); 46 | extern double unif_rand(); 47 | extern void R_qsort_I(double *v, int *I, int i, int j); 48 | 49 | void catmax_(double *parentDen, double *tclasscat, 50 | double *tclasspop, int *nclass, int *lcat, 51 | int *ncatsp, double *critmax, int *nhit, 52 | int *maxcat, int *ncmax, int *ncsplit); 53 | 54 | void catmaxb_(double *totalWt, double *tclasscat, double *classCount, 55 | int *nclass, int *nCat, int *nbest, double *critmax, 56 | int *nhit, double *catCount) ; 57 | 58 | #ifdef WIN64 59 | void F77_NAME(_catmax) 60 | #endif 61 | 62 | #ifndef WIN64 63 | void F77_NAME(catmax) 64 | #endif 65 | (double *parentDen, double *tclasscat, 66 | double *tclasspop, int *nclass, int *lcat, 67 | int *ncatsp, double *critmax, int *nhit, 68 | int *maxcat, int *ncmax, int *ncsplit) { 69 | /* This finds the best split of a categorical variable with lcat 70 | categories and nclass classes, where tclasscat(j, k) is the number 71 | of cases in class j with category value k. The method uses an 72 | exhaustive search over all partitions of the category values if the 73 | number of categories is 10 or fewer. Otherwise ncsplit randomly 74 | selected splits are tested and best used. */ 75 | int j, k, n, icat[32], nsplit; 76 | double leftNum, leftDen, rightNum, decGini, *leftCatClassCount; 77 | 78 | leftCatClassCount = (double *) mxCalloc(*nclass, sizeof(double)); 79 | *nhit = 0; 80 | nsplit = *lcat > *ncmax ? 81 | *ncsplit : (int) pow(2.0, (double) *lcat - 1) - 1; 82 | 83 | for (n = 0; n < nsplit; ++n) { 84 | zeroInt(icat, 32); 85 | if (*lcat > *ncmax) { 86 | /* Generate random split. 87 | TODO: consider changing to generating random bits with more 88 | efficient algorithm */ 89 | for (j = 0; j < *lcat; ++j) icat[j] = unif_rand() > 0.5 ? 1 : 0; 90 | } else { 91 | unpack((unsigned int) n + 1, icat); 92 | } 93 | for (j = 0; j < *nclass; ++j) { 94 | leftCatClassCount[j] = 0; 95 | for (k = 0; k < *lcat; ++k) { 96 | if (icat[k]) { 97 | leftCatClassCount[j] += tclasscat[j + k * *nclass]; 98 | } 99 | } 100 | } 101 | leftNum = 0.0; 102 | leftDen = 0.0; 103 | for (j = 0; j < *nclass; ++j) { 104 | leftNum += leftCatClassCount[j] * leftCatClassCount[j]; 105 | leftDen += leftCatClassCount[j]; 106 | } 107 | /* If either node is empty, try another split. */ 108 | if (leftDen <= 1.0e-8 || *parentDen - leftDen <= 1.0e-5) continue; 109 | rightNum = 0.0; 110 | for (j = 0; j < *nclass; ++j) { 111 | leftCatClassCount[j] = tclasspop[j] - leftCatClassCount[j]; 112 | rightNum += leftCatClassCount[j] * leftCatClassCount[j]; 113 | } 114 | decGini = (leftNum / leftDen) + (rightNum / (*parentDen - leftDen)); 115 | if (decGini > *critmax) { 116 | *critmax = decGini; 117 | *nhit = 1; 118 | *ncatsp = *lcat > *ncmax ? pack((unsigned int) *lcat, icat) : n + 1; 119 | } 120 | } 121 | mxFree(leftCatClassCount); 122 | } 123 | 124 | 125 | 126 | /* Find best split of with categorical variable when there are two classes */ 127 | #ifdef WIN64 128 | void F77_NAME(_catmaxb) 129 | #endif 130 | #ifndef WIN64 131 | void F77_NAME(catmaxb) 132 | #endif 133 | (double *totalWt, double *tclasscat, double *classCount, 134 | int *nclass, int *nCat, int *nbest, double *critmax, 135 | int *nhit, double *catCount) { 136 | 137 | double catProportion[32], cp[32], cm[32]; 138 | int kcat[32]; 139 | int i, j; 140 | double bestsplit=0.0, rightDen, leftDen, leftNum, rightNum, crit; 141 | 142 | *nhit = 0; 143 | for (i = 0; i < *nCat; ++i) { 144 | catProportion[i] = catCount[i] ? 145 | tclasscat[i * *nclass] / catCount[i] : 0.0; 146 | kcat[i] = i + 1; 147 | } 148 | R_qsort_I(catProportion, kcat, 1, *nCat); 149 | for (i = 0; i < *nclass; ++i) { 150 | cp[i] = 0; 151 | cm[i] = classCount[i]; 152 | } 153 | rightDen = *totalWt; 154 | leftDen = 0.0; 155 | for (i = 0; i < *nCat - 1; ++i) { 156 | leftDen += catCount[kcat[i]-1]; 157 | rightDen -= catCount[kcat[i]-1]; 158 | leftNum = 0.0; 159 | rightNum = 0.0; 160 | for (j = 0; j < *nclass; ++j) { 161 | cp[j] += tclasscat[j + (kcat[i]-1) * *nclass]; 162 | cm[j] -= tclasscat[j + (kcat[i]-1) * *nclass]; 163 | leftNum += cp[j] * cp[j]; 164 | rightNum += cm[j] * cm[j]; 165 | } 166 | if (catProportion[i] < catProportion[i + 1]) { 167 | /* If neither node is empty, check the split. */ 168 | if (rightDen > 1.0e-5 && leftDen > 1.0e-5) { 169 | crit = (leftNum / leftDen) + (rightNum / rightDen); 170 | if (crit > *critmax) { 171 | *critmax = crit; 172 | bestsplit = .5 * (catProportion[i] + catProportion[i + 1]); 173 | *nhit = 1; 174 | } 175 | } 176 | } 177 | } 178 | if (*nhit == 1) { 179 | zeroInt(kcat, *nCat); 180 | for (i = 0; i < *nCat; ++i) { 181 | catProportion[i] = catCount[i] ? 182 | tclasscat[i * *nclass] / catCount[i] : 0.0; 183 | kcat[i] = catProportion[i] < bestsplit ? 1 : 0; 184 | } 185 | *nbest = pack(*nCat, kcat); 186 | } 187 | } 188 | 189 | 190 | 191 | void predictClassTree(double *x, int n, int mdim, int *treemap, 192 | int *nodestatus, double *xbestsplit, 193 | int *bestvar, int *nodeclass, 194 | int treeSize, int *cat, int nclass, 195 | int *jts, int *nodex, int maxcat) { 196 | int m, i, j, k; 197 | int *cbestsplit = NULL; 198 | unsigned int npack; 199 | 200 | //Rprintf("maxcat %d\n",maxcat); 201 | /* decode the categorical splits */ 202 | if (maxcat > 1) { 203 | cbestsplit = (int *) mxCalloc(maxcat * treeSize, sizeof(int)); 204 | zeroInt(cbestsplit, maxcat * treeSize); 205 | for (i = 0; i < treeSize; ++i) { 206 | if (nodestatus[i] != NODE_TERMINAL) { 207 | if (cat[bestvar[i] - 1] > 1) { 208 | npack = (unsigned int) xbestsplit[i]; 209 | /* unpack `npack' into bits */ 210 | for (j = 0; npack; npack >>= 1, ++j) { 211 | cbestsplit[j + i*maxcat] = npack & 01; 212 | } 213 | } 214 | } 215 | } 216 | } 217 | for (i = 0; i < n; ++i) { 218 | k = 0; 219 | while (nodestatus[k] != NODE_TERMINAL) { 220 | m = bestvar[k] - 1; 221 | if (cat[m] == 1) { 222 | /* Split by a numerical predictor */ 223 | k = (x[m + i * mdim] <= xbestsplit[k]) ? 224 | treemap[k * 2] - 1 : treemap[1 + k * 2] - 1; 225 | } else { 226 | /* Split by a categorical predictor */ 227 | k = cbestsplit[(int) x[m + i * mdim] - 1 + k * maxcat] ? 228 | treemap[k * 2] - 1 : treemap[1 + k * 2] - 1; 229 | } 230 | } 231 | /* Terminal node: assign class label */ 232 | jts[i] = nodeclass[k]; 233 | nodex[i] = k + 1; 234 | } 235 | if (maxcat > 1) mxFree(cbestsplit); 236 | } 237 | -------------------------------------------------------------------------------- /classification/classification.c: -------------------------------------------------------------------------------- 1 | /****************************************************************************** 2 | MODULE: usage 3 | 4 | PURPOSE: Classification part of the CCDC algorithm 5 | 6 | RETURN VALUE: 7 | Type = None 8 | 9 | HISTORY: 10 | Date Programmer Reason 11 | -------- --------------- ------------------------------------- 12 | 8/25/2015 Song Guo Original Development 13 | 14 | ******************************************************************************/ 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | 21 | #include "classification.h" 22 | #include "utilities.h" 23 | #include "matio.h" 24 | 25 | int main(int argc, char *argv[]) 26 | { 27 | mat_t *mat, *mat2; 28 | matvar_t *matvar, *matvar2; 29 | char *data; 30 | double *x; 31 | double *ref_x; 32 | int *y; 33 | int i, j = 0; 34 | size_t stride; 35 | FILE *fp; 36 | char FUNC_NAME[] = "main"; 37 | char msg_str[MAX_STR_LEN]; /* input data scene name */ 38 | int rows; 39 | int ref_rows; 40 | int cols; 41 | int nclass; 42 | bool verbose; /* verbose flag for printing messages */ 43 | int status; 44 | 45 | time_t now; 46 | time (&now); 47 | snprintf (msg_str, sizeof(msg_str), 48 | "CCDC start_time=%s\n", ctime (&now)); 49 | LOG_MESSAGE (msg_str, FUNC_NAME); 50 | 51 | /* Read the command-line arguments */ 52 | status = get_args (argc, argv, &rows, &cols, &ref_rows, &nclass, &verbose); 53 | if (status != SUCCESS) 54 | { 55 | RETURN_ERROR ("calling get_args", FUNC_NAME, EXIT_FAILURE); 56 | } 57 | 58 | /* Allocate memory */ 59 | x = malloc(rows * cols * sizeof(double)); 60 | ref_x = malloc(ref_rows * cols * sizeof(double)); 61 | y = malloc(rows * sizeof(int)); 62 | if (x == NULL || ref_x == NULL || y == NULL) 63 | { 64 | fprintf(stderr,"Error allocating memory\n"); 65 | return -1; 66 | } 67 | 68 | mat = Mat_Open("/data1/sguo/CCDC/classification/Xs.mat",MAT_ACC_RDONLY); 69 | mat2 = Mat_Open("/data1/sguo/CCDC/classification/Ys.mat",MAT_ACC_RDONLY); 70 | if ( NULL == mat || NULL == mat2) { 71 | fprintf(stderr,"Error opening MAT file \n"); 72 | return -1; 73 | } 74 | 75 | while((matvar=Mat_VarReadNext(mat)) != NULL) 76 | { 77 | if ( matvar->rank == 2 ) 78 | { 79 | stride = Mat_SizeOf(matvar->data_type); 80 | data = matvar->data; 81 | for ( i = 0; i < rows; i++ ) { 82 | for ( j = 0; j < matvar->dims[1]; j++ ) { 83 | size_t idx = matvar->dims[0]*j+i; 84 | x[i * matvar->dims[1] + j] = *(double*)(data+idx*stride); 85 | #if 0 86 | printf("i,j,x[i][j]=%d,%d,%g\n",i,j,x[i * matvar->dims[1] + j]); 87 | printf(" "); 88 | #endif 89 | } 90 | // printf("\n"); 91 | } 92 | 93 | } 94 | break; 95 | } 96 | Mat_VarFree(matvar); 97 | matvar = NULL; 98 | 99 | while((matvar2=Mat_VarReadNext(mat2)) != NULL) 100 | { 101 | 102 | if ( matvar2->rank == 2 ) 103 | { 104 | stride = Mat_SizeOf(matvar2->data_type); 105 | data = matvar2->data; 106 | for ( i = 0; i < rows; i++ ) { 107 | y[i] = (int) *(double*)(data+i*stride); 108 | #if 0 109 | printf("i,j,y[i]=%d,%d,%d\n",i,j,y[i]); 110 | printf("\n"); 111 | #endif 112 | } 113 | } 114 | } 115 | Mat_VarFree(matvar2); 116 | matvar2 = NULL; 117 | 118 | Mat_Close(mat); 119 | Mat_Close(mat2); 120 | 121 | /***START: NO NEED TO CHANGE ANYTHING FROM HERE TO THERE***************/ 122 | int p_size=cols,n_size=rows; 123 | int nsample=n_size; 124 | 125 | /* the classifcation version requires {D,N}, where D=(num) dimensions, N=(num) examples */ 126 | int dimx[2]; 127 | dimx[0]=p_size; 128 | dimx[1]=n_size; 129 | 130 | int* cat = (int*)calloc(p_size,sizeof(int)); 131 | 132 | /***END: NO NEED TO CHANGE ANYTHING FROM HERE TO THERE*****************/ 133 | 134 | /* write prediction OUTPUT into Y_hat.txt */ 135 | fp = fopen("Y_hat.txt","w"); 136 | 137 | /* need to do set this else everything blows up, represents the number of categories for 138 | every dimension - */ 139 | for(i=0;i 143 | 144 | int sampsize=n_size; /* if replace then sampsize=n_size or sampsize=0.632*n_size */ 145 | 146 | /* no need to change this */ 147 | int nsum = sampsize; 148 | 149 | int strata = 1; 150 | /* other options */ 151 | int addclass = 0; 152 | int importance=0; 153 | int localImp=0; 154 | int proximity=0; 155 | int oob_prox=0; 156 | int do_trace; //this variable prints verbosely each step 157 | if(verbose) 158 | do_trace=1; 159 | else 160 | do_trace=0; 161 | int keep_forest=1; 162 | int replace=1; 163 | int stratify=0; 164 | int keep_inbag=0; 165 | int Options[]={addclass,importance,localImp,proximity,oob_prox 166 | ,do_trace,keep_forest,replace,stratify,keep_inbag}; 167 | 168 | 169 | //ntree= number of tree. mtry=mtry :) 170 | int ntree=500; int nt=ntree; 171 | int mtry=(int)floor(sqrt(p_size)); /* - */ 172 | if(verbose) printf("ntree %d, mtry %d\n",ntree,mtry); 173 | 174 | int ipi=0; 175 | double* classwt=(double*)calloc(nclass,sizeof(double)); 176 | double* cutoff=(double*)calloc(nclass,sizeof(double)); 177 | for(i=0;i" 324 | " --cols=" 325 | " --rows=" 326 | " --nclass=" 327 | " [--verbose]\n"); 328 | 329 | printf ("\n"); 330 | printf ("where the following parameters are required:\n"); 331 | printf (" --rows=: number of rows\n"); 332 | printf (" --cols=: number of columns\n"); 333 | printf (" --rows=: number of rows for reference data\n"); 334 | printf (" --nclass=: number of classes\n"); 335 | printf ("\n"); 336 | printf ("and the following parameters are optional:\n"); 337 | printf (" -verbose: should intermediate messages be printed?" 338 | " (default is false)\n"); 339 | printf ("\n"); 340 | printf ("classification --help will print the usage statement\n"); 341 | printf ("\n"); 342 | printf ("Example:\n"); 343 | printf ("classification" 344 | " --rows=1000" 345 | " --cols=71" 346 | " --ref_rows=500" 347 | " --nclass=11" 348 | " --verbose\n"); 349 | printf ("Note: The classification must run from the directory" 350 | " where the input data are located.\n\n"); 351 | } 352 | -------------------------------------------------------------------------------- /classification/classification.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | 6 | #ifndef SUCCESS 7 | #define SUCCESS -1 8 | #endif 9 | 10 | #ifndef ERROR 11 | #define ERROR -1 12 | #endif 13 | 14 | #ifndef FAILURE 15 | #define FAILURE 1 16 | #endif 17 | 18 | #ifndef TRUE 19 | #define TRUE 1 20 | #endif 21 | 22 | #ifndef FALSE 23 | #define FALSE 1 24 | #endif 25 | 26 | #define MAX_STR_LEN 100 27 | 28 | typedef enum { false = 0, true = !false } bool; 29 | 30 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat, 31 | int *sampsize, int *strata, int *Options, int *ntree, int *nvar, 32 | int *ipi, double *classwt, double *cut, int *nodesize, 33 | int *outcl, int *counttr, double *prox, 34 | double *imprt, double *impsd, double *impmat, int *nrnodes, 35 | int *ndbigtree, int *nodestatus, int *bestvar, int *treemap, 36 | int *nodeclass, double *xbestsplit, double *errtr, 37 | int *testdat, double *xts, int *clts, int *nts, double *countts, 38 | int *outclts, int labelts, double *proxts, double *errts, 39 | int *inbag, int print_verbose_tree_progression); 40 | 41 | 42 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat, 43 | int *nrnodes, int *ntree, double *x, double *xbestsplit, 44 | double *pid, double *cutoff, double *countts, int *treemap, 45 | int *nodestatus, int *cat, int *nodeclass, int *jts, 46 | int *jet, int *bestvar, int *node, int *treeSize, 47 | int *keepPred, int *prox, double *proxMat, int *nodes); 48 | 49 | int get_args 50 | ( 51 | int argc, /* I: number of cmd-line args */ 52 | char *argv[], /* I: string of cmd-line args */ 53 | int *rows, /* O: number of rows */ 54 | int *cols, /* O: number of columns */ 55 | int *ref_rows, /* O: number of rows for reference data */ 56 | int *nclass, /* O: number of classification types */ 57 | bool *verbose /* O: verbose flag */ 58 | ); 59 | 60 | void usage(); 61 | -------------------------------------------------------------------------------- /classification/cokus.c: -------------------------------------------------------------------------------- 1 | // This is the Mersenne Twister random number generator MT19937, which 2 | // generates pseudorandom integers uniformly distributed in 0..(2^32 - 1) 3 | // starting from any odd seed in 0..(2^32 - 1). This version is a recode 4 | // by Shawn Cokus (Cokus@math.washington.edu) on March 8, 1998 of a version by 5 | // Takuji Nishimura (who had suggestions from Topher Cooper and Marc Rieffel in 6 | // July-August 1997). 7 | // 8 | // Effectiveness of the recoding (on Goedel2.math.washington.edu, a DEC Alpha 9 | // running OSF/1) using GCC -O3 as a compiler: before recoding: 51.6 sec. to 10 | // generate 300 million random numbers; after recoding: 24.0 sec. for the same 11 | // (i.e., 46.5% of original time), so speed is now about 12.5 million random 12 | // number generations per second on this machine. 13 | // 14 | // According to the URL 15 | // (and paraphrasing a bit in places), the Mersenne Twister is ``designed 16 | // with consideration of the flaws of various existing generators,'' has 17 | // a period of 2^19937 - 1, gives a sequence that is 623-dimensionally 18 | // equidistributed, and ``has passed many stringent tests, including the 19 | // die-hard test of G. Marsaglia and the load test of P. Hellekalek and 20 | // S. Wegenkittl.'' It is efficient in memory usage (typically using 2506 21 | // to 5012 bytes of static data, depending on data type sizes, and the code 22 | // is quite short as well). It generates random numbers in batches of 624 23 | // at a time, so the caching and pipelining of modern systems is exploited. 24 | // It is also divide- and mod-free. 25 | // 26 | // This library is free software; you can redistribute it and/or modify it 27 | // under the terms of the GNU Library General Public License as published by 28 | // the Free Software Foundation (either version 2 of the License or, at your 29 | // option, any later version). This library is distributed in the hope that 30 | // it will be useful, but WITHOUT ANY WARRANTY, without even the implied 31 | // warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See 32 | // the GNU Library General Public License for more details. You should have 33 | // received a copy of the GNU Library General Public License along with this 34 | // library; if not, write to the Free Software Foundation, Inc., 59 Temple 35 | // Place, Suite 330, Boston, MA 02111-1307, USA. 36 | // 37 | // The code as Shawn received it included the following notice: 38 | // 39 | // Copyright (C) 1997 Makoto Matsumoto and Takuji Nishimura. When 40 | // you use this, send an e-mail to with 41 | // an appropriate reference to your work. 42 | // 43 | // It would be nice to CC: when you write. 44 | // 45 | 46 | //#include 47 | //#include 48 | 49 | // 50 | // uint32 must be an unsigned integer type capable of holding at least 32 51 | // bits; exactly 32 should be fastest, but 64 is better on an Alpha with 52 | // GCC at -O3 optimization so try your options and see whats best for you 53 | // 54 | 55 | typedef unsigned long uint32; 56 | 57 | #define N (624) // length of state vector 58 | #define M (397) // a period parameter 59 | #define K (0x9908B0DFU) // a magic constant 60 | #define hiBit(u) ((u) & 0x80000000U) // mask all but highest bit of u 61 | #define loBit(u) ((u) & 0x00000001U) // mask all but lowest bit of u 62 | #define loBits(u) ((u) & 0x7FFFFFFFU) // mask the highest bit of u 63 | #define mixBits(u, v) (hiBit(u)|loBits(v)) // move hi bit of u to hi bit of v 64 | 65 | static uint32 state[N+1]; // state vector + 1 extra to not violate ANSI C 66 | static uint32 *next; // next random value is computed from here 67 | static int left = -1; // can *next++ this many times before reloading 68 | 69 | 70 | void seedMT(uint32 seed) 71 | { 72 | // 73 | // We initialize state[0..(N-1)] via the generator 74 | // 75 | // x_new = (69069 * x_old) mod 2^32 76 | // 77 | // from Line 15 of Table 1, p. 106, Sec. 3.3.4 of Knuths 78 | // _The Art of Computer Programming_, Volume 2, 3rd ed. 79 | // 80 | // Notes (SJC): I do not know what the initial state requirements 81 | // of the Mersenne Twister are, but it seems this seeding generator 82 | // could be better. It achieves the maximum period for its modulus 83 | // (2^30) iff x_initial is odd (p. 20-21, Sec. 3.2.1.2, Knuth); if 84 | // x_initial can be even, you have sequences like 0, 0, 0, ...; 85 | // 2^31, 2^31, 2^31, ...; 2^30, 2^30, 2^30, ...; 2^29, 2^29 + 2^31, 86 | // 2^29, 2^29 + 2^31, ..., etc. so I force seed to be odd below. 87 | // 88 | // Even if x_initial is odd, if x_initial is 1 mod 4 then 89 | // 90 | // the lowest bit of x is always 1, 91 | // the next-to-lowest bit of x is always 0, 92 | // the 2nd-from-lowest bit of x alternates ... 0 1 0 1 0 1 0 1 ... , 93 | // the 3rd-from-lowest bit of x 4-cycles ... 0 1 1 0 0 1 1 0 ... , 94 | // the 4th-from-lowest bit of x has the 8-cycle ... 0 0 0 1 1 1 1 0 ... , 95 | // ... 96 | // 97 | // and if x_initial is 3 mod 4 then 98 | // 99 | // the lowest bit of x is always 1, 100 | // the next-to-lowest bit of x is always 1, 101 | // the 2nd-from-lowest bit of x alternates ... 0 1 0 1 0 1 0 1 ... , 102 | // the 3rd-from-lowest bit of x 4-cycles ... 0 0 1 1 0 0 1 1 ... , 103 | // the 4th-from-lowest bit of x has the 8-cycle ... 0 0 1 1 1 1 0 0 ... , 104 | // ... 105 | // 106 | // The generators potency (min. s>=0 with (69069-1)^s = 0 mod 2^32) is 107 | // 16, which seems to be alright by p. 25, Sec. 3.2.1.3 of Knuth. It 108 | // also does well in the dimension 2..5 spectral tests, but it could be 109 | // better in dimension 6 (Line 15, Table 1, p. 106, Sec. 3.3.4, Knuth). 110 | // 111 | // Note that the random number user does not see the values generated 112 | // here directly since reloadMT() will always munge them first, so maybe 113 | // none of all of this matters. In fact, the seed values made here could 114 | // even be extra-special desirable if the Mersenne Twister theory says 115 | // so-- thats why the only change I made is to restrict to odd seeds. 116 | // 117 | 118 | register uint32 x = (seed | 1U) & 0xFFFFFFFFU, *s = state; 119 | register int j; 120 | 121 | for(left=0, *s++=x, j=N; --j; 122 | *s++ = (x*=69069U) & 0xFFFFFFFFU); 123 | } 124 | 125 | 126 | uint32 reloadMT(void) 127 | { 128 | register uint32 *p0=state, *p2=state+2, *pM=state+M, s0, s1; 129 | register int j; 130 | 131 | if(left < -1) 132 | seedMT(4357U); 133 | 134 | left=N-1, next=state+1; 135 | 136 | for(s0=state[0], s1=state[1], j=N-M+1; --j; s0=s1, s1=*p2++) 137 | *p0++ = *pM++ ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U); 138 | 139 | for(pM=state, j=M; --j; s0=s1, s1=*p2++) 140 | *p0++ = *pM++ ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U); 141 | 142 | s1=state[0], *p0 = *pM ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U); 143 | s1 ^= (s1 >> 11); 144 | s1 ^= (s1 << 7) & 0x9D2C5680U; 145 | s1 ^= (s1 << 15) & 0xEFC60000U; 146 | return(s1 ^ (s1 >> 18)); 147 | } 148 | 149 | 150 | uint32 randomMT(void) 151 | { 152 | uint32 y; 153 | 154 | if(--left < 0) 155 | return(reloadMT()); 156 | 157 | y = *next++; 158 | y ^= (y >> 11); 159 | y ^= (y << 7) & 0x9D2C5680U; 160 | y ^= (y << 15) & 0xEFC60000U; 161 | y ^= (y >> 18); 162 | return(y); 163 | } 164 | 165 | /* 166 | #define uint32 unsigned long 167 | #define SMALL_INT char 168 | #define SMALL_INT_CLASS mxCHAR_CLASS 169 | void seedMT(uint32 seed); 170 | uint32 randomMT(void); 171 | 172 | #include "stdio.h" 173 | #include "math.h" 174 | 175 | int main(void) 176 | { 177 | int j; 178 | 179 | // you can seed with any uint32, but the best are odds in 0..(2^32 - 1) 180 | 181 | seedMT(4357U); 182 | uint32 MAX=pow(2,32)-1; 183 | // print the first 2,002 random numbers seven to a line as an example 184 | 185 | for(j=0; j<2002; j++) 186 | printf(" %10lu%s", (unsigned long) randomMT(), (j%7)==6 ? "\n" : ""); 187 | 188 | for(j=0; j<2002; j++) 189 | printf(" %f%s", ((double)randomMT()/(double)MAX), (j%7)==6 ? "\n" : ""); 190 | 191 | 192 | return(1); 193 | } 194 | */ 195 | 196 | 197 | -------------------------------------------------------------------------------- /classification/get_args.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "classification.h" 6 | #include "utilities.h" 7 | 8 | /****************************************************************************** 9 | MODULE: get_args 10 | 11 | PURPOSE: Gets the command-line arguments and validates that the required 12 | arguments were specified. 13 | 14 | RETURN VALUE: 15 | Type = int 16 | Value Description 17 | ----- ----------- 18 | FAILURE Error getting the command-line arguments or a command-line 19 | argument and associated value were not specified 20 | SUCCESS No errors encountered 21 | 22 | HISTORY: 23 | Date Programmer Reason 24 | -------- --------------- ------------------------------------- 25 | 8/5/2015 Song Guo Original Development 26 | ******************************************************************************/ 27 | int get_args 28 | ( 29 | int argc, /* I: number of cmd-line args */ 30 | char *argv[], /* I: string of cmd-line args */ 31 | int *rows, /* O: number of rows */ 32 | int *cols, /* O: number of columns */ 33 | int *ref_rows, /* O: number of rows for reference data */ 34 | int *nclass, /* O: number of classification types */ 35 | bool *verbose /* O: verbose flag */ 36 | ) 37 | { 38 | int c; /* current argument index */ 39 | int option_index; /* index for the command-line option */ 40 | static int verbose_flag = 0; /* verbose flag */ 41 | static int cols_default = 71; /* Default buffer for number of columns */ 42 | static int nclass_default = 11; /* Default buffer for number of classes */ 43 | char errmsg[MAX_STR_LEN]; /* error message */ 44 | char FUNC_NAME[] = "get_args"; /* function name */ 45 | static struct option long_options[] = { 46 | {"verbose", no_argument, &verbose_flag, 1}, 47 | {"rows", required_argument, 0, 'r'}, 48 | {"cols", required_argument, 0, 'c'}, 49 | {"ref_rows", required_argument, 0, 'f'}, 50 | {"nclass", required_argument, 0, 'n'}, 51 | {"help", no_argument, 0, 'h'}, 52 | {0, 0, 0, 0} 53 | }; 54 | 55 | /* Assign the default values */ 56 | *cols = cols_default; 57 | *nclass = nclass_default; 58 | 59 | /* Loop through all the cmd-line options */ 60 | opterr = 0; /* turn off getopt_long error msgs as we'll print our own */ 61 | while (1) 62 | { 63 | /* optstring in call to getopt_long is empty since we will only 64 | support the long options */ 65 | c = getopt_long (argc, argv, "", long_options, &option_index); 66 | if (c == -1) 67 | { /* Out of cmd-line options */ 68 | break; 69 | } 70 | 71 | switch (c) 72 | { 73 | case 0: 74 | /* If this option set a flag, do nothing else now. */ 75 | if (long_options[option_index].flag != 0) 76 | { 77 | break; 78 | } 79 | sprintf (errmsg, "option %s\n", long_options[option_index].name); 80 | if (optarg) 81 | { 82 | sprintf (errmsg, "option %s with arg %s\n", 83 | long_options[option_index].name, optarg); 84 | } 85 | RETURN_ERROR (errmsg, FUNC_NAME, ERROR); 86 | break; 87 | 88 | case 'h': /* help */ 89 | usage (); 90 | return FAILURE; 91 | break; 92 | 93 | case 'r': 94 | *rows = atoi (optarg); 95 | break; 96 | 97 | case 'c': 98 | *cols = atoi (optarg); 99 | break; 100 | 101 | case 'f': 102 | *ref_rows = atoi (optarg); 103 | break; 104 | 105 | case 'n': 106 | *nclass = atoi (optarg); 107 | break; 108 | 109 | case '?': 110 | default: 111 | sprintf (errmsg, "Unknown option %s", argv[optind - 1]); 112 | usage (); 113 | RETURN_ERROR (errmsg, FUNC_NAME, ERROR); 114 | break; 115 | } 116 | } 117 | 118 | /* Check the input values */ 119 | if (*rows < 0) 120 | { 121 | sprintf (errmsg, "number of rows must be > 0"); 122 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE); 123 | } 124 | 125 | if (*cols < 0) 126 | { 127 | sprintf (errmsg, "number of columns must be > 0"); 128 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE); 129 | } 130 | 131 | if (*ref_rows < 0) 132 | { 133 | sprintf (errmsg, "number of reference rows must be > 0"); 134 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE); 135 | } 136 | 137 | if (*nclass < 0) 138 | { 139 | sprintf (errmsg, "number of classes must be > 0"); 140 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE); 141 | } 142 | 143 | 144 | /* Check the verbose flag */ 145 | if (verbose_flag) 146 | *verbose = true; 147 | else 148 | *verbose = false; 149 | 150 | if (*verbose) 151 | { 152 | printf ("rows = %d\n", *rows); 153 | printf ("cols = %d\n", *cols); 154 | printf ("ref_rows = %d\n", *ref_rows); 155 | printf ("nclass = %d\n", *nclass); 156 | printf ("verbose = %d\n", *verbose); 157 | } 158 | 159 | return SUCCESS; 160 | } 161 | 162 | -------------------------------------------------------------------------------- /classification/qsort.c: -------------------------------------------------------------------------------- 1 | /************************************************************** 2 | * Blatantly copied Quick sort algorithm from R's source code 3 | *************************************************************/ 4 | 5 | 6 | #define qsort_Index 7 | #define NUMERIC double 8 | void R_qsort_I(double *v, int *I, int i, int j) 9 | /*====== BODY of R_qsort() and R_qsorti() functions ==================== 10 | * 11 | * is included in ./qsort.c with and without ``qsort_Index'' defined 12 | *====================================================================== 13 | */ 14 | { 15 | /* Orders v[] increasingly. Puts into I[] the permutation vector: 16 | * new v[k] = old v[I[k]] 17 | * Only elements [i : j] (in 1-indexing !) are considered. 18 | 19 | * This is a modification of CACM algorithm #347 by R. C. Singleton, 20 | * which is a modified Hoare quicksort. 21 | * This version incorporates the modification in the remark by Peto. 22 | */ 23 | 24 | int il[31], iu[31]; 25 | /* Arrays iu[k] and il[k] permit sorting up to 2^(k+1)-1 elements; 26 | * originally k = 20 -> n_max = 2'097'151 27 | * now k = 31 -> n_max = 4294'967'295 28 | */ 29 | NUMERIC vt, vtt; 30 | double R = 0.375; 31 | int ii, ij, k, l, m; 32 | #ifdef qsort_Index 33 | int it, tt; 34 | #endif 35 | 36 | 37 | /* 1-indexing for I[], v[] (and `i' and `j') : */ 38 | --v; 39 | #ifdef qsort_Index 40 | --I; 41 | #endif 42 | 43 | ii = i;/* save */ 44 | m = 1; 45 | 46 | L10: 47 | if (i < j) { 48 | if (R < 0.5898437) R += 0.0390625; else R -= 0.21875; 49 | L20: 50 | k = i; 51 | /* ij = (j + i) >> 1; midpoint */ 52 | ij = i + (int)((j - i)*R); 53 | #ifdef qsort_Index 54 | it = I[ij]; 55 | #endif 56 | vt = v[ij]; 57 | if (v[i] > vt) { 58 | #ifdef qsort_Index 59 | I[ij] = I[i]; I[i] = it; it = I[ij]; 60 | #endif 61 | v[ij] = v[i]; v[i] = vt; vt = v[ij]; 62 | } 63 | /* L30:*/ 64 | l = j; 65 | if (v[j] < vt) { 66 | #ifdef qsort_Index 67 | I[ij] = I[j]; I[j] = it; it = I[ij]; 68 | #endif 69 | v[ij] = v[j]; v[j] = vt; vt = v[ij]; 70 | if (v[i] > vt) { 71 | #ifdef qsort_Index 72 | I[ij] = I[i]; I[i] = it; it = I[ij]; 73 | #endif 74 | v[ij] = v[i]; v[i] = vt; vt = v[ij]; 75 | } 76 | } 77 | 78 | for(;;) { /*L50:*/ 79 | //do l--; while (v[l] > vt); 80 | l--;for(;v[l]>vt;l--); 81 | 82 | 83 | #ifdef qsort_Index 84 | tt = I[l]; 85 | #endif 86 | vtt = v[l]; 87 | /*L60:*/ 88 | //do k++; while (v[k] < vt); 89 | k=k+1;for(;v[k] l) break; 92 | 93 | /* else (k <= l) : */ 94 | #ifdef qsort_Index 95 | I[l] = I[k]; I[k] = tt; 96 | #endif 97 | v[l] = v[k]; v[k] = vtt; 98 | } 99 | 100 | m++; 101 | if (l - i <= j - k) { 102 | /*L70: */ 103 | il[m] = k; 104 | iu[m] = j; 105 | j = l; 106 | } 107 | else { 108 | il[m] = i; 109 | iu[m] = l; 110 | i = k; 111 | } 112 | }else { /* i >= j : */ 113 | 114 | L80: 115 | if (m == 1) return; 116 | 117 | /* else */ 118 | i = il[m]; 119 | j = iu[m]; 120 | m--; 121 | } 122 | 123 | if (j - i > 10) goto L20; 124 | 125 | if (i == ii) goto L10; 126 | 127 | --i; 128 | L100: 129 | do { 130 | ++i; 131 | if (i == j) { 132 | goto L80; 133 | } 134 | #ifdef qsort_Index 135 | it = I[i + 1]; 136 | #endif 137 | vt = v[i + 1]; 138 | } while (v[i] <= vt); 139 | 140 | k = i; 141 | 142 | do { /*L110:*/ 143 | #ifdef qsort_Index 144 | I[k + 1] = I[k]; 145 | #endif 146 | v[k + 1] = v[k]; 147 | --k; 148 | } while (vt < v[k]); 149 | 150 | #ifdef qsort_Index 151 | I[k + 1] = it; 152 | #endif 153 | v[k + 1] = vt; 154 | goto L100; 155 | } /* R_qsort{i} */ 156 | -------------------------------------------------------------------------------- /classification/rf.h: -------------------------------------------------------------------------------- 1 | /************************************************************** 2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest) 3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu ) 4 | * License: GPLv2 5 | * Version: 0.02 6 | * 7 | * other than adding the macros for F77_* and adding this message 8 | * nothing changed . 9 | *************************************************************/ 10 | 11 | /******************************************************************* 12 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc. 13 | 14 | This program is free software; you can redistribute it and/or 15 | modify it under the terms of the GNU General Public License 16 | as published by the Free Software Foundation; either version 2 17 | of the License, or (at your option) any later version. 18 | 19 | This program is distributed in the hope that it will be useful, 20 | but WITHOUT ANY WARRANTY; without even the implied warranty of 21 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 22 | GNU General Public License for more details. 23 | *******************************************************************/ 24 | #ifndef RF_H 25 | #define RF_H 26 | 27 | /* test if the bit at position pos is turned on */ 28 | #define isBitOn(x,pos) (((x) & (1 << (pos))) > 0) 29 | /* swap two integers */ 30 | #define swapInt(a, b) ((a ^= b), (b ^= a), (a ^= b)) 31 | /* 32 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat, 33 | int *sampsize, int *Options, int *ntree, int *nvar, 34 | int *ipi, double *pi, double *cut, int *nodesize, 35 | int *outcl, int *counttr, double *prox, 36 | double *imprt, double *, double *impmat, int *nrnodes, int *ndbigtree, 37 | int *nodestatus, int *bestvar, int *treemap, int *nodeclass, 38 | double *xbestsplit, double *pid, double *errtr, 39 | int *testdat, double *xts, int *clts, int *nts, double *countts, 40 | int *outclts, int *labelts, double *proxts, double *errts); 41 | */ 42 | 43 | #define F77_CALL(x) x ## _ 44 | #define F77_NAME(x) F77_CALL(x) 45 | #define F77_SUB(x) F77_CALL(x) 46 | 47 | 48 | void normClassWt(int *cl, const int nsample, const int nclass, 49 | const int useWt, double *classwt, int *classFreq); 50 | 51 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat, 52 | int *nrnodes, int *jbt, double *xts, double *xbestsplit, 53 | double *pid, double *cutoff, double *countts, int *treemap, 54 | int *nodestatus, int *cat, int *nodeclass, int *jts, 55 | int *jet, int *bestvar, int *nodexts, int *ndbigtree, 56 | int *keepPred, int *prox, double *proxmatrix, int *nodes); 57 | 58 | void regTree(double *x, double *y, int mdim, int nsample, 59 | int *lDaughter, int *rDaughter, double *upper, double *avnode, 60 | int *nodestatus, int nrnodes, int *treeSize, int nthsize, 61 | int mtry, int *mbest, int *cat, double *tgini, int *varUsed); 62 | 63 | void findBestSplit(double *x, int *jdex, double *y, int mdim, int nsample, 64 | int ndstart, int ndend, int *msplit, double *decsplit, 65 | double *ubest, int *ndendl, int *jstat, int mtry, 66 | double sumnode, int nodecnt, int *cat); 67 | 68 | void predictRegTree(double *x, int nsample, int mdim, 69 | int *lDaughter, int *rDaughter, int *nodestatus, 70 | double *ypred, double *split, double *nodepred, 71 | int *splitVar, int treeSize, int *cat, int maxcat, 72 | int *nodex); 73 | 74 | void predictClassTree(double *x, int n, int mdim, int *treemap, 75 | int *nodestatus, double *xbestsplit, 76 | int *bestvar, int *nodeclass, 77 | int ndbigtree, int *cat, int nclass, 78 | int *jts, int *nodex, int maxcat); 79 | 80 | int pack(int l, int *icat); 81 | void unpack(unsigned int npack, int *icat); 82 | 83 | void zeroInt(int *x, int length); 84 | void zeroDouble(double *x, int length); 85 | void createClass(double *x, int realN, int totalN, int mdim); 86 | void prepare(int *cl, const int nsample, const int nclass, const int ipi, 87 | double *pi, double *pid, int *nc, double *wtt); 88 | void makeA(double *x, const int mdim, const int nsample, int *cat, int *a, 89 | int *b); 90 | void modA(int *a, int *nuse, const int nsample, const int mdim, int *cat, 91 | const int maxcat, int *ncase, int *jin); 92 | void Xtranslate(double *x, int mdim, int nrnodes, int nsample, 93 | int *bestvar, int *bestsplit, int *bestsplitnext, 94 | double *xbestsplit, int *nodestatus, int *cat, int treeSize); 95 | void permuteOOB(int m, double *x, int *in, int nsample, int mdim); 96 | void computeProximity(double *prox, int oobprox, int *node, int *inbag, 97 | int *oobpair, int n); 98 | 99 | /* Template of Fortran subroutines to be called from the C wrapper */ 100 | /*extern void F77_NAME(buildtree)(int *a, int *b, int *cl, int *cat, 101 | int *maxcat, int *mdim, int *nsample, 102 | int *nclass, int *treemap, int *bestvar, 103 | int *bestsplit, int *bestsplitnext, 104 | double *tgini, int *nodestatus, int *nodepop, 105 | int *nodestart, double *classpop, 106 | double *tclasspop, double *tclasscat, 107 | int *ta, int *nrnodes, int *, 108 | int *, int *, int *, int *, int *, int *, 109 | double *, double *, double *, 110 | int *, int *, int *); 111 | */ 112 | /* Node status */ 113 | #define NODE_TERMINAL -1 114 | #define NODE_TOSPLIT -2 115 | #define NODE_INTERIOR -3 116 | 117 | #endif /* RF_H */ 118 | -------------------------------------------------------------------------------- /classification/rfsub.f: -------------------------------------------------------------------------------- 1 | c Copyright (C) 2001-7 Leo Breiman and Adele Cutler and Merck & Co, Inc. 2 | c This program is free software; you can redistribute it and/or 3 | c modify it under the terms of the GNU General Public License 4 | c as published by the Free Software Foundation; either version 2 5 | c of the License, or (at your option) any later version. 6 | 7 | c This program is distributed in the hope that it will be useful, 8 | c but WITHOUT ANY WARRANTY; without even the implied warranty of 9 | c MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 10 | c GNU General Public License for more details. 11 | c 12 | c Modified by Andy Liaw and Matt Wiener: 13 | c The main program is re-written as a C function to be called from R. 14 | c All calls to the uniform RNG is replaced with R's RNG. Some subroutines 15 | c not called are excluded. Variables and arrays declared as double as 16 | c needed. Unused variables are deleted. 17 | c 18 | c SUBROUTINE BUILDTREE 19 | 20 | subroutine buildtree(a, b, cl, cat, maxcat, mdim, nsample, 21 | 1 nclass, treemap, bestvar, bestsplit, bestsplitnext, tgini, 22 | 1 nodestatus,nodepop, nodestart, classpop, tclasspop, 23 | 1 tclasscat,ta,nrnodes, idmove, ndsize, ncase, mtry, iv, 24 | 1 nodeclass, ndbigtree, win, wr, wl, mred, nuse, mind) 25 | 26 | c Buildtree consists of repeated calls to two subroutines, Findbestsplit 27 | c and Movedata. Findbestsplit does just that--it finds the best split of 28 | c the current node. Movedata moves the data in the split node right and 29 | c left so that the data corresponding to each child node is contiguous. 30 | c The buildtree bookkeeping is different from that in Friedman's original 31 | c CART program. ncur is the total number of nodes to date. 32 | c nodestatus(k)=1 if the kth node has been split. nodestatus(k)=2 if the 33 | c node exists but has not yet been split, and =-1 of the node is terminal. 34 | c A node is terminal if its size is below a threshold value, or if it is 35 | c all one class, or if all the x-values are equal. If the current node k 36 | c is split, then its children are numbered ncur+1 (left), and 37 | c ncur+2(right), ncur increases to ncur+2 and the next node to be split is 38 | c numbered k+1. When no more nodes can be split, buildtree returns to the 39 | c main program. 40 | 41 | implicit double precision(a-h,o-z) 42 | integer a(mdim, nsample), cl(nsample), cat(mdim), 43 | 1 treemap(2,nrnodes), bestvar(nrnodes), 44 | 1 bestsplit(nrnodes), nodestatus(nrnodes), ta(nsample), 45 | 1 nodepop(nrnodes), nodestart(nrnodes), 46 | 1 bestsplitnext(nrnodes), idmove(nsample), 47 | 1 ncase(nsample), b(mdim,nsample), 48 | 1 iv(mred), nodeclass(nrnodes), mind(mred) 49 | 50 | double precision tclasspop(nclass), classpop(nclass, nrnodes), 51 | 1 tclasscat(nclass, 32), win(nsample), wr(nclass), 52 | 1 wl(nclass), tgini(mdim), xrand 53 | integer msplit, ntie 54 | 55 | msplit = 0 56 | call zerv(nodestatus,nrnodes) 57 | call zerv(nodestart,nrnodes) 58 | call zerv(nodepop,nrnodes) 59 | call zermr(classpop,nclass,nrnodes) 60 | 61 | do j=1,nclass 62 | classpop(j, 1) = tclasspop(j) 63 | end do 64 | ncur = 1 65 | nodestart(1) = 1 66 | nodepop(1) = nuse 67 | nodestatus(1) = 2 68 | c start main loop 69 | do 30 kbuild = 1, nrnodes 70 | if (kbuild .gt. ncur) goto 50 71 | if (nodestatus(kbuild) .ne. 2) goto 30 72 | c initialize for next call to findbestsplit 73 | ndstart = nodestart(kbuild) 74 | ndend = ndstart + nodepop(kbuild) - 1 75 | do j = 1, nclass 76 | tclasspop(j) = classpop(j,kbuild) 77 | end do 78 | jstat = 0 79 | 80 | call findbestsplit(a,b,cl,mdim,nsample,nclass,cat,maxcat, 81 | 1 ndstart, ndend,tclasspop,tclasscat,msplit, decsplit, 82 | 1 nbest,ncase, jstat,mtry,win,wr,wl,mred,mind) 83 | if (jstat .eq. -1) then 84 | nodestatus(kbuild) = -1 85 | goto 30 86 | else 87 | bestvar(kbuild) = msplit 88 | iv(msplit) = 1 89 | if (decsplit .lt. 0.0) decsplit = 0.0 90 | tgini(msplit) = tgini(msplit) + decsplit 91 | if (cat(msplit) .eq. 1) then 92 | bestsplit(kbuild) = a(msplit,nbest) 93 | bestsplitnext(kbuild) = a(msplit,nbest+1) 94 | else 95 | bestsplit(kbuild) = nbest 96 | bestsplitnext(kbuild) = 0 97 | endif 98 | endif 99 | 100 | call movedata(a,ta,mdim,nsample,ndstart,ndend,idmove,ncase, 101 | 1 msplit,cat,nbest,ndendl) 102 | 103 | c leftnode no.= ncur+1, rightnode no. = ncur+2. 104 | nodepop(ncur+1) = ndendl - ndstart + 1 105 | nodepop(ncur+2) = ndend - ndendl 106 | nodestart(ncur+1) = ndstart 107 | nodestart(ncur+2) = ndendl + 1 108 | 109 | c find class populations in both nodes 110 | do n = ndstart, ndendl 111 | nc = ncase(n) 112 | j=cl(nc) 113 | classpop(j,ncur+1) = classpop(j,ncur+1) + win(nc) 114 | end do 115 | do n = ndendl+1, ndend 116 | nc = ncase(n) 117 | j = cl(nc) 118 | classpop(j,ncur+2) = classpop(j,ncur+2) + win(nc) 119 | end do 120 | c check on nodestatus 121 | nodestatus(ncur+1) = 2 122 | nodestatus(ncur+2) = 2 123 | if (nodepop(ncur+1).le.ndsize) nodestatus(ncur+1) = -1 124 | if (nodepop(ncur+2).le.ndsize) nodestatus(ncur+2) = -1 125 | popt1 = 0 126 | popt2 = 0 127 | do j = 1, nclass 128 | popt1 = popt1 + classpop(j,ncur+1) 129 | popt2 = popt2 + classpop(j,ncur+2) 130 | end do 131 | 132 | do j=1,nclass 133 | if (classpop(j,ncur+1).eq.popt1) nodestatus(ncur+1) = -1 134 | if (classpop(j,ncur+2).eq.popt2) nodestatus(ncur+2) = -1 135 | end do 136 | 137 | treemap(1,kbuild) = ncur + 1 138 | treemap(2,kbuild) = ncur + 2 139 | nodestatus(kbuild) = 1 140 | ncur = ncur+2 141 | if (ncur.ge.nrnodes) goto 50 142 | 143 | 30 continue 144 | 50 continue 145 | 146 | ndbigtree = nrnodes 147 | do k=nrnodes, 1, -1 148 | if (nodestatus(k) .eq. 0) ndbigtree = ndbigtree - 1 149 | if (nodestatus(k) .eq. 2) nodestatus(k) = -1 150 | end do 151 | 152 | c form prediction in terminal nodes 153 | do kn = 1, ndbigtree 154 | if(nodestatus(kn) .eq. -1) then 155 | pp = 0 156 | ntie = 1 157 | do j = 1, nclass 158 | if (classpop(j,kn) .gt. pp) then 159 | nodeclass(kn) = j 160 | pp = classpop(j,kn) 161 | end if 162 | c Break ties at random: 163 | if (classpop(j,kn) .eq. pp) then 164 | ntie = ntie + 1 165 | call rrand(xrand) 166 | if (xrand .lt. 1.0 / ntie) then 167 | nodeclass(kn)=j 168 | pp=classpop(j,kn) 169 | end if 170 | end if 171 | end do 172 | end if 173 | end do 174 | 175 | end 176 | 177 | c SUBROUTINE FINDBESTSPLIT 178 | c For the best split, msplit is the variable split on. decsplit is the 179 | c dec. in impurity. If msplit is numerical, nsplit is the case number 180 | c of value of msplit split on, and nsplitnext is the case number of the 181 | c next larger value of msplit. If msplit is categorical, then nsplit is 182 | c the coding into an integer of the categories going left. 183 | subroutine findbestsplit(a, b, cl, mdim, nsample, nclass, cat, 184 | 1 maxcat, ndstart, ndend, tclasspop, tclasscat, msplit, 185 | 2 decsplit, nbest, ncase, jstat, mtry, win, wr, wl, 186 | 3 mred, mind) 187 | implicit double precision(a-h,o-z) 188 | integer a(mdim,nsample), cl(nsample), cat(mdim), 189 | 1 ncase(nsample), b(mdim,nsample), nn, j 190 | double precision tclasspop(nclass), tclasscat(nclass,32), dn(32), 191 | 1 win(nsample), wr(nclass), wl(nclass), xrand 192 | integer mind(mred), ncmax, ncsplit,nhit, ntie 193 | ncmax = 10 194 | ncsplit = 512 195 | c compute initial values of numerator and denominator of Gini 196 | pno = 0.0 197 | pdo = 0.0 198 | do j = 1, nclass 199 | pno = pno + tclasspop(j) * tclasspop(j) 200 | pdo = pdo + tclasspop(j) 201 | end do 202 | crit0 = pno / pdo 203 | jstat = 0 204 | 205 | c start main loop through variables to find best split 206 | critmax = -1.0e25 207 | do k = 1, mred 208 | mind(k) = k 209 | end do 210 | nn = mred 211 | c sampling mtry variables w/o replacement. 212 | do mt = 1, mtry 213 | call rrand(xrand) 214 | j = int(nn * xrand) + 1 215 | mvar = mind(j) 216 | mind(j) = mind(nn) 217 | mind(nn) = mvar 218 | nn = nn - 1 219 | lcat = cat(mvar) 220 | if (lcat .eq. 1) then 221 | c Split on a numerical predictor. 222 | rrn = pno 223 | rrd = pdo 224 | rln = 0 225 | rld = 0 226 | call zervr(wl, nclass) 227 | do j = 1, nclass 228 | wr(j) = tclasspop(j) 229 | end do 230 | ntie = 1 231 | do nsp = ndstart, ndend-1 232 | nc = a(mvar, nsp) 233 | u = win(nc) 234 | k = cl(nc) 235 | rln = rln + u * (2 * wl(k) + u) 236 | rrn = rrn + u * (-2 * wr(k) + u) 237 | rld = rld + u 238 | rrd = rrd - u 239 | wl(k) = wl(k) + u 240 | wr(k) = wr(k) - u 241 | if (b(mvar, nc) .lt. b(mvar, a(mvar, nsp + 1))) then 242 | c If neither nodes is empty, check the split. 243 | if (dmin1(rrd, rld) .gt. 1.0e-5) then 244 | crit = (rln / rld) + (rrn / rrd) 245 | if (crit .gt. critmax) then 246 | nbest = nsp 247 | critmax = crit 248 | msplit = mvar 249 | end if 250 | c Break ties at random: 251 | if (crit .eq. critmax) then 252 | ntie = ntie + 1 253 | call rrand(xrand) 254 | if (xrand .lt. 1.0 / ntie) then 255 | nbest = nsp 256 | critmax = crit 257 | msplit = mvar 258 | end if 259 | end if 260 | end if 261 | end if 262 | end do 263 | else 264 | c Split on a categorical predictor. Compute the decrease in impurity. 265 | call zermr(tclasscat, nclass, 32) 266 | do nsp = ndstart, ndend 267 | nc = ncase(nsp) 268 | l = a(mvar, ncase(nsp)) 269 | tclasscat(cl(nc), l) = tclasscat(cl(nc), l) + win(nc) 270 | end do 271 | nnz = 0 272 | do i = 1, lcat 273 | su = 0 274 | do j = 1, nclass 275 | su = su + tclasscat(j, i) 276 | end do 277 | dn(i) = su 278 | if(su .gt. 0) nnz = nnz + 1 279 | end do 280 | nhit = 0 281 | if (nnz .gt. 1) then 282 | if (nclass .eq. 2 .and. lcat .gt. ncmax) then 283 | call catmaxb(pdo, tclasscat, tclasspop, nclass, 284 | & lcat, nbest, critmax, nhit, dn) 285 | else 286 | call catmax(pdo, tclasscat, tclasspop, nclass, lcat, 287 | & nbest, critmax, nhit, maxcat, ncmax, ncsplit) 288 | end if 289 | if (nhit .eq. 1) msplit = mvar 290 | c else 291 | c critmax = -1.0e25 292 | end if 293 | end if 294 | end do 295 | if (critmax .lt. -1.0e10 .or. msplit .eq. 0) jstat = -1 296 | decsplit = critmax - crit0 297 | return 298 | end 299 | 300 | C ============================================================== 301 | c SUBROUTINE MOVEDATA 302 | c This subroutine is the heart of the buildtree construction. Based on the 303 | c best split the data in the part of the a matrix corresponding to the 304 | c current node is moved to the left if it belongs to the left child and 305 | c right if it belongs to the right child. 306 | 307 | subroutine movedata(a,ta,mdim,nsample,ndstart,ndend,idmove, 308 | 1 ncase,msplit,cat,nbest,ndendl) 309 | implicit double precision(a-h,o-z) 310 | integer a(mdim,nsample),ta(nsample),idmove(nsample), 311 | 1 ncase(nsample),cat(mdim),icat(32) 312 | 313 | c compute idmove=indicator of case nos. going left 314 | 315 | if (cat(msplit).eq.1) then 316 | do nsp=ndstart,nbest 317 | nc=a(msplit,nsp) 318 | idmove(nc)=1 319 | end do 320 | do nsp=nbest+1, ndend 321 | nc=a(msplit,nsp) 322 | idmove(nc)=0 323 | end do 324 | ndendl=nbest 325 | else 326 | ndendl=ndstart-1 327 | l=cat(msplit) 328 | call myunpack(l,nbest,icat) 329 | do nsp=ndstart,ndend 330 | nc=ncase(nsp) 331 | if (icat(a(msplit,nc)).eq.1) then 332 | idmove(nc)=1 333 | ndendl=ndendl+1 334 | else 335 | idmove(nc)=0 336 | endif 337 | end do 338 | endif 339 | 340 | c shift case. nos. right and left for numerical variables. 341 | 342 | do 40 msh=1,mdim 343 | if (cat(msh).eq.1) then 344 | k=ndstart-1 345 | do 50 n=ndstart,ndend 346 | ih=a(msh,n) 347 | if (idmove(ih).eq.1) then 348 | k=k+1 349 | ta(k)=a(msh,n) 350 | endif 351 | 50 continue 352 | do 60 n=ndstart,ndend 353 | ih=a(msh,n) 354 | if (idmove(ih).eq.0) then 355 | k=k+1 356 | ta(k)=a(msh,n) 357 | endif 358 | 60 continue 359 | 360 | do 70 k=ndstart,ndend 361 | a(msh,k)=ta(k) 362 | 70 continue 363 | endif 364 | 365 | 40 continue 366 | ndo=0 367 | if (ndo.eq.1) then 368 | do 140 msh = 1, mdim 369 | if (cat(msh) .gt. 1) then 370 | k = ndstart - 1 371 | do 150 n = ndstart, ndend 372 | ih = ncase(n) 373 | if (idmove(ih) .eq. 1) then 374 | k = k + 1 375 | ta(k) = a(msh, ih) 376 | endif 377 | 150 continue 378 | do 160 n = ndstart, ndend 379 | ih = ncase(n) 380 | if (idmove(ih) .eq. 0) then 381 | k = k + 1 382 | ta(k) = a(msh,ih) 383 | endif 384 | 160 continue 385 | 386 | do 170 k = ndstart, ndend 387 | a(msh,k) = ta(k) 388 | 170 continue 389 | endif 390 | 391 | 140 continue 392 | end if 393 | 394 | c compute case nos. for right and left nodes. 395 | 396 | if (cat(msplit).eq.1) then 397 | do 80 n=ndstart,ndend 398 | ncase(n)=a(msplit,n) 399 | 80 continue 400 | else 401 | k=ndstart-1 402 | do 90 n=ndstart, ndend 403 | if (idmove(ncase(n)).eq.1) then 404 | k=k+1 405 | ta(k)=ncase(n) 406 | endif 407 | 90 continue 408 | do 100 n=ndstart,ndend 409 | if (idmove(ncase(n)).eq.0) then 410 | k=k+1 411 | ta(k)=ncase(n) 412 | endif 413 | 100 continue 414 | do 110 k=ndstart,ndend 415 | ncase(k)=ta(k) 416 | 110 continue 417 | endif 418 | 419 | end 420 | 421 | subroutine myunpack(l,npack,icat) 422 | 423 | c npack is a long integer. The sub. returns icat, an integer of zeroes and 424 | c ones corresponding to the coefficients in the binary expansion of npack. 425 | 426 | integer icat(32),npack 427 | do j=1,32 428 | icat(j)=0 429 | end do 430 | n=npack 431 | icat(1)=mod(n,2) 432 | do k=2,l 433 | n=(n-icat(k-1))/2 434 | icat(k)=mod(n,2) 435 | end do 436 | end 437 | 438 | subroutine zerv(ix,m1) 439 | integer ix(m1) 440 | do n=1,m1 441 | ix(n)=0 442 | end do 443 | end 444 | 445 | subroutine zervr(rx,m1) 446 | double precision rx(m1) 447 | do n=1,m1 448 | rx(n)=0.0d0 449 | end do 450 | end 451 | 452 | subroutine zerm(mx,m1,m2) 453 | integer mx(m1,m2) 454 | do i=1,m1 455 | do j=1,m2 456 | mx(i,j)=0 457 | end do 458 | end do 459 | end 460 | 461 | subroutine zermr(rx,m1,m2) 462 | double precision rx(m1,m2) 463 | do i=1,m1 464 | do j=1,m2 465 | rx(i,j)=0.0d0 466 | end do 467 | end do 468 | end 469 | 470 | subroutine zermd(rx,m1,m2) 471 | double precision rx(m1,m2) 472 | do i=1,m1 473 | do j=1,m2 474 | rx(i,j)=0.0d0 475 | end do 476 | end do 477 | end 478 | -------------------------------------------------------------------------------- /classification/rfutils.c: -------------------------------------------------------------------------------- 1 | /******************************************************************* 2 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc. 3 | 4 | This program is free software; you can redistribute it and/or 5 | modify it under the terms of the GNU General Public License 6 | as published by the Free Software Foundation; either version 2 7 | of the License, or (at your option) any later version. 8 | 9 | This program is distributed in the hope that it will be useful, 10 | but WITHOUT ANY WARRANTY; without even the implied warranty of 11 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 | GNU General Public License for more details. 13 | *******************************************************************/ 14 | //#include 15 | #include "rf.h" 16 | #include "memory.h" 17 | #include "stdlib.h" 18 | #include "qsort.c" 19 | 20 | #define MAX_UINT_COKUS 4294967295 //basically 2^32-1 21 | 22 | typedef unsigned long uint32; 23 | extern void seedMT(uint32 seed); 24 | extern uint32 reloadMT(void); 25 | extern uint32 randomMT(void); 26 | extern double unif_rand(); 27 | 28 | void zeroInt(int *x, int length) { 29 | memset(x, 0, length * sizeof(int)); 30 | } 31 | 32 | void zeroDouble(double *x, int length) { 33 | memset(x, 0, length * sizeof(double)); 34 | } 35 | 36 | void createClass(double *x, int realN, int totalN, int mdim) { 37 | /* Create the second class by bootstrapping each variable independently. */ 38 | int i, j, k; 39 | for (i = realN; i < totalN; ++i) { 40 | for (j = 0; j < mdim; ++j) { 41 | k = (int) (unif_rand() * realN); 42 | x[j + i * mdim] = x[j + k * mdim]; 43 | } 44 | } 45 | } 46 | 47 | #include "stdio.h" 48 | void normClassWt(int *cl, const int nsample, const int nclass, 49 | const int useWt, double *classwt, int *classFreq) { 50 | int i; 51 | double sumwt = 0.0; 52 | //printf("useWt %d",useWt); 53 | if (useWt) { 54 | //printf("User supplied via priors classwt"); 55 | /* Normalize user-supplied weights so they sum to one. */ 56 | for (i = 0; i < nclass; ++i) sumwt += classwt[i]; 57 | //printf("\n sumwt %f",sumwt); 58 | for (i = 0; i < nclass; ++i) classwt[i] /= sumwt; 59 | } else { 60 | for (i = 0; i < nclass; ++i) { 61 | classwt[i] = ((double) classFreq[i]) / nsample; 62 | } 63 | } 64 | for (i = 0; i < nclass; ++i) { 65 | classwt[i] = classFreq[i] ? classwt[i] * nsample / classFreq[i] : 0.0; 66 | } 67 | } 68 | 69 | void makeA(double *x, const int mdim, const int nsample, int *cat, int *a, 70 | int *b) { 71 | /* makeA() constructs the mdim by nsample integer array a. For each 72 | numerical variable with values x(m, n), n=1, ...,nsample, the x-values 73 | are sorted from lowest to highest. Denote these by xs(m, n). Then 74 | a(m,n) is the case number in which xs(m, n) occurs. The b matrix is 75 | also contructed here. If the mth variable is categorical, then 76 | a(m, n) is the category of the nth case number. */ 77 | int i, j, n1, n2; 78 | double *v= (double *) calloc(nsample, sizeof(double)); 79 | int *index = (int *) calloc(nsample, sizeof(int)); 80 | 81 | for (i = 0; i < mdim; ++i) { 82 | if (cat[i] == 1) { /* numerical predictor */ 83 | for (j = 0; j < nsample; ++j) { 84 | v[j] = x[i + j * mdim]; 85 | index[j] = j + 1; 86 | } 87 | R_qsort_I(v, index, 1, nsample); 88 | 89 | /* this sorts the v(n) in ascending order. index(n) is the case 90 | number of that v(n) nth from the lowest (assume the original 91 | case numbers are 1,2,...). */ 92 | for (j = 0; j < nsample-1; ++j) { 93 | n1 = index[j]; 94 | n2 = index[j + 1]; 95 | a[i + j * mdim] = n1; 96 | if (j == 0) b[i + (n1-1) * mdim] = 1; 97 | b[i + (n2-1) * mdim] = (v[j] < v[j + 1]) ? 98 | b[i + (n1-1) * mdim] + 1 : b[i + (n1-1) * mdim]; 99 | } 100 | a[i + (nsample-1) * mdim] = index[nsample-1]; 101 | } else { /* categorical predictor */ 102 | for (j = 0; j < nsample; ++j) 103 | a[i + j*mdim] = (int) x[i + j * mdim]; 104 | } 105 | } 106 | free(index); 107 | free(v); 108 | } 109 | 110 | 111 | void modA(int *a, int *nuse, const int nsample, const int mdim, 112 | int *cat, const int maxcat, int *ncase, int *jin) { 113 | int i, j, k, m, nt; 114 | 115 | *nuse = 0; 116 | for (i = 0; i < nsample; ++i) if (jin[i]) (*nuse)++; 117 | 118 | for (i = 0; i < mdim; ++i) { 119 | k = 0; 120 | nt = 0; 121 | if (cat[i] == 1) { 122 | for (j = 0; j < nsample; ++j) { 123 | if (jin[a[i + k * mdim] - 1]) { 124 | a[i + nt * mdim] = a[i + k * mdim]; 125 | k++; 126 | } else { 127 | for (m = 0; m < nsample - k; ++m) { 128 | if (jin[a[i + (k + m) * mdim] - 1]) { 129 | a[i + nt * mdim] = a[i + (k + m) * mdim]; 130 | k += m + 1; 131 | break; 132 | } 133 | } 134 | } 135 | nt++; 136 | if (nt >= *nuse) break; 137 | } 138 | } 139 | } 140 | if (maxcat > 1) { 141 | k = 0; 142 | nt = 0; 143 | for (i = 0; i < nsample; ++i) { 144 | if (jin[k]) { 145 | k++; 146 | ncase[nt] = k; 147 | } else { 148 | for (j = 0; j < nsample - k; ++j) { 149 | if (jin[k + j]) { 150 | ncase[nt] = k + j + 1; 151 | k += j + 1; 152 | break; 153 | } 154 | } 155 | } 156 | nt++; 157 | if (nt >= *nuse) break; 158 | } 159 | } 160 | } 161 | 162 | void Xtranslate(double *x, int mdim, int nrnodes, int nsample, 163 | int *bestvar, int *bestsplit, int *bestsplitnext, 164 | double *xbestsplit, int *nodestatus, int *cat, int treeSize) { 165 | /* 166 | this subroutine takes the splits on numerical variables and translates them 167 | back into x-values. It also unpacks each categorical split into a 168 | 32-dimensional vector with components of zero or one--a one indicates that 169 | the corresponding category goes left in the split. 170 | */ 171 | 172 | int i, m; 173 | 174 | for (i = 0; i < treeSize; ++i) { 175 | if (nodestatus[i] == 1) { 176 | m = bestvar[i] - 1; 177 | if (cat[m] == 1) { 178 | xbestsplit[i] = 0.5 * (x[m + (bestsplit[i] - 1) * mdim] + 179 | x[m + (bestsplitnext[i] - 1) * mdim]); 180 | } else { 181 | xbestsplit[i] = (double) bestsplit[i]; 182 | } 183 | } 184 | } 185 | } 186 | 187 | void permuteOOB(int m, double *x, int *in, int nsample, int mdim) { 188 | /* Permute the OOB part of a variable in x. 189 | * Argument: 190 | * m: the variable to be permuted 191 | * x: the data matrix (variables in rows) 192 | * in: vector indicating which case is OOB 193 | * nsample: number of cases in the data 194 | * mdim: number of variables in the data 195 | */ 196 | double *tp, tmp; 197 | int i, last, k, nOOB = 0; 198 | 199 | tp = (double *) calloc(nsample, sizeof(double)); 200 | 201 | for (i = 0; i < nsample; ++i) { 202 | /* make a copy of the OOB part of the data into tp (for permuting) */ 203 | if (in[i] == 0) { 204 | tp[nOOB] = x[m + i*mdim]; 205 | nOOB++; 206 | } 207 | } 208 | /* Permute tp */ 209 | last = nOOB; 210 | for (i = 0; i < nOOB; ++i) { 211 | k = (int) (last * unif_rand()); 212 | tmp = tp[last - 1]; 213 | tp[last - 1] = tp[k]; 214 | tp[k] = tmp; 215 | last--; 216 | } 217 | 218 | /* Copy the permuted OOB data back into x. */ 219 | nOOB = 0; 220 | for (i = 0; i < nsample; ++i) { 221 | if (in[i] == 0) { 222 | x[m + i*mdim] = tp[nOOB]; 223 | nOOB++; 224 | } 225 | } 226 | free(tp); 227 | } 228 | 229 | /* Compute proximity. */ 230 | void computeProximity(double *prox, int oobprox, int *node, int *inbag, 231 | int *oobpair, int n) { 232 | /* Accumulate the number of times a pair of points fall in the same node. 233 | prox: n x n proximity matrix 234 | oobprox: should the accumulation only count OOB cases? (0=no, 1=yes) 235 | node: vector of terminal node labels 236 | inbag: indicator of whether a case is in-bag 237 | oobpair: matrix to accumulate the number of times a pair is OOB together 238 | n: total number of cases 239 | */ 240 | int i, j; 241 | for (i = 0; i < n; ++i) { 242 | for (j = i+1; j < n; ++j) { 243 | if (oobprox) { 244 | /* if (jin[k] == 0 && jin[n] == 0) { */ 245 | if ((inbag[i] > 0) || (inbag[j] > 0)) { 246 | oobpair[j*n + i] ++; 247 | oobpair[i*n + j] ++; 248 | if (node[i] == node[j]) { 249 | prox[j*n + i] += 1.0; 250 | prox[i*n + j] += 1.0; 251 | } 252 | } 253 | } else { 254 | if (node[i] == node[j]) { 255 | prox[j*n + i] += 1.0; 256 | prox[i*n + j] += 1.0; 257 | } 258 | } 259 | } 260 | } 261 | } 262 | 263 | int pack(int nBits, int *bits) { 264 | int i = nBits, pack = 0; 265 | while (--i >= 0) pack += bits[i] << i; 266 | return(pack); 267 | } 268 | 269 | void unpack(unsigned int pack, int *bits) { 270 | /* pack is a 4-byte integer. The sub. returns icat, an integer array of 271 | zeroes and ones corresponding to the coefficients in the binary expansion 272 | of pack. */ 273 | int i; 274 | for (i = 0; pack != 0; pack >>= 1, ++i) bits[i] = pack & 1; 275 | } 276 | 277 | #ifdef OLD 278 | 279 | double oldpack(int l, int *icat) { 280 | /* icat is a binary integer with ones for categories going left 281 | * and zeroes for those going right. The sub returns npack- the integer */ 282 | int k; 283 | double pack = 0.0; 284 | 285 | for (k = 0; k < l; ++k) { 286 | if (icat[k]) pack += R_pow_di(2.0, k); 287 | } 288 | return(pack); 289 | } 290 | 291 | 292 | void oldunpack(int l, int npack, int *icat) { 293 | /* 294 | * npack is a long integer. The sub. returns icat, an integer of zeroes and 295 | * ones corresponding to the coefficients in the binary expansion of npack. 296 | */ 297 | int i; 298 | zeroInt(icat, 32); 299 | icat[0] = npack % 2; 300 | for (i = 1; i < l; ++i) { 301 | npack = (npack - icat[i-1]) / 2; 302 | icat[i] = npack % 2; 303 | } 304 | } 305 | 306 | 307 | 308 | #endif /* OLD */ 309 | -------------------------------------------------------------------------------- /classification/utilities.c: -------------------------------------------------------------------------------- 1 | 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | 11 | #include "utilities.h" 12 | 13 | 14 | /***************************************************************************** 15 | NAME: write_message 16 | 17 | PURPOSE: Writes a formatted log message to the specified file handle. 18 | 19 | RETURN VALUE: None 20 | 21 | NOTES: 22 | - Log Message Format: 23 | yyyy-mm-dd HH:mm:ss pid:module [filename]:line message 24 | *****************************************************************************/ 25 | void write_message 26 | ( 27 | const char *message, /* I: message to write to the log */ 28 | const char *module, /* I: module the message is from */ 29 | const char *type, /* I: type of the error */ 30 | char *file, /* I: file the message was generated in */ 31 | int line, /* I: line number in the file where the message was 32 | generated */ 33 | FILE *fd /* I: where to write the log message */ 34 | ) 35 | { 36 | time_t current_time; 37 | struct tm *time_info; 38 | int year; 39 | pid_t pid; 40 | 41 | time (¤t_time); 42 | time_info = localtime (¤t_time); 43 | year = time_info->tm_year + 1900; 44 | 45 | pid = getpid (); 46 | 47 | fprintf (fd, "%04d:%02d:%02d %02d:%02d:%02d %d:%s [%s]:%d [%s]:%s\n", 48 | year, 49 | time_info->tm_mon, 50 | time_info->tm_mday, 51 | time_info->tm_hour, 52 | time_info->tm_min, 53 | time_info->tm_sec, 54 | pid, module, basename (file), line, type, message); 55 | } 56 | -------------------------------------------------------------------------------- /classification/utilities.h: -------------------------------------------------------------------------------- 1 | 2 | #ifndef UTILITIES_H 3 | #define UTILITIES_H 4 | 5 | 6 | #include 7 | 8 | 9 | #define LOG_MESSAGE(message, module) \ 10 | write_message((message), (module), "INFO", \ 11 | __FILE__, __LINE__, stdout); 12 | 13 | 14 | #define WARNING_MESSAGE(message, module) \ 15 | write_message((message), (module), "WARNING", \ 16 | __FILE__, __LINE__, stdout); 17 | 18 | 19 | #define ERROR_MESSAGE(message, module) \ 20 | write_message((message), (module), "ERROR", \ 21 | __FILE__, __LINE__, stdout); 22 | 23 | 24 | #define RETURN_ERROR(message, module, status) \ 25 | {write_message((message), (module), "ERROR", \ 26 | __FILE__, __LINE__, stdout); \ 27 | return (status);} 28 | 29 | 30 | void write_message 31 | ( 32 | const char *message, /* I: message to write to the log */ 33 | const char *module, /* I: module the message is from */ 34 | const char *type, /* I: type of the error */ 35 | char *file, /* I: file the message was generated in */ 36 | int line, /* I: line number in the file where the message was 37 | generated */ 38 | FILE * fd /* I: where to write the log message */ 39 | ); 40 | 41 | 42 | #endif /* UTILITIES_H */ 43 | -------------------------------------------------------------------------------- /docker/Makefile: -------------------------------------------------------------------------------- 1 | TAG_PREFIX = usgseros 2 | DOCKERHUB_ORG = $(TAG_PREFIX) 3 | 4 | all: debian-c-ccdc ubuntu-c-ccdc 5 | 6 | publish-docker: all debian-publish-c-ccdc ubuntu-publish-c-ccdc 7 | 8 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9 | # Common 10 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11 | # 12 | # Note that the format of these base/common targets is: 13 | # >-$ docker build -t 14 | 15 | base-c-ccdc: 16 | @docker build -t $(TAG_PREFIX)/$(SYSTEM)-c-ccdc $(SYSTEM) 17 | 18 | base-publish: 19 | @docker push $(DOCKERHUB_ORG)/$(REPO) 20 | 21 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 22 | # Debian 23 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 24 | 25 | debian-c-ccdc: 26 | @SYSTEM=debian make base-c-ccdc 27 | 28 | debian-publish-c-ccdc: 29 | @REPO=debian-c-ccdc make base-publish 30 | 31 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 32 | # Ubuntu 33 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 34 | 35 | ubuntu-c-ccdc: 36 | @SYSTEM=ubuntu make base-c-ccdc 37 | 38 | ubuntu-publish-c-ccdc: 39 | @REPO=ubuntu-c-ccdc make base-publish 40 | 41 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42 | # CentOS 43 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 44 | 45 | # TBD 46 | 47 | -------------------------------------------------------------------------------- /docker/debian/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM usgseros/debian-python 2 | MAINTAINER USGS LCMAP http://eros.usgs.gov 3 | 4 | RUN apt-get install -y libgsl0-dev libgsl0ldbl gsl-bin \ 5 | libmatio-dev libmatio2 gfortran 6 | 7 | ENV CCDC_BIN /root/bin 8 | RUN mkdir $CCDC_BIN 9 | 10 | RUN git clone https://github.com/USGS-EROS/lcmap-change-detection-c.git 11 | RUN cd lcmap-change-detection-c && \ 12 | BIN=$CCDC_BIN make 13 | 14 | ENTRYPOINT ["/root/bin/ccdc"] 15 | 16 | -------------------------------------------------------------------------------- /docker/debian/run.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import subprocess 3 | 4 | 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument("--arg-1", help="an arg") 7 | parser.add_argument("--arg-2", help="another arg") 8 | args = parser.parse_args() 9 | #print(subprocess.check_output(["echo", "Testing output; args:", args.arg_1, args.arg_2])) 10 | print("Testing output; args:", args.arg_1, args.arg_2) 11 | 12 | -------------------------------------------------------------------------------- /docker/ubuntu/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM usgseros/ubuntu-python 2 | MAINTAINER USGS LCMAP http://eros.usgs.gov 3 | 4 | RUN apt-get install -y libgsl0-dev libgsl0ldbl gsl-bin \ 5 | libmatio-dev libmatio2 gfortran 6 | 7 | ENV CCDC_BIN /root/bin 8 | RUN mkdir $CCDC_BIN 9 | 10 | RUN git clone https://github.com/USGS-EROS/lcmap-change-detection-c.git 11 | RUN cd lcmap-change-detection-c && \ 12 | BIN=$CCDC_BIN make 13 | 14 | ENTRYPOINT ["/root/bin/ccdc"] 15 | 16 | -------------------------------------------------------------------------------- /docker/ubuntu/run.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import subprocess 3 | 4 | 5 | parser = argparse.ArgumentParser() 6 | parser.add_argument("--arg-1", help="an arg") 7 | parser.add_argument("--arg-2", help="another arg") 8 | args = parser.parse_args() 9 | #print(subprocess.check_output(["echo", "Testing output; args:", args.arg_1, args.arg_2])) 10 | print("Testing output; args:", args.arg_1, args.arg_2) 11 | 12 | -------------------------------------------------------------------------------- /docs/CCDC ADD CY5 V1.0.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/repository-preservation/lcmap-change-detection-c/fee9b1303719cd0e0045234a3f2a594303692ac0/docs/CCDC ADD CY5 V1.0.docx -------------------------------------------------------------------------------- /docs/ccdc_work_flow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/repository-preservation/lcmap-change-detection-c/fee9b1303719cd0e0045234a3f2a594303692ac0/docs/ccdc_work_flow.pdf -------------------------------------------------------------------------------- /docs/flowchart_description.txt: -------------------------------------------------------------------------------- 1 | 1. Only landsat scene with more than 20% clear-sky pixels are used in ccdc analysis. 2 | 2. Only run CCDC for pixels where more than 50% of images has data 3 | 3. If more than 75% of valid data show a pixel is snow, then the pixel is considered permanent 4 | snow. 5 | 4. For permanent snow pixel, get model coefficients and RMSEs either from lasso regression if more than 6 | 12 valid data points or from median values if less tha 12 points. 7 | 5. At least 12 data points and 1 years of data are needed for clear-sky land/water pixel analysis. 8 | 6. Modle initialization: Further remove outliers (cloud/cloud shadow/snow) using Robust fit. 9 | 7. Model fitting: Do lasso regression and get the right starting point with observed values and 10 | lasso regressing predicated values 11 | 8. Stable Start: 12 data points and 1 years of data? If no, increses data points, otherwise, next step. 12 | 9. Model ready: Find the previous break point & use lasso model to detect change if there is enough points(6) 13 | or use median values if less than conse (6) points 14 | 10. Real change detected (Minimum magnitude of change is larger than a threshold and the first point of change 15 | is less than another threshold)? 16 | 11. If yes, then model coefficients, RMSEs, magnitude of change are generated through lasso regression 17 | 12. If no, the false detected point is removed. 18 | 13. Check to see if enough points for model fitting and break point confirmation? If yes, do lasso regression, 19 | otherwise, use median values to calculate RMSEs 20 | 14. Continuous monitoring: If first 24 points, dynamic model fitting for model coeffs, rmse, v_dif and calculate 21 | norm of v_dif & update IDs, otherwise, next step. 22 | 15. Updated data points as needed to use the closest 24 data points for model fitting 23 | 16. Detected change based on RMSEs and normalized value v_dif 24 | 17. If break happens close to the end of time serires, roboust fit outlier removing process is first appled, 25 | then if the data points after the change detected (break point) are less than conse (6) points, median 26 | values are used, otherwise, lasso model fitting is used. 27 | 18. Model outputs: model coefficients, RMSEs, change of magnitude, times of start, end, break, change probability, 28 | change category 29 | 30 | 31 | 32 | --------------------------------------------------------------------------------