├── .gitignore
├── Makefile
├── README.md
├── ccdc
├── 2d_array.c
├── 2d_array.h
├── Makefile
├── ccdc.c
├── ccdc.h
├── const.h
├── defines.h
├── gdb.args
├── glmnet5.f
├── input.c
├── input.h
├── misc.c
├── multirobust.c
├── output.h
├── scripts
│ ├── cloudCover.pl
│ ├── renameMTLfiles.sh
│ ├── tileLandsat.sh
│ └── tileLandsatParent.sh
├── utilities.c
└── utilities.h
├── classification
├── 2d_array.c
├── 2d_array.h
├── Makefile
├── classRF.c
├── classTree.c
├── classification.c
├── classification.h
├── cokus.c
├── get_args.c
├── qsort.c
├── rf.h
├── rfsub.f
├── rfutils.c
├── utilities.c
└── utilities.h
├── docker
├── Makefile
├── debian
│ ├── Dockerfile
│ └── run.py
└── ubuntu
│ ├── Dockerfile
│ └── run.py
└── docs
├── CCDC ADD CY5 V1.0.docx
├── ccdc_work_flow.pdf
└── flowchart_description.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | Y_hat.txt
2 | *.o
3 | *~
4 | bin/*
5 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | all:
2 | make ccdc
3 | make classification
4 |
5 | ccdc:
6 | cd ccdc && \
7 | make && \
8 | make install
9 |
10 | classification:
11 | cd classification && \
12 | make && \
13 | make install
14 |
15 | clean:
16 | cd ccdc && make clean
17 | cd classification && make clean
18 |
19 | docker:
20 | cd docker && make
21 |
22 | dockerhub:
23 | cd docker && make publish-docker
24 |
25 | ubuntu-bash:
26 | docker run -i -t --entrypoint=/bin/bash usgseros/ubuntu-c-ccdc -s
27 |
28 | debian-bash:
29 | docker run -i -t --entrypoint=/bin/bash usgseros/debian-c-ccdc -s
30 |
31 | .PHONY: ccdc classification clean docker
32 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## LCMAP Change Detection
2 |
3 | This project contains application source code for Change Detection C library
4 | and related scripts. **This code is deprecated, non-reviewed and non-verified. All future work is being directed toward USGS-EROS/lcmap-pyccd.**
5 |
6 |
7 | ## Installation
8 |
9 | To install, simply run the top-level ``make`` target:
10 |
11 | ```bash
12 | $ git clone git@github.com:USGS-EROS/lcmap-change-detection-c.git
13 | $ cd lcmap-change-detection-c
14 | $ make
15 | ```
16 |
17 | The executables and scripts will be installed into ``./bin`` by default. This
18 | can be overridden by setting a ``BIN`` environment variable or using a ``BIN``
19 | variable when running the target:
20 |
21 | ```bash
22 | $ BIN=/my/path/bin make
23 | ```
24 |
25 | For additional notes, such as installing dependencies (Ubuntu), overriding
26 | ``Makefile`` variables, etc., see:
27 |
28 | * [Building CCDC](../..//wiki/Building-CCDC)
29 |
30 |
31 | ## Usage
32 |
33 | [We're in active development on making this not only work, but be usable.
34 | Ticket #5 has some early usability notes + tasks that we're trying to hit right
35 | away, if you're interested in tracking this.]
36 |
37 |
38 | ## Development
39 |
40 | Development notes for C-CCDC are maintained in the project wiki. For more
41 | details, see:
42 |
43 | * [CCDC Development](../../wiki/CCDC Development)
44 | * [Using CCDC with Docker](../../CCDC-%26-Docker)
45 |
46 |
47 | ## Implementation
48 |
49 | ### CCDC - Continuous Change Detection and Classification (Algorithm)
50 |
51 | * NOTE: This algorithm is not validated and considered prototype.
52 | * See [CCDC ADD](https://landsat.usgs.gov/sites/default/files/documents/ccdc_add.pdf) for the
53 | detailed description.
54 |
55 |
56 | ## More Information
57 |
58 | This project is hosted by the US Geological Survey (USGS) Earth Resources
59 | Observation and Science (EROS) Land Change Monitoring, Assessment, and
60 | Projection ([LCMAP](https://github.com/USGS-EROS?utf8=%E2%9C%93&query=lcmap))
61 | Project. For questions regarding this source code, please contact the
62 | [Landsat Contact Us](https://landsat.usgs.gov/contactus.php) page and specify
63 | ``USGS LCMAP`` in the "Regarding" section.
64 |
--------------------------------------------------------------------------------
/ccdc/2d_array.c:
--------------------------------------------------------------------------------
1 |
2 | #include
3 | #include
4 |
5 | #include "const.h"
6 | #include "2d_array.h"
7 | #include "utilities.h"
8 | #include "defines.h"
9 |
10 |
11 | /* The 2D_ARRAY maintains a 2D array that can be sized at run-time. */
12 | typedef struct lsrd_2d_array
13 | {
14 | unsigned int signature; /* Signature used to make sure the pointer
15 | math from a row_array_ptr actually gets back to
16 | the expected structure (helps detect errors). */
17 | int rows; /* Rows in the 2D array */
18 | int columns; /* Columns in the 2D array */
19 | int member_size; /* Size of each entry in the array */
20 | void *data_ptr; /* Pointer to the data storage for the array */
21 | void **row_array_ptr; /* Pointer to an array of pointers to each row in
22 | the 2D array */
23 | double memory_block[0]; /* Block of memory for storage of the array.
24 | It is broken into two blocks. The first 'rows *
25 | sizeof(void *)' block stores the pointer the
26 | first column in each of the rows. The remainder
27 | of the block is for storing the actual data.
28 | Note: the type is double to force the worst case
29 | memory alignment on sparc boxes. */
30 | } LSRD_2D_ARRAY;
31 |
32 |
33 | /*************************************************************************
34 | NAME: allocate_2d_array
35 |
36 | PURPOSE: Allocate memory for 2D array.
37 |
38 | RETURNS: A pointer to a 2D array, or NULL if the routine fails. A pointer
39 | to an array of void pointers to the storage for each row of the
40 | array is returned. The returned pointer must be freed by the
41 | free_2d_array routine.
42 |
43 | HISTORY:
44 | Date Programmer Reason
45 | -------- --------------- -------------------------------------
46 | 3/15/2013 Song Guo Modified from LDCM IAS library
47 | **************************************************************************/
48 | void **allocate_2d_array
49 | (
50 | int rows, /* I: Number of rows for the 2D array */
51 | int columns, /* I: Number of columns for the 2D array */
52 | size_t member_size /* I: Size of the 2D array element */
53 | )
54 | {
55 | int row;
56 | LSRD_2D_ARRAY *array;
57 | size_t size;
58 | int data_start_index;
59 |
60 | /* Calculate the size needed for the array memory. The size includes the
61 | size of the base structure, an array of pointers to the rows in the
62 | 2D array, an array for the data, and additional space
63 | (2 * sizeof(void*)) to account for different memory alignment rules
64 | on some machine architectures. */
65 | size = sizeof (*array) + (rows * sizeof (void *))
66 | + (rows * columns * member_size) + 2 * sizeof (void *);
67 |
68 | /* Allocate the structure */
69 | array = malloc (size);
70 | if (!array)
71 | {
72 | RETURN_ERROR ("Failure to allocate memory for the array",
73 | "allocate_2d_array", NULL);
74 | }
75 |
76 | /* Initialize the member structures */
77 | array->signature = SIGNATURE;
78 | array->rows = rows;
79 | array->columns = columns;
80 | array->member_size = member_size;
81 |
82 | /* The array of pointers to rows starts at the beginning of the memory
83 | block */
84 | array->row_array_ptr = (void **) array->memory_block;
85 |
86 | /* The data starts after the row pointers, with the index adjusted in
87 | case the void pointer and memory block pointers are not the same
88 | size */
89 | data_start_index =
90 | rows * sizeof (void *) / sizeof (array->memory_block[0]);
91 | if ((rows % 2) == 1)
92 | data_start_index++;
93 | array->data_ptr = &array->memory_block[data_start_index];
94 |
95 | /* Initialize the row pointers */
96 | for (row = 0; row < rows; row++)
97 | {
98 | array->row_array_ptr[row] = array->data_ptr
99 | + row * columns * member_size;
100 | }
101 |
102 | return array->row_array_ptr;
103 | }
104 |
105 |
106 | /*************************************************************************
107 | NAME: free_2d_array
108 |
109 | PURPOSE: Free memory for a 2D array allocated by allocate_2d_array
110 |
111 | RETURNS: SUCCESS or FAILURE
112 |
113 | HISTORY:
114 | Date Programmer Reason
115 | -------- --------------- -------------------------------------
116 | 3/15/2013 Song Guo Modified from LDCM IAS library
117 | **************************************************************************/
118 | int free_2d_array
119 | (
120 | void **array_ptr /* I: Pointer returned by the alloc routine */
121 | )
122 | {
123 | if (array_ptr != NULL)
124 | {
125 | /* Convert the array_ptr into a pointer to the structure */
126 | LSRD_2D_ARRAY *array = GET_ARRAY_STRUCTURE_FROM_PTR (array_ptr);
127 |
128 | /* Verify it is a valid 2D array */
129 | if (array->signature != SIGNATURE)
130 | {
131 | /* Programming error of sort - exit the program */
132 | RETURN_ERROR ("Invalid signature on 2D array - memory "
133 | "corruption or programming error?", "free_2d_array",
134 | FAILURE);
135 | }
136 | free (array);
137 | }
138 |
139 | return SUCCESS;
140 | }
141 |
--------------------------------------------------------------------------------
/ccdc/2d_array.h:
--------------------------------------------------------------------------------
1 | #ifndef MISC_2D_ARRAY_H
2 | #define MISC_2D_ARRAY_H
3 |
4 |
5 | #include
6 |
7 |
8 | void **allocate_2d_array
9 | (
10 | int rows, /* I: Number of rows for the 2D array */
11 | int columns, /* I: Number of columns for the 2D array */
12 | size_t member_size /* I: Size of the 2D array element */
13 | );
14 |
15 |
16 | int get_2d_array_size
17 | (
18 | void **array_ptr, /* I: Pointer returned by the alloc routine */
19 | int *rows, /* O: Pointer to number of rows */
20 | int *columns /* O: Pointer to number of columns */
21 | );
22 |
23 |
24 | int free_2d_array
25 | (
26 | void **array_ptr /* I: Pointer returned by the alloc routine */
27 | );
28 |
29 |
30 | #endif
31 |
--------------------------------------------------------------------------------
/ccdc/Makefile:
--------------------------------------------------------------------------------
1 | # Configuration
2 | SRC_DIR = .
3 | SCRIPTS = ./scripts
4 | BIN ?= ../bin
5 | XML2INC ?= /usr/include/libxml2/libxml
6 | ESPAINC ?=
7 | GSL_SCI_INC ?= /usr/include/gsl
8 | GSL_SCI_LIB ?= /usr/lib
9 |
10 | # Set up compile options
11 | CC = gcc
12 | FORTRAN = gfortran
13 | RM = rm -f
14 | MV = mv
15 | EXTRA = -Wall -Wextra -g
16 | FFLAGS=-g -fdefault-real-8
17 |
18 | # Define the include files
19 | INC = $(wildcard $(SRC_DIR)/*.h)
20 | INCDIR = -I. -I$(SRC_DIR) -I$(GSL_SCI_INC) -I$(XML2INC) -I$(ESPAINC) -I$(GSL_SCI_INC)
21 | NCFLAGS = $(EXTRA) $(INCDIR)
22 |
23 | # Define the source code and object files
24 | #SRC = input.c 2d_array.c ccdc.c utilities.c misc.c
25 | SRC = $(wildcard $(SRC_DIR)/*.c)
26 | OBJ = $(SRC:.c=.o)
27 |
28 | # Define the object libraries
29 | LIB = -L$(GSL_SCI_LIB) -lz -lpthread -lrt -lgsl -lgslcblas -lgfortran -lm
30 |
31 | # Define the executable
32 | EXE = ccdc
33 |
34 | # Target for the executable
35 | all: $(EXE)
36 |
37 | ccdc: $(OBJ) glmnet5 $(INC)
38 | $(CC) $(NCFLAGS) -o ccdc $(OBJ) glmnet5.o $(LIB)
39 |
40 | glmnet5: $(SRC) glmnet5.f
41 | $(FORTRAN) $(FFLAGS) -c glmnet5.f -o glmnet5.o
42 |
43 |
44 | $(BIN):
45 | mkdir -p $(BIN)
46 |
47 | install: $(BIN)
48 | mv $(EXE) $(BIN)
49 | cp $(SCRIPTS)/* $(BIN)
50 |
51 | clean:
52 | $(RM) $(BIN)/$(EXE)
53 | $(RM) $(BIN)/*.r
54 | $(RM) *.o
55 |
56 | $(OBJ): $(INC)
57 |
58 | .c.o:
59 | $(CC) $(NCFLAGS) $(INCDIR) -c $<
60 |
61 |
--------------------------------------------------------------------------------
/ccdc/ccdc.h:
--------------------------------------------------------------------------------
1 | #ifndef CCDC_H
2 | #define CCDC_H
3 |
4 |
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 |
12 | #include "input.h"
13 |
14 | int get_args
15 | (
16 | int argc, /* I: number of cmd-line args */
17 | char *argv[], /* I: string of cmd-line args */
18 | int *row, /* O: row number for the pixel */
19 | int *col, /* O: col number for the pixel */
20 | char *in_path, /* O: directory location of input data */
21 | char *out_path, /* O: direcotry location of output files */
22 | char *data_type, /* O: data type: tifs, bip, stdin, bip_lines. */
23 | char *scene_list_file, /* O: optional file name of list of sceneIDs */
24 | bool *verbose /* O: verbose flag */
25 | );
26 |
27 | void get_scenename
28 | (
29 | const char *filename, /* I: Name of file to split */
30 | char *directory, /* O: Directory portion of file name */
31 | char *scene_name, /* O: Scene name portion of the file name */
32 | char *appendix /* O: Appendix portion of the file name */
33 | );
34 |
35 | int create_scene_list
36 | (
37 | const char *item, /* I: string of file items be found */
38 | int *num_scenes, /* I/O: number of scenes */
39 | char *sceneListFileName /* I: file name containing list of scene IDs */
40 | );
41 |
42 | int convert_year_doy_to_jday_from_0000
43 | (
44 | int year, /* I: year */
45 | int doy, /* I: day of the year */
46 | int *jday /* O: julian date since year 0000 */
47 | );
48 |
49 | int sort_scene_based_on_year_doy_row
50 | (
51 | char **scene_list, /* I/O: scene_list, sorted as output */
52 | int num_scenes, /* I: number of scenes in the scene list */
53 | int *sdate /* O: year plus date since 0000 */
54 | );
55 |
56 | void quick_sort_2d_float
57 | (
58 | float arr[],
59 | float *brr[],
60 | int left,
61 | int right
62 | );
63 |
64 | void update_cft
65 | (
66 | int i_span,
67 | int n_times,
68 | int min_num_c,
69 | int mid_num_c,
70 | int max_num_c,
71 | int num_c,
72 | int *update_number_c
73 | );
74 |
75 | int median_variogram
76 | (
77 | float **array, /* I: input array */
78 | int dim1_len, /* I: dimension 1 length in input array */
79 | int dim2_start, /* I: dimension 2 start index */
80 | int dim2_end, /* I: dimension 2 end index */
81 | float *output_array /* O: output array */
82 | );
83 |
84 | void split_directory_scenename
85 | (
86 | const char *filename, /* I: Name of scene with path to split */
87 | char *directory, /* O: Directory portion of file name */
88 | char *scene_name /* O: Scene name portion of the file name */
89 | );
90 |
91 | void rmse_from_square_root_mean
92 | (
93 | float **array, /* I: input array */
94 | float fit_cft, /* I: input fit_cft value */
95 | int dim1_index, /* I: dimension 1 index in input array */
96 | int dim2_len, /* I: dimension 2 length */
97 | float *rmse /* O: output rmse */
98 | );
99 |
100 | void partial_square_root_mean
101 | (
102 | float **array, /* I: input array */
103 | int dim1_index, /* I: 1st dimension index */
104 | int start, /* I: number of start elements in 1st dim */
105 | int end, /* I: number of end elements in 1st dim */
106 | float **fit_ctf, /* I: */
107 | float *rmse /* O: output rmse value */
108 | );
109 |
110 | void matlab_2d_array_mean
111 | (
112 | float **array, /* I: input array */
113 | int dim1_index, /* I: 1st dimension index */
114 | int dim2_len, /* I: number of input elements in 2nd dim */
115 | float *output_mean /* O: output norm value */
116 | );
117 |
118 | void matlab_2d_float_median
119 | (
120 | float **array, /* I: input array */
121 | int dim1_index, /* I: 1st dimension index */
122 | int dim2_len, /* I: number of input elements in 2nd dim */
123 | float *output_median /* O: output norm value */
124 | );
125 |
126 | void matlab_2d_partial_mean
127 | (
128 | float **array, /* I: input array */
129 | int dim1_index, /* I: 1st dimension index */
130 | int start, /* I: number of start elements in 2nd dim */
131 | int end, /* I: number of end elements in 2nd dim */
132 | float *output_mean /* O: output norm value */
133 | );
134 |
135 | void matlab_float_2d_partial_median
136 | (
137 | float **array, /* I: input array */
138 | int dim1_index, /* I: 1st dimension index */
139 | int start, /* I: number of start elements in 2nd dim */
140 | int end, /* I: number of end elements in 2nd dim */
141 | float *output_median /* O: output norm value */
142 | );
143 |
144 | void matlab_2d_partial_square_mean
145 | (
146 | float **array, /* I: input array */
147 | int dim1_index, /* I: 1st dimension index */
148 | int start, /* I: number of start elements in 2nd dim */
149 | int end, /* I: number of end elements in 2nd dim */
150 | float *output_mean /* O: output norm value */
151 | );
152 |
153 | void matlab_2d_array_norm
154 | (
155 | float **array, /* I: input array */
156 | int dim1_index, /* I: 1st dimension index */
157 | int dim2_len, /* I: number of input elements in 2nd dim */
158 | float *output_norm /* O: output norm value */
159 | );
160 |
161 | void get_ids_length
162 | (
163 | int *id_array, /* I: input array */
164 | int start, /* I: array start index */
165 | int end, /* I: array end index */
166 | int *id_len /* O: number of non-zero number in the array */
167 | );
168 |
169 | void matlab_unique
170 | (
171 | int *clrx,
172 | float **clry,
173 | int nums,
174 | int *new_nums
175 | );
176 |
177 | int auto_mask
178 | (
179 | int *clrx,
180 | float **clry,
181 | int start,
182 | int end,
183 | float years,
184 | float t_b1,
185 | float t_b2,
186 | float n_t,
187 | int *bl_ids
188 | );
189 |
190 | int auto_ts_fit
191 | (
192 | int *clrx,
193 | float **clry,
194 | int band_index,
195 | int start,
196 | int end,
197 | int df,
198 | float **coefs,
199 | float *rmse,
200 | float **v_dif
201 | );
202 |
203 | int auto_ts_predict
204 | (
205 | int *clrx,
206 | float **coefs,
207 | int df,
208 | int band_index,
209 | int start,
210 | int end,
211 | float *pred_y
212 | );
213 |
214 | extern void elnet_(
215 |
216 | // input:
217 |
218 | int *ka, // ka = algorithm flag
219 | // ka=1 => covariance updating algorithm
220 | // ka=2 => naive algorithm
221 | double *parm, // parm = penalty member index (0 <= parm <= 1)
222 | // = 0.0 => ridge
223 | // = 1.0 => lasso
224 | int *no, // no = number of observations
225 | int *ni, // ni = number of predictor variables
226 | double *x, // x[ni][no] = predictor data matrix flat file (overwritten)
227 | double *y, // y[no] = response vector (overwritten)
228 | double *w, // w[no]= observation weights (overwritten)
229 | int *jd, // jd(jd(1)+1) = predictor variable deletion flag
230 | // jd(1) = 0 => use all variables
231 | // jd(1) != 0 => do not use variables jd(2)...jd(jd(1)+1)
232 | double *vp, // vp(ni) = relative penalties for each predictor variable
233 | // vp(j) = 0 => jth variable unpenalized
234 | double cl[][2], // cl(2,ni) = interval constraints on coefficient values (overwritten)
235 | // cl(1,j) = lower bound for jth coefficient value (<= 0.0)
236 | // cl(2,j) = upper bound for jth coefficient value (>= 0.0)
237 | int *ne, // ne = maximum number of variables allowed to enter largest model
238 | // (stopping criterion)
239 | int *nx, // nx = maximum number of variables allowed to enter all models
240 | // along path (memory allocation, nx > ne).
241 | int *nlam, // nlam = (maximum) number of lamda values
242 | double *flmin, // flmin = user control of lamda values (>=0)
243 | // flmin < 1.0 => minimum lamda = flmin*(largest lamda value)
244 | // flmin >= 1.0 => use supplied lamda values (see below)
245 | double *ulam, // ulam(nlam) = user supplied lamda values (ignored if flmin < 1.0)
246 | double *thr, // thr = convergence threshold for each lamda solution.
247 | // iterations stop when the maximum reduction in the criterion value
248 | // as a result of each parameter update over a single pass
249 | // is less than thr times the null criterion value.
250 | // (suggested value, thr=1.0e-5)
251 | int *isd, // isd = predictor variable standarization flag:
252 | // isd = 0 => regression on original predictor variables
253 | // isd = 1 => regression on standardized predictor variables
254 | // Note: output solutions always reference original
255 | // variables locations and scales.
256 | int *intr, // intr = intercept flag
257 | // intr = 0/1 => don't/do include intercept in model
258 | int *maxit, // maxit = maximum allowed number of passes over the data for all lambda
259 | // values (suggested values, maxit = 100000)
260 |
261 | // output:
262 |
263 | int *lmu, // lmu = actual number of lamda values (solutions)
264 | double *a0, // a0(lmu) = intercept values for each solution
265 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution
266 | int *ia, // ia(nx) = pointers to compressed coefficients
267 | int *nin, // nin(lmu) = number of compressed coefficients for each solution
268 | double *rsq, // rsq(lmu) = R**2 values for each solution
269 | double *alm, // alm(lmu) = lamda values corresponding to each solution
270 | int *nlp, // nlp = actual number of passes over the data for all lamda values
271 | int *jerr // jerr = error flag:
272 | // jerr = 0 => no error
273 | // jerr > 0 => fatal error - no output returned
274 | // jerr < 7777 => memory allocation error
275 | // jerr = 7777 => all used predictors have zero variance
276 | // jerr = 10000 => maxval(vp) <= 0.0
277 | // jerr < 0 => non fatal error - partial output:
278 | // Solutions for larger lamdas (1:(k-1)) returned.
279 | // jerr = -k => convergence for kth lamda value not reached
280 | // after maxit (see above) iterations.
281 | // jerr = -10000-k => number of non zero coefficients along path
282 | // exceeds nx (see above) at kth lamda value.
283 | );
284 |
285 | extern void spelnet_(
286 |
287 | // input:
288 |
289 | int *ka, // ka = algorithm flag
290 | // ka=1 => covariance updating algorithm
291 | // ka=2 => naive algorithm
292 | double *parm, // parm = penalty member index (0 <= parm <= 1)
293 | // = 0.0 => ridge
294 | // = 1.0 => lasso
295 | int *no, // no = number of observations
296 | int *ni, // ni = number of predictor variables
297 | double *x, // x[ni][no] = predictor data matrix flat file (overwritten)
298 | double *y, // y[no] = response vector (overwritten)
299 | double *w, // w[no]= observation weights (overwritten)
300 | int *jd, // jd(jd(1)+1) = predictor variable deletion flag
301 | // jd(1) = 0 => use all variables
302 | // jd(1) != 0 => do not use variables jd(2)...jd(jd(1)+1)
303 | double *vp, // vp(ni) = relative penalties for each predictor variable
304 | // vp(j) = 0 => jth variable unpenalized
305 | double cl[][2], // cl(2,ni) = interval constraints on coefficient values (overwritten)
306 | // cl(1,j) = lower bound for jth coefficient value (<= 0.0)
307 | // cl(2,j) = upper bound for jth coefficient value (>= 0.0)
308 | int *ne, // ne = maximum number of variables allowed to enter largest model
309 | // (stopping criterion)
310 | int *nx, // nx = maximum number of variables allowed to enter all models
311 | // along path (memory allocation, nx > ne).
312 | int *nlam, // nlam = (maximum) number of lamda values
313 | double *flmin, // flmin = user control of lamda values (>=0)
314 | // flmin < 1.0 => minimum lamda = flmin*(largest lamda value)
315 | // flmin >= 1.0 => use supplied lamda values (see below)
316 | double *ulam, // ulam(nlam) = user supplied lamda values (ignored if flmin < 1.0)
317 | double *thr, // thr = convergence threshold for each lamda solution.
318 | // iterations stop when the maximum reduction in the criterion value
319 | // as a result of each parameter update over a single pass
320 | // is less than thr times the null criterion value.
321 | // (suggested value, thr=1.0e-5)
322 | int *isd, // isd = predictor variable standarization flag:
323 | // isd = 0 => regression on original predictor variables
324 | // isd = 1 => regression on standardized predictor variables
325 | // Note: output solutions always reference original
326 | // variables locations and scales.
327 | int *intr, // intr = intercept flag
328 | // intr = 0/1 => don't/do include intercept in model
329 | int *maxit, // maxit = maximum allowed number of passes over the data for all lambda
330 | // values (suggested values, maxit = 100000)
331 |
332 | // output:
333 |
334 | int *lmu, // lmu = actual number of lamda values (solutions)
335 | double *a0, // a0(lmu) = intercept values for each solution
336 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution
337 | int *ia, // ia(nx) = pointers to compressed coefficients
338 | int *nin, // nin(lmu) = number of compressed coefficients for each solution
339 | double *rsq, // rsq(lmu) = R**2 values for each solution
340 | double *alm, // alm(lmu) = lamda values corresponding to each solution
341 | int *nlp, // nlp = actual number of passes over the data for all lamda values
342 | int *jerr // jerr = error flag:
343 | // jerr = 0 => no error
344 | // jerr > 0 => fatal error - no output returned
345 | // jerr < 7777 => memory allocation error
346 | // jerr = 7777 => all used predictors have zero variance
347 | // jerr = 10000 => maxval(vp) <= 0.0
348 | // jerr < 0 => non fatal error - partial output:
349 | // Solutions for larger lamdas (1:(k-1)) returned.
350 | // jerr = -k => convergence for kth lamda value not reached
351 | // after maxit (see above) iterations.
352 | // jerr = -10000-k => number of non zero coefficients along path
353 | // exceeds nx (see above) at kth lamda value.
354 | );
355 |
356 | /*--------------------------------------------------------------------
357 | c uncompress coefficient vectors for all solutions:
358 | c
359 | c call solns(ni,nx,lmu,ca,ia,nin,b)
360 | c
361 | c input:
362 | c
363 | c ni,nx = input to elnet
364 | c lmu,ca,ia,nin = output from elnet
365 | c
366 | c output:
367 | c
368 | c b(ni,lmu) = all elnet returned solutions in uncompressed format
369 | ----------------------------------------------------------------------*/
370 | extern int solns_(
371 | int *ni, // ni = number of predictor variables
372 | int *nx, // nx = maximum number of variables allowed to enter all models
373 | int *lmu, // lmu = actual number of lamda values (solutions)
374 | double *ca, // ca(nx,lmu) = compressed coefficient values for each solution
375 | int *ia, // ia(nx) = pointers to compressed coefficients
376 | int *nin, // nin(lmu) = number of compressed coefficients for each solution
377 | double *b // b(ni,lmu) = compressed coefficient values for each solution
378 | );
379 |
380 | extern int c_glmnet(
381 | int no, // number of observations (no)
382 | int ni, // number of predictor variables (ni)
383 | double *x, // input matrix, x[ni][no]
384 | double *y, // response vaiable, of dimentions (no)
385 | int nlam, // number of lambda values
386 | double *ulam, // value of lambda values, of dimentions (nlam)
387 | double parm, // the alpha variable
388 |
389 | int *lmu, // lmu = actual number of lamda values (solutions)
390 | double cfs[nlam][ni+1] // results = cfs[lmu][ni + 1]
391 | );
392 |
393 | #endif /* CCDC_H */
394 |
--------------------------------------------------------------------------------
/ccdc/const.h:
--------------------------------------------------------------------------------
1 | #ifndef CONST_H
2 | #define CONST_H
3 |
4 |
5 | #include
6 |
7 | typedef signed short int16;
8 | typedef unsigned char uint8;
9 | typedef signed char int8;
10 |
11 | #ifndef min
12 | #define min(a,b) (((a) < (b)) ? (a) : (b))
13 | #endif
14 |
15 |
16 | #ifndef max
17 | #define max(a,b) (((a) > (b)) ? (a) : (b))
18 | #endif
19 |
20 | #define TOTAL_BANDS 8
21 |
22 | #define PI 3.1415926535897935
23 | #define TWO_PI (2.0 * PI)
24 | #define HALF_PI (PI / 2.0)
25 |
26 |
27 | #define DEG (180.0 / PI)
28 | #define RAD (PI / 180.0)
29 |
30 |
31 | #ifndef SUCCESS
32 | #define SUCCESS 0
33 | #endif
34 |
35 | #ifndef ERROR
36 | #define ERROR -1
37 | #endif
38 |
39 |
40 | #ifndef FAILURE
41 | #define FAILURE 1
42 | #endif
43 |
44 | #ifndef TRUE
45 | #define TRUE 1
46 | #endif
47 |
48 | #ifndef FALSE
49 | #define FALSE 0
50 | #endif
51 |
52 |
53 | #define MINSIGMA 1e-5
54 |
55 | #define MAX_STR_LEN 512
56 | #define MAX_SCENE_LIST 3922
57 |
58 | #endif
59 |
--------------------------------------------------------------------------------
/ccdc/defines.h:
--------------------------------------------------------------------------------
1 | /* this is an effort to consolidate all defines. Previously, they */
2 | /* scattered throughout .c and/or .h files, and not always included */
3 | /* and/or avilable everywhere or where needed. Also, some reduncancy */
4 | /* and conflicts existed. */
5 |
6 | /* from ccdc.c */
7 | #define NUM_LASSO_BANDS 5 /* Number of bands for Least Absolute Shrinkage */
8 | /* and Selection Operator LASSO regressions */
9 | #define TOTAL_IMAGE_BANDS 7 /* Number of image bands, for loops. */
10 | #define TOTAL_BANDS 8 /* Total image plus mask bands, for loops. */
11 | #define MIN_NUM_C 4 /* Minimum number of coefficients */
12 | #define MID_NUM_C 6 /* Mid-point number of coefficients */
13 | #define MAX_NUM_C 8 /* Maximum number of coefficients */
14 | #define CONSE 6 /* No. of CONSEquential pixels 4 bldg. model*/
15 | #define N_TIMES 3 /* number of clear observations/coefficients*/
16 | #define NUM_YEARS 365.25 /* average number of days per year */
17 | #define NUM_FC 10 /* Values change with number of pixels run */
18 | #define T_CONST 4.89 /* Threshold for cloud, shadow, and snow detection */
19 | #define MIN_YEARS 1 /* minimum year for model intialization */
20 | #define T_SN 0.75 /* no change detection for permanent snow pixels */
21 | #define T_CLR 0.25 /* Fmask fails threshold */
22 | #define T_CG 15.0863 /* chi-square inversed T_cg (0.99) for noise removal */
23 | #define T_MAX_CG 35.8882 /* chi-square inversed T_max_cg (1e-6) for
24 | last step noise removal */
25 |
26 |
27 | /* from 2darray.c */
28 | /* Define a unique (i.e. random) value that can be used to verify a pointer
29 | points to an LSRD_2D_ARRAY. This is used to verify the operation succeeds to
30 | get an LSRD_2D_ARRAY pointer from a row pointer. */
31 | #define SIGNATURE 0x326589ab
32 |
33 | /* Given an address returned by the allocate routine, get a pointer to the
34 | entire structure. */
35 | #define GET_ARRAY_STRUCTURE_FROM_PTR(ptr) \
36 | ((LSRD_2D_ARRAY *)((char *)(ptr) - offsetof(LSRD_2D_ARRAY, memory_block)))
37 |
38 |
39 | /* from input.c */
40 | //#define TOTAL_IMAGE_BANDS 7
41 |
42 | /* from misc.c */
43 | /* 12-31-1972 is 720624 in julian day since year 0000 */
44 | #define JULIAN_DATE_LAST_DAY_1972 720624
45 | #define LANDSAT_START_YEAR 1973
46 | #define LEAP_YEAR_DAYS 366
47 | #define NON_LEAP_YEAR_DAYS 365
48 | #define AVE_DAYS_IN_A_YEAR 365.25
49 | #define ROBUST_COEFFS 5
50 | #define LASSO_COEFFS 8
51 | //#define TOTAL_IMAGE_BANDS 7
52 |
53 | /* from input.h */
54 | /* possible cfmask values */
55 | #define CFMASK_CLEAR 0
56 | #define CFMASK_WATER 1
57 | #define CFMASK_SHADOW 2
58 | #define CFMASK_SNOW 3
59 | #define CFMASK_CLOUD 4
60 | #define CFMASK_FILL 255
61 | #define IMAGE_FILL -9999
62 | #define CFMASK_BAND 7
63 |
64 | /* from output.h */
65 | #define FILL_VALUE 255
66 | #define NUM_COEFFS 8
67 | #define NUM_BANDS 7
68 |
--------------------------------------------------------------------------------
/ccdc/gdb.args:
--------------------------------------------------------------------------------
1 | run --row=23 --col=3022 --inDir=/shared/users/bdavis/ARD_out/grid07/C --outDir=/shared/bdavis/grid07/C_out/all --sceneList=/shared/users/bdavis/ARD_out/grid07/all.txt --verbose
2 |
3 | inDir = /shared/users/bdavis/ARD_out/grid07/C
4 | outDir = /shared/bdavis/grid07/C_out/all
5 | sceneList = /shared/users/bdavis/ARD_out/grid07/0001-0064.txt
6 | verbose = 1
7 |
8 |
--------------------------------------------------------------------------------
/ccdc/input.h:
--------------------------------------------------------------------------------
1 | #ifndef INPUT_H
2 | #define INPUT_H
3 |
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 | #include "const.h"
10 |
11 | /* possible cfmask values */
12 | #include "defines.h"
13 |
14 | /* Input file type definition */
15 | typedef enum {
16 | INPUT_TYPE_NULL = -1,
17 | INPUT_TYPE_BINARY = 0,
18 | INPUT_TYPE_MAX
19 | } Input_type_t;
20 |
21 | /* Structure for the metadata */
22 | typedef struct {
23 | int lines; /* number of lines in a scene */
24 | int samples; /* number of samples in a scene */
25 | int data_type; /* envi data type */
26 | int byte_order; /* envi byte order */
27 | int utm_zone; /* UTM zone; use a negative number if this is a
28 | southern zone */
29 | int pixel_size; /* pixel size */
30 | char interleave[MAX_STR_LEN]; /* envi save format */
31 | int upper_left_x; /* upper left x coordinates */
32 | int upper_left_y; /* upper left y coordinates */
33 | } Input_meta_t;
34 |
35 | /* Structure for the 'input' data type */
36 | typedef struct {
37 | Input_type_t file_type; /* Type of the input image files */
38 | Input_meta_t meta; /* Input metadata */
39 | FILE *fp_bin[TOTAL_BANDS][MAX_SCENE_LIST];
40 | } Input_t;
41 |
42 | /* Prototypes */
43 | FILE *open_raw_binary
44 | (
45 | char *infile, /* I: name of the input file to be opened */
46 | char *access_type /* I: string for the access type for reading the
47 | input file */
48 | );
49 |
50 | void close_raw_binary
51 | (
52 | FILE *fptr /* I: pointer to raw binary file to be closed */
53 | );
54 |
55 | int write_raw_binary
56 | (
57 | FILE *rb_fptr, /* I: pointer to the raw binary file */
58 | int nlines, /* I: number of lines to write to the file */
59 | int nsamps, /* I: number of samples to write to the file */
60 | int size, /* I: number of bytes per pixel (ex. sizeof(uint8)) */
61 | void *img_array /* I: array of nlines * nsamps * size to be written
62 | to the raw binary file */
63 | );
64 |
65 | int read_raw_binary
66 | (
67 | FILE *rb_fptr, /* I: pointer to the raw binary file */
68 | int nlines, /* I: number of lines to read from the file */
69 | int nsamps, /* I: number of samples to read from the file */
70 | int size, /* I: number of bytes per pixel (ex. sizeof(uint8)) */
71 | void *img_array /* O: array of nlines * nsamps * size to be read from
72 | the raw binary file (sufficient space should
73 | already have been allocated) */
74 | );
75 |
76 | int read_envi_header
77 | (
78 | char *data_type, /* I: input data type */
79 | char *scene_name, /* I: scene name */
80 | Input_meta_t *meta /* O: saved header file info */
81 | );
82 |
83 | int read_cfmask
84 | (
85 | int curr_scene_num, /* I: current num. in list of scenes to read */
86 | char *data_type, /* I: type of flies, tifs or single BIP */
87 | char **scene_list, /* I: current scene name in list of sceneIDs */
88 | int row, /* I: the row (Y) location within img/grid */
89 | int col, /* I: the col (X) location within img/grid */
90 | int num_samples, /* I: number of image samples (X width) */
91 | FILE ***fp_tifs, /* I/O: file ptr array for tif band file names */
92 | FILE **fp_bip, /* I/O: file pointer array for BIP file names */
93 | unsigned char *fmask_buf,/* O: pointer to cfmask band values */
94 | /* I/O: Worldwide Reference System path and row for */
95 | /* I/O: the current swath, this group of variables */
96 | /* I/O: is for filtering out swath overlap, and */
97 | int *prev_wrs_path, /* I/O: using the first of two scenes in a swath, */
98 | int *prev_wrs_row, /* I/O: , because it is recommended to use the meta */
99 | int *prev_year, /* I/O: data from the first for things like sun */
100 | int *prev_jday, /* I/O: angle, etc. However, always removing a */
101 | unsigned char *prev_fmask_buf,/* I/O: redundant x/y location specified */
102 | int *valid_scene_count,/* I/O: x/y is not always valid for gridded data, */
103 | int *swath_overlap_count,/* I/O: it may/may not be in swap overlap area. */
104 | char **valid_scene_list,/* I/O: 2-D array for list of filtered */
105 | int *clear_sum, /* I/O: Total number of clear cfmask pixels */
106 | int *water_sum, /* I/O: counter for cfmask water pixels. */
107 | int *shadow_sum, /* I/O: counter for cfmask shadow pixels. */
108 | int *sn_sum, /* I/O: Total number of snow cfmask pixels */
109 | int *cloud_sum, /* I/O: counter for cfmask cloud pixels. */
110 | int *fill_sum, /* I/O: counter for cfmask fill pixels. */
111 | int *all_sum, /* I/O: Total of all cfmask pixels */
112 | unsigned char *updated_fmask_buf, /* I/O: new entry in valid fmask values */
113 | int *updated_sdate_array, /* I/O: new buf of valid date values */
114 | int *sdate, /* I: Original array of julian date values */
115 | int *valid_num_scenes/* I/O: number of valid scenes after reading cfmask */
116 | );
117 |
118 |
119 | int read_stdin
120 | (
121 | int *updated_sdate_array, /* pointer to date values buffer. */
122 | int **buf, /* pointer to image bands buffer. */
123 | unsigned char *updated_cfmask_buf, /* pointer to cfmask pixel buffer.*/
124 | int num_bands, /* total number of bands. */
125 | int *clear_sum, /* accumulator for clear pixels. */
126 | int *water_sum, /* accumulator for water pixels. */
127 | int *shadow_sum, /* accumulator for shadow pixels. */
128 | int *snow_sum, /* accumulator for snow pixels. */
129 | int *cloud_sum, /* accumulator for cloud pixels. */
130 | int *fill_sum, /* accumulator for fill pixels. */
131 | int *all_sum, /* accumulator for all pixels. */
132 | int *valid_num_scenes, /* total scenes read. */
133 | bool debug /* flag for printing mesgs/info. */
134 | );
135 |
136 |
137 | int assign_cfmask_values
138 | (
139 | unsigned char cfmask_value, /* I: current cfmask pixel value. */
140 | int *clear_sum, /* O: accumulator for clear pixels */
141 | int *water_sum, /* O: accumulator for water pixels */
142 | int *shadow_sum, /* O: accumulator for shadow pixels */
143 | int *snow_sum, /* O: accumulator for snow pixels */
144 | int *cloud_sum, /* O: accumulator for cloud pixels */
145 | int *fill_sum, /* O: accumulator for full pixels */
146 | int *all_sum /* O: accumulator for all pixels */
147 | );
148 |
149 |
150 | int read_tifs
151 | (
152 | char *sceneID_name, /* I: current file name in list of sceneIDs */
153 | FILE ***fp_tifs, /* I/O: file pointer array for band file names */
154 | int curr_scene_num, /* I: current num. in list of scenes to read */
155 | int row, /* I: the row (Y) location within img/grid */
156 | int col, /* I: the col (X) location within img/grid */
157 | int num_samples, /* I: number of image samples (X width) */
158 | bool debug, /* I: flag for printing debug messages */
159 | int **image_buf /* O: pointer to 2-D image band values array */
160 | );
161 |
162 |
163 | int read_bip
164 | (
165 | char *current_scene_name, /* I: current file name in list of sceneIDs */
166 | FILE **fp_bip, /* I/O: file pointer array for BIP file names */
167 | int curr_scene_num, /* I: current num. in list of scenes to read */
168 | int row, /* I: the row (Y) location within img/grid */
169 | int col, /* I: the col (X) location within img/grid */
170 | int num_samples, /* I: number of image samples (X width) */
171 | int **image_buf /* O: pointer to 2-D image band values array */
172 | );
173 |
174 |
175 | void usage ();
176 |
177 | #endif
178 |
--------------------------------------------------------------------------------
/ccdc/multirobust.c:
--------------------------------------------------------------------------------
1 | /* multirobust.c
2 | *
3 | * Copyright (C) 2013 Patrick Alken
4 | *
5 | * This program is free software; you can redistribute it and/or modify
6 | * it under the terms of the GNU General Public License as published by
7 | * the Free Software Foundation; either version 3 of the License, or (at
8 | * your option) any later version.
9 | *
10 | * This program is distributed in the hope that it will be useful, but
11 | * WITHOUT ANY WARRANTY; without even the implied warranty of
12 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13 | * General Public License for more details.
14 | *
15 | * You should have received a copy of the GNU General Public License
16 | * along with this program; if not, write to the Free Software
17 | * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
18 | *
19 | * This module contains routines related to robust linear least squares. The
20 | * algorithm used closely follows the publications:
21 | *
22 | * [1] DuMouchel, W. and F. O'Brien (1989), "Integrating a robust
23 | * option into a multiple regression computing environment,"
24 | * Computer Science and Statistics: Proceedings of the 21st
25 | * Symposium on the Interface, American Statistical Association
26 | *
27 | * [2] Street, J.O., R.J. Carroll, and D. Ruppert (1988), "A note on
28 | * computing robust regression estimates via iteratively
29 | * reweighted least squares," The American Statistician, v. 42,
30 | * pp. 152-154.
31 | */
32 |
33 | //#include
34 | #include
35 | #include
36 | #include
37 | #include
38 | #include
39 | #include
40 | #include
41 | #include
42 | #include
43 |
44 | static int robust_test_convergence(const gsl_vector *c_prev, const gsl_vector *c,
45 | const double tol);
46 | static double robust_madsigma(const gsl_vector *x, gsl_multifit_robust_workspace *w);
47 | static double robust_robsigma(const gsl_vector *r, const double s,
48 | const double tune, gsl_multifit_robust_workspace *w);
49 | static double robust_sigma(const double s_ols, const double s_rob,
50 | gsl_multifit_robust_workspace *w);
51 | static int robust_covariance(const double sigma, gsl_matrix *cov,
52 | gsl_multifit_robust_workspace *w);
53 |
54 | /*
55 | gsl_multifit_robust_alloc
56 | Allocate a robust workspace
57 |
58 | Inputs: T - robust weighting algorithm
59 | n - number of observations
60 | p - number of model parameters
61 |
62 | Return: pointer to workspace
63 | */
64 |
65 | gsl_multifit_robust_workspace *
66 | gsl_multifit_robust_alloc(const gsl_multifit_robust_type *T,
67 | const size_t n, const size_t p)
68 | {
69 | gsl_multifit_robust_workspace *w;
70 |
71 | if (n < p)
72 | {
73 | GSL_ERROR_VAL("observations n must be >= p", GSL_EINVAL, 0);
74 | }
75 |
76 | w = calloc(1, sizeof(gsl_multifit_robust_workspace));
77 | if (w == 0)
78 | {
79 | GSL_ERROR_VAL("failed to allocate space for multifit_robust struct",
80 | GSL_ENOMEM, 0);
81 | }
82 |
83 | w->n = n;
84 | w->p = p;
85 | w->type = T;
86 | /* bdavis */
87 | //w->maxiter = 100; /* maximum iterations */
88 | w->maxiter = 5; /* maximum iterations */
89 | /* bdavis */
90 | w->tune = w->type->tuning_default;
91 |
92 | w->multifit_p = gsl_multifit_linear_alloc(n, p);
93 | if (w->multifit_p == 0)
94 | {
95 | GSL_ERROR_VAL("failed to allocate space for multifit_linear struct",
96 | GSL_ENOMEM, 0);
97 | }
98 |
99 | w->r = gsl_vector_alloc(n);
100 | if (w->r == 0)
101 | {
102 | GSL_ERROR_VAL("failed to allocate space for residuals",
103 | GSL_ENOMEM, 0);
104 | }
105 |
106 | w->weights = gsl_vector_alloc(n);
107 | if (w->weights == 0)
108 | {
109 | GSL_ERROR_VAL("failed to allocate space for weights", GSL_ENOMEM, 0);
110 | }
111 |
112 | w->c_prev = gsl_vector_alloc(p);
113 | if (w->c_prev == 0)
114 | {
115 | GSL_ERROR_VAL("failed to allocate space for c_prev", GSL_ENOMEM, 0);
116 | }
117 |
118 | w->resfac = gsl_vector_alloc(n);
119 | if (w->resfac == 0)
120 | {
121 | GSL_ERROR_VAL("failed to allocate space for residual factors",
122 | GSL_ENOMEM, 0);
123 | }
124 |
125 | w->psi = gsl_vector_alloc(n);
126 | if (w->psi == 0)
127 | {
128 | GSL_ERROR_VAL("failed to allocate space for psi", GSL_ENOMEM, 0);
129 | }
130 |
131 | w->dpsi = gsl_vector_alloc(n);
132 | if (w->dpsi == 0)
133 | {
134 | GSL_ERROR_VAL("failed to allocate space for dpsi", GSL_ENOMEM, 0);
135 | }
136 |
137 | w->QSI = gsl_matrix_alloc(p, p);
138 | if (w->QSI == 0)
139 | {
140 | GSL_ERROR_VAL("failed to allocate space for QSI", GSL_ENOMEM, 0);
141 | }
142 |
143 | w->D = gsl_vector_alloc(p);
144 | if (w->D == 0)
145 | {
146 | GSL_ERROR_VAL("failed to allocate space for D", GSL_ENOMEM, 0);
147 | }
148 |
149 | w->workn = gsl_vector_alloc(n);
150 | if (w->workn == 0)
151 | {
152 | GSL_ERROR_VAL("failed to allocate space for workn", GSL_ENOMEM, 0);
153 | }
154 |
155 | w->stats.sigma_ols = 0.0;
156 | w->stats.sigma_mad = 0.0;
157 | w->stats.sigma_rob = 0.0;
158 | w->stats.sigma = 0.0;
159 | w->stats.Rsq = 0.0;
160 | w->stats.adj_Rsq = 0.0;
161 | w->stats.rmse = 0.0;
162 | w->stats.sse = 0.0;
163 | w->stats.dof = n - p;
164 | w->stats.weights = w->weights;
165 | w->stats.r = w->r;
166 |
167 | return w;
168 | } /* gsl_multifit_robust_alloc() */
169 |
170 | /*
171 | gsl_multifit_robust_free()
172 | Free memory associated with robust workspace
173 | */
174 |
175 | void
176 | gsl_multifit_robust_free(gsl_multifit_robust_workspace *w)
177 | {
178 | if (w->multifit_p)
179 | gsl_multifit_linear_free(w->multifit_p);
180 |
181 | if (w->r)
182 | gsl_vector_free(w->r);
183 |
184 | if (w->weights)
185 | gsl_vector_free(w->weights);
186 |
187 | if (w->c_prev)
188 | gsl_vector_free(w->c_prev);
189 |
190 | if (w->resfac)
191 | gsl_vector_free(w->resfac);
192 |
193 | if (w->psi)
194 | gsl_vector_free(w->psi);
195 |
196 | if (w->dpsi)
197 | gsl_vector_free(w->dpsi);
198 |
199 | if (w->QSI)
200 | gsl_matrix_free(w->QSI);
201 |
202 | if (w->D)
203 | gsl_vector_free(w->D);
204 |
205 | if (w->workn)
206 | gsl_vector_free(w->workn);
207 |
208 | free(w);
209 | } /* gsl_multifit_robust_free() */
210 |
211 | int
212 | gsl_multifit_robust_tune(const double tune, gsl_multifit_robust_workspace *w)
213 | {
214 | w->tune = tune;
215 | return GSL_SUCCESS;
216 | }
217 |
218 | const char *
219 | gsl_multifit_robust_name(const gsl_multifit_robust_workspace *w)
220 | {
221 | return w->type->name;
222 | }
223 |
224 | gsl_multifit_robust_stats
225 | gsl_multifit_robust_statistics(const gsl_multifit_robust_workspace *w)
226 | {
227 | return w->stats;
228 | }
229 |
230 | /*
231 | gsl_multifit_robust()
232 | Perform robust iteratively reweighted linear least squares
233 | fit
234 |
235 | Inputs: X - design matrix of basis functions
236 | y - right hand side vector
237 | c - (output) model coefficients
238 | cov - (output) covariance matrix
239 | w - workspace
240 | */
241 |
242 | int
243 | gsl_multifit_robust(const gsl_matrix * X,
244 | const gsl_vector * y,
245 | gsl_vector * c,
246 | gsl_matrix * cov,
247 | gsl_multifit_robust_workspace *w)
248 | {
249 | /* check matrix and vector sizes */
250 | if (X->size1 != y->size)
251 | {
252 | GSL_ERROR
253 | ("number of observations in y does not match rows of matrix X",
254 | GSL_EBADLEN);
255 | }
256 | else if (X->size2 != c->size)
257 | {
258 | GSL_ERROR ("number of parameters c does not match columns of matrix X",
259 | GSL_EBADLEN);
260 | }
261 | else if (cov->size1 != cov->size2)
262 | {
263 | GSL_ERROR ("covariance matrix is not square", GSL_ENOTSQR);
264 | }
265 | else if (c->size != cov->size1)
266 | {
267 | GSL_ERROR
268 | ("number of parameters does not match size of covariance matrix",
269 | GSL_EBADLEN);
270 | }
271 | else if (X->size1 != w->n || X->size2 != w->p)
272 | {
273 | GSL_ERROR
274 | ("size of workspace does not match size of observation matrix",
275 | GSL_EBADLEN);
276 | }
277 | else
278 | {
279 | int s;
280 | double chisq;
281 | const double tol = GSL_SQRT_DBL_EPSILON;
282 | int converged = 0;
283 | size_t numit = 0;
284 | const size_t n = y->size;
285 | double sigy = gsl_stats_sd(y->data, y->stride, n);
286 | double sig_lower;
287 | size_t i;
288 |
289 | /*
290 | * if the initial fit is very good, then finding outliers by comparing
291 | * them to the residual standard deviation is difficult. Therefore we
292 | * set a lower bound on the standard deviation estimate that is a small
293 | * fraction of the standard deviation of the data values
294 | */
295 | sig_lower = 1.0e-6 * sigy;
296 | if (sig_lower == 0.0)
297 | sig_lower = 1.0;
298 |
299 | /* compute initial estimates using ordinary least squares */
300 | s = gsl_multifit_linear(X, y, c, cov, &chisq, w->multifit_p);
301 | if (s)
302 | return s;
303 |
304 | /* save Q S^{-1} of original matrix */
305 | gsl_matrix_memcpy(w->QSI, w->multifit_p->QSI);
306 | gsl_vector_memcpy(w->D, w->multifit_p->D);
307 |
308 | /* compute statistical leverage of each data point */
309 | s = gsl_linalg_SV_leverage(w->multifit_p->A, w->resfac);
310 | if (s)
311 | return s;
312 |
313 | /* correct residuals with factor 1 / sqrt(1 - h) */
314 | for (i = 0; i < n; ++i)
315 | {
316 | double h = gsl_vector_get(w->resfac, i);
317 |
318 | if (h > 0.9999)
319 | h = 0.9999;
320 |
321 | gsl_vector_set(w->resfac, i, 1.0 / sqrt(1.0 - h));
322 | }
323 |
324 | /* compute residuals from OLS fit r = y - X c */
325 | s = gsl_multifit_linear_residuals(X, y, c, w->r);
326 | if (s)
327 | return s;
328 |
329 | /* compute estimate of sigma from ordinary least squares */
330 | w->stats.sigma_ols = gsl_blas_dnrm2(w->r) / sqrt((double) w->stats.dof);
331 |
332 | while (!converged && ++numit <= w->maxiter)
333 | {
334 | double sig;
335 |
336 | /* adjust residuals by statistical leverage (see DuMouchel and O'Brien) */
337 | s = gsl_vector_mul(w->r, w->resfac);
338 | if (s)
339 | return s;
340 |
341 | /* compute estimate of standard deviation using MAD */
342 | sig = robust_madsigma(w->r, w);
343 |
344 | /* scale residuals by standard deviation and tuning parameter */
345 | gsl_vector_scale(w->r, 1.0 / (GSL_MAX(sig, sig_lower) * w->tune));
346 |
347 | /* compute weights using these residuals */
348 | s = w->type->wfun(w->r, w->weights);
349 | if (s)
350 | return s;
351 |
352 | gsl_vector_memcpy(w->c_prev, c);
353 |
354 | /* solve weighted least squares with new weights */
355 | s = gsl_multifit_wlinear(X, w->weights, y, c, cov, &chisq, w->multifit_p);
356 | if (s)
357 | return s;
358 |
359 | /* compute new residuals r = y - X c */
360 | s = gsl_multifit_linear_residuals(X, y, c, w->r);
361 | if (s)
362 | return s;
363 |
364 | converged = robust_test_convergence(w->c_prev, c, tol);
365 | }
366 |
367 | /* compute final MAD sigma */
368 | w->stats.sigma_mad = robust_madsigma(w->r, w);
369 |
370 | /* compute robust estimate of sigma */
371 | w->stats.sigma_rob = robust_robsigma(w->r, w->stats.sigma_mad, w->tune, w);
372 |
373 | /* compute final estimate of sigma */
374 | w->stats.sigma = robust_sigma(w->stats.sigma_ols, w->stats.sigma_rob, w);
375 |
376 | /* store number of iterations */
377 | w->stats.numit = numit;
378 |
379 | {
380 | double dof = (double) w->stats.dof;
381 | double rnorm = w->stats.sigma * sqrt(dof); /* see DuMouchel, sec 4.2 */
382 | double ss_err = rnorm * rnorm;
383 | double ss_tot = gsl_stats_tss(y->data, y->stride, n);
384 |
385 | /* compute R^2 */
386 | w->stats.Rsq = 1.0 - ss_err / ss_tot;
387 |
388 | /* compute adjusted R^2 */
389 | w->stats.adj_Rsq = 1.0 - (1.0 - w->stats.Rsq) * (n - 1.0) / dof;
390 |
391 | /* compute rmse */
392 | w->stats.rmse = sqrt(ss_err / dof);
393 |
394 | /* store SSE */
395 | w->stats.sse = ss_err;
396 | }
397 |
398 | /* calculate covariance matrix = sigma^2 (X^T X)^{-1} */
399 | s = robust_covariance(w->stats.sigma, cov, w);
400 | if (s)
401 | return s;
402 |
403 | /* raise an error if not converged */
404 | /* bdavis */
405 | /* Eliminating this check is to avoid an error when iterations */
406 | /* exceed 5. A better solution is probably recommended, such as */
407 | /* reverting to default of 100 if an input specification is not */
408 | /* enabled and defined. */
409 | /* bdavis@usgs.gov */
410 | /*
411 | if (numit > w->maxiter)
412 | {
413 | GSL_ERROR("maximum iterations exceeded", GSL_EMAXITER);
414 | }
415 | */
416 | /* bdavis */
417 |
418 | return s;
419 | }
420 | } /* gsl_multifit_robust() */
421 |
422 | /* Estimation of values for given x */
423 | int
424 | gsl_multifit_robust_est(const gsl_vector * x, const gsl_vector * c,
425 | const gsl_matrix * cov, double *y, double *y_err)
426 | {
427 | int s = gsl_multifit_linear_est(x, c, cov, y, y_err);
428 |
429 | return s;
430 | }
431 |
432 | /***********************************
433 | * INTERNAL ROUTINES *
434 | ***********************************/
435 |
436 | /*
437 | robust_test_convergence()
438 | Test for convergence in robust least squares
439 |
440 | Convergence criteria:
441 |
442 | |c_i^(k) - c_i^(k-1)| <= tol * max(|c_i^(k)|, |c_i^(k-1)|)
443 |
444 | for all i. k refers to iteration number.
445 |
446 | Inputs: c_prev - coefficients from previous iteration
447 | c - coefficients from current iteration
448 | tol - tolerance
449 |
450 | Return: 1 if converged, 0 if not
451 | */
452 |
453 | static int
454 | robust_test_convergence(const gsl_vector *c_prev, const gsl_vector *c,
455 | const double tol)
456 | {
457 | size_t p = c->size;
458 | size_t i;
459 |
460 | for (i = 0; i < p; ++i)
461 | {
462 | double ai = gsl_vector_get(c_prev, i);
463 | double bi = gsl_vector_get(c, i);
464 |
465 | if (fabs(bi - ai) > tol * GSL_MAX(fabs(ai), fabs(bi)))
466 | return 0; /* not yet converged */
467 | }
468 |
469 | /* converged */
470 | return 1;
471 | } /* robust_test_convergence() */
472 |
473 | /*
474 | robust_madsigma()
475 | Estimate the standard deviation of the residuals using
476 | the Median-Absolute-Deviation (MAD) of the residuals,
477 | throwing away the smallest p residuals.
478 |
479 | See: Street et al, 1988
480 |
481 | Inputs: r - vector of residuals
482 | w - workspace
483 | */
484 |
485 | static double
486 | robust_madsigma(const gsl_vector *r, gsl_multifit_robust_workspace *w)
487 | {
488 | gsl_vector_view v;
489 | double sigma;
490 | size_t n = r->size;
491 | const size_t p = w->p;
492 | size_t i;
493 |
494 | /* copy |r| into workn */
495 | for (i = 0; i < n; ++i)
496 | {
497 | gsl_vector_set(w->workn, i, fabs(gsl_vector_get(r, i)));
498 | }
499 |
500 | gsl_sort_vector(w->workn);
501 |
502 | /*
503 | * ignore the smallest p residuals when computing the median
504 | * (see Street et al 1988)
505 | */
506 | v = gsl_vector_subvector(w->workn, p - 1, n - p + 1);
507 | sigma = gsl_stats_median_from_sorted_data(v.vector.data, v.vector.stride, v.vector.size) / 0.6745;
508 |
509 | return sigma;
510 | } /* robust_madsigma() */
511 |
512 | /*
513 | robust_robsigma()
514 | Compute robust estimate of sigma so that
515 | sigma^2 * inv(X' * X) is a reasonable estimate of
516 | the covariance for robust regression. Based heavily
517 | on the equations of Street et al, 1988.
518 |
519 | Inputs: r - vector of residuals y - X c
520 | s - sigma estimate using MAD
521 | tune - tuning constant
522 | w - workspace
523 | */
524 |
525 | static double
526 | robust_robsigma(const gsl_vector *r, const double s,
527 | const double tune, gsl_multifit_robust_workspace *w)
528 | {
529 | double sigma;
530 | size_t i;
531 | const size_t n = w->n;
532 | const size_t p = w->p;
533 | const double st = s * tune;
534 | double a, b, lambda;
535 |
536 | /* compute u = r / sqrt(1 - h) / st */
537 | gsl_vector_memcpy(w->workn, r);
538 | gsl_vector_mul(w->workn, w->resfac);
539 | gsl_vector_scale(w->workn, 1.0 / st);
540 |
541 | /* compute w(u) and psi'(u) */
542 | w->type->wfun(w->workn, w->psi);
543 | w->type->psi_deriv(w->workn, w->dpsi);
544 |
545 | /* compute psi(u) = u*w(u) */
546 | gsl_vector_mul(w->psi, w->workn);
547 |
548 | /* Street et al, Eq (3) */
549 | a = gsl_stats_mean(w->dpsi->data, w->dpsi->stride, n);
550 |
551 | /* Street et al, Eq (5) */
552 | b = 0.0;
553 | for (i = 0; i < n; ++i)
554 | {
555 | double psi_i = gsl_vector_get(w->psi, i);
556 | double resfac = gsl_vector_get(w->resfac, i);
557 | double fac = 1.0 / (resfac*resfac); /* 1 - h */
558 |
559 | b += fac * psi_i * psi_i;
560 | }
561 | b /= (double) (n - p);
562 |
563 | /* Street et al, Eq (5) */
564 | lambda = 1.0 + ((double)p)/((double)n) * (1.0 - a) / a;
565 |
566 | sigma = lambda * sqrt(b) * st / a;
567 |
568 | return sigma;
569 | } /* robust_robsigma() */
570 |
571 | /*
572 | robust_sigma()
573 | Compute final estimate of residual standard deviation, using
574 | the OLS and robust sigma estimates.
575 |
576 | This equation is taken from DuMouchel and O'Brien, sec 4.1:
577 | \hat{\sigma_R}
578 |
579 | Inputs: s_ols - OLS sigma
580 | s_rob - robust sigma
581 | w - workspace
582 |
583 | Return: final estimate of sigma
584 | */
585 |
586 | static double
587 | robust_sigma(const double s_ols, const double s_rob,
588 | gsl_multifit_robust_workspace *w)
589 | {
590 | double sigma;
591 | const size_t p = w->p;
592 | const size_t n = w->n;
593 |
594 | /* see DuMouchel and O'Brien, sec 4.1 */
595 | sigma = GSL_MAX(s_rob,
596 | sqrt((s_ols*s_ols*p*p + s_rob*s_rob*n) /
597 | (p*p + n)));
598 |
599 | return sigma;
600 | } /* robust_sigma() */
601 |
602 | /*
603 | robust_covariance()
604 | Calculate final covariance matrix, defined as:
605 |
606 | sigma * (X^T X)^{-1}
607 |
608 | Inputs: sigma - residual standard deviation
609 | cov - (output) covariance matrix
610 | w - workspace
611 | */
612 |
613 | static int
614 | robust_covariance(const double sigma, gsl_matrix *cov,
615 | gsl_multifit_robust_workspace *w)
616 | {
617 | int s = 0;
618 | const size_t p = w->p;
619 | const double s2 = sigma * sigma;
620 | size_t i, j;
621 | gsl_matrix *QSI = w->QSI;
622 | gsl_vector *D = w->D;
623 |
624 | /* Form variance-covariance matrix cov = s2 * (Q S^-1) (Q S^-1)^T */
625 |
626 | for (i = 0; i < p; i++)
627 | {
628 | gsl_vector_view row_i = gsl_matrix_row (QSI, i);
629 | double d_i = gsl_vector_get (D, i);
630 |
631 | for (j = i; j < p; j++)
632 | {
633 | gsl_vector_view row_j = gsl_matrix_row (QSI, j);
634 | double d_j = gsl_vector_get (D, j);
635 | double s;
636 |
637 | gsl_blas_ddot (&row_i.vector, &row_j.vector, &s);
638 |
639 | gsl_matrix_set (cov, i, j, s * s2 / (d_i * d_j));
640 | gsl_matrix_set (cov, j, i, s * s2 / (d_i * d_j));
641 | }
642 | }
643 |
644 | return s;
645 | } /* robust_covariance() */
646 |
--------------------------------------------------------------------------------
/ccdc/output.h:
--------------------------------------------------------------------------------
1 | #ifndef OUTPUT_H
2 | #define OUTPUT_H
3 |
4 | #include "defines.h"
5 |
6 | typedef struct {
7 | int row;
8 | int col;
9 | } Position_t;
10 |
11 | /* Structure for the 'output' data type */
12 | typedef struct
13 | {
14 | int t_start; /* time when series model gets started */
15 | int t_end; /* time when series model gets ended */
16 | int t_break; /* time when the first break (change) is observed */
17 | float coefs[NUM_BANDS][NUM_COEFFS];
18 | /* coefficients for each time series model for each
19 | spectral band*/
20 | float rmse[NUM_BANDS];
21 | /* RMSE for each time series model for each
22 | spectral band*/
23 | Position_t pos; /* the location of each time series model */
24 | float change_prob; /* the probability of a pixel that have undergone
25 | change (between 0 and 100) */
26 | int num_obs; /* the number of "good" observations used for model
27 | estimation */
28 | int category; /* the quality of the model estimation (what model
29 | is used, what process is used)
30 | 1x: persistent snow 2x: persistent water
31 | 3x: Fmask fails 4x: normal precedure
32 | x1: mean value (1) x4: simple fit (4)
33 | x6: basic fit (6) x8: full fit (8) */
34 | float magnitude[NUM_BANDS];/* the magnitude of change (difference between model
35 | prediction and observation for each spectral band)*/
36 | } Output_t;
37 |
38 | #endif
39 |
--------------------------------------------------------------------------------
/ccdc/scripts/cloudCover.pl:
--------------------------------------------------------------------------------
1 | #!/usr/bin/perl
2 |
3 | # ######################################################################
4 | #
5 | # Name: cloudCover.pl
6 | #
7 | # Author: bdavis
8 | #
9 | # Date: 20151029
10 | #
11 | # Description:
12 | # Copied from cc.pl. Receives as arguments the output from gdal -hist
13 | # the first 5 values of which are number of clear, water, cloud shadow,
14 | # snow, and cloud, respectively. It returns the percentage of clear
15 | # pixels as determined by clear + water / total. cc.pl returns percent
16 | # cloud cover. This is called from bash scripts because floating
17 | # point math is difficult in bash, and rounding errors were causing
18 | # divide by zero when calculating the total pixels as a percentage of
19 | # possible pixels in a grid.
20 | #
21 | # As it turns out, we have come across 1 ESPA scene which was all fill
22 | # values, so zero valid pixels, resulting in divide by zero, hence the
23 | # check for total not equal to zero.
24 | #
25 | # Multipy result times by 100 because tileLandsat.sh is expecting
26 | # percents as integer values.
27 | #
28 | # ######################################################################
29 |
30 | $clear = $ARGV[0];
31 | $water = $ARGV[1];
32 | $cloudShadow = $ARGV[2];
33 | $snow = $ARGV[3];
34 | $cloud = $ARGV[4];
35 | ##print "clear $clear\n";
36 | ##print "water $water\n";
37 | ##print "cloudShadow $cloudShadow\n";
38 | ##print "snow $snow\n";
39 | ##print "cloud $cloud\n";
40 |
41 | $total = $clear + $water + $cloudShadow + $snow + $cloud;
42 | #$cc = (($cloudShadow + $cloud) / $total ) * 100;
43 | if ($total != 0)
44 | {
45 | $pctClear = (($clear + $water) / $total ) * 100;
46 | }
47 | else
48 | {
49 | $pctClear = 0;
50 | }
51 |
52 | ## debug
53 | # test of 1 for total gave a value of 4e-08 for pctTotal,
54 | # which still was interpreted as a valid non-zero value
55 | # and did not give a divide by zero error.
56 | #$pctTotal = $total / (5000 * 5000);
57 | #print "pctTotal $pctTotal\n";
58 | #print "total $total\n";
59 | #print "cc $cc\n";
60 | ## debug
61 | #print "$cc\n";
62 | print "$pctClear\n";
63 |
64 | exit;
65 |
--------------------------------------------------------------------------------
/ccdc/scripts/renameMTLfiles.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | for sceneID in *
4 | do
5 |
6 |
7 | cd $sceneID
8 | echo $sceneID
9 |
10 | img=`ls -1 | grep -v hdr`
11 | echo img $img
12 | hdr=`ls -1 | grep hdr`
13 | echo hdr $hdr
14 |
15 | new_img=$img"_MTLstack"
16 | new_hdr=`echo $hdr|sed 's/.hdr//'`
17 | new_hdr=$new_hdr"_MTLstack.hdr"
18 |
19 | echo img $img new_img $new_img
20 | echo hdr $hdr new_hdr $new_hdr
21 | mv $img $new_img
22 | mv $hdr $new_hdr
23 |
24 | exit
25 |
26 | cd ..
27 |
28 | done
29 |
--------------------------------------------------------------------------------
/ccdc/scripts/tileLandsat.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | ########################################################################
4 | #
5 | # tileLandsat.sh
6 | #
7 | # Parent script to "tile" landsat scenes into files of stacked bands
8 | # in ENVI format, in BIP order, sequenced as required by the CCDC.
9 | #
10 | # Assumes inputs are in a directory and ESPA-packaged .tar.gz files.
11 | # Output is written to sub-directories under a parent directory,
12 | # one per scene ID.
13 | #
14 | # 20151019 bdavis
15 | # Original development, plagiarizing from fireMapping.sh. Input
16 | # scene IDs should be arguments, somehow.
17 | #
18 | ########################################################################
19 |
20 |
21 | ########################################################################
22 | #
23 | # Set up environment. This should be valid for any of the SLURM nodes
24 | # in the EROS YetiJr environment.
25 | #
26 | ########################################################################
27 |
28 | export PATH=.:/alt/local/run:/alt/local/bin:/usr/local/local.host/bin:/usr/lib64/qt-3.3/bin:/usr/local/local.host/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/home/bdavis/bin
29 | export LD_LIBRARY_PATH=/alt/local/lib:/usr/local/local.host/ssl/lib:/usr/local/local.host/lib64:/usr/local/local.host/lib:/lib64:/usr/lib64:/usr/local/local.host/mysql/lib
30 | export PYTHONPATH=/alt/local/bin/:/alt/local/lib/python2.7/site-packages
31 |
32 |
33 | ########################################################################
34 | #
35 | # Parse arguments. So far just in and out paths.
36 | #
37 | ########################################################################
38 |
39 | date
40 | echo ""
41 |
42 | if [ $# -ne 3 ]; then
43 | echo "Usage: tileLandsat.sh input-path output-path exe"
44 | echo "Where input-path is the directory containing ESPA-created .tar.gz files,"
45 | echo "and output-path is the parent directory in which to write results in sub-directories."
46 | echo "and exe is the type of executable to prepare for, MatLAB or C."
47 | echo ""
48 | echo "For Example: tileLandsat.sh /data/bdavis/WA/2005-2015/Albers_grid07 /data/bdavis/WA/2005-2015/Tiles_grid07 C"
49 | echo ""
50 | exit
51 | fi
52 |
53 |
54 | #in=/data/bdavis/WA/2005-2015/Albers_grid07
55 | #out=/data/bdavis/WA/2005-2015/Tiles_grid07
56 |
57 | in=$1
58 | out=$2
59 | exe=$3
60 | originalDir=`pwd`
61 |
62 | echo Reading from $in, writing to $out
63 | echo ""
64 | #echo in $in
65 | #echo out $out
66 | #echo originalDir $originalDir
67 |
68 | ########################################################################
69 | #
70 | # It seems one could just read from /hsm, if it were mounted, but it's
71 | # not.
72 | #
73 | # Also, the 20% clear requirement is outdated thinking, because WA data
74 | # is ESPA rectangles (Area of Interest), with much more temporal density
75 | # (and fill), but also the change for any one x/y pixel location to be
76 | # clear/snow/water, even though the total scene/ESPA-AOI is less than
77 | # 20%. Therefore, process without checking for clear threshold and
78 | # skipping, but for the C verion only. Retain the 20% restriction for
79 | # the MatLAB version, because that is what is expected.
80 | #
81 | ########################################################################
82 |
83 | ########################################################################
84 | #
85 | # ls -1 LE7*.gz|sort -n --key=1.10,1.16
86 | # will give all path/row file names in a gridNN directory, sorted by
87 | # date, if that is required. Zhe says no, his software sorts
88 | # internally.checking with Song.
89 | # There are 174 46/27 LE7 in grid07, n of them are clear enough.
90 | #
91 | # This list of scenes needs to be parameterized. Piping an ls,
92 | # cat-ing a text file, etc. Limted to the first 24 path 46 row 27
93 | # LE7 files for initial testing purposes.
94 | #
95 | ########################################################################
96 |
97 | #for sceneID in LE70460272005010 LE70460272005042 LE70460272005058 LE70460272005074 LE70460272005106 LE70460272005154 LE70460272005170 LE70460272005186 LE70460272005202 LE70460272005218 LE70460272005234 LE70460272005250 LE70460272005266 LE70460272005282 LE70460272005298 LE70460272005314 LE70460272005330 LE70460272005346 LE70460272006045 LE70460272006061 LE70460272006077 LE70460272006109 LE70460272006125 LE70460272006141
98 |
99 | cd $in
100 |
101 | #just zz scenes
102 | #for sceneID in `ls -1 LE7046027*.gz|sort -n --key=1.10,1.16`
103 | #production
104 | for sceneID in `ls -1 *.gz|sort -n --key=1.10,1.16`
105 | #testing
106 | #for sceneID in `ls -1 LT4*.gz|sort -n --key=1.10,1.16`
107 | #for sceneID in `ls -1 LT5*.gz|sort -n --key=1.10,1.16`
108 | #for sceneID in `ls -1 LE7*.gz|sort -n --key=1.10,1.16`
109 | #for sceneID in `ls -1 LC8*.gz|sort -n --key=1.10,1.16`
110 | #for sceneID in LT50450281996227 # test of zero valid pixels
111 | #for sceneID in LT50450281996243 # test of 100 pct cloud cover
112 | #for sceneID in LT50450281995112 # test of valid scene
113 |
114 | do
115 |
116 | ####################################################################
117 | #
118 | # Set up the names to use, and unpackage the scene.
119 | #
120 | ####################################################################
121 |
122 | sceneID=`echo $sceneID|cut -c 1-16`
123 | pkg=`ls -1 $in/*.gz|grep $sceneID`
124 | echo sceneID $sceneID
125 | echo pkg $pkg
126 | echo ""
127 |
128 | # if the directory and hdr already exist, assume this is done, skip.
129 | if [ -d $out/$sceneID ]; then
130 |
131 | hdr=`ls -1 $out/$sceneID/*.hdr|wc|awk '{print $1}'`
132 | echo hdr
133 | ls $out/$sceneID/*hdr
134 |
135 | if [ $hdr -ne 0 ]; then
136 | echo skipping hdr
137 | ls $out/$sceneID
138 | echo ""
139 | continue
140 | fi
141 | fi
142 |
143 | mkdir -p $out/$sceneID
144 | tar -C $out/$sceneID -zxvf $pkg
145 | cd $out/$sceneID
146 | name=`ls -1 *cfmask.tif|cut -c 1-21`
147 | echo name $name
148 |
149 |
150 |
151 | ####################################################################
152 | #
153 | # At some point, we may need to extract the sensor, to determine
154 | # sensor-specific processing of which bands, etc.
155 | #
156 | ####################################################################
157 |
158 | sensor=`echo $sceneID|cut -c 1-3`
159 |
160 |
161 | ####################################################################
162 | #
163 | # If the scene is not at least 20% clear, skip. Clear is defined
164 | # as clear + water / total. Call a perl tool modified from
165 | # fireMapping.sh which calculated cloud cover, because we need
166 | # clear. FYI, "clear" is a reserved word, hence "cleer".
167 | # Attempting to do floating point math in bash was causing some
168 | # divide by zero errors because of rounding very small values
169 | # (less than 0.01) to integer values. cloudCover.pl also returns
170 | # zero for scenes whose values are all fill, so zero valid pixels.
171 | # (yes, I've found one.)
172 | #
173 | ####################################################################
174 |
175 | ccargs=`gdalinfo -hist *cfmask.tif|tail -2|head -1|awk '{print $1 " " $2 " " $3 " " $4 " " $5}'`
176 | echo ccargs $ccargs
177 | pctClear=`cloudCover.pl $ccargs`
178 | intPctClear=`echo $pctClear | cut -d '.' -f 1`
179 |
180 | echo Percent clear pixels in total valid pixels: $intPctClear
181 | echo ""
182 |
183 | if [ $intPctClear -lt 20 ] && [ $exe == "MatLAB" ]; then
184 | echo Percent clear pixels less than 20: $intPctClear
185 | echo ""
186 | #rm -f *.tif *.xml *.txt
187 | cd ..
188 | rm -f -r $sceneID
189 | cd $originalDir
190 | continue
191 | fi
192 |
193 |
194 | ####################################################################
195 | #
196 | # For the Song C version of ccdc, individual envi files for each
197 | # band are required. For the Zhe MatLAB version, stacked band envi
198 | # files in BIP format are required.
199 | #
200 | ####################################################################
201 |
202 | if [ $exe == "C" ]; then
203 |
204 | if [ $sensor == "LC8" ]; then
205 |
206 | gdal_translate -of ENVI $name"_sr_band2.tif" $name"_sr_band2.img"
207 | gdal_translate -of ENVI $name"_sr_band3.tif" $name"_sr_band3.img"
208 | gdal_translate -of ENVI $name"_sr_band4.tif" $name"_sr_band4.img"
209 | gdal_translate -of ENVI $name"_sr_band5.tif" $name"_sr_band5.img"
210 | gdal_translate -of ENVI $name"_sr_band6.tif" $name"_sr_band6.img"
211 | gdal_translate -of ENVI $name"_sr_band7.tif" $name"_sr_band7.img"
212 | gdal_translate -of ENVI $name"_toa_band10.tif" $name"_toa_band10.img"
213 | gdal_translate -of ENVI $name"_cfmask.tif" $name"_cfmask.img"
214 |
215 | mv *.img *sr_band2.hdr $out/.
216 |
217 | else
218 |
219 | gdal_translate -of ENVI $name"_sr_band1.tif" $name"_sr_band1.img"
220 | gdal_translate -of ENVI $name"_sr_band2.tif" $name"_sr_band2.img"
221 | gdal_translate -of ENVI $name"_sr_band3.tif" $name"_sr_band3.img"
222 | gdal_translate -of ENVI $name"_sr_band4.tif" $name"_sr_band4.img"
223 | gdal_translate -of ENVI $name"_sr_band5.tif" $name"_sr_band5.img"
224 | gdal_translate -of ENVI $name"_sr_band7.tif" $name"_sr_band7.img"
225 | gdal_translate -of ENVI $name"_toa_band6.tif" $name"_toa_band6.img"
226 | gdal_translate -of ENVI $name"_cfmask.tif" $name"_cfmask.img"
227 |
228 | mv *.img *sr_band1.hdr $out/.
229 |
230 | fi
231 |
232 | cd ..
233 | rm -r $sceneID
234 | cd $originalDir
235 |
236 | else
237 |
238 |
239 | ################################################################
240 | #
241 | # Convert the cfmask from byte to 16bit. All "bands" in the
242 | # merged tif need to be of the same data type.
243 | #
244 | ################################################################
245 |
246 | cfmask=`ls -1 *_cfmask.tif`
247 | cfmask16=`echo $cfmask | sed 's/cfmask/cfmask16/'`
248 | echo cfmask $cfmask cfmask16 $cfmask16
249 | echo ""
250 | echo "Converting cfmask to 16-bit."
251 | echo ""
252 | gdal_translate -ot UInt16 $cfmask $cfmask16
253 | echo ""
254 |
255 |
256 | ################################################################
257 | #
258 | # Merge all the bands together, in the specified order, in ENVI
259 | # format.
260 | #
261 | ################################################################
262 |
263 | echo "Merging bands."
264 | echo ""
265 |
266 | if [ $sensor == "LC8" ]; then
267 |
268 | gdal_merge.py -o $name"_stack.img" \
269 | -separate \
270 | -of ENVI \
271 | $name"_sr_band2.tif" \
272 | $name"_sr_band3.tif" \
273 | $name"_sr_band4.tif" \
274 | $name"_sr_band5.tif" \
275 | $name"_sr_band6.tif" \
276 | $name"_sr_band7.tif" \
277 | $name"_toa_band10.tif" \
278 | $name"_cfmask16.tif"
279 |
280 | else
281 |
282 | gdal_merge.py -o $name"_stack.img" \
283 | -separate \
284 | -of ENVI \
285 | $name"_sr_band1.tif" \
286 | $name"_sr_band2.tif" \
287 | $name"_sr_band3.tif" \
288 | $name"_sr_band4.tif" \
289 | $name"_sr_band5.tif" \
290 | $name"_sr_band7.tif" \
291 | $name"_toa_band6.tif" \
292 | $name"_cfmask16.tif"
293 |
294 | fi
295 |
296 | echo ""
297 |
298 |
299 | ################################################################
300 | #
301 | # Create a jpg just during testing if a sanity check is
302 | # required.
303 | #
304 | ################################################################
305 |
306 | # echo "Creating JPEG."
307 | # echo ""
308 | # gdal_translate -of JPEG \
309 | # -b 6 -b 4 -b 3 \
310 | # -ot Byte \
311 | # -scale \
312 | # $name"_stack.img" \
313 | # $name.jpg
314 | # echo ""
315 |
316 |
317 | ################################################################
318 | #
319 | # Translate the envi file to BIP. This could be combined with
320 | # the initial merge if we do not need a bsq jpg. Eventually.
321 | #
322 | ################################################################
323 |
324 | echo "Converting to BIP."
325 | echo ""
326 | gdal_translate -co "INTERLEAVE=BIP" \
327 | -of ENVI \
328 | $name"_stack.img" \
329 | $name"_MTLstack.img"
330 | echo ""
331 |
332 |
333 | ################################################################
334 | #
335 | # Clean up after yourself. Leaves only the required files and
336 | # reduces confusion. (was going to say "eliminates".)
337 | #
338 | ################################################################
339 |
340 | mv $name"_MTLstack.img" $name"_MTLstack"
341 | rm *.tif *.xml *_stack.hdr *_stack.img
342 | cd ..
343 |
344 | echo ""
345 |
346 | fi
347 |
348 | done
349 |
350 | echo "Processing complete."
351 | echo ""
352 | date
353 |
354 | exit
355 |
356 |
--------------------------------------------------------------------------------
/ccdc/scripts/tileLandsatParent.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | tileLandsat.sh /shared/bdavis/WA/1982-1984/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1982-M11.out &
4 | tileLandsat.sh /shared/bdavis/WA/1984-1990/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1984-M11.out &
5 | tileLandsat.sh /shared/bdavis/WA/1990-1995/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1990-M11.out &
6 | tileLandsat.sh /shared/bdavis/WA/1995-2000/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 1995-M11.out &
7 | tileLandsat.sh /shared/bdavis/WA/2000-2005/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 2000-M11.out &
8 | tileLandsat.sh /shared/bdavis/WA/2005-2015/grid11 /shared/bdavis/grid11/MatLAB MatLAB >& 2005-M11.out &
9 |
10 | exit
11 |
12 | tileLandsat.sh /shared/bdavis/WA/1982-1984/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1982-M.out &
13 | tileLandsat.sh /shared/bdavis/WA/1984-1990/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1984-M.out &
14 | tileLandsat.sh /shared/bdavis/WA/1990-1995/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1990-M.out &
15 | tileLandsat.sh /shared/bdavis/WA/1995-2000/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 1995-M.out &
16 | tileLandsat.sh /shared/bdavis/WA/2000-2005/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 2000-M.out &
17 | tileLandsat.sh /shared/bdavis/WA/2005-2015/grid08 /shared/bdavis/grid08/MatLAB MatLAB >& 2005-M.out &
18 |
19 | exit
20 |
21 | #tileLandsat.sh /shared/bdavis/WA/1982-1984/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1982-M.out &
22 | #tileLandsat.sh /shared/bdavis/WA/1984-1990/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1984-M.out &
23 | #tileLandsat.sh /shared/bdavis/WA/1990-1995/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1990-M.out &
24 | #tileLandsat.sh /shared/bdavis/WA/1995-2000/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 1995-M.out &
25 | #tileLandsat.sh /shared/bdavis/WA/2000-2005/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 2000-M.out &
26 | #tileLandsat.sh /shared/bdavis/WA/2005-2015/grid03 /shared/bdavis/grid03/MatLAB MatLAB >& 2005-M.out &
27 |
28 | exit
29 |
30 | #tileLandsat.sh /data/bdavis/WA/1982-1984/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1982-M.out &
31 | #tileLandsat.sh /data/bdavis/WA/1984-1990/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1984-M.out &
32 | #tileLandsat.sh /data/bdavis/WA/1990-1995/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1990-M.out &
33 | #tileLandsat.sh /data/bdavis/WA/1995-2000/grid02 /data/bdavis/grid02/MatLAB MatLAB >& 1995-M.out &
34 | #tileLandsat.sh /shared/bdavis/WA/2000-2005/grid02 /shared/bdavis/grid02/MatLAB MatLAB >& 2000-M.out &
35 | #tileLandsat.sh /shared/bdavis/WA/2005-2015/grid02 /shared/bdavis/grid02/MatLAB MatLAB >& 2005-M.out &
36 |
37 | exit
38 |
39 | #tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1982-1984/grid07 /data/bdavis/1982-1984/C C >& 1982-1984-C.out &
40 | #tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1982-1984/grid07 /data/bdavis/1982-1984/MatLAB MatLAB >& 1982-1984-M.out &
41 |
42 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1984-1990/grid07 /data/bdavis/grid07/C/1984-1990 C >& 1984-1990-C.out
43 | #tileLandsat.sh /data/bdavis/WA/1984-1990/grid07 /data/bdavis/test/MatLAB MatLAB >& 1984-1990-M.out
44 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1990-1995/grid07 /data/bdavis/grid07/C/1990-1995 C >& 1990-1995-C.out
45 | #tileLandsat.sh /data/bdavis/WA/1990-1995/grid07 /data/bdavis/test/MatLAB MatLAB >& 1990-1995-M.out
46 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/1995-2000/grid07 /data/bdavis/grid07/C/1995-2000 C >& 1995-2000-C.out
47 | #tileLandsat.sh /data/bdavis/WA/1995-2000/grid07 /data/bdavis/test/MatLAB MatLAB >& 1995-2000-M.out
48 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/2000-2005/grid07 /data/bdavis/grid07/C/2000-2005 C >& 2000-2005-C.out
49 | #tileLandsat.sh /data/bdavis/WA/2000-2005/grid07 /data/bdavis/test/MatLAB MatLAB >& 2000-2005-M.out
50 | tileLandsat.sh /stornext/nlcdsnfs1/bdavis/WA/2005-2015/grid07 /data/bdavis/grid07/C/2005-2015 C >& 2005-2015-C.out
51 | #tileLandsat.sh /data/bdavis/WA/2005-2015/grid07 /data/bdavis/test/MatLAB MatLAB >& 2005-2015-M.out
52 |
53 |
--------------------------------------------------------------------------------
/ccdc/utilities.c:
--------------------------------------------------------------------------------
1 |
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 |
10 |
11 | #include "utilities.h"
12 |
13 |
14 | /*****************************************************************************
15 | NAME: write_message
16 |
17 | PURPOSE: Writes a formatted log message to the specified file handle.
18 |
19 | RETURN VALUE: None
20 |
21 | NOTES:
22 | - Log Message Format:
23 | yyyy-mm-dd HH:mm:ss pid:module [filename]:line message
24 | *****************************************************************************/
25 |
26 | void write_message
27 | (
28 | const char *message, /* I: message to write to the log */
29 | const char *module, /* I: module the message is from */
30 | const char *type, /* I: type of the error */
31 | char *file, /* I: file the message was generated in */
32 | int line, /* I: line number in the file where the message was
33 | generated */
34 | FILE *fd /* I: where to write the log message */
35 | )
36 | {
37 | time_t current_time;
38 | struct tm *time_info;
39 | int year;
40 | pid_t pid;
41 |
42 | time (¤t_time);
43 | time_info = localtime (¤t_time);
44 | year = time_info->tm_year + 1900;
45 |
46 | pid = getpid ();
47 |
48 | fprintf (fd, "%04d:%02d:%02d %02d:%02d:%02d %d:%s [%s]:%d [%s]:%s\n",
49 | year,
50 | time_info->tm_mon,
51 | time_info->tm_mday,
52 | time_info->tm_hour,
53 | time_info->tm_min,
54 | time_info->tm_sec,
55 | pid, module, basename (file), line, type, message);
56 | }
57 |
58 |
59 | /*****************************************************************************
60 | NAME: sub_string
61 |
62 | PURPOSE: To control the specific way in with a string is manipulated.
63 |
64 | RETURN VALUE: Sub-setted character string
65 |
66 | NOTES: Probably dangerous.
67 | *****************************************************************************/
68 |
69 | char *sub_string /* explicit control of a substring function */
70 | (
71 | const char *source, /* I: input string */
72 | size_t start, /* I: index for start of sub string */
73 | size_t length /* I: number of characters to grab */
74 | )
75 | {
76 | size_t i;
77 | char *target;
78 |
79 | target = malloc(length*sizeof(char));
80 |
81 | for(i = 0; i != length; ++i)
82 | {
83 | target[i] = source[start + i];
84 | }
85 | target[i] = 0;
86 | return target;
87 | }
88 |
89 |
--------------------------------------------------------------------------------
/ccdc/utilities.h:
--------------------------------------------------------------------------------
1 |
2 | #ifndef UTILITIES_H
3 | #define UTILITIES_H
4 |
5 |
6 | #include
7 |
8 |
9 | #define LOG_MESSAGE(message, module) \
10 | write_message((message), (module), "INFO", \
11 | __FILE__, __LINE__, stdout);
12 |
13 |
14 | #define WARNING_MESSAGE(message, module) \
15 | write_message((message), (module), "WARNING", \
16 | __FILE__, __LINE__, stdout);
17 |
18 |
19 | #define ERROR_MESSAGE(message, module) \
20 | write_message((message), (module), "ERROR", \
21 | __FILE__, __LINE__, stdout);
22 |
23 |
24 | #define RETURN_ERROR(message, module, status) \
25 | {write_message((message), (module), "ERROR", \
26 | __FILE__, __LINE__, stdout); \
27 | return (status);}
28 |
29 |
30 | void write_message
31 | (
32 | const char *message, /* I: message to write to the log */
33 | const char *module, /* I: module the message is from */
34 | const char *type, /* I: type of the error */
35 | char *file, /* I: file the message was generated in */
36 | int line, /* I: line number in the file where the message was
37 | generated */
38 | FILE * fd /* I: where to write the log message */
39 | );
40 |
41 |
42 | char *sub_string /* explicit control of a substring function */
43 | (
44 | const char *source, /* I: input string */
45 | size_t start, /* I: index for start of sub string */
46 | size_t length /* I: number of characters to grab */
47 | );
48 |
49 |
50 | #endif /* UTILITIES_H */
51 |
--------------------------------------------------------------------------------
/classification/2d_array.c:
--------------------------------------------------------------------------------
1 |
2 | #include
3 | #include
4 |
5 | #include "const.h"
6 | #include "2d_array.h"
7 | #include "utilities.h"
8 |
9 |
10 | /* The 2D_ARRAY maintains a 2D array that can be sized at run-time. */
11 | typedef struct lsrd_2d_array
12 | {
13 | unsigned int signature; /* Signature used to make sure the pointer
14 | math from a row_array_ptr actually gets back to
15 | the expected structure (helps detect errors). */
16 | int rows; /* Rows in the 2D array */
17 | int columns; /* Columns in the 2D array */
18 | int member_size; /* Size of each entry in the array */
19 | void *data_ptr; /* Pointer to the data storage for the array */
20 | void **row_array_ptr; /* Pointer to an array of pointers to each row in
21 | the 2D array */
22 | double memory_block[0]; /* Block of memory for storage of the array.
23 | It is broken into two blocks. The first 'rows *
24 | sizeof(void *)' block stores the pointer the
25 | first column in each of the rows. The remainder
26 | of the block is for storing the actual data.
27 | Note: the type is double to force the worst case
28 | memory alignment on sparc boxes. */
29 | } LSRD_2D_ARRAY;
30 |
31 |
32 | /* Define a unique (i.e. random) value that can be used to verify a pointer
33 | points to an LSRD_2D_ARRAY. This is used to verify the operation succeeds to
34 | get an LSRD_2D_ARRAY pointer from a row pointer. */
35 | #define SIGNATURE 0x326589ab
36 |
37 |
38 | /* Given an address returned by the allocate routine, get a pointer to the
39 | entire structure. */
40 | #define GET_ARRAY_STRUCTURE_FROM_PTR(ptr) \
41 | ((LSRD_2D_ARRAY *)((char *)(ptr) - offsetof(LSRD_2D_ARRAY, memory_block)))
42 |
43 |
44 | /*************************************************************************
45 | NAME: allocate_2d_array
46 |
47 | PURPOSE: Allocate memory for 2D array.
48 |
49 | RETURNS: A pointer to a 2D array, or NULL if the routine fails. A pointer
50 | to an array of void pointers to the storage for each row of the
51 | array is returned. The returned pointer must be freed by the
52 | free_2d_array routine.
53 |
54 | HISTORY:
55 | Date Programmer Reason
56 | -------- --------------- -------------------------------------
57 | 3/15/2013 Song Guo Modified from LDCM IAS library
58 | **************************************************************************/
59 | void **allocate_2d_array
60 | (
61 | int rows, /* I: Number of rows for the 2D array */
62 | int columns, /* I: Number of columns for the 2D array */
63 | size_t member_size /* I: Size of the 2D array element */
64 | )
65 | {
66 | int row;
67 | LSRD_2D_ARRAY *array;
68 | size_t size;
69 | int data_start_index;
70 |
71 | /* Calculate the size needed for the array memory. The size includes the
72 | size of the base structure, an array of pointers to the rows in the
73 | 2D array, an array for the data, and additional space
74 | (2 * sizeof(void*)) to account for different memory alignment rules
75 | on some machine architectures. */
76 | size = sizeof (*array) + (rows * sizeof (void *))
77 | + (rows * columns * member_size) + 2 * sizeof (void *);
78 |
79 | /* Allocate the structure */
80 | array = malloc (size);
81 | if (!array)
82 | {
83 | RETURN_ERROR ("Failure to allocate memory for the array",
84 | "allocate_2d_array", NULL);
85 | }
86 |
87 | /* Initialize the member structures */
88 | array->signature = SIGNATURE;
89 | array->rows = rows;
90 | array->columns = columns;
91 | array->member_size = member_size;
92 |
93 | /* The array of pointers to rows starts at the beginning of the memory
94 | block */
95 | array->row_array_ptr = (void **) array->memory_block;
96 |
97 | /* The data starts after the row pointers, with the index adjusted in
98 | case the void pointer and memory block pointers are not the same
99 | size */
100 | data_start_index =
101 | rows * sizeof (void *) / sizeof (array->memory_block[0]);
102 | if ((rows % 2) == 1)
103 | data_start_index++;
104 | array->data_ptr = &array->memory_block[data_start_index];
105 |
106 | /* Initialize the row pointers */
107 | for (row = 0; row < rows; row++)
108 | {
109 | array->row_array_ptr[row] = array->data_ptr
110 | + row * columns * member_size;
111 | }
112 |
113 | return array->row_array_ptr;
114 | }
115 |
116 |
117 | /*************************************************************************
118 | NAME: free_2d_array
119 |
120 | PURPOSE: Free memory for a 2D array allocated by allocate_2d_array
121 |
122 | RETURNS: SUCCESS or FAILURE
123 |
124 | HISTORY:
125 | Date Programmer Reason
126 | -------- --------------- -------------------------------------
127 | 3/15/2013 Song Guo Modified from LDCM IAS library
128 | **************************************************************************/
129 | int free_2d_array
130 | (
131 | void **array_ptr /* I: Pointer returned by the alloc routine */
132 | )
133 | {
134 | if (array_ptr != NULL)
135 | {
136 | /* Convert the array_ptr into a pointer to the structure */
137 | LSRD_2D_ARRAY *array = GET_ARRAY_STRUCTURE_FROM_PTR (array_ptr);
138 |
139 | /* Verify it is a valid 2D array */
140 | if (array->signature != SIGNATURE)
141 | {
142 | /* Programming error of sort - exit the program */
143 | RETURN_ERROR ("Invalid signature on 2D array - memory "
144 | "corruption or programming error?", "free_2d_array",
145 | FAILURE);
146 | }
147 | free (array);
148 | }
149 |
150 | return SUCCESS;
151 | }
152 |
--------------------------------------------------------------------------------
/classification/2d_array.h:
--------------------------------------------------------------------------------
1 | #ifndef MISC_2D_ARRAY_H
2 | #define MISC_2D_ARRAY_H
3 |
4 |
5 | #include
6 |
7 |
8 | void **allocate_2d_array
9 | (
10 | int rows, /* I: Number of rows for the 2D array */
11 | int columns, /* I: Number of columns for the 2D array */
12 | size_t member_size /* I: Size of the 2D array element */
13 | );
14 |
15 |
16 | int get_2d_array_size
17 | (
18 | void **array_ptr, /* I: Pointer returned by the alloc routine */
19 | int *rows, /* O: Pointer to number of rows */
20 | int *columns /* O: Pointer to number of columns */
21 | );
22 |
23 |
24 | int free_2d_array
25 | (
26 | void **array_ptr /* I: Pointer returned by the alloc routine */
27 | );
28 |
29 |
30 | #endif
31 |
--------------------------------------------------------------------------------
/classification/Makefile:
--------------------------------------------------------------------------------
1 | BIN ?= ../bin
2 | SCRIPTS = ./scripts
3 | EXE = classification
4 |
5 | SRC_FILES = classRF.c classTree.c rfutils.c cokus.c utilities.c classification.c get_args.c
6 |
7 | CC = gcc
8 | FORTRAN = gfortran # or g77 whichever is present
9 | HDF5INC ?= /usr/include/hdf5
10 | HDF5LIB ?= /usr/lib/x86_64-linux-gnu/hdf5/serial
11 | MATIOLIB ?= /usr/lib/x86_64-linux-gnu
12 | INCDIR = -I. -I$(MATIO_INC) -I$(HDF5INC)
13 | CFLAGS = -fpic -O2 -funroll-loops -march=native -Wall $(INCDIR)
14 | FFLAGS = -O2 -fpic -march=native#-g -Wall
15 | LDFORTRAN = #-gfortran
16 | MEXFLAGS = -O
17 | INC = classification.h utilities.h rf.h
18 | LIB = -lgfortran -lm -DmxCalloc=calloc -DmxFree=free -L$(MATIOLIB) -L$(HDF5LIB) -lmatio -lhdf5 -lhdf5_hl
19 |
20 | all: clean rfsub $(EXE)
21 |
22 | $(EXE): clean rfsub
23 | $(CC) $(CFLAGS) $(INC) $(SRC_FILES) rfsub.o -o $(EXE) $(LIB)
24 |
25 | rfsub: $(SRC)rfsub.f
26 | echo 'Compiling rfsub.f (fortran subroutines)'
27 | $(FORTRAN) $(FFLAGS) -c rfsub.f -o rfsub.o
28 |
29 | install:
30 | mv $(EXE) $(BIN)
31 |
32 | clean:
33 | rm $(BIN)/$(EXE) -rf
34 | rm *~ -rf
35 | rm *.o -rf
36 |
--------------------------------------------------------------------------------
/classification/classRF.c:
--------------------------------------------------------------------------------
1 | /**************************************************************
2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest)
3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu )
4 | * License: GPLv2
5 | * Version: 0.02
6 | *
7 | * File: contains all the supporting code for a standalone C or mex for
8 | * Classification RF.
9 | * Copied all the code from the randomForest 4.5-28 or was it -29?
10 | *
11 | * important changes (other than the many commented out printf's)
12 | * 1. realized that instead of changing individual S_allocs to callocs
13 | * a better way is to emulate them
14 | * 2. found some places where memory is not freed in classRF via valgrind so
15 | * added frees
16 | * 3. made sure that C can now interface with brieman's fortran code so added
17 | * externs "C"'s and the F77_* macros
18 | * 4. added cokus's mersenne twister.
19 | *
20 | *************************************************************/
21 |
22 | /*****************************************************************
23 | * Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc.
24 | *
25 | * This program is free software; you can redistribute it and/or
26 | * modify it under the terms of the GNU General Public License
27 | * as published by the Free Software Foundation; either version 2
28 | * of the License, or (at your option) any later version.
29 | *
30 | * This program is distributed in the hope that it will be useful,
31 | * but WITHOUT ANY WARRANTY; without even the implied warranty of
32 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
33 | * GNU General Public License for more details.
34 | *
35 | * You should have received a copy of the GNU General Public License
36 | * along with this program; if not, write to the Free Software
37 | * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
38 | *
39 | * C driver for Breiman & Cutler's random forest code.
40 | * Re-written from the original main program in Fortran.
41 | * Andy Liaw Feb. 7, 2002.
42 | * Modifications to get the forest out Matt Wiener Feb. 26, 2002.
43 | *****************************************************************/
44 |
45 | #include "stdlib.h"
46 | #include "memory.h"
47 | #include "rf.h"
48 | #include "stdio.h"
49 | #include "math.h"
50 | #include "time.h"
51 |
52 | #ifndef MATLAB
53 | #define Rprintf printf
54 | #endif
55 |
56 | #ifdef MATLAB
57 | #include "mex.h"
58 | #define Rprintf mexPrintf
59 | #endif
60 |
61 | #define F77_CALL(x) x ## _
62 | #define F77_NAME(x) F77_CALL(x)
63 | #define F77_SUB(x) F77_CALL(x)
64 |
65 |
66 | #define MAX_UINT_COKUS 4294967295 //basically 2^32-1
67 |
68 | typedef unsigned long uint32;
69 | extern void seedMT(uint32 seed);
70 | extern uint32 reloadMT(void);
71 | extern uint32 randomMT(void);
72 | /*extern void F77_NAME(buildtree)(int *a, int *b, int *cl, int *cat,
73 | * int *maxcat, int *mdim, int *nsample,
74 | * int *nclass, int *treemap, int *bestvar,
75 | * int *bestsplit, int *bestsplitnext,
76 | * double *tgini, int *nodestatus, int *nodepop,
77 | * int *nodestart, double *classpop,
78 | * double *tclasspop, double *tclasscat,
79 | * int *ta, int *nrnodes, int *,
80 | * int *, int *, int *, int *, int *, int *,
81 | * double *, double *, double *,
82 | * int *, int *, int *);
83 | */
84 |
85 | void buildtree_(int *a, int *b, int *cl, int *cat,
86 | int *maxcat, int *mdim, int *nsample,
87 | int *nclass, int *treemap, int *bestvar,
88 | int *bestsplit, int *bestsplitnext,
89 | double *tgini, int *nodestatus, int *nodepop,
90 | int *nodestart, double *classpop,
91 | double *tclasspop, double *tclasscat,
92 | int *ta, int *nrnodes, int *,
93 | int *, int *, int *, int *, int *, int *,
94 | double *, double *, double *,
95 | int *, int *, int *);
96 |
97 | void rrand_(double *r) ;
98 |
99 | double unif_rand(){
100 | return (((double)randomMT())/((double)MAX_UINT_COKUS));
101 | }
102 |
103 | void GetRNGstate(){};
104 | void PutRNGstate(){};
105 |
106 | void oob(int nsample, int nclass, int *jin, int *cl, int *jtr, int *jerr,
107 | int *counttr, int *out, double *errtr, int *jest, double *cutoff);
108 |
109 | void TestSetError(double *countts, int *jts, int *clts, int *jet, int ntest,
110 | int nclass, int nvote, double *errts,
111 | int labelts, int *nclts, double *cutoff);
112 |
113 | /* Define the R RNG for use from Fortran. */
114 | #ifdef WIN64
115 | void _rrand_(double *r) { *r = unif_rand(); }
116 | #endif
117 |
118 | #ifndef WIN64
119 | void rrand_(double *r) { *r = unif_rand(); }
120 | #endif
121 |
122 |
123 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat,
124 | int *sampsize, int *strata, int *Options, int *ntree, int *nvar,
125 | int *ipi, double *classwt, double *cut, int *nodesize,
126 | int *outcl, int *counttr, double *prox,
127 | double *imprt, double *impsd, double *impmat, int *nrnodes,
128 | int *ndbigtree, int *nodestatus, int *bestvar, int *treemap,
129 | int *nodeclass, double *xbestsplit, double *errtr,
130 | int *testdat, double *xts, int *clts, int *nts, double *countts,
131 | int *outclts, int labelts, double *proxts, double *errts,
132 | int *inbag, int print_verbose_tree_progression) {
133 | /******************************************************************
134 | * C wrapper for random forests: get input from R and drive
135 | * the Fortran routines.
136 | *
137 | * Input:
138 | *
139 | * x: matrix of predictors (transposed!)
140 | * dimx: two integers: number of variables and number of cases
141 | * cl: class labels of the data
142 | * ncl: number of classes in the responsema
143 | * cat: integer vector of number of classes in the predictor;
144 | * 1=continuous
145 | * maxcat: maximum of cat
146 | * Options: 7 integers: (0=no, 1=yes)
147 | * add a second class (for unsupervised RF)?
148 | * 1: sampling from product of marginals
149 | * 2: sampling from product of uniforms
150 | * assess variable importance?
151 | * calculate proximity?
152 | * calculate proximity based on OOB predictions?
153 | * calculate outlying measure?
154 | * how often to print output?
155 | * keep the forest for future prediction?
156 | * ntree: number of trees
157 | * nvar: number of predictors to use for each split
158 | * ipi: 0=use class proportion as prob.; 1=use supplied priors
159 | * pi: double vector of class priors
160 | * nodesize: minimum node size: no node with fewer than ndsize
161 | * cases will be split
162 | *
163 | * Output:
164 | *
165 | * outcl: class predicted by RF
166 | * counttr: matrix of votes (transposed!)
167 | * imprt: matrix of variable importance measures
168 | * impmat: matrix of local variable importance measures
169 | * prox: matrix of proximity (if iprox=1)
170 | ******************************************************************/
171 |
172 | int nsample0, mdim, nclass, addClass, mtry, ntest, nsample, ndsize,
173 | mimp, nimp, near, nuse, noutall, nrightall, nrightimpall,
174 | keepInbag;
175 | int nstrata = 0;
176 | int jb, j, n, m, k, idxByNnode, idxByNsample, imp, localImp, iprox,
177 | oobprox, keepf, replace, stratify, trace, *nright,
178 | *nrightimp, *nout, Ntree;
179 | int *nclts = NULL;
180 | int *out, *bestsplitnext, *bestsplit, *nodepop, *jin, *nodex,
181 | *nodexts, *nodestart, *ta, *ncase, *jerr, *varUsed,
182 | *jtr, *classFreq, *idmove, *jvr,
183 | *at, *a, *b, *mind, *jts;
184 | int *nind = NULL;
185 | int *oobpair = NULL;
186 | int last, ktmp, anyEmpty, ntry;
187 | int **strata_idx = NULL;
188 | int *strata_size = NULL;
189 | double av=0.0;
190 |
191 | double *tgini, *tx, *wl, *classpop, *tclasscat, *tclasspop, *win,
192 | *tp, *wr;
193 |
194 | srand(time(NULL));
195 | //Do initialization for COKUS's Random generator
196 | seedMT(2*rand()+1); //works well with odd number so why don't use that
197 |
198 | addClass = Options[0];
199 | imp = Options[1];
200 | localImp = Options[2];
201 | iprox = Options[3];
202 | oobprox = Options[4];
203 | trace = Options[5];
204 | keepf = Options[6];
205 | replace = Options[7];
206 | stratify = Options[8];
207 | keepInbag = Options[9];
208 | mdim = dimx[0];
209 | nsample0 = dimx[1];
210 | nclass = (*ncl==1) ? 2 : *ncl;
211 | ndsize = *nodesize;
212 | Ntree = *ntree;
213 | mtry = *nvar;
214 | ntest = *nts;
215 | nsample = addClass ? (nsample0 + nsample0) : nsample0;
216 | mimp = imp ? mdim : 1;
217 | nimp = imp ? nsample : 1;
218 | near = iprox ? nsample0 : 1;
219 | if (trace == 0) trace = Ntree + 1;
220 |
221 | /*printf("\nmdim %d, nclass %d, nrnodes %d, nsample %d, ntest %d\n", mdim, nclass, *nrnodes, nsample, ntest);
222 | printf("\noobprox %d, mdim %d, nsample0 %d, Ntree %d, mtry %d, mimp %d", oobprox, mdim, nsample0, Ntree, mtry, mimp);
223 | printf("\nstratify %d, replace %d",stratify,replace);
224 | printf("\n");*/
225 | tgini = (double *) mxCalloc(mdim, sizeof(double));
226 | wl = (double *) mxCalloc(nclass, sizeof(double));
227 | wr = (double *) mxCalloc(nclass, sizeof(double));
228 | classpop = (double *) mxCalloc(nclass* *nrnodes, sizeof(double));
229 | tclasscat = (double *) mxCalloc(nclass*32, sizeof(double));
230 | tclasspop = (double *) mxCalloc(nclass, sizeof(double));
231 | tx = (double *) mxCalloc(nsample, sizeof(double));
232 | win = (double *) mxCalloc(nsample, sizeof(double));
233 | tp = (double *) mxCalloc(nsample, sizeof(double));
234 | out = (int *) mxCalloc(nsample, sizeof(int));
235 | bestsplitnext = (int *) mxCalloc(*nrnodes, sizeof(int));
236 | bestsplit = (int *) mxCalloc(*nrnodes, sizeof(int));
237 | nodepop = (int *) mxCalloc(*nrnodes, sizeof(int));
238 | nodestart = (int *) mxCalloc(*nrnodes, sizeof(int));
239 | jin = (int *) mxCalloc(nsample, sizeof(int));
240 | nodex = (int *) mxCalloc(nsample, sizeof(int));
241 | nodexts = (int *) mxCalloc(ntest, sizeof(int));
242 | ta = (int *) mxCalloc(nsample, sizeof(int));
243 | ncase = (int *) mxCalloc(nsample, sizeof(int));
244 | jerr = (int *) mxCalloc(nsample, sizeof(int));
245 | varUsed = (int *) mxCalloc(mdim, sizeof(int));
246 | jtr = (int *) mxCalloc(nsample, sizeof(int));
247 | jvr = (int *) mxCalloc(nsample, sizeof(int));
248 | classFreq = (int *) mxCalloc(nclass, sizeof(int));
249 | jts = (int *) mxCalloc(ntest, sizeof(int));
250 | idmove = (int *) mxCalloc(nsample, sizeof(int));
251 | at = (int *) mxCalloc(mdim*nsample, sizeof(int));
252 | a = (int *) mxCalloc(mdim*nsample, sizeof(int));
253 | b = (int *) mxCalloc(mdim*nsample, sizeof(int));
254 | mind = (int *) mxCalloc(mdim, sizeof(int));
255 | nright = (int *) mxCalloc(nclass, sizeof(int));
256 | nrightimp = (int *) mxCalloc(nclass, sizeof(int));
257 | nout = (int *) mxCalloc(nclass, sizeof(int));
258 | if (oobprox) {
259 | oobpair = (int *) mxCalloc(near*near, sizeof(int));
260 | }
261 | //printf("nsample=%d mdim=%d nclass=%d nsample0=%d nsample=%d ntest=%d\n", nsample, mdim,nclass, nsample0, nsample, ntest);
262 | /* Count number of cases in each class. */
263 | zeroInt(classFreq, nclass);
264 | for (n = 0; n < nsample; ++n) classFreq[cl[n] - 1] ++;
265 | /* Normalize class weights. */
266 | //Rprintf("ipi %d ",*ipi);
267 | //for(n=0;n nstrata) nstrata = strata[n];
276 | /* Create the array of pointers, each pointing to a vector
277 | * of indices of where data of each stratum is. */
278 | strata_size = (int *) mxCalloc(nstrata, sizeof(int));
279 | for (n = 0; n < nsample0; ++n) {
280 | strata_size[strata[n] - 1] ++;
281 | }
282 | strata_idx = (int **) mxCalloc(nstrata, sizeof(int *));
283 | for (n = 0; n < nstrata; ++n) {
284 | strata_idx[n] = (int *) mxCalloc(strata_size[n], sizeof(int));
285 | }
286 | zeroInt(strata_size, nstrata);
287 | for (n = 0; n < nsample0; ++n) {
288 | strata_size[strata[n] - 1] ++;
289 | strata_idx[strata[n] - 1][strata_size[strata[n] - 1] - 1] = n;
290 | }
291 | } else {
292 | nind = replace ? NULL : (int *) mxCalloc(nsample, sizeof(int));
293 | }
294 |
295 | /* INITIALIZE FOR RUN */
296 | if (*testdat) zeroDouble(countts, ntest * nclass);
297 | zeroInt(counttr, nclass * nsample);
298 | zeroInt(out, nsample);
299 | zeroDouble(tgini, mdim);
300 | zeroDouble(errtr, (nclass + 1) * Ntree);
301 |
302 | if (labelts) {
303 | nclts = (int *) mxCalloc(nclass, sizeof(int));
304 | for (n = 0; n < ntest; ++n) nclts[clts[n]-1]++;
305 | zeroDouble(errts, (nclass + 1) * Ntree);
306 | }
307 | //printf("labelts %d\n",labelts);fflush(stdout);
308 | if (imp) {
309 | zeroDouble(imprt, (nclass+2) * mdim);
310 | zeroDouble(impsd, (nclass+1) * mdim);
311 | if (localImp) zeroDouble(impmat, nsample * mdim);
312 | }
313 | if (iprox) {
314 | zeroDouble(prox, nsample0 * nsample0);
315 | if (*testdat) zeroDouble(proxts, ntest * (ntest + nsample0));
316 | }
317 | makeA(x, mdim, nsample, cat, at, b);
318 |
319 | //R_CheckUserInterrupt();
320 |
321 |
322 | /* Starting the main loop over number of trees. */
323 | GetRNGstate();
324 | if (trace <= Ntree) {
325 | /* Print header for running output. */
326 | Rprintf("ntree OOB");
327 | for (n = 1; n <= nclass; ++n) Rprintf("%7i", n);
328 | if (labelts) {
329 | Rprintf("| Test");
330 | for (n = 1; n <= nclass; ++n) Rprintf("%7i", n);
331 | }
332 | Rprintf("\n");
333 | }
334 | idxByNnode = 0;
335 | idxByNsample = 0;
336 |
337 | // time_t curr_time;
338 | //Rprintf("addclass %d, ntree %d, cl[300]=%d", addClass,Ntree,cl[299]);
339 | for(jb = 0; jb < Ntree; jb++) {
340 | //Rprintf("addclass %d, ntree %d, cl[300]=%d", addClass,Ntree,cl[299]);
341 | //printf("jb=%d,\n",jb);
342 | /* Do we need to simulate data for the second class? */
343 | if (addClass) createClass(x, nsample0, nsample, mdim);
344 | do {
345 | zeroInt(nodestatus + idxByNnode, *nrnodes);
346 | zeroInt(treemap + 2*idxByNnode, 2 * *nrnodes);
347 | zeroDouble(xbestsplit + idxByNnode, *nrnodes);
348 | zeroInt(nodeclass + idxByNnode, *nrnodes);
349 | zeroInt(varUsed, mdim);
350 | /* TODO: Put all sampling code into a function. */
351 | /* drawSample(sampsize, nsample, ); */
352 | if (stratify) { /* stratified sampling */
353 | zeroInt(jin, nsample);
354 | zeroDouble(tclasspop, nclass);
355 | zeroDouble(win, nsample);
356 | if (replace) { /* with replacement */
357 | for (n = 0; n < nstrata; ++n) {
358 | for (j = 0; j < sampsize[n]; ++j) {
359 | ktmp = (int) (unif_rand() * strata_size[n]);
360 | k = strata_idx[n][ktmp];
361 | tclasspop[cl[k] - 1] += classwt[cl[k] - 1];
362 | win[k] += classwt[cl[k] - 1];
363 | jin[k] = 1;
364 | }
365 | }
366 | } else { /* stratified sampling w/o replacement */
367 | /* re-initialize the index array */
368 | zeroInt(strata_size, nstrata);
369 | for (j = 0; j < nsample; ++j) {
370 | strata_size[strata[j] - 1] ++;
371 | strata_idx[strata[j] - 1][strata_size[strata[j] - 1] - 1] = j;
372 | }
373 | /* sampling without replacement */
374 | for (n = 0; n < nstrata; ++n) {
375 | last = strata_size[n] - 1;
376 | for (j = 0; j < sampsize[n]; ++j) {
377 | ktmp = (int) (unif_rand() * (last+1));
378 | k = strata_idx[n][ktmp];
379 | swapInt(strata_idx[n][last], strata_idx[n][ktmp]);
380 | last--;
381 | tclasspop[cl[k] - 1] += classwt[cl[k]-1];
382 | win[k] += classwt[cl[k]-1];
383 | jin[k] = 1;
384 | }
385 | }
386 | }
387 | } else { /* unstratified sampling */
388 | anyEmpty = 0;
389 | ntry = 0;
390 | do {
391 | zeroInt(jin, nsample);
392 | zeroDouble(tclasspop, nclass);
393 | zeroDouble(win, nsample);
394 | if (replace) {
395 | for (n = 0; n < *sampsize; ++n) {
396 | k = unif_rand() * nsample;
397 | tclasspop[cl[k] - 1] += classwt[cl[k]-1];
398 | win[k] += classwt[cl[k]-1];
399 | jin[k] = 1;
400 | }
401 | } else {
402 | for (n = 0; n < nsample; ++n) nind[n] = n;
403 | last = nsample - 1;
404 | for (n = 0; n < *sampsize; ++n) {
405 | ktmp = (int) (unif_rand() * (last+1));
406 | k = nind[ktmp];
407 | swapInt(nind[ktmp], nind[last]);
408 | last--;
409 | tclasspop[cl[k] - 1] += classwt[cl[k]-1];
410 | win[k] += classwt[cl[k]-1];
411 | jin[k] = 1;
412 | }
413 | }
414 | /* check if any class is missing in the sample */
415 | for (n = 0; n < nclass; ++n) {
416 | if (tclasspop[n] == 0) anyEmpty = 1;
417 | }
418 | ntry++;
419 | } while (anyEmpty && ntry <= 10);
420 | }
421 |
422 | /* If need to keep indices of inbag data, do that here. */
423 | if (keepInbag) {
424 | for (n = 0; n < nsample0; ++n) {
425 | inbag[n + idxByNsample] = jin[n];
426 | }
427 | }
428 |
429 | /* Copy the original a matrix back. */
430 | memcpy(a, at, sizeof(int) * mdim * nsample);
431 | modA(a, &nuse, nsample, mdim, cat, *maxcat, ncase, jin);
432 |
433 | #ifdef WIN64
434 | F77_CALL(_buildtree)
435 | #endif
436 |
437 | #ifndef WIN64
438 | F77_CALL(buildtree)
439 | #endif
440 | (a, b, cl, cat, maxcat, &mdim, &nsample,
441 | &nclass,
442 | treemap + 2*idxByNnode, bestvar + idxByNnode,
443 | bestsplit, bestsplitnext, tgini,
444 | nodestatus + idxByNnode, nodepop,
445 | nodestart, classpop, tclasspop, tclasscat,
446 | ta, nrnodes, idmove, &ndsize, ncase,
447 | &mtry, varUsed, nodeclass + idxByNnode,
448 | ndbigtree + jb, win, wr, wl, &mdim,
449 | &nuse, mind);
450 | /* if the "tree" has only the root node, start over */
451 | } while (ndbigtree[jb] < 1);
452 |
453 | Xtranslate(x, mdim, *nrnodes, nsample, bestvar + idxByNnode,
454 | bestsplit, bestsplitnext, xbestsplit + idxByNnode,
455 | nodestatus + idxByNnode, cat, ndbigtree[jb]);
456 |
457 | /* Get test set error */
458 | if (*testdat) {
459 | predictClassTree(xts, ntest, mdim, treemap + 2*idxByNnode,
460 | nodestatus + idxByNnode, xbestsplit + idxByNnode,
461 | bestvar + idxByNnode,
462 | nodeclass + idxByNnode, ndbigtree[jb],
463 | cat, nclass, jts, nodexts, *maxcat);
464 | TestSetError(countts, jts, clts, outclts, ntest, nclass, jb+1,
465 | errts + jb*(nclass+1), labelts, nclts, cut);
466 | }
467 |
468 | /* Get out-of-bag predictions and errors. */
469 | predictClassTree(x, nsample, mdim, treemap + 2*idxByNnode,
470 | nodestatus + idxByNnode, xbestsplit + idxByNnode,
471 | bestvar + idxByNnode,
472 | nodeclass + idxByNnode, ndbigtree[jb],
473 | cat, nclass, jtr, nodex, *maxcat);
474 |
475 | zeroInt(nout, nclass);
476 | noutall = 0;
477 | for (n = 0; n < nsample; ++n) {
478 | if (jin[n] == 0) {
479 | /* increment the OOB votes */
480 | counttr[n*nclass + jtr[n] - 1] ++;
481 | /* count number of times a case is OOB */
482 | out[n]++;
483 | /* count number of OOB cases in the current iteration.
484 | * nout[n] is the number of OOB cases for the n-th class.
485 | * noutall is the number of OOB cases overall. */
486 | nout[cl[n] - 1]++;
487 | noutall++;
488 | }
489 | }
490 |
491 | /* Compute out-of-bag error rate. */
492 | oob(nsample, nclass, jin, cl, jtr, jerr, counttr, out,
493 | errtr + jb*(nclass+1), outcl, cut);
494 |
495 | if ((jb+1) % trace == 0) {
496 | Rprintf("%5i: %6.2f%%", jb+1, 100.0*errtr[jb * (nclass+1)]);
497 | for (n = 1; n <= nclass; ++n) {
498 | Rprintf("%6.2f%%", 100.0 * errtr[n + jb * (nclass+1)]);
499 | }
500 | if (labelts) {
501 | Rprintf("| ");
502 | for (n = 0; n <= nclass; ++n) {
503 | Rprintf("%6.2f%%", 100.0 * errts[n + jb * (nclass+1)]);
504 | }
505 | }
506 | Rprintf("\n");
507 |
508 | //R_CheckUserInterrupt();
509 | }
510 |
511 | /* DO VARIABLE IMPORTANCE */
512 | if (imp) {
513 | nrightall = 0;
514 | /* Count the number of correct prediction by the current tree
515 | * among the OOB samples, by class. */
516 | zeroInt(nright, nclass);
517 | for (n = 0; n < nsample; ++n) {
518 | /* out-of-bag and predicted correctly: */
519 | if (jin[n] == 0 && jtr[n] == cl[n]) {
520 | nright[cl[n] - 1]++;
521 | nrightall++;
522 | }
523 | }
524 | for (m = 0; m < mdim; ++m) {
525 | if (varUsed[m]) {
526 | nrightimpall = 0;
527 | zeroInt(nrightimp, nclass);
528 | for (n = 0; n < nsample; ++n) tx[n] = x[m + n*mdim];
529 | /* Permute the m-th variable. */
530 | permuteOOB(m, x, jin, nsample, mdim);
531 | /* Predict the modified data using the current tree. */
532 | predictClassTree(x, nsample, mdim, treemap + 2*idxByNnode,
533 | nodestatus + idxByNnode,
534 | xbestsplit + idxByNnode,
535 | bestvar + idxByNnode,
536 | nodeclass + idxByNnode, ndbigtree[jb],
537 | cat, nclass, jvr, nodex, *maxcat);
538 | /* Count how often correct predictions are made with
539 | * the modified data. */
540 | for (n = 0; n < nsample; n++) {
541 | if (jin[n] == 0) {
542 | if (jvr[n] == cl[n]) {
543 | nrightimp[cl[n] - 1]++;
544 | nrightimpall++;
545 | }
546 | if (localImp && jvr[n] != jtr[n]) {
547 | if (cl[n] == jvr[n]) {
548 | impmat[m + n*mdim] -= 1.0;
549 | } else {
550 | impmat[m + n*mdim] += 1.0;
551 | }
552 | }
553 | }
554 | /* Restore the original data for that variable. */
555 | x[m + n*mdim] = tx[n];
556 | }
557 | /* Accumulate decrease in proportions of correct
558 | * predictions. */
559 | for (n = 0; n < nclass; ++n) {
560 | if (nout[n] > 0) {
561 | imprt[m + n*mdim] +=
562 | ((double) (nright[n] - nrightimp[n])) /
563 | nout[n];
564 | impsd[m + n*mdim] +=
565 | ((double) (nright[n] - nrightimp[n]) *
566 | (nright[n] - nrightimp[n])) / nout[n];
567 | }
568 | }
569 | if (noutall > 0) {
570 | imprt[m + nclass*mdim] +=
571 | ((double)(nrightall - nrightimpall)) / noutall;
572 | impsd[m + nclass*mdim] +=
573 | ((double) (nrightall - nrightimpall) *
574 | (nrightall - nrightimpall)) / noutall;
575 | }
576 | }
577 | }
578 | }
579 |
580 | /* DO PROXIMITIES */
581 | if (iprox) {
582 | computeProximity(prox, oobprox, nodex, jin, oobpair, near);
583 | /* proximity for test data */
584 | if (*testdat) {
585 | computeProximity(proxts, 0, nodexts, jin, oobpair, ntest);
586 | // /* Compute proximity between testset and training set. */
587 | for (n = 0; n < ntest; ++n) {
588 | for (k = 0; k < near; ++k) {
589 | if (nodexts[n] == nodex[k])
590 | proxts[n + ntest * (k+ntest)] += 1.0;
591 | }
592 | }
593 | }
594 | }
595 |
596 | if (keepf) idxByNnode += *nrnodes;
597 | if (keepInbag) idxByNsample += nsample0;
598 |
599 | if(print_verbose_tree_progression){
600 | #ifdef MATLAB
601 | time(&curr_time);
602 | mexPrintf("tree num %d created at %s", jb, ctime(&curr_time));mexEvalString("drawnow;");
603 | #endif
604 | }
605 | }
606 | PutRNGstate();
607 |
608 |
609 | /* Final processing of variable importance. */
610 | for (m = 0; m < mdim; m++) tgini[m] /= Ntree;
611 |
612 | if (imp) {
613 | for (m = 0; m < mdim; ++m) {
614 | if (localImp) { /* casewise measures */
615 | for (n = 0; n < nsample; ++n) impmat[m + n*mdim] /= out[n];
616 | }
617 | /* class-specific measures */
618 | for (k = 0; k < nclass; ++k) {
619 | av = imprt[m + k*mdim] / Ntree;
620 | impsd[m + k*mdim] =
621 | sqrt(((impsd[m + k*mdim] / Ntree) - av*av) / Ntree);
622 | imprt[m + k*mdim] = av;
623 | /* imprt[m + k*mdim] = (se <= 0.0) ? -1000.0 - av : av / se; */
624 | }
625 | /* overall measures */
626 | av = imprt[m + nclass*mdim] / Ntree;
627 | impsd[m + nclass*mdim] =
628 | sqrt(((impsd[m + nclass*mdim] / Ntree) - av*av) / Ntree);
629 | imprt[m + nclass*mdim] = av;
630 | imprt[m + (nclass+1)*mdim] = tgini[m];
631 | }
632 | } else {
633 | for (m = 0; m < mdim; ++m) imprt[m] = tgini[m];
634 | }
635 |
636 | /* PROXIMITY DATA ++++++++++++++++++++++++++++++++*/
637 | if (iprox) {
638 | for (n = 0; n < near; ++n) {
639 | for (k = n + 1; k < near; ++k) {
640 | prox[near*k + n] /= oobprox ?
641 | (oobpair[near*k + n] > 0 ? oobpair[near*k + n] : 1) :
642 | Ntree;
643 | prox[near*n + k] = prox[near*k + n];
644 | }
645 | prox[near*n + n] = 1.0;
646 | }
647 | if (*testdat) {
648 | for (n = 0; n < ntest; ++n){
649 | for (k = 0; k < ntest + nsample; ++k)
650 | proxts[ntest*k + n] /= Ntree;
651 | proxts[ntest*n + n]=1.0;
652 | }
653 | }
654 | }
655 | if (trace <= Ntree){
656 | printf("\nmdim %d, nclass %d, nrnodes %d, nsample %d, ntest %d\n", mdim, nclass, *nrnodes, nsample, ntest);
657 | printf("\noobprox %d, mdim %d, nsample0 %d, Ntree %d, mtry %d, mimp %d", oobprox, mdim, nsample0, Ntree, mtry, mimp);
658 | printf("\nstratify %d, replace %d",stratify,replace);
659 | printf("\n");
660 | }
661 |
662 | //frees up the memory
663 | mxFree(tgini);
664 | mxFree(wl);
665 | mxFree(wr);
666 | mxFree(classpop);
667 | mxFree(tclasscat);
668 | mxFree(tclasspop);
669 | mxFree(tx);
670 | mxFree(win);
671 | mxFree(tp);
672 | mxFree(out);
673 |
674 | mxFree(bestsplitnext);
675 | mxFree(bestsplit);
676 | mxFree(nodepop);
677 | mxFree(nodestart);
678 | mxFree(jin);
679 | mxFree(nodex);
680 | mxFree(nodexts);
681 | mxFree(ta);
682 | mxFree(ncase);
683 | mxFree(jerr);
684 |
685 | mxFree(varUsed);
686 | mxFree(jtr);
687 | mxFree(jvr);
688 | mxFree(classFreq);
689 | mxFree(jts);
690 | mxFree(idmove);
691 | mxFree(at);
692 | mxFree(a);
693 | mxFree(b);
694 | mxFree(mind);
695 |
696 | mxFree(nright);
697 | mxFree(nrightimp);
698 | mxFree(nout);
699 |
700 | if (oobprox) {
701 | mxFree(oobpair);
702 | }
703 |
704 | if (stratify) {
705 | mxFree(strata_size);
706 | for (n = 0; n < nstrata; ++n) {
707 | mxFree(strata_idx[n]);
708 | }
709 | mxFree(strata_idx);
710 | } else {
711 | if (replace)
712 | mxFree(nind);
713 | }
714 | //printf("labelts %d\n",labelts);fflush(stdout);
715 | fflush(stdout);
716 | if (labelts) {
717 | mxFree(nclts);
718 | }
719 | //printf("stratify %d",stratify);fflush(stdout);
720 | }
721 |
722 |
723 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat,
724 | int *nrnodes, int *ntree, double *x, double *xbestsplit,
725 | double *pid, double *cutoff, double *countts, int *treemap,
726 | int *nodestatus, int *cat, int *nodeclass, int *jts,
727 | int *jet, int *bestvar, int *node, int *treeSize,
728 | int *keepPred, int *prox, double *proxMat, int *nodes) {
729 | int j, n, n1, n2, idxNodes, offset1, offset2, *junk, ntie;
730 | double crit, cmax;
731 |
732 | zeroDouble(countts, *nclass * *ntest);
733 | idxNodes = 0;
734 | offset1 = 0;
735 | offset2 = 0;
736 | junk = NULL;
737 |
738 | //Rprintf("nclass %d\n", *nclass);
739 | for (j = 0; j < *ntree; ++j) {
740 | //Rprintf("pCT nclass %d \n", *nclass);
741 | /* predict by the j-th tree */
742 | predictClassTree(x, *ntest, *mdim, treemap + 2*idxNodes,
743 | nodestatus + idxNodes, xbestsplit + idxNodes,
744 | bestvar + idxNodes, nodeclass + idxNodes,
745 | treeSize[j], cat, *nclass,
746 | jts + offset1, node + offset2, *maxcat);
747 |
748 | /* accumulate votes: */
749 | for (n = 0; n < *ntest; ++n) {
750 | countts[jts[n + offset1] - 1 + n * *nclass] += 1.0;
751 | }
752 |
753 | /* if desired, do proximities for this round */
754 | if (*prox) computeProximity(proxMat, 0, node + offset2, junk, junk,
755 | *ntest);
756 | idxNodes += *nrnodes;
757 | if (*keepPred) offset1 += *ntest;
758 | if (*nodes) offset2 += *ntest;
759 | }
760 |
761 | //Rprintf("ntest %d\n", *ntest);
762 | /* Aggregated prediction is the class with the maximum votes/cutoff */
763 | for (n = 0; n < *ntest; ++n) {
764 | //Rprintf("Ap: ntest %d\n", *ntest);
765 | cmax = 0.0;
766 | ntie = 1;
767 | for (j = 0; j < *nclass; ++j) {
768 | crit = (countts[j + n * *nclass] / *ntree) / cutoff[j];
769 | if (crit > cmax) {
770 | jet[n] = j + 1;
771 | cmax = crit;
772 | }
773 | /* Break ties at random: */
774 | if (crit == cmax) {
775 | ntie++;
776 | if (unif_rand() > 1.0 / ntie) jet[n] = j + 1;
777 | }
778 | }
779 | }
780 |
781 | //Rprintf("ntest %d\n", *ntest);
782 | /* if proximities requested, do the final adjustment
783 | * (division by number of trees) */
784 |
785 | //Rprintf("prox %d",*prox);
786 | if (*prox) {
787 | //Rprintf("prox: ntest %d\n", *ntest);
788 | for (n1 = 0; n1 < *ntest; ++n1) {
789 | for (n2 = n1 + 1; n2 < *ntest; ++n2) {
790 | proxMat[n1 + n2 * *ntest] /= *ntree;
791 | proxMat[n2 + n1 * *ntest] = proxMat[n1 + n2 * *ntest];
792 | }
793 | proxMat[n1 + n1 * *ntest] = 1.0;
794 | }
795 | }
796 | //Rprintf("END ntest %d\n", *ntest);
797 |
798 | }
799 |
800 | /*
801 | * Modified by A. Liaw 1/10/2003 (Deal with cutoff)
802 | * Re-written in C by A. Liaw 3/08/2004
803 | */
804 | void oob(int nsample, int nclass, int *jin, int *cl, int *jtr, int *jerr,
805 | int *counttr, int *out, double *errtr, int *jest,
806 | double *cutoff) {
807 | int j, n, noob, *noobcl, ntie;
808 | double qq, smax, smaxtr;
809 |
810 | noobcl = (int *) mxCalloc(nclass, sizeof(int));
811 | zeroInt(jerr, nsample);
812 | zeroDouble(errtr, nclass+1);
813 |
814 | noob = 0;
815 | for (n = 0; n < nsample; ++n) {
816 | if (out[n]) {
817 | noob++;
818 | noobcl[cl[n]-1]++;
819 | smax = 0.0;
820 | smaxtr = 0.0;
821 | ntie = 1;
822 | for (j = 0; j < nclass; ++j) {
823 | qq = (((double) counttr[j + n*nclass]) / out[n]) / cutoff[j];
824 | if (j+1 != cl[n]) smax = (qq > smax) ? qq : smax;
825 | /* if vote / cutoff is larger than current max, re-set max and
826 | * change predicted class to the current class */
827 | if (qq > smaxtr) {
828 | smaxtr = qq;
829 | jest[n] = j+1;
830 | }
831 | /* break tie at random */
832 | if (qq == smaxtr) {
833 | ntie++;
834 | if (unif_rand() > 1.0 / ntie) {
835 | smaxtr = qq;
836 | jest[n] = j+1;
837 | }
838 | }
839 | }
840 | if (jest[n] != cl[n]) {
841 | errtr[cl[n]] += 1.0;
842 | errtr[0] += 1.0;
843 | jerr[n] = 1;
844 | }
845 | }
846 | }
847 | errtr[0] /= noob;
848 | for (n = 1; n <= nclass; ++n) errtr[n] /= noobcl[n-1];
849 | mxFree(noobcl);
850 | }
851 |
852 |
853 | void TestSetError(double *countts, int *jts, int *clts, int *jet, int ntest,
854 | int nclass, int nvote, double *errts,
855 | int labelts, int *nclts, double *cutoff) {
856 | int j, n, ntie;
857 | double cmax, crit;
858 |
859 | for (n = 0; n < ntest; ++n) countts[jts[n]-1 + n*nclass] += 1.0;
860 |
861 | /* Prediction is the class with the maximum votes */
862 | for (n = 0; n < ntest; ++n) {
863 | cmax=0.0;
864 | ntie = 1;
865 | for (j = 0; j < nclass; ++j) {
866 | crit = (countts[j + n*nclass] / nvote) / cutoff[j];
867 | if (crit > cmax) {
868 | jet[n] = j+1;
869 | cmax = crit;
870 | }
871 | /* Break ties at random: */
872 | if (crit == cmax) {
873 | ntie++;
874 | if (unif_rand() > 1.0 / ntie) {
875 | jet[n] = j+1;
876 | cmax = crit;
877 | }
878 | }
879 | }
880 | }
881 | if (labelts) {
882 | zeroDouble(errts, nclass + 1);
883 | for (n = 0; n < ntest; ++n) {
884 | if (jet[n] != clts[n]) {
885 | errts[0] += 1.0;
886 | errts[clts[n]] += 1.0;
887 | }
888 | }
889 | errts[0] /= ntest;
890 | for (n = 1; n <= nclass; ++n) errts[n] /= nclts[n-1];
891 | }
892 | }
893 |
894 |
--------------------------------------------------------------------------------
/classification/classTree.c:
--------------------------------------------------------------------------------
1 | /**************************************************************
2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest)
3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu )
4 | * License: GPLv2
5 | * Version: 0.02
6 | *
7 | * File: contains all the other supporting code for a standalone C or mex for
8 | * Classification RF.
9 | * Copied all the code from the randomForest 4.5-28 or was it -29?
10 | * added externs "C"'s and the F77_* macros
11 | *
12 | *************************************************************/
13 |
14 | /*******************************************************************
15 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc.
16 |
17 | This program is free software; you can redistribute it and/or
18 | modify it under the terms of the GNU General Public License
19 | as published by the Free Software Foundation; either version 2
20 | of the License, or (at your option) any later version.
21 |
22 | This program is distributed in the hope that it will be useful,
23 | but WITHOUT ANY WARRANTY; without even the implied warranty of
24 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
25 | GNU General Public License for more details.
26 | *******************************************************************/
27 | #include "rf.h"
28 | #include "memory.h"
29 | #include "stdlib.h"
30 | #include "math.h"
31 |
32 | #ifdef MATLAB
33 | #define Rprintf mexPrintf
34 | #include "mex.h"
35 | #endif
36 |
37 | #ifndef MATLAB
38 | #define Rprintf printf
39 | #include "stdio.h"
40 | #endif
41 |
42 | typedef unsigned long uint32;
43 | extern void seedMT(uint32 seed);
44 | extern uint32 reloadMT(void);
45 | extern uint32 randomMT(void);
46 | extern double unif_rand();
47 | extern void R_qsort_I(double *v, int *I, int i, int j);
48 |
49 | void catmax_(double *parentDen, double *tclasscat,
50 | double *tclasspop, int *nclass, int *lcat,
51 | int *ncatsp, double *critmax, int *nhit,
52 | int *maxcat, int *ncmax, int *ncsplit);
53 |
54 | void catmaxb_(double *totalWt, double *tclasscat, double *classCount,
55 | int *nclass, int *nCat, int *nbest, double *critmax,
56 | int *nhit, double *catCount) ;
57 |
58 | #ifdef WIN64
59 | void F77_NAME(_catmax)
60 | #endif
61 |
62 | #ifndef WIN64
63 | void F77_NAME(catmax)
64 | #endif
65 | (double *parentDen, double *tclasscat,
66 | double *tclasspop, int *nclass, int *lcat,
67 | int *ncatsp, double *critmax, int *nhit,
68 | int *maxcat, int *ncmax, int *ncsplit) {
69 | /* This finds the best split of a categorical variable with lcat
70 | categories and nclass classes, where tclasscat(j, k) is the number
71 | of cases in class j with category value k. The method uses an
72 | exhaustive search over all partitions of the category values if the
73 | number of categories is 10 or fewer. Otherwise ncsplit randomly
74 | selected splits are tested and best used. */
75 | int j, k, n, icat[32], nsplit;
76 | double leftNum, leftDen, rightNum, decGini, *leftCatClassCount;
77 |
78 | leftCatClassCount = (double *) mxCalloc(*nclass, sizeof(double));
79 | *nhit = 0;
80 | nsplit = *lcat > *ncmax ?
81 | *ncsplit : (int) pow(2.0, (double) *lcat - 1) - 1;
82 |
83 | for (n = 0; n < nsplit; ++n) {
84 | zeroInt(icat, 32);
85 | if (*lcat > *ncmax) {
86 | /* Generate random split.
87 | TODO: consider changing to generating random bits with more
88 | efficient algorithm */
89 | for (j = 0; j < *lcat; ++j) icat[j] = unif_rand() > 0.5 ? 1 : 0;
90 | } else {
91 | unpack((unsigned int) n + 1, icat);
92 | }
93 | for (j = 0; j < *nclass; ++j) {
94 | leftCatClassCount[j] = 0;
95 | for (k = 0; k < *lcat; ++k) {
96 | if (icat[k]) {
97 | leftCatClassCount[j] += tclasscat[j + k * *nclass];
98 | }
99 | }
100 | }
101 | leftNum = 0.0;
102 | leftDen = 0.0;
103 | for (j = 0; j < *nclass; ++j) {
104 | leftNum += leftCatClassCount[j] * leftCatClassCount[j];
105 | leftDen += leftCatClassCount[j];
106 | }
107 | /* If either node is empty, try another split. */
108 | if (leftDen <= 1.0e-8 || *parentDen - leftDen <= 1.0e-5) continue;
109 | rightNum = 0.0;
110 | for (j = 0; j < *nclass; ++j) {
111 | leftCatClassCount[j] = tclasspop[j] - leftCatClassCount[j];
112 | rightNum += leftCatClassCount[j] * leftCatClassCount[j];
113 | }
114 | decGini = (leftNum / leftDen) + (rightNum / (*parentDen - leftDen));
115 | if (decGini > *critmax) {
116 | *critmax = decGini;
117 | *nhit = 1;
118 | *ncatsp = *lcat > *ncmax ? pack((unsigned int) *lcat, icat) : n + 1;
119 | }
120 | }
121 | mxFree(leftCatClassCount);
122 | }
123 |
124 |
125 |
126 | /* Find best split of with categorical variable when there are two classes */
127 | #ifdef WIN64
128 | void F77_NAME(_catmaxb)
129 | #endif
130 | #ifndef WIN64
131 | void F77_NAME(catmaxb)
132 | #endif
133 | (double *totalWt, double *tclasscat, double *classCount,
134 | int *nclass, int *nCat, int *nbest, double *critmax,
135 | int *nhit, double *catCount) {
136 |
137 | double catProportion[32], cp[32], cm[32];
138 | int kcat[32];
139 | int i, j;
140 | double bestsplit=0.0, rightDen, leftDen, leftNum, rightNum, crit;
141 |
142 | *nhit = 0;
143 | for (i = 0; i < *nCat; ++i) {
144 | catProportion[i] = catCount[i] ?
145 | tclasscat[i * *nclass] / catCount[i] : 0.0;
146 | kcat[i] = i + 1;
147 | }
148 | R_qsort_I(catProportion, kcat, 1, *nCat);
149 | for (i = 0; i < *nclass; ++i) {
150 | cp[i] = 0;
151 | cm[i] = classCount[i];
152 | }
153 | rightDen = *totalWt;
154 | leftDen = 0.0;
155 | for (i = 0; i < *nCat - 1; ++i) {
156 | leftDen += catCount[kcat[i]-1];
157 | rightDen -= catCount[kcat[i]-1];
158 | leftNum = 0.0;
159 | rightNum = 0.0;
160 | for (j = 0; j < *nclass; ++j) {
161 | cp[j] += tclasscat[j + (kcat[i]-1) * *nclass];
162 | cm[j] -= tclasscat[j + (kcat[i]-1) * *nclass];
163 | leftNum += cp[j] * cp[j];
164 | rightNum += cm[j] * cm[j];
165 | }
166 | if (catProportion[i] < catProportion[i + 1]) {
167 | /* If neither node is empty, check the split. */
168 | if (rightDen > 1.0e-5 && leftDen > 1.0e-5) {
169 | crit = (leftNum / leftDen) + (rightNum / rightDen);
170 | if (crit > *critmax) {
171 | *critmax = crit;
172 | bestsplit = .5 * (catProportion[i] + catProportion[i + 1]);
173 | *nhit = 1;
174 | }
175 | }
176 | }
177 | }
178 | if (*nhit == 1) {
179 | zeroInt(kcat, *nCat);
180 | for (i = 0; i < *nCat; ++i) {
181 | catProportion[i] = catCount[i] ?
182 | tclasscat[i * *nclass] / catCount[i] : 0.0;
183 | kcat[i] = catProportion[i] < bestsplit ? 1 : 0;
184 | }
185 | *nbest = pack(*nCat, kcat);
186 | }
187 | }
188 |
189 |
190 |
191 | void predictClassTree(double *x, int n, int mdim, int *treemap,
192 | int *nodestatus, double *xbestsplit,
193 | int *bestvar, int *nodeclass,
194 | int treeSize, int *cat, int nclass,
195 | int *jts, int *nodex, int maxcat) {
196 | int m, i, j, k;
197 | int *cbestsplit = NULL;
198 | unsigned int npack;
199 |
200 | //Rprintf("maxcat %d\n",maxcat);
201 | /* decode the categorical splits */
202 | if (maxcat > 1) {
203 | cbestsplit = (int *) mxCalloc(maxcat * treeSize, sizeof(int));
204 | zeroInt(cbestsplit, maxcat * treeSize);
205 | for (i = 0; i < treeSize; ++i) {
206 | if (nodestatus[i] != NODE_TERMINAL) {
207 | if (cat[bestvar[i] - 1] > 1) {
208 | npack = (unsigned int) xbestsplit[i];
209 | /* unpack `npack' into bits */
210 | for (j = 0; npack; npack >>= 1, ++j) {
211 | cbestsplit[j + i*maxcat] = npack & 01;
212 | }
213 | }
214 | }
215 | }
216 | }
217 | for (i = 0; i < n; ++i) {
218 | k = 0;
219 | while (nodestatus[k] != NODE_TERMINAL) {
220 | m = bestvar[k] - 1;
221 | if (cat[m] == 1) {
222 | /* Split by a numerical predictor */
223 | k = (x[m + i * mdim] <= xbestsplit[k]) ?
224 | treemap[k * 2] - 1 : treemap[1 + k * 2] - 1;
225 | } else {
226 | /* Split by a categorical predictor */
227 | k = cbestsplit[(int) x[m + i * mdim] - 1 + k * maxcat] ?
228 | treemap[k * 2] - 1 : treemap[1 + k * 2] - 1;
229 | }
230 | }
231 | /* Terminal node: assign class label */
232 | jts[i] = nodeclass[k];
233 | nodex[i] = k + 1;
234 | }
235 | if (maxcat > 1) mxFree(cbestsplit);
236 | }
237 |
--------------------------------------------------------------------------------
/classification/classification.c:
--------------------------------------------------------------------------------
1 | /******************************************************************************
2 | MODULE: usage
3 |
4 | PURPOSE: Classification part of the CCDC algorithm
5 |
6 | RETURN VALUE:
7 | Type = None
8 |
9 | HISTORY:
10 | Date Programmer Reason
11 | -------- --------------- -------------------------------------
12 | 8/25/2015 Song Guo Original Development
13 |
14 | ******************************************************************************/
15 | #include
16 | #include
17 | #include
18 | #include
19 | #include
20 |
21 | #include "classification.h"
22 | #include "utilities.h"
23 | #include "matio.h"
24 |
25 | int main(int argc, char *argv[])
26 | {
27 | mat_t *mat, *mat2;
28 | matvar_t *matvar, *matvar2;
29 | char *data;
30 | double *x;
31 | double *ref_x;
32 | int *y;
33 | int i, j = 0;
34 | size_t stride;
35 | FILE *fp;
36 | char FUNC_NAME[] = "main";
37 | char msg_str[MAX_STR_LEN]; /* input data scene name */
38 | int rows;
39 | int ref_rows;
40 | int cols;
41 | int nclass;
42 | bool verbose; /* verbose flag for printing messages */
43 | int status;
44 |
45 | time_t now;
46 | time (&now);
47 | snprintf (msg_str, sizeof(msg_str),
48 | "CCDC start_time=%s\n", ctime (&now));
49 | LOG_MESSAGE (msg_str, FUNC_NAME);
50 |
51 | /* Read the command-line arguments */
52 | status = get_args (argc, argv, &rows, &cols, &ref_rows, &nclass, &verbose);
53 | if (status != SUCCESS)
54 | {
55 | RETURN_ERROR ("calling get_args", FUNC_NAME, EXIT_FAILURE);
56 | }
57 |
58 | /* Allocate memory */
59 | x = malloc(rows * cols * sizeof(double));
60 | ref_x = malloc(ref_rows * cols * sizeof(double));
61 | y = malloc(rows * sizeof(int));
62 | if (x == NULL || ref_x == NULL || y == NULL)
63 | {
64 | fprintf(stderr,"Error allocating memory\n");
65 | return -1;
66 | }
67 |
68 | mat = Mat_Open("/data1/sguo/CCDC/classification/Xs.mat",MAT_ACC_RDONLY);
69 | mat2 = Mat_Open("/data1/sguo/CCDC/classification/Ys.mat",MAT_ACC_RDONLY);
70 | if ( NULL == mat || NULL == mat2) {
71 | fprintf(stderr,"Error opening MAT file \n");
72 | return -1;
73 | }
74 |
75 | while((matvar=Mat_VarReadNext(mat)) != NULL)
76 | {
77 | if ( matvar->rank == 2 )
78 | {
79 | stride = Mat_SizeOf(matvar->data_type);
80 | data = matvar->data;
81 | for ( i = 0; i < rows; i++ ) {
82 | for ( j = 0; j < matvar->dims[1]; j++ ) {
83 | size_t idx = matvar->dims[0]*j+i;
84 | x[i * matvar->dims[1] + j] = *(double*)(data+idx*stride);
85 | #if 0
86 | printf("i,j,x[i][j]=%d,%d,%g\n",i,j,x[i * matvar->dims[1] + j]);
87 | printf(" ");
88 | #endif
89 | }
90 | // printf("\n");
91 | }
92 |
93 | }
94 | break;
95 | }
96 | Mat_VarFree(matvar);
97 | matvar = NULL;
98 |
99 | while((matvar2=Mat_VarReadNext(mat2)) != NULL)
100 | {
101 |
102 | if ( matvar2->rank == 2 )
103 | {
104 | stride = Mat_SizeOf(matvar2->data_type);
105 | data = matvar2->data;
106 | for ( i = 0; i < rows; i++ ) {
107 | y[i] = (int) *(double*)(data+i*stride);
108 | #if 0
109 | printf("i,j,y[i]=%d,%d,%d\n",i,j,y[i]);
110 | printf("\n");
111 | #endif
112 | }
113 | }
114 | }
115 | Mat_VarFree(matvar2);
116 | matvar2 = NULL;
117 |
118 | Mat_Close(mat);
119 | Mat_Close(mat2);
120 |
121 | /***START: NO NEED TO CHANGE ANYTHING FROM HERE TO THERE***************/
122 | int p_size=cols,n_size=rows;
123 | int nsample=n_size;
124 |
125 | /* the classifcation version requires {D,N}, where D=(num) dimensions, N=(num) examples */
126 | int dimx[2];
127 | dimx[0]=p_size;
128 | dimx[1]=n_size;
129 |
130 | int* cat = (int*)calloc(p_size,sizeof(int));
131 |
132 | /***END: NO NEED TO CHANGE ANYTHING FROM HERE TO THERE*****************/
133 |
134 | /* write prediction OUTPUT into Y_hat.txt */
135 | fp = fopen("Y_hat.txt","w");
136 |
137 | /* need to do set this else everything blows up, represents the number of categories for
138 | every dimension - */
139 | for(i=0;i
143 |
144 | int sampsize=n_size; /* if replace then sampsize=n_size or sampsize=0.632*n_size */
145 |
146 | /* no need to change this */
147 | int nsum = sampsize;
148 |
149 | int strata = 1;
150 | /* other options */
151 | int addclass = 0;
152 | int importance=0;
153 | int localImp=0;
154 | int proximity=0;
155 | int oob_prox=0;
156 | int do_trace; //this variable prints verbosely each step
157 | if(verbose)
158 | do_trace=1;
159 | else
160 | do_trace=0;
161 | int keep_forest=1;
162 | int replace=1;
163 | int stratify=0;
164 | int keep_inbag=0;
165 | int Options[]={addclass,importance,localImp,proximity,oob_prox
166 | ,do_trace,keep_forest,replace,stratify,keep_inbag};
167 |
168 |
169 | //ntree= number of tree. mtry=mtry :)
170 | int ntree=500; int nt=ntree;
171 | int mtry=(int)floor(sqrt(p_size)); /* - */
172 | if(verbose) printf("ntree %d, mtry %d\n",ntree,mtry);
173 |
174 | int ipi=0;
175 | double* classwt=(double*)calloc(nclass,sizeof(double));
176 | double* cutoff=(double*)calloc(nclass,sizeof(double));
177 | for(i=0;i"
324 | " --cols="
325 | " --rows="
326 | " --nclass="
327 | " [--verbose]\n");
328 |
329 | printf ("\n");
330 | printf ("where the following parameters are required:\n");
331 | printf (" --rows=: number of rows\n");
332 | printf (" --cols=: number of columns\n");
333 | printf (" --rows=: number of rows for reference data\n");
334 | printf (" --nclass=: number of classes\n");
335 | printf ("\n");
336 | printf ("and the following parameters are optional:\n");
337 | printf (" -verbose: should intermediate messages be printed?"
338 | " (default is false)\n");
339 | printf ("\n");
340 | printf ("classification --help will print the usage statement\n");
341 | printf ("\n");
342 | printf ("Example:\n");
343 | printf ("classification"
344 | " --rows=1000"
345 | " --cols=71"
346 | " --ref_rows=500"
347 | " --nclass=11"
348 | " --verbose\n");
349 | printf ("Note: The classification must run from the directory"
350 | " where the input data are located.\n\n");
351 | }
352 |
--------------------------------------------------------------------------------
/classification/classification.h:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 |
6 | #ifndef SUCCESS
7 | #define SUCCESS -1
8 | #endif
9 |
10 | #ifndef ERROR
11 | #define ERROR -1
12 | #endif
13 |
14 | #ifndef FAILURE
15 | #define FAILURE 1
16 | #endif
17 |
18 | #ifndef TRUE
19 | #define TRUE 1
20 | #endif
21 |
22 | #ifndef FALSE
23 | #define FALSE 1
24 | #endif
25 |
26 | #define MAX_STR_LEN 100
27 |
28 | typedef enum { false = 0, true = !false } bool;
29 |
30 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat,
31 | int *sampsize, int *strata, int *Options, int *ntree, int *nvar,
32 | int *ipi, double *classwt, double *cut, int *nodesize,
33 | int *outcl, int *counttr, double *prox,
34 | double *imprt, double *impsd, double *impmat, int *nrnodes,
35 | int *ndbigtree, int *nodestatus, int *bestvar, int *treemap,
36 | int *nodeclass, double *xbestsplit, double *errtr,
37 | int *testdat, double *xts, int *clts, int *nts, double *countts,
38 | int *outclts, int labelts, double *proxts, double *errts,
39 | int *inbag, int print_verbose_tree_progression);
40 |
41 |
42 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat,
43 | int *nrnodes, int *ntree, double *x, double *xbestsplit,
44 | double *pid, double *cutoff, double *countts, int *treemap,
45 | int *nodestatus, int *cat, int *nodeclass, int *jts,
46 | int *jet, int *bestvar, int *node, int *treeSize,
47 | int *keepPred, int *prox, double *proxMat, int *nodes);
48 |
49 | int get_args
50 | (
51 | int argc, /* I: number of cmd-line args */
52 | char *argv[], /* I: string of cmd-line args */
53 | int *rows, /* O: number of rows */
54 | int *cols, /* O: number of columns */
55 | int *ref_rows, /* O: number of rows for reference data */
56 | int *nclass, /* O: number of classification types */
57 | bool *verbose /* O: verbose flag */
58 | );
59 |
60 | void usage();
61 |
--------------------------------------------------------------------------------
/classification/cokus.c:
--------------------------------------------------------------------------------
1 | // This is the Mersenne Twister random number generator MT19937, which
2 | // generates pseudorandom integers uniformly distributed in 0..(2^32 - 1)
3 | // starting from any odd seed in 0..(2^32 - 1). This version is a recode
4 | // by Shawn Cokus (Cokus@math.washington.edu) on March 8, 1998 of a version by
5 | // Takuji Nishimura (who had suggestions from Topher Cooper and Marc Rieffel in
6 | // July-August 1997).
7 | //
8 | // Effectiveness of the recoding (on Goedel2.math.washington.edu, a DEC Alpha
9 | // running OSF/1) using GCC -O3 as a compiler: before recoding: 51.6 sec. to
10 | // generate 300 million random numbers; after recoding: 24.0 sec. for the same
11 | // (i.e., 46.5% of original time), so speed is now about 12.5 million random
12 | // number generations per second on this machine.
13 | //
14 | // According to the URL
15 | // (and paraphrasing a bit in places), the Mersenne Twister is ``designed
16 | // with consideration of the flaws of various existing generators,'' has
17 | // a period of 2^19937 - 1, gives a sequence that is 623-dimensionally
18 | // equidistributed, and ``has passed many stringent tests, including the
19 | // die-hard test of G. Marsaglia and the load test of P. Hellekalek and
20 | // S. Wegenkittl.'' It is efficient in memory usage (typically using 2506
21 | // to 5012 bytes of static data, depending on data type sizes, and the code
22 | // is quite short as well). It generates random numbers in batches of 624
23 | // at a time, so the caching and pipelining of modern systems is exploited.
24 | // It is also divide- and mod-free.
25 | //
26 | // This library is free software; you can redistribute it and/or modify it
27 | // under the terms of the GNU Library General Public License as published by
28 | // the Free Software Foundation (either version 2 of the License or, at your
29 | // option, any later version). This library is distributed in the hope that
30 | // it will be useful, but WITHOUT ANY WARRANTY, without even the implied
31 | // warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
32 | // the GNU Library General Public License for more details. You should have
33 | // received a copy of the GNU Library General Public License along with this
34 | // library; if not, write to the Free Software Foundation, Inc., 59 Temple
35 | // Place, Suite 330, Boston, MA 02111-1307, USA.
36 | //
37 | // The code as Shawn received it included the following notice:
38 | //
39 | // Copyright (C) 1997 Makoto Matsumoto and Takuji Nishimura. When
40 | // you use this, send an e-mail to with
41 | // an appropriate reference to your work.
42 | //
43 | // It would be nice to CC: when you write.
44 | //
45 |
46 | //#include
47 | //#include
48 |
49 | //
50 | // uint32 must be an unsigned integer type capable of holding at least 32
51 | // bits; exactly 32 should be fastest, but 64 is better on an Alpha with
52 | // GCC at -O3 optimization so try your options and see whats best for you
53 | //
54 |
55 | typedef unsigned long uint32;
56 |
57 | #define N (624) // length of state vector
58 | #define M (397) // a period parameter
59 | #define K (0x9908B0DFU) // a magic constant
60 | #define hiBit(u) ((u) & 0x80000000U) // mask all but highest bit of u
61 | #define loBit(u) ((u) & 0x00000001U) // mask all but lowest bit of u
62 | #define loBits(u) ((u) & 0x7FFFFFFFU) // mask the highest bit of u
63 | #define mixBits(u, v) (hiBit(u)|loBits(v)) // move hi bit of u to hi bit of v
64 |
65 | static uint32 state[N+1]; // state vector + 1 extra to not violate ANSI C
66 | static uint32 *next; // next random value is computed from here
67 | static int left = -1; // can *next++ this many times before reloading
68 |
69 |
70 | void seedMT(uint32 seed)
71 | {
72 | //
73 | // We initialize state[0..(N-1)] via the generator
74 | //
75 | // x_new = (69069 * x_old) mod 2^32
76 | //
77 | // from Line 15 of Table 1, p. 106, Sec. 3.3.4 of Knuths
78 | // _The Art of Computer Programming_, Volume 2, 3rd ed.
79 | //
80 | // Notes (SJC): I do not know what the initial state requirements
81 | // of the Mersenne Twister are, but it seems this seeding generator
82 | // could be better. It achieves the maximum period for its modulus
83 | // (2^30) iff x_initial is odd (p. 20-21, Sec. 3.2.1.2, Knuth); if
84 | // x_initial can be even, you have sequences like 0, 0, 0, ...;
85 | // 2^31, 2^31, 2^31, ...; 2^30, 2^30, 2^30, ...; 2^29, 2^29 + 2^31,
86 | // 2^29, 2^29 + 2^31, ..., etc. so I force seed to be odd below.
87 | //
88 | // Even if x_initial is odd, if x_initial is 1 mod 4 then
89 | //
90 | // the lowest bit of x is always 1,
91 | // the next-to-lowest bit of x is always 0,
92 | // the 2nd-from-lowest bit of x alternates ... 0 1 0 1 0 1 0 1 ... ,
93 | // the 3rd-from-lowest bit of x 4-cycles ... 0 1 1 0 0 1 1 0 ... ,
94 | // the 4th-from-lowest bit of x has the 8-cycle ... 0 0 0 1 1 1 1 0 ... ,
95 | // ...
96 | //
97 | // and if x_initial is 3 mod 4 then
98 | //
99 | // the lowest bit of x is always 1,
100 | // the next-to-lowest bit of x is always 1,
101 | // the 2nd-from-lowest bit of x alternates ... 0 1 0 1 0 1 0 1 ... ,
102 | // the 3rd-from-lowest bit of x 4-cycles ... 0 0 1 1 0 0 1 1 ... ,
103 | // the 4th-from-lowest bit of x has the 8-cycle ... 0 0 1 1 1 1 0 0 ... ,
104 | // ...
105 | //
106 | // The generators potency (min. s>=0 with (69069-1)^s = 0 mod 2^32) is
107 | // 16, which seems to be alright by p. 25, Sec. 3.2.1.3 of Knuth. It
108 | // also does well in the dimension 2..5 spectral tests, but it could be
109 | // better in dimension 6 (Line 15, Table 1, p. 106, Sec. 3.3.4, Knuth).
110 | //
111 | // Note that the random number user does not see the values generated
112 | // here directly since reloadMT() will always munge them first, so maybe
113 | // none of all of this matters. In fact, the seed values made here could
114 | // even be extra-special desirable if the Mersenne Twister theory says
115 | // so-- thats why the only change I made is to restrict to odd seeds.
116 | //
117 |
118 | register uint32 x = (seed | 1U) & 0xFFFFFFFFU, *s = state;
119 | register int j;
120 |
121 | for(left=0, *s++=x, j=N; --j;
122 | *s++ = (x*=69069U) & 0xFFFFFFFFU);
123 | }
124 |
125 |
126 | uint32 reloadMT(void)
127 | {
128 | register uint32 *p0=state, *p2=state+2, *pM=state+M, s0, s1;
129 | register int j;
130 |
131 | if(left < -1)
132 | seedMT(4357U);
133 |
134 | left=N-1, next=state+1;
135 |
136 | for(s0=state[0], s1=state[1], j=N-M+1; --j; s0=s1, s1=*p2++)
137 | *p0++ = *pM++ ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U);
138 |
139 | for(pM=state, j=M; --j; s0=s1, s1=*p2++)
140 | *p0++ = *pM++ ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U);
141 |
142 | s1=state[0], *p0 = *pM ^ (mixBits(s0, s1) >> 1) ^ (loBit(s1) ? K : 0U);
143 | s1 ^= (s1 >> 11);
144 | s1 ^= (s1 << 7) & 0x9D2C5680U;
145 | s1 ^= (s1 << 15) & 0xEFC60000U;
146 | return(s1 ^ (s1 >> 18));
147 | }
148 |
149 |
150 | uint32 randomMT(void)
151 | {
152 | uint32 y;
153 |
154 | if(--left < 0)
155 | return(reloadMT());
156 |
157 | y = *next++;
158 | y ^= (y >> 11);
159 | y ^= (y << 7) & 0x9D2C5680U;
160 | y ^= (y << 15) & 0xEFC60000U;
161 | y ^= (y >> 18);
162 | return(y);
163 | }
164 |
165 | /*
166 | #define uint32 unsigned long
167 | #define SMALL_INT char
168 | #define SMALL_INT_CLASS mxCHAR_CLASS
169 | void seedMT(uint32 seed);
170 | uint32 randomMT(void);
171 |
172 | #include "stdio.h"
173 | #include "math.h"
174 |
175 | int main(void)
176 | {
177 | int j;
178 |
179 | // you can seed with any uint32, but the best are odds in 0..(2^32 - 1)
180 |
181 | seedMT(4357U);
182 | uint32 MAX=pow(2,32)-1;
183 | // print the first 2,002 random numbers seven to a line as an example
184 |
185 | for(j=0; j<2002; j++)
186 | printf(" %10lu%s", (unsigned long) randomMT(), (j%7)==6 ? "\n" : "");
187 |
188 | for(j=0; j<2002; j++)
189 | printf(" %f%s", ((double)randomMT()/(double)MAX), (j%7)==6 ? "\n" : "");
190 |
191 |
192 | return(1);
193 | }
194 | */
195 |
196 |
197 |
--------------------------------------------------------------------------------
/classification/get_args.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 |
5 | #include "classification.h"
6 | #include "utilities.h"
7 |
8 | /******************************************************************************
9 | MODULE: get_args
10 |
11 | PURPOSE: Gets the command-line arguments and validates that the required
12 | arguments were specified.
13 |
14 | RETURN VALUE:
15 | Type = int
16 | Value Description
17 | ----- -----------
18 | FAILURE Error getting the command-line arguments or a command-line
19 | argument and associated value were not specified
20 | SUCCESS No errors encountered
21 |
22 | HISTORY:
23 | Date Programmer Reason
24 | -------- --------------- -------------------------------------
25 | 8/5/2015 Song Guo Original Development
26 | ******************************************************************************/
27 | int get_args
28 | (
29 | int argc, /* I: number of cmd-line args */
30 | char *argv[], /* I: string of cmd-line args */
31 | int *rows, /* O: number of rows */
32 | int *cols, /* O: number of columns */
33 | int *ref_rows, /* O: number of rows for reference data */
34 | int *nclass, /* O: number of classification types */
35 | bool *verbose /* O: verbose flag */
36 | )
37 | {
38 | int c; /* current argument index */
39 | int option_index; /* index for the command-line option */
40 | static int verbose_flag = 0; /* verbose flag */
41 | static int cols_default = 71; /* Default buffer for number of columns */
42 | static int nclass_default = 11; /* Default buffer for number of classes */
43 | char errmsg[MAX_STR_LEN]; /* error message */
44 | char FUNC_NAME[] = "get_args"; /* function name */
45 | static struct option long_options[] = {
46 | {"verbose", no_argument, &verbose_flag, 1},
47 | {"rows", required_argument, 0, 'r'},
48 | {"cols", required_argument, 0, 'c'},
49 | {"ref_rows", required_argument, 0, 'f'},
50 | {"nclass", required_argument, 0, 'n'},
51 | {"help", no_argument, 0, 'h'},
52 | {0, 0, 0, 0}
53 | };
54 |
55 | /* Assign the default values */
56 | *cols = cols_default;
57 | *nclass = nclass_default;
58 |
59 | /* Loop through all the cmd-line options */
60 | opterr = 0; /* turn off getopt_long error msgs as we'll print our own */
61 | while (1)
62 | {
63 | /* optstring in call to getopt_long is empty since we will only
64 | support the long options */
65 | c = getopt_long (argc, argv, "", long_options, &option_index);
66 | if (c == -1)
67 | { /* Out of cmd-line options */
68 | break;
69 | }
70 |
71 | switch (c)
72 | {
73 | case 0:
74 | /* If this option set a flag, do nothing else now. */
75 | if (long_options[option_index].flag != 0)
76 | {
77 | break;
78 | }
79 | sprintf (errmsg, "option %s\n", long_options[option_index].name);
80 | if (optarg)
81 | {
82 | sprintf (errmsg, "option %s with arg %s\n",
83 | long_options[option_index].name, optarg);
84 | }
85 | RETURN_ERROR (errmsg, FUNC_NAME, ERROR);
86 | break;
87 |
88 | case 'h': /* help */
89 | usage ();
90 | return FAILURE;
91 | break;
92 |
93 | case 'r':
94 | *rows = atoi (optarg);
95 | break;
96 |
97 | case 'c':
98 | *cols = atoi (optarg);
99 | break;
100 |
101 | case 'f':
102 | *ref_rows = atoi (optarg);
103 | break;
104 |
105 | case 'n':
106 | *nclass = atoi (optarg);
107 | break;
108 |
109 | case '?':
110 | default:
111 | sprintf (errmsg, "Unknown option %s", argv[optind - 1]);
112 | usage ();
113 | RETURN_ERROR (errmsg, FUNC_NAME, ERROR);
114 | break;
115 | }
116 | }
117 |
118 | /* Check the input values */
119 | if (*rows < 0)
120 | {
121 | sprintf (errmsg, "number of rows must be > 0");
122 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE);
123 | }
124 |
125 | if (*cols < 0)
126 | {
127 | sprintf (errmsg, "number of columns must be > 0");
128 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE);
129 | }
130 |
131 | if (*ref_rows < 0)
132 | {
133 | sprintf (errmsg, "number of reference rows must be > 0");
134 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE);
135 | }
136 |
137 | if (*nclass < 0)
138 | {
139 | sprintf (errmsg, "number of classes must be > 0");
140 | RETURN_ERROR(errmsg, FUNC_NAME, FAILURE);
141 | }
142 |
143 |
144 | /* Check the verbose flag */
145 | if (verbose_flag)
146 | *verbose = true;
147 | else
148 | *verbose = false;
149 |
150 | if (*verbose)
151 | {
152 | printf ("rows = %d\n", *rows);
153 | printf ("cols = %d\n", *cols);
154 | printf ("ref_rows = %d\n", *ref_rows);
155 | printf ("nclass = %d\n", *nclass);
156 | printf ("verbose = %d\n", *verbose);
157 | }
158 |
159 | return SUCCESS;
160 | }
161 |
162 |
--------------------------------------------------------------------------------
/classification/qsort.c:
--------------------------------------------------------------------------------
1 | /**************************************************************
2 | * Blatantly copied Quick sort algorithm from R's source code
3 | *************************************************************/
4 |
5 |
6 | #define qsort_Index
7 | #define NUMERIC double
8 | void R_qsort_I(double *v, int *I, int i, int j)
9 | /*====== BODY of R_qsort() and R_qsorti() functions ====================
10 | *
11 | * is included in ./qsort.c with and without ``qsort_Index'' defined
12 | *======================================================================
13 | */
14 | {
15 | /* Orders v[] increasingly. Puts into I[] the permutation vector:
16 | * new v[k] = old v[I[k]]
17 | * Only elements [i : j] (in 1-indexing !) are considered.
18 |
19 | * This is a modification of CACM algorithm #347 by R. C. Singleton,
20 | * which is a modified Hoare quicksort.
21 | * This version incorporates the modification in the remark by Peto.
22 | */
23 |
24 | int il[31], iu[31];
25 | /* Arrays iu[k] and il[k] permit sorting up to 2^(k+1)-1 elements;
26 | * originally k = 20 -> n_max = 2'097'151
27 | * now k = 31 -> n_max = 4294'967'295
28 | */
29 | NUMERIC vt, vtt;
30 | double R = 0.375;
31 | int ii, ij, k, l, m;
32 | #ifdef qsort_Index
33 | int it, tt;
34 | #endif
35 |
36 |
37 | /* 1-indexing for I[], v[] (and `i' and `j') : */
38 | --v;
39 | #ifdef qsort_Index
40 | --I;
41 | #endif
42 |
43 | ii = i;/* save */
44 | m = 1;
45 |
46 | L10:
47 | if (i < j) {
48 | if (R < 0.5898437) R += 0.0390625; else R -= 0.21875;
49 | L20:
50 | k = i;
51 | /* ij = (j + i) >> 1; midpoint */
52 | ij = i + (int)((j - i)*R);
53 | #ifdef qsort_Index
54 | it = I[ij];
55 | #endif
56 | vt = v[ij];
57 | if (v[i] > vt) {
58 | #ifdef qsort_Index
59 | I[ij] = I[i]; I[i] = it; it = I[ij];
60 | #endif
61 | v[ij] = v[i]; v[i] = vt; vt = v[ij];
62 | }
63 | /* L30:*/
64 | l = j;
65 | if (v[j] < vt) {
66 | #ifdef qsort_Index
67 | I[ij] = I[j]; I[j] = it; it = I[ij];
68 | #endif
69 | v[ij] = v[j]; v[j] = vt; vt = v[ij];
70 | if (v[i] > vt) {
71 | #ifdef qsort_Index
72 | I[ij] = I[i]; I[i] = it; it = I[ij];
73 | #endif
74 | v[ij] = v[i]; v[i] = vt; vt = v[ij];
75 | }
76 | }
77 |
78 | for(;;) { /*L50:*/
79 | //do l--; while (v[l] > vt);
80 | l--;for(;v[l]>vt;l--);
81 |
82 |
83 | #ifdef qsort_Index
84 | tt = I[l];
85 | #endif
86 | vtt = v[l];
87 | /*L60:*/
88 | //do k++; while (v[k] < vt);
89 | k=k+1;for(;v[k] l) break;
92 |
93 | /* else (k <= l) : */
94 | #ifdef qsort_Index
95 | I[l] = I[k]; I[k] = tt;
96 | #endif
97 | v[l] = v[k]; v[k] = vtt;
98 | }
99 |
100 | m++;
101 | if (l - i <= j - k) {
102 | /*L70: */
103 | il[m] = k;
104 | iu[m] = j;
105 | j = l;
106 | }
107 | else {
108 | il[m] = i;
109 | iu[m] = l;
110 | i = k;
111 | }
112 | }else { /* i >= j : */
113 |
114 | L80:
115 | if (m == 1) return;
116 |
117 | /* else */
118 | i = il[m];
119 | j = iu[m];
120 | m--;
121 | }
122 |
123 | if (j - i > 10) goto L20;
124 |
125 | if (i == ii) goto L10;
126 |
127 | --i;
128 | L100:
129 | do {
130 | ++i;
131 | if (i == j) {
132 | goto L80;
133 | }
134 | #ifdef qsort_Index
135 | it = I[i + 1];
136 | #endif
137 | vt = v[i + 1];
138 | } while (v[i] <= vt);
139 |
140 | k = i;
141 |
142 | do { /*L110:*/
143 | #ifdef qsort_Index
144 | I[k + 1] = I[k];
145 | #endif
146 | v[k + 1] = v[k];
147 | --k;
148 | } while (vt < v[k]);
149 |
150 | #ifdef qsort_Index
151 | I[k + 1] = it;
152 | #endif
153 | v[k + 1] = vt;
154 | goto L100;
155 | } /* R_qsort{i} */
156 |
--------------------------------------------------------------------------------
/classification/rf.h:
--------------------------------------------------------------------------------
1 | /**************************************************************
2 | * mex interface to Andy Liaw et al.'s C code (used in R package randomForest)
3 | * Added by Abhishek Jaiantilal ( abhishek.jaiantilal@colorado.edu )
4 | * License: GPLv2
5 | * Version: 0.02
6 | *
7 | * other than adding the macros for F77_* and adding this message
8 | * nothing changed .
9 | *************************************************************/
10 |
11 | /*******************************************************************
12 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc.
13 |
14 | This program is free software; you can redistribute it and/or
15 | modify it under the terms of the GNU General Public License
16 | as published by the Free Software Foundation; either version 2
17 | of the License, or (at your option) any later version.
18 |
19 | This program is distributed in the hope that it will be useful,
20 | but WITHOUT ANY WARRANTY; without even the implied warranty of
21 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
22 | GNU General Public License for more details.
23 | *******************************************************************/
24 | #ifndef RF_H
25 | #define RF_H
26 |
27 | /* test if the bit at position pos is turned on */
28 | #define isBitOn(x,pos) (((x) & (1 << (pos))) > 0)
29 | /* swap two integers */
30 | #define swapInt(a, b) ((a ^= b), (b ^= a), (a ^= b))
31 | /*
32 | void classRF(double *x, int *dimx, int *cl, int *ncl, int *cat, int *maxcat,
33 | int *sampsize, int *Options, int *ntree, int *nvar,
34 | int *ipi, double *pi, double *cut, int *nodesize,
35 | int *outcl, int *counttr, double *prox,
36 | double *imprt, double *, double *impmat, int *nrnodes, int *ndbigtree,
37 | int *nodestatus, int *bestvar, int *treemap, int *nodeclass,
38 | double *xbestsplit, double *pid, double *errtr,
39 | int *testdat, double *xts, int *clts, int *nts, double *countts,
40 | int *outclts, int *labelts, double *proxts, double *errts);
41 | */
42 |
43 | #define F77_CALL(x) x ## _
44 | #define F77_NAME(x) F77_CALL(x)
45 | #define F77_SUB(x) F77_CALL(x)
46 |
47 |
48 | void normClassWt(int *cl, const int nsample, const int nclass,
49 | const int useWt, double *classwt, int *classFreq);
50 |
51 | void classForest(int *mdim, int *ntest, int *nclass, int *maxcat,
52 | int *nrnodes, int *jbt, double *xts, double *xbestsplit,
53 | double *pid, double *cutoff, double *countts, int *treemap,
54 | int *nodestatus, int *cat, int *nodeclass, int *jts,
55 | int *jet, int *bestvar, int *nodexts, int *ndbigtree,
56 | int *keepPred, int *prox, double *proxmatrix, int *nodes);
57 |
58 | void regTree(double *x, double *y, int mdim, int nsample,
59 | int *lDaughter, int *rDaughter, double *upper, double *avnode,
60 | int *nodestatus, int nrnodes, int *treeSize, int nthsize,
61 | int mtry, int *mbest, int *cat, double *tgini, int *varUsed);
62 |
63 | void findBestSplit(double *x, int *jdex, double *y, int mdim, int nsample,
64 | int ndstart, int ndend, int *msplit, double *decsplit,
65 | double *ubest, int *ndendl, int *jstat, int mtry,
66 | double sumnode, int nodecnt, int *cat);
67 |
68 | void predictRegTree(double *x, int nsample, int mdim,
69 | int *lDaughter, int *rDaughter, int *nodestatus,
70 | double *ypred, double *split, double *nodepred,
71 | int *splitVar, int treeSize, int *cat, int maxcat,
72 | int *nodex);
73 |
74 | void predictClassTree(double *x, int n, int mdim, int *treemap,
75 | int *nodestatus, double *xbestsplit,
76 | int *bestvar, int *nodeclass,
77 | int ndbigtree, int *cat, int nclass,
78 | int *jts, int *nodex, int maxcat);
79 |
80 | int pack(int l, int *icat);
81 | void unpack(unsigned int npack, int *icat);
82 |
83 | void zeroInt(int *x, int length);
84 | void zeroDouble(double *x, int length);
85 | void createClass(double *x, int realN, int totalN, int mdim);
86 | void prepare(int *cl, const int nsample, const int nclass, const int ipi,
87 | double *pi, double *pid, int *nc, double *wtt);
88 | void makeA(double *x, const int mdim, const int nsample, int *cat, int *a,
89 | int *b);
90 | void modA(int *a, int *nuse, const int nsample, const int mdim, int *cat,
91 | const int maxcat, int *ncase, int *jin);
92 | void Xtranslate(double *x, int mdim, int nrnodes, int nsample,
93 | int *bestvar, int *bestsplit, int *bestsplitnext,
94 | double *xbestsplit, int *nodestatus, int *cat, int treeSize);
95 | void permuteOOB(int m, double *x, int *in, int nsample, int mdim);
96 | void computeProximity(double *prox, int oobprox, int *node, int *inbag,
97 | int *oobpair, int n);
98 |
99 | /* Template of Fortran subroutines to be called from the C wrapper */
100 | /*extern void F77_NAME(buildtree)(int *a, int *b, int *cl, int *cat,
101 | int *maxcat, int *mdim, int *nsample,
102 | int *nclass, int *treemap, int *bestvar,
103 | int *bestsplit, int *bestsplitnext,
104 | double *tgini, int *nodestatus, int *nodepop,
105 | int *nodestart, double *classpop,
106 | double *tclasspop, double *tclasscat,
107 | int *ta, int *nrnodes, int *,
108 | int *, int *, int *, int *, int *, int *,
109 | double *, double *, double *,
110 | int *, int *, int *);
111 | */
112 | /* Node status */
113 | #define NODE_TERMINAL -1
114 | #define NODE_TOSPLIT -2
115 | #define NODE_INTERIOR -3
116 |
117 | #endif /* RF_H */
118 |
--------------------------------------------------------------------------------
/classification/rfsub.f:
--------------------------------------------------------------------------------
1 | c Copyright (C) 2001-7 Leo Breiman and Adele Cutler and Merck & Co, Inc.
2 | c This program is free software; you can redistribute it and/or
3 | c modify it under the terms of the GNU General Public License
4 | c as published by the Free Software Foundation; either version 2
5 | c of the License, or (at your option) any later version.
6 |
7 | c This program is distributed in the hope that it will be useful,
8 | c but WITHOUT ANY WARRANTY; without even the implied warranty of
9 | c MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
10 | c GNU General Public License for more details.
11 | c
12 | c Modified by Andy Liaw and Matt Wiener:
13 | c The main program is re-written as a C function to be called from R.
14 | c All calls to the uniform RNG is replaced with R's RNG. Some subroutines
15 | c not called are excluded. Variables and arrays declared as double as
16 | c needed. Unused variables are deleted.
17 | c
18 | c SUBROUTINE BUILDTREE
19 |
20 | subroutine buildtree(a, b, cl, cat, maxcat, mdim, nsample,
21 | 1 nclass, treemap, bestvar, bestsplit, bestsplitnext, tgini,
22 | 1 nodestatus,nodepop, nodestart, classpop, tclasspop,
23 | 1 tclasscat,ta,nrnodes, idmove, ndsize, ncase, mtry, iv,
24 | 1 nodeclass, ndbigtree, win, wr, wl, mred, nuse, mind)
25 |
26 | c Buildtree consists of repeated calls to two subroutines, Findbestsplit
27 | c and Movedata. Findbestsplit does just that--it finds the best split of
28 | c the current node. Movedata moves the data in the split node right and
29 | c left so that the data corresponding to each child node is contiguous.
30 | c The buildtree bookkeeping is different from that in Friedman's original
31 | c CART program. ncur is the total number of nodes to date.
32 | c nodestatus(k)=1 if the kth node has been split. nodestatus(k)=2 if the
33 | c node exists but has not yet been split, and =-1 of the node is terminal.
34 | c A node is terminal if its size is below a threshold value, or if it is
35 | c all one class, or if all the x-values are equal. If the current node k
36 | c is split, then its children are numbered ncur+1 (left), and
37 | c ncur+2(right), ncur increases to ncur+2 and the next node to be split is
38 | c numbered k+1. When no more nodes can be split, buildtree returns to the
39 | c main program.
40 |
41 | implicit double precision(a-h,o-z)
42 | integer a(mdim, nsample), cl(nsample), cat(mdim),
43 | 1 treemap(2,nrnodes), bestvar(nrnodes),
44 | 1 bestsplit(nrnodes), nodestatus(nrnodes), ta(nsample),
45 | 1 nodepop(nrnodes), nodestart(nrnodes),
46 | 1 bestsplitnext(nrnodes), idmove(nsample),
47 | 1 ncase(nsample), b(mdim,nsample),
48 | 1 iv(mred), nodeclass(nrnodes), mind(mred)
49 |
50 | double precision tclasspop(nclass), classpop(nclass, nrnodes),
51 | 1 tclasscat(nclass, 32), win(nsample), wr(nclass),
52 | 1 wl(nclass), tgini(mdim), xrand
53 | integer msplit, ntie
54 |
55 | msplit = 0
56 | call zerv(nodestatus,nrnodes)
57 | call zerv(nodestart,nrnodes)
58 | call zerv(nodepop,nrnodes)
59 | call zermr(classpop,nclass,nrnodes)
60 |
61 | do j=1,nclass
62 | classpop(j, 1) = tclasspop(j)
63 | end do
64 | ncur = 1
65 | nodestart(1) = 1
66 | nodepop(1) = nuse
67 | nodestatus(1) = 2
68 | c start main loop
69 | do 30 kbuild = 1, nrnodes
70 | if (kbuild .gt. ncur) goto 50
71 | if (nodestatus(kbuild) .ne. 2) goto 30
72 | c initialize for next call to findbestsplit
73 | ndstart = nodestart(kbuild)
74 | ndend = ndstart + nodepop(kbuild) - 1
75 | do j = 1, nclass
76 | tclasspop(j) = classpop(j,kbuild)
77 | end do
78 | jstat = 0
79 |
80 | call findbestsplit(a,b,cl,mdim,nsample,nclass,cat,maxcat,
81 | 1 ndstart, ndend,tclasspop,tclasscat,msplit, decsplit,
82 | 1 nbest,ncase, jstat,mtry,win,wr,wl,mred,mind)
83 | if (jstat .eq. -1) then
84 | nodestatus(kbuild) = -1
85 | goto 30
86 | else
87 | bestvar(kbuild) = msplit
88 | iv(msplit) = 1
89 | if (decsplit .lt. 0.0) decsplit = 0.0
90 | tgini(msplit) = tgini(msplit) + decsplit
91 | if (cat(msplit) .eq. 1) then
92 | bestsplit(kbuild) = a(msplit,nbest)
93 | bestsplitnext(kbuild) = a(msplit,nbest+1)
94 | else
95 | bestsplit(kbuild) = nbest
96 | bestsplitnext(kbuild) = 0
97 | endif
98 | endif
99 |
100 | call movedata(a,ta,mdim,nsample,ndstart,ndend,idmove,ncase,
101 | 1 msplit,cat,nbest,ndendl)
102 |
103 | c leftnode no.= ncur+1, rightnode no. = ncur+2.
104 | nodepop(ncur+1) = ndendl - ndstart + 1
105 | nodepop(ncur+2) = ndend - ndendl
106 | nodestart(ncur+1) = ndstart
107 | nodestart(ncur+2) = ndendl + 1
108 |
109 | c find class populations in both nodes
110 | do n = ndstart, ndendl
111 | nc = ncase(n)
112 | j=cl(nc)
113 | classpop(j,ncur+1) = classpop(j,ncur+1) + win(nc)
114 | end do
115 | do n = ndendl+1, ndend
116 | nc = ncase(n)
117 | j = cl(nc)
118 | classpop(j,ncur+2) = classpop(j,ncur+2) + win(nc)
119 | end do
120 | c check on nodestatus
121 | nodestatus(ncur+1) = 2
122 | nodestatus(ncur+2) = 2
123 | if (nodepop(ncur+1).le.ndsize) nodestatus(ncur+1) = -1
124 | if (nodepop(ncur+2).le.ndsize) nodestatus(ncur+2) = -1
125 | popt1 = 0
126 | popt2 = 0
127 | do j = 1, nclass
128 | popt1 = popt1 + classpop(j,ncur+1)
129 | popt2 = popt2 + classpop(j,ncur+2)
130 | end do
131 |
132 | do j=1,nclass
133 | if (classpop(j,ncur+1).eq.popt1) nodestatus(ncur+1) = -1
134 | if (classpop(j,ncur+2).eq.popt2) nodestatus(ncur+2) = -1
135 | end do
136 |
137 | treemap(1,kbuild) = ncur + 1
138 | treemap(2,kbuild) = ncur + 2
139 | nodestatus(kbuild) = 1
140 | ncur = ncur+2
141 | if (ncur.ge.nrnodes) goto 50
142 |
143 | 30 continue
144 | 50 continue
145 |
146 | ndbigtree = nrnodes
147 | do k=nrnodes, 1, -1
148 | if (nodestatus(k) .eq. 0) ndbigtree = ndbigtree - 1
149 | if (nodestatus(k) .eq. 2) nodestatus(k) = -1
150 | end do
151 |
152 | c form prediction in terminal nodes
153 | do kn = 1, ndbigtree
154 | if(nodestatus(kn) .eq. -1) then
155 | pp = 0
156 | ntie = 1
157 | do j = 1, nclass
158 | if (classpop(j,kn) .gt. pp) then
159 | nodeclass(kn) = j
160 | pp = classpop(j,kn)
161 | end if
162 | c Break ties at random:
163 | if (classpop(j,kn) .eq. pp) then
164 | ntie = ntie + 1
165 | call rrand(xrand)
166 | if (xrand .lt. 1.0 / ntie) then
167 | nodeclass(kn)=j
168 | pp=classpop(j,kn)
169 | end if
170 | end if
171 | end do
172 | end if
173 | end do
174 |
175 | end
176 |
177 | c SUBROUTINE FINDBESTSPLIT
178 | c For the best split, msplit is the variable split on. decsplit is the
179 | c dec. in impurity. If msplit is numerical, nsplit is the case number
180 | c of value of msplit split on, and nsplitnext is the case number of the
181 | c next larger value of msplit. If msplit is categorical, then nsplit is
182 | c the coding into an integer of the categories going left.
183 | subroutine findbestsplit(a, b, cl, mdim, nsample, nclass, cat,
184 | 1 maxcat, ndstart, ndend, tclasspop, tclasscat, msplit,
185 | 2 decsplit, nbest, ncase, jstat, mtry, win, wr, wl,
186 | 3 mred, mind)
187 | implicit double precision(a-h,o-z)
188 | integer a(mdim,nsample), cl(nsample), cat(mdim),
189 | 1 ncase(nsample), b(mdim,nsample), nn, j
190 | double precision tclasspop(nclass), tclasscat(nclass,32), dn(32),
191 | 1 win(nsample), wr(nclass), wl(nclass), xrand
192 | integer mind(mred), ncmax, ncsplit,nhit, ntie
193 | ncmax = 10
194 | ncsplit = 512
195 | c compute initial values of numerator and denominator of Gini
196 | pno = 0.0
197 | pdo = 0.0
198 | do j = 1, nclass
199 | pno = pno + tclasspop(j) * tclasspop(j)
200 | pdo = pdo + tclasspop(j)
201 | end do
202 | crit0 = pno / pdo
203 | jstat = 0
204 |
205 | c start main loop through variables to find best split
206 | critmax = -1.0e25
207 | do k = 1, mred
208 | mind(k) = k
209 | end do
210 | nn = mred
211 | c sampling mtry variables w/o replacement.
212 | do mt = 1, mtry
213 | call rrand(xrand)
214 | j = int(nn * xrand) + 1
215 | mvar = mind(j)
216 | mind(j) = mind(nn)
217 | mind(nn) = mvar
218 | nn = nn - 1
219 | lcat = cat(mvar)
220 | if (lcat .eq. 1) then
221 | c Split on a numerical predictor.
222 | rrn = pno
223 | rrd = pdo
224 | rln = 0
225 | rld = 0
226 | call zervr(wl, nclass)
227 | do j = 1, nclass
228 | wr(j) = tclasspop(j)
229 | end do
230 | ntie = 1
231 | do nsp = ndstart, ndend-1
232 | nc = a(mvar, nsp)
233 | u = win(nc)
234 | k = cl(nc)
235 | rln = rln + u * (2 * wl(k) + u)
236 | rrn = rrn + u * (-2 * wr(k) + u)
237 | rld = rld + u
238 | rrd = rrd - u
239 | wl(k) = wl(k) + u
240 | wr(k) = wr(k) - u
241 | if (b(mvar, nc) .lt. b(mvar, a(mvar, nsp + 1))) then
242 | c If neither nodes is empty, check the split.
243 | if (dmin1(rrd, rld) .gt. 1.0e-5) then
244 | crit = (rln / rld) + (rrn / rrd)
245 | if (crit .gt. critmax) then
246 | nbest = nsp
247 | critmax = crit
248 | msplit = mvar
249 | end if
250 | c Break ties at random:
251 | if (crit .eq. critmax) then
252 | ntie = ntie + 1
253 | call rrand(xrand)
254 | if (xrand .lt. 1.0 / ntie) then
255 | nbest = nsp
256 | critmax = crit
257 | msplit = mvar
258 | end if
259 | end if
260 | end if
261 | end if
262 | end do
263 | else
264 | c Split on a categorical predictor. Compute the decrease in impurity.
265 | call zermr(tclasscat, nclass, 32)
266 | do nsp = ndstart, ndend
267 | nc = ncase(nsp)
268 | l = a(mvar, ncase(nsp))
269 | tclasscat(cl(nc), l) = tclasscat(cl(nc), l) + win(nc)
270 | end do
271 | nnz = 0
272 | do i = 1, lcat
273 | su = 0
274 | do j = 1, nclass
275 | su = su + tclasscat(j, i)
276 | end do
277 | dn(i) = su
278 | if(su .gt. 0) nnz = nnz + 1
279 | end do
280 | nhit = 0
281 | if (nnz .gt. 1) then
282 | if (nclass .eq. 2 .and. lcat .gt. ncmax) then
283 | call catmaxb(pdo, tclasscat, tclasspop, nclass,
284 | & lcat, nbest, critmax, nhit, dn)
285 | else
286 | call catmax(pdo, tclasscat, tclasspop, nclass, lcat,
287 | & nbest, critmax, nhit, maxcat, ncmax, ncsplit)
288 | end if
289 | if (nhit .eq. 1) msplit = mvar
290 | c else
291 | c critmax = -1.0e25
292 | end if
293 | end if
294 | end do
295 | if (critmax .lt. -1.0e10 .or. msplit .eq. 0) jstat = -1
296 | decsplit = critmax - crit0
297 | return
298 | end
299 |
300 | C ==============================================================
301 | c SUBROUTINE MOVEDATA
302 | c This subroutine is the heart of the buildtree construction. Based on the
303 | c best split the data in the part of the a matrix corresponding to the
304 | c current node is moved to the left if it belongs to the left child and
305 | c right if it belongs to the right child.
306 |
307 | subroutine movedata(a,ta,mdim,nsample,ndstart,ndend,idmove,
308 | 1 ncase,msplit,cat,nbest,ndendl)
309 | implicit double precision(a-h,o-z)
310 | integer a(mdim,nsample),ta(nsample),idmove(nsample),
311 | 1 ncase(nsample),cat(mdim),icat(32)
312 |
313 | c compute idmove=indicator of case nos. going left
314 |
315 | if (cat(msplit).eq.1) then
316 | do nsp=ndstart,nbest
317 | nc=a(msplit,nsp)
318 | idmove(nc)=1
319 | end do
320 | do nsp=nbest+1, ndend
321 | nc=a(msplit,nsp)
322 | idmove(nc)=0
323 | end do
324 | ndendl=nbest
325 | else
326 | ndendl=ndstart-1
327 | l=cat(msplit)
328 | call myunpack(l,nbest,icat)
329 | do nsp=ndstart,ndend
330 | nc=ncase(nsp)
331 | if (icat(a(msplit,nc)).eq.1) then
332 | idmove(nc)=1
333 | ndendl=ndendl+1
334 | else
335 | idmove(nc)=0
336 | endif
337 | end do
338 | endif
339 |
340 | c shift case. nos. right and left for numerical variables.
341 |
342 | do 40 msh=1,mdim
343 | if (cat(msh).eq.1) then
344 | k=ndstart-1
345 | do 50 n=ndstart,ndend
346 | ih=a(msh,n)
347 | if (idmove(ih).eq.1) then
348 | k=k+1
349 | ta(k)=a(msh,n)
350 | endif
351 | 50 continue
352 | do 60 n=ndstart,ndend
353 | ih=a(msh,n)
354 | if (idmove(ih).eq.0) then
355 | k=k+1
356 | ta(k)=a(msh,n)
357 | endif
358 | 60 continue
359 |
360 | do 70 k=ndstart,ndend
361 | a(msh,k)=ta(k)
362 | 70 continue
363 | endif
364 |
365 | 40 continue
366 | ndo=0
367 | if (ndo.eq.1) then
368 | do 140 msh = 1, mdim
369 | if (cat(msh) .gt. 1) then
370 | k = ndstart - 1
371 | do 150 n = ndstart, ndend
372 | ih = ncase(n)
373 | if (idmove(ih) .eq. 1) then
374 | k = k + 1
375 | ta(k) = a(msh, ih)
376 | endif
377 | 150 continue
378 | do 160 n = ndstart, ndend
379 | ih = ncase(n)
380 | if (idmove(ih) .eq. 0) then
381 | k = k + 1
382 | ta(k) = a(msh,ih)
383 | endif
384 | 160 continue
385 |
386 | do 170 k = ndstart, ndend
387 | a(msh,k) = ta(k)
388 | 170 continue
389 | endif
390 |
391 | 140 continue
392 | end if
393 |
394 | c compute case nos. for right and left nodes.
395 |
396 | if (cat(msplit).eq.1) then
397 | do 80 n=ndstart,ndend
398 | ncase(n)=a(msplit,n)
399 | 80 continue
400 | else
401 | k=ndstart-1
402 | do 90 n=ndstart, ndend
403 | if (idmove(ncase(n)).eq.1) then
404 | k=k+1
405 | ta(k)=ncase(n)
406 | endif
407 | 90 continue
408 | do 100 n=ndstart,ndend
409 | if (idmove(ncase(n)).eq.0) then
410 | k=k+1
411 | ta(k)=ncase(n)
412 | endif
413 | 100 continue
414 | do 110 k=ndstart,ndend
415 | ncase(k)=ta(k)
416 | 110 continue
417 | endif
418 |
419 | end
420 |
421 | subroutine myunpack(l,npack,icat)
422 |
423 | c npack is a long integer. The sub. returns icat, an integer of zeroes and
424 | c ones corresponding to the coefficients in the binary expansion of npack.
425 |
426 | integer icat(32),npack
427 | do j=1,32
428 | icat(j)=0
429 | end do
430 | n=npack
431 | icat(1)=mod(n,2)
432 | do k=2,l
433 | n=(n-icat(k-1))/2
434 | icat(k)=mod(n,2)
435 | end do
436 | end
437 |
438 | subroutine zerv(ix,m1)
439 | integer ix(m1)
440 | do n=1,m1
441 | ix(n)=0
442 | end do
443 | end
444 |
445 | subroutine zervr(rx,m1)
446 | double precision rx(m1)
447 | do n=1,m1
448 | rx(n)=0.0d0
449 | end do
450 | end
451 |
452 | subroutine zerm(mx,m1,m2)
453 | integer mx(m1,m2)
454 | do i=1,m1
455 | do j=1,m2
456 | mx(i,j)=0
457 | end do
458 | end do
459 | end
460 |
461 | subroutine zermr(rx,m1,m2)
462 | double precision rx(m1,m2)
463 | do i=1,m1
464 | do j=1,m2
465 | rx(i,j)=0.0d0
466 | end do
467 | end do
468 | end
469 |
470 | subroutine zermd(rx,m1,m2)
471 | double precision rx(m1,m2)
472 | do i=1,m1
473 | do j=1,m2
474 | rx(i,j)=0.0d0
475 | end do
476 | end do
477 | end
478 |
--------------------------------------------------------------------------------
/classification/rfutils.c:
--------------------------------------------------------------------------------
1 | /*******************************************************************
2 | Copyright (C) 2001-7 Leo Breiman, Adele Cutler and Merck & Co., Inc.
3 |
4 | This program is free software; you can redistribute it and/or
5 | modify it under the terms of the GNU General Public License
6 | as published by the Free Software Foundation; either version 2
7 | of the License, or (at your option) any later version.
8 |
9 | This program is distributed in the hope that it will be useful,
10 | but WITHOUT ANY WARRANTY; without even the implied warranty of
11 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 | GNU General Public License for more details.
13 | *******************************************************************/
14 | //#include
15 | #include "rf.h"
16 | #include "memory.h"
17 | #include "stdlib.h"
18 | #include "qsort.c"
19 |
20 | #define MAX_UINT_COKUS 4294967295 //basically 2^32-1
21 |
22 | typedef unsigned long uint32;
23 | extern void seedMT(uint32 seed);
24 | extern uint32 reloadMT(void);
25 | extern uint32 randomMT(void);
26 | extern double unif_rand();
27 |
28 | void zeroInt(int *x, int length) {
29 | memset(x, 0, length * sizeof(int));
30 | }
31 |
32 | void zeroDouble(double *x, int length) {
33 | memset(x, 0, length * sizeof(double));
34 | }
35 |
36 | void createClass(double *x, int realN, int totalN, int mdim) {
37 | /* Create the second class by bootstrapping each variable independently. */
38 | int i, j, k;
39 | for (i = realN; i < totalN; ++i) {
40 | for (j = 0; j < mdim; ++j) {
41 | k = (int) (unif_rand() * realN);
42 | x[j + i * mdim] = x[j + k * mdim];
43 | }
44 | }
45 | }
46 |
47 | #include "stdio.h"
48 | void normClassWt(int *cl, const int nsample, const int nclass,
49 | const int useWt, double *classwt, int *classFreq) {
50 | int i;
51 | double sumwt = 0.0;
52 | //printf("useWt %d",useWt);
53 | if (useWt) {
54 | //printf("User supplied via priors classwt");
55 | /* Normalize user-supplied weights so they sum to one. */
56 | for (i = 0; i < nclass; ++i) sumwt += classwt[i];
57 | //printf("\n sumwt %f",sumwt);
58 | for (i = 0; i < nclass; ++i) classwt[i] /= sumwt;
59 | } else {
60 | for (i = 0; i < nclass; ++i) {
61 | classwt[i] = ((double) classFreq[i]) / nsample;
62 | }
63 | }
64 | for (i = 0; i < nclass; ++i) {
65 | classwt[i] = classFreq[i] ? classwt[i] * nsample / classFreq[i] : 0.0;
66 | }
67 | }
68 |
69 | void makeA(double *x, const int mdim, const int nsample, int *cat, int *a,
70 | int *b) {
71 | /* makeA() constructs the mdim by nsample integer array a. For each
72 | numerical variable with values x(m, n), n=1, ...,nsample, the x-values
73 | are sorted from lowest to highest. Denote these by xs(m, n). Then
74 | a(m,n) is the case number in which xs(m, n) occurs. The b matrix is
75 | also contructed here. If the mth variable is categorical, then
76 | a(m, n) is the category of the nth case number. */
77 | int i, j, n1, n2;
78 | double *v= (double *) calloc(nsample, sizeof(double));
79 | int *index = (int *) calloc(nsample, sizeof(int));
80 |
81 | for (i = 0; i < mdim; ++i) {
82 | if (cat[i] == 1) { /* numerical predictor */
83 | for (j = 0; j < nsample; ++j) {
84 | v[j] = x[i + j * mdim];
85 | index[j] = j + 1;
86 | }
87 | R_qsort_I(v, index, 1, nsample);
88 |
89 | /* this sorts the v(n) in ascending order. index(n) is the case
90 | number of that v(n) nth from the lowest (assume the original
91 | case numbers are 1,2,...). */
92 | for (j = 0; j < nsample-1; ++j) {
93 | n1 = index[j];
94 | n2 = index[j + 1];
95 | a[i + j * mdim] = n1;
96 | if (j == 0) b[i + (n1-1) * mdim] = 1;
97 | b[i + (n2-1) * mdim] = (v[j] < v[j + 1]) ?
98 | b[i + (n1-1) * mdim] + 1 : b[i + (n1-1) * mdim];
99 | }
100 | a[i + (nsample-1) * mdim] = index[nsample-1];
101 | } else { /* categorical predictor */
102 | for (j = 0; j < nsample; ++j)
103 | a[i + j*mdim] = (int) x[i + j * mdim];
104 | }
105 | }
106 | free(index);
107 | free(v);
108 | }
109 |
110 |
111 | void modA(int *a, int *nuse, const int nsample, const int mdim,
112 | int *cat, const int maxcat, int *ncase, int *jin) {
113 | int i, j, k, m, nt;
114 |
115 | *nuse = 0;
116 | for (i = 0; i < nsample; ++i) if (jin[i]) (*nuse)++;
117 |
118 | for (i = 0; i < mdim; ++i) {
119 | k = 0;
120 | nt = 0;
121 | if (cat[i] == 1) {
122 | for (j = 0; j < nsample; ++j) {
123 | if (jin[a[i + k * mdim] - 1]) {
124 | a[i + nt * mdim] = a[i + k * mdim];
125 | k++;
126 | } else {
127 | for (m = 0; m < nsample - k; ++m) {
128 | if (jin[a[i + (k + m) * mdim] - 1]) {
129 | a[i + nt * mdim] = a[i + (k + m) * mdim];
130 | k += m + 1;
131 | break;
132 | }
133 | }
134 | }
135 | nt++;
136 | if (nt >= *nuse) break;
137 | }
138 | }
139 | }
140 | if (maxcat > 1) {
141 | k = 0;
142 | nt = 0;
143 | for (i = 0; i < nsample; ++i) {
144 | if (jin[k]) {
145 | k++;
146 | ncase[nt] = k;
147 | } else {
148 | for (j = 0; j < nsample - k; ++j) {
149 | if (jin[k + j]) {
150 | ncase[nt] = k + j + 1;
151 | k += j + 1;
152 | break;
153 | }
154 | }
155 | }
156 | nt++;
157 | if (nt >= *nuse) break;
158 | }
159 | }
160 | }
161 |
162 | void Xtranslate(double *x, int mdim, int nrnodes, int nsample,
163 | int *bestvar, int *bestsplit, int *bestsplitnext,
164 | double *xbestsplit, int *nodestatus, int *cat, int treeSize) {
165 | /*
166 | this subroutine takes the splits on numerical variables and translates them
167 | back into x-values. It also unpacks each categorical split into a
168 | 32-dimensional vector with components of zero or one--a one indicates that
169 | the corresponding category goes left in the split.
170 | */
171 |
172 | int i, m;
173 |
174 | for (i = 0; i < treeSize; ++i) {
175 | if (nodestatus[i] == 1) {
176 | m = bestvar[i] - 1;
177 | if (cat[m] == 1) {
178 | xbestsplit[i] = 0.5 * (x[m + (bestsplit[i] - 1) * mdim] +
179 | x[m + (bestsplitnext[i] - 1) * mdim]);
180 | } else {
181 | xbestsplit[i] = (double) bestsplit[i];
182 | }
183 | }
184 | }
185 | }
186 |
187 | void permuteOOB(int m, double *x, int *in, int nsample, int mdim) {
188 | /* Permute the OOB part of a variable in x.
189 | * Argument:
190 | * m: the variable to be permuted
191 | * x: the data matrix (variables in rows)
192 | * in: vector indicating which case is OOB
193 | * nsample: number of cases in the data
194 | * mdim: number of variables in the data
195 | */
196 | double *tp, tmp;
197 | int i, last, k, nOOB = 0;
198 |
199 | tp = (double *) calloc(nsample, sizeof(double));
200 |
201 | for (i = 0; i < nsample; ++i) {
202 | /* make a copy of the OOB part of the data into tp (for permuting) */
203 | if (in[i] == 0) {
204 | tp[nOOB] = x[m + i*mdim];
205 | nOOB++;
206 | }
207 | }
208 | /* Permute tp */
209 | last = nOOB;
210 | for (i = 0; i < nOOB; ++i) {
211 | k = (int) (last * unif_rand());
212 | tmp = tp[last - 1];
213 | tp[last - 1] = tp[k];
214 | tp[k] = tmp;
215 | last--;
216 | }
217 |
218 | /* Copy the permuted OOB data back into x. */
219 | nOOB = 0;
220 | for (i = 0; i < nsample; ++i) {
221 | if (in[i] == 0) {
222 | x[m + i*mdim] = tp[nOOB];
223 | nOOB++;
224 | }
225 | }
226 | free(tp);
227 | }
228 |
229 | /* Compute proximity. */
230 | void computeProximity(double *prox, int oobprox, int *node, int *inbag,
231 | int *oobpair, int n) {
232 | /* Accumulate the number of times a pair of points fall in the same node.
233 | prox: n x n proximity matrix
234 | oobprox: should the accumulation only count OOB cases? (0=no, 1=yes)
235 | node: vector of terminal node labels
236 | inbag: indicator of whether a case is in-bag
237 | oobpair: matrix to accumulate the number of times a pair is OOB together
238 | n: total number of cases
239 | */
240 | int i, j;
241 | for (i = 0; i < n; ++i) {
242 | for (j = i+1; j < n; ++j) {
243 | if (oobprox) {
244 | /* if (jin[k] == 0 && jin[n] == 0) { */
245 | if ((inbag[i] > 0) || (inbag[j] > 0)) {
246 | oobpair[j*n + i] ++;
247 | oobpair[i*n + j] ++;
248 | if (node[i] == node[j]) {
249 | prox[j*n + i] += 1.0;
250 | prox[i*n + j] += 1.0;
251 | }
252 | }
253 | } else {
254 | if (node[i] == node[j]) {
255 | prox[j*n + i] += 1.0;
256 | prox[i*n + j] += 1.0;
257 | }
258 | }
259 | }
260 | }
261 | }
262 |
263 | int pack(int nBits, int *bits) {
264 | int i = nBits, pack = 0;
265 | while (--i >= 0) pack += bits[i] << i;
266 | return(pack);
267 | }
268 |
269 | void unpack(unsigned int pack, int *bits) {
270 | /* pack is a 4-byte integer. The sub. returns icat, an integer array of
271 | zeroes and ones corresponding to the coefficients in the binary expansion
272 | of pack. */
273 | int i;
274 | for (i = 0; pack != 0; pack >>= 1, ++i) bits[i] = pack & 1;
275 | }
276 |
277 | #ifdef OLD
278 |
279 | double oldpack(int l, int *icat) {
280 | /* icat is a binary integer with ones for categories going left
281 | * and zeroes for those going right. The sub returns npack- the integer */
282 | int k;
283 | double pack = 0.0;
284 |
285 | for (k = 0; k < l; ++k) {
286 | if (icat[k]) pack += R_pow_di(2.0, k);
287 | }
288 | return(pack);
289 | }
290 |
291 |
292 | void oldunpack(int l, int npack, int *icat) {
293 | /*
294 | * npack is a long integer. The sub. returns icat, an integer of zeroes and
295 | * ones corresponding to the coefficients in the binary expansion of npack.
296 | */
297 | int i;
298 | zeroInt(icat, 32);
299 | icat[0] = npack % 2;
300 | for (i = 1; i < l; ++i) {
301 | npack = (npack - icat[i-1]) / 2;
302 | icat[i] = npack % 2;
303 | }
304 | }
305 |
306 |
307 |
308 | #endif /* OLD */
309 |
--------------------------------------------------------------------------------
/classification/utilities.c:
--------------------------------------------------------------------------------
1 |
2 | #include
3 | #include
4 | #include
5 | #include
6 | #include
7 | #include
8 | #include
9 |
10 |
11 | #include "utilities.h"
12 |
13 |
14 | /*****************************************************************************
15 | NAME: write_message
16 |
17 | PURPOSE: Writes a formatted log message to the specified file handle.
18 |
19 | RETURN VALUE: None
20 |
21 | NOTES:
22 | - Log Message Format:
23 | yyyy-mm-dd HH:mm:ss pid:module [filename]:line message
24 | *****************************************************************************/
25 | void write_message
26 | (
27 | const char *message, /* I: message to write to the log */
28 | const char *module, /* I: module the message is from */
29 | const char *type, /* I: type of the error */
30 | char *file, /* I: file the message was generated in */
31 | int line, /* I: line number in the file where the message was
32 | generated */
33 | FILE *fd /* I: where to write the log message */
34 | )
35 | {
36 | time_t current_time;
37 | struct tm *time_info;
38 | int year;
39 | pid_t pid;
40 |
41 | time (¤t_time);
42 | time_info = localtime (¤t_time);
43 | year = time_info->tm_year + 1900;
44 |
45 | pid = getpid ();
46 |
47 | fprintf (fd, "%04d:%02d:%02d %02d:%02d:%02d %d:%s [%s]:%d [%s]:%s\n",
48 | year,
49 | time_info->tm_mon,
50 | time_info->tm_mday,
51 | time_info->tm_hour,
52 | time_info->tm_min,
53 | time_info->tm_sec,
54 | pid, module, basename (file), line, type, message);
55 | }
56 |
--------------------------------------------------------------------------------
/classification/utilities.h:
--------------------------------------------------------------------------------
1 |
2 | #ifndef UTILITIES_H
3 | #define UTILITIES_H
4 |
5 |
6 | #include
7 |
8 |
9 | #define LOG_MESSAGE(message, module) \
10 | write_message((message), (module), "INFO", \
11 | __FILE__, __LINE__, stdout);
12 |
13 |
14 | #define WARNING_MESSAGE(message, module) \
15 | write_message((message), (module), "WARNING", \
16 | __FILE__, __LINE__, stdout);
17 |
18 |
19 | #define ERROR_MESSAGE(message, module) \
20 | write_message((message), (module), "ERROR", \
21 | __FILE__, __LINE__, stdout);
22 |
23 |
24 | #define RETURN_ERROR(message, module, status) \
25 | {write_message((message), (module), "ERROR", \
26 | __FILE__, __LINE__, stdout); \
27 | return (status);}
28 |
29 |
30 | void write_message
31 | (
32 | const char *message, /* I: message to write to the log */
33 | const char *module, /* I: module the message is from */
34 | const char *type, /* I: type of the error */
35 | char *file, /* I: file the message was generated in */
36 | int line, /* I: line number in the file where the message was
37 | generated */
38 | FILE * fd /* I: where to write the log message */
39 | );
40 |
41 |
42 | #endif /* UTILITIES_H */
43 |
--------------------------------------------------------------------------------
/docker/Makefile:
--------------------------------------------------------------------------------
1 | TAG_PREFIX = usgseros
2 | DOCKERHUB_ORG = $(TAG_PREFIX)
3 |
4 | all: debian-c-ccdc ubuntu-c-ccdc
5 |
6 | publish-docker: all debian-publish-c-ccdc ubuntu-publish-c-ccdc
7 |
8 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9 | # Common
10 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 | #
12 | # Note that the format of these base/common targets is:
13 | # >-$ docker build -t
14 |
15 | base-c-ccdc:
16 | @docker build -t $(TAG_PREFIX)/$(SYSTEM)-c-ccdc $(SYSTEM)
17 |
18 | base-publish:
19 | @docker push $(DOCKERHUB_ORG)/$(REPO)
20 |
21 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22 | # Debian
23 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24 |
25 | debian-c-ccdc:
26 | @SYSTEM=debian make base-c-ccdc
27 |
28 | debian-publish-c-ccdc:
29 | @REPO=debian-c-ccdc make base-publish
30 |
31 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 | # Ubuntu
33 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
34 |
35 | ubuntu-c-ccdc:
36 | @SYSTEM=ubuntu make base-c-ccdc
37 |
38 | ubuntu-publish-c-ccdc:
39 | @REPO=ubuntu-c-ccdc make base-publish
40 |
41 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
42 | # CentOS
43 | #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44 |
45 | # TBD
46 |
47 |
--------------------------------------------------------------------------------
/docker/debian/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM usgseros/debian-python
2 | MAINTAINER USGS LCMAP http://eros.usgs.gov
3 |
4 | RUN apt-get install -y libgsl0-dev libgsl0ldbl gsl-bin \
5 | libmatio-dev libmatio2 gfortran
6 |
7 | ENV CCDC_BIN /root/bin
8 | RUN mkdir $CCDC_BIN
9 |
10 | RUN git clone https://github.com/USGS-EROS/lcmap-change-detection-c.git
11 | RUN cd lcmap-change-detection-c && \
12 | BIN=$CCDC_BIN make
13 |
14 | ENTRYPOINT ["/root/bin/ccdc"]
15 |
16 |
--------------------------------------------------------------------------------
/docker/debian/run.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import subprocess
3 |
4 |
5 | parser = argparse.ArgumentParser()
6 | parser.add_argument("--arg-1", help="an arg")
7 | parser.add_argument("--arg-2", help="another arg")
8 | args = parser.parse_args()
9 | #print(subprocess.check_output(["echo", "Testing output; args:", args.arg_1, args.arg_2]))
10 | print("Testing output; args:", args.arg_1, args.arg_2)
11 |
12 |
--------------------------------------------------------------------------------
/docker/ubuntu/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM usgseros/ubuntu-python
2 | MAINTAINER USGS LCMAP http://eros.usgs.gov
3 |
4 | RUN apt-get install -y libgsl0-dev libgsl0ldbl gsl-bin \
5 | libmatio-dev libmatio2 gfortran
6 |
7 | ENV CCDC_BIN /root/bin
8 | RUN mkdir $CCDC_BIN
9 |
10 | RUN git clone https://github.com/USGS-EROS/lcmap-change-detection-c.git
11 | RUN cd lcmap-change-detection-c && \
12 | BIN=$CCDC_BIN make
13 |
14 | ENTRYPOINT ["/root/bin/ccdc"]
15 |
16 |
--------------------------------------------------------------------------------
/docker/ubuntu/run.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import subprocess
3 |
4 |
5 | parser = argparse.ArgumentParser()
6 | parser.add_argument("--arg-1", help="an arg")
7 | parser.add_argument("--arg-2", help="another arg")
8 | args = parser.parse_args()
9 | #print(subprocess.check_output(["echo", "Testing output; args:", args.arg_1, args.arg_2]))
10 | print("Testing output; args:", args.arg_1, args.arg_2)
11 |
12 |
--------------------------------------------------------------------------------
/docs/CCDC ADD CY5 V1.0.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/repository-preservation/lcmap-change-detection-c/fee9b1303719cd0e0045234a3f2a594303692ac0/docs/CCDC ADD CY5 V1.0.docx
--------------------------------------------------------------------------------
/docs/ccdc_work_flow.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/repository-preservation/lcmap-change-detection-c/fee9b1303719cd0e0045234a3f2a594303692ac0/docs/ccdc_work_flow.pdf
--------------------------------------------------------------------------------
/docs/flowchart_description.txt:
--------------------------------------------------------------------------------
1 | 1. Only landsat scene with more than 20% clear-sky pixels are used in ccdc analysis.
2 | 2. Only run CCDC for pixels where more than 50% of images has data
3 | 3. If more than 75% of valid data show a pixel is snow, then the pixel is considered permanent
4 | snow.
5 | 4. For permanent snow pixel, get model coefficients and RMSEs either from lasso regression if more than
6 | 12 valid data points or from median values if less tha 12 points.
7 | 5. At least 12 data points and 1 years of data are needed for clear-sky land/water pixel analysis.
8 | 6. Modle initialization: Further remove outliers (cloud/cloud shadow/snow) using Robust fit.
9 | 7. Model fitting: Do lasso regression and get the right starting point with observed values and
10 | lasso regressing predicated values
11 | 8. Stable Start: 12 data points and 1 years of data? If no, increses data points, otherwise, next step.
12 | 9. Model ready: Find the previous break point & use lasso model to detect change if there is enough points(6)
13 | or use median values if less than conse (6) points
14 | 10. Real change detected (Minimum magnitude of change is larger than a threshold and the first point of change
15 | is less than another threshold)?
16 | 11. If yes, then model coefficients, RMSEs, magnitude of change are generated through lasso regression
17 | 12. If no, the false detected point is removed.
18 | 13. Check to see if enough points for model fitting and break point confirmation? If yes, do lasso regression,
19 | otherwise, use median values to calculate RMSEs
20 | 14. Continuous monitoring: If first 24 points, dynamic model fitting for model coeffs, rmse, v_dif and calculate
21 | norm of v_dif & update IDs, otherwise, next step.
22 | 15. Updated data points as needed to use the closest 24 data points for model fitting
23 | 16. Detected change based on RMSEs and normalized value v_dif
24 | 17. If break happens close to the end of time serires, roboust fit outlier removing process is first appled,
25 | then if the data points after the change detected (break point) are less than conse (6) points, median
26 | values are used, otherwise, lasso model fitting is used.
27 | 18. Model outputs: model coefficients, RMSEs, change of magnitude, times of start, end, break, change probability,
28 | change category
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------