├── README.md ├── run_analysis.R ├── run_analysis.md └── CodeBook.md /README.md: -------------------------------------------------------------------------------- 1 | ### Accelerometers' Wearable Data - Getting and Cleaning Project 2 | 3 | The purpose of this project is to demonstrate my ability to collect, work with, and clean a data set. 4 | The goal is to prepare tidy data that can be used for later analysis. 5 | 6 | One of the most exciting areas in all of data science right now is wearable computing - see for example this article . 7 | Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. 8 | The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. 9 | A full description is available at the site where the data was obtained: 10 | 11 | http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones 12 | 13 | Here are the data for the project: 14 | 15 | https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip 16 | 17 | I will create one R script called run_analysis.R that does the following. 18 | 19 | 1) Merges the training and the test sets to create one data set. 20 | 2) Extracts only the measurements on the mean and standard deviation for each measurement. 21 | 3) Uses descriptive activity names to name the activities in the data set 22 | 4) Appropriately labels the data set with descriptive variable names. 23 | 5) From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject. 24 | -------------------------------------------------------------------------------- /run_analysis.R: -------------------------------------------------------------------------------- 1 | # We are loading the "dplyr" library (piping and its select function will be used) 2 | library(dplyr) 3 | # We are downloading the zip file 4 | filename <- "projectfiles_UCI_HAR_Dataset" 5 | zipURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" 6 | download.file(zipURL, filename) 7 | # We are checking if folder exists and unzipping it 8 | if (!dir.exists(filename)){unzip(filename)} 9 | # We are checking names of the directories available in our woring directory 10 | # in order to localise the one concerning us (e.g. "UCI HAR Dataset") 11 | list.dirs(getwd()) 12 | 13 | # ----- 1. 14 | # We are Loading all necessary files creating corresponding data frames: 15 | # - the complete list of variables of each feature vector 16 | features <- read.table("UCI HAR Dataset/features.txt", col.names = c("n","functions")) 17 | # - the complete list of activities and the corresponding class labels 18 | activities <- read.table("UCI HAR Dataset/activity_labels.txt", col.names = c("code", "activity")) 19 | # - the measurements of both sets (test and subject) for axes X and Y 20 | x_test <- read.table("UCI HAR Dataset/test/X_test.txt", col.names = features$functions) 21 | y_test <- read.table("UCI HAR Dataset/test/y_test.txt", col.names = "code") 22 | x_train <- read.table("UCI HAR Dataset/train/X_train.txt", col.names = features$functions) 23 | y_train <- read.table("UCI HAR Dataset/train/y_train.txt", col.names = "code") 24 | # - the subjects ids corresponding to each row of measurements for the training and the test sets 25 | subject_test <- read.table("UCI HAR Dataset/test/subject_test.txt", col.names = "subject") 26 | subject_train <- read.table("UCI HAR Dataset/train/subject_train.txt", col.names = "subject") 27 | 28 | # We are merging the rows of the training and the test sets for axe X 29 | x <- rbind(x_train, x_test) 30 | # We are merging the rows of the training and the test sets for axe Y 31 | y <- rbind(y_train, y_test) 32 | # We are merging the rows of subjects ids 33 | subject <- rbind(subject_train, subject_test) 34 | # We are merging the columns of subjects ids and measurements for axe X and Y for both sets (train, test) creating one unique data set. 35 | unique_data <- cbind(subject, x, y) 36 | 37 | # ----- 2. 38 | # We are extracting only the mean and standard deviation variable for each measurement, 39 | # that is to say we select only the columns whose name contains the sequences of letters "mean" or "std" 40 | tidy_data1 <- unique_data %>% select(subject, code, contains("mean"), contains("std")) 41 | 42 | # ----- 3. 43 | # We are changing the names of the activities. 44 | # We are replacing the codes with descriptions according to the key data frame "activities" 45 | tidy_data1$code <- activities[tidy_data1$code, 2] 46 | 47 | # ----- 4. 48 | # We are replacing the labels of the variables of the data set 49 | # with descriptive variable names (according to the description of the researchers) 50 | names(tidy_data1)[1] = "Subject" 51 | names(tidy_data1)[2] = "Activity" 52 | names(tidy_data1)<-gsub("^f", "Frequency", names(tidy_data1)) 53 | names(tidy_data1)<-gsub("^t", "Time", names(tidy_data1)) 54 | names(tidy_data1)<-gsub("-freq()", "Frequency", names(tidy_data1), ignore.case = TRUE) 55 | names(tidy_data1)<-gsub("-mean()", "Mean", names(tidy_data1), ignore.case = TRUE) 56 | names(tidy_data1)<-gsub("-std()", "STandardDeviation", names(tidy_data1), ignore.case = TRUE) 57 | names(tidy_data1)<-gsub("Acc", "Accelerometer", names(tidy_data1)) 58 | names(tidy_data1)<-gsub("angle", "Angle", names(tidy_data1)) 59 | names(tidy_data1)<-gsub("BodyBody", "Body", names(tidy_data1)) 60 | names(tidy_data1)<-gsub("gravity", "Gravity", names(tidy_data1)) 61 | names(tidy_data1)<-gsub("Gyro", "Gyroscope", names(tidy_data1)) 62 | names(tidy_data1)<-gsub("Mag", "Magnitude", names(tidy_data1)) 63 | names(tidy_data1)<-gsub("tBody", "TimeBody", names(tidy_data1)) 64 | names(tidy_data1)<-gsub("...", ".", names(tidy_data1), fixed=TRUE) 65 | names(tidy_data1)<-gsub("..", ".", names(tidy_data1), fixed=TRUE) 66 | names(tidy_data1)<-gsub("\\.$", "", names(tidy_data1)) 67 | 68 | # ----- 5. 69 | # We are creating a new data set grouping the measurements of the previous one by subject and activity 70 | # and calculating the average values for each activity and each subject 71 | Data <- tidy_data1 %>% 72 | group_by(Subject, Activity) %>% 73 | summarise_all(funs(mean)) 74 | # We are assigning the final data frame into a correspnding .txt file) 75 | write.table(Data, "Data.txt", row.name=FALSE) 76 | 77 | -------------------------------------------------------------------------------- /run_analysis.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | #### We are loading the "dplyr" library (piping and its select function will be used) 4 | library(dplyr) 5 | #### We are downloading the zip file 6 | filename <- "projectfiles_UCI_HAR_Dataset" 7 | zipURL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" 8 | download.file(zipURL, filename) 9 | #### We are checking if folder exists and unzipping it 10 | if (!dir.exists(filename)){unzip(filename)} 11 | #### We are checking names of the directories available in our woring directory in order to localise the one concerning us (e.g. "UCI HAR Dataset") 12 | list.dirs(getwd()) 13 | --- 14 | ### 1. 15 | #### We are Loading all necessary files creating corresponding data frames: 16 | ##### the complete list of variables of each feature vector 17 | features <- read.table("UCI HAR Dataset/features.txt", col.names = c("n","functions")) 18 | ##### - the complete list of activities and the corresponding class labels 19 | activities <- read.table("UCI HAR Dataset/activity_labels.txt", col.names = c("code", "activity")) 20 | ##### - the measurements of both sets (test and subject) for axes X and Y 21 | x_test <- read.table("UCI HAR Dataset/test/X_test.txt", col.names = features$functions) 22 | y_test <- read.table("UCI HAR Dataset/test/y_test.txt", col.names = "code") 23 | x_train <- read.table("UCI HAR Dataset/train/X_train.txt", col.names = features$functions) 24 | y_train <- read.table("UCI HAR Dataset/train/y_train.txt", col.names = "code") 25 | ##### - the subjects ids corresponding to each row of measurements for the training and the test sets 26 | subject_test <- read.table("UCI HAR Dataset/test/subject_test.txt", col.names = "subject") 27 | subject_train <- read.table("UCI HAR Dataset/train/subject_train.txt", col.names = "subject") 28 | 29 | #### We are merging the rows of the training and the test sets for axe X 30 | x <- rbind(x_train, x_test) 31 | #### We are merging the rows of the training and the test sets for axe Y 32 | y <- rbind(y_train, y_test) 33 | #### We are merging the rows of subjects ids 34 | subject <- rbind(subject_train, subject_test) 35 | #### We are merging the columns of subjects ids and measurements for axe X and Y for both sets (train, test) creating one unique data set. 36 | unique_data <- cbind(subject, x, y) 37 | --- 38 | ### 2. 39 | #### We are extracting only the mean and standard deviation variable for each measurement, that is to say we select only the columns whose name contains the sequences of letters "mean" or "std" 40 | tidy_data1 <- unique_data %>% select(subject, code, contains("mean"), contains("std")) 41 | --- 42 | ### 3. 43 | #### We are changing the names of the activities. 44 | #### We are replacing the codes with descriptions according to the key data frame "activities" 45 | tidy_data1$code <- activities[tidy_data1$code, 2] 46 | --- 47 | ### 4. 48 | #### We are replacing the labels of the variables of the data set with descriptive variable names (according to the description of the researchers) 49 | names(tidy_data1)[2] = "Subject" 50 | names(tidy_data1)[2] = "Activity" 51 | names(tidy_data1)<-gsub("^f", "Frequency", names(tidy_data1)) 52 | names(tidy_data1)<-gsub("^t", "Time", names(tidy_data1)) 53 | names(tidy_data1)<-gsub("-freq()", "Frequency", names(tidy_data1), ignore.case = TRUE) 54 | names(tidy_data1)<-gsub("-mean()", "Mean", names(tidy_data1), ignore.case = TRUE) 55 | names(tidy_data1)<-gsub("-std()", "STandardDeviation",names(tidy_data1), ignore.case = TRUE) 56 | names(tidy_data1)<-gsub("Acc", "Accelerometer", names(tidy_data1)) 57 | names(tidy_data1)<-gsub("angle", "Angle", names(tidy_data1)) 58 | names(tidy_data1)<-gsub("BodyBody", "Body", names(tidy_data1)) 59 | names(tidy_data1)<-gsub("gravity", "Gravity", names(tidy_data1)) 60 | names(tidy_data1)<-gsub("Gyro", "Gyroscope", names(tidy_data1)) 61 | names(tidy_data1)<-gsub("Mag", "Magnitude", names(tidy_data1)) 62 | names(tidy_data1)<-gsub("tBody", "TimeBody", names(tidy_data1)) 63 | 64 | #### We are replacing the double dots with one, and we are deleting the dots at the end of the variable names (when existed) 65 | names(tidy_data1)<-gsub("..", ".", names(tidy_data1), fixed=TRUE) 66 | names(tidy_data1)<-gsub("\\.$", "", names(tidy_data1)) 67 | 68 | --- 69 | ### 5. 70 | #### We are creating a new data set grouping the measurements of the previous one by subject and activity and calculating the average values for each activity and each subject 71 | Data <- tidy_data1 %>% 72 | group_by(Subject, Activity) %>% 73 | summarise_all(funs(mean)) 74 | #### We are assigning the final data frame into a correspnding .txt file) 75 | write.table(Data, "Data.txt") 76 | -------------------------------------------------------------------------------- /CodeBook.md: -------------------------------------------------------------------------------- 1 | #### The descriptive names of each one of the 88 variables of the date frame of the Data.txt file are based on the following brief dictionary: 2 | 3 | - BodyGyroscope: Body Angular Velocity 4 | - Accelerometer: Linear Acceleration 5 | - Body Accelerometer: Body Linear Accelartion 6 | - Gravity Accelerometer: Gravity Linear Acceleration 7 | - Angle: Angle between two vectors 8 | - Jerk: Jerk signal 9 | - Frequency 10 | - Time 11 | - Magnitude 12 | - X: axis X 13 | - Y: axis Y 14 | - Z: axis Z 15 | - mean 16 | - std: Standard Deviation 17 | 18 | --- 19 | Features are normalized and bounded within [-1,1]. 20 | 21 | --- 22 | #### The 88 variables of the data frame of the Data.txt file together with the corresponding number of their collumn 23 | 24 | ##### 1] Subject 25 | Each one of the 30 persons-subjects who carried out the experiment 26 | ##### 2] Activity 27 | - WALKING 28 | - WALKING_UPSTAIRS 29 | - WALKING_DOWNSTAIRS 30 | - SITTING 31 | - STANDING 32 | - LAYING 33 | 34 | ##### 3] TimeBodyAccelerometer.mean.X 35 | 36 | ##### 4] TimeBodyAccelerometer.mean.Y 37 | 38 | ##### 5] TimeBodyAccelerometer.mean.Z 39 | 40 | ##### 6] TimeGravityAccelerometer.mean.X 41 | 42 | ##### 7] TimeGravityAccelerometer.mean.Y 43 | 44 | ##### 8] TimeGravityAccelerometer.mean.Z 45 | 46 | ##### 9] TimeBodyAccelerometerJerk.mean.X 47 | 48 | ##### 10] TimeBodyAccelerometerJerk.mean.Y 49 | 50 | ##### 11] TimeBodyAccelerometerJerk.mean.Z 51 | 52 | ##### 12] TimeBodyGyroscope.mean.X 53 | 54 | ##### 13] TimeBodyGyroscope.mean.Y 55 | 56 | ##### 14] TimeBodyGyroscope.mean.Z 57 | 58 | ##### 15] TimeBodyGyroscopeJerk.mean.X 59 | 60 | ##### 16] TimeBodyGyroscopeJerk.mean.Y 61 | 62 | ##### 17] TimeBodyGyroscopeJerk.mean.Z 63 | 64 | ##### 18] TimeBodyAccelerometerMagnitude.mean 65 | 66 | ##### 19] TimeGravityAccelerometerMagnitude.mean 67 | 68 | ##### 20] TimeBodyAccelerometerJerkMagnitude.mean 69 | 70 | ##### 21] TimeBodyGyroscopeMagnitude.mean 71 | 72 | ##### 22] TimeBodyGyroscopeJerkMagnitude.mean 73 | 74 | ##### 23] FrequencyBodyAccelerometer.mean.X 75 | 76 | ##### 24] FrequencyBodyAccelerometer.mean.Y 77 | 78 | ##### 25] FrequencyBodyAccelerometer.mean.Z 79 | 80 | ##### 26] FrequencyBodyAccelerometer.meanFreq.X 81 | 82 | ##### 27] FrequencyBodyAccelerometer.meanFreq.Y 83 | 84 | ##### 28] FrequencyBodyAccelerometer.meanFreq.Z 85 | 86 | ##### 29] FrequencyBodyAccelerometerJerk.mean.X 87 | 88 | ##### 30] FrequencyBodyAccelerometerJerk.mean.Y 89 | 90 | ##### 31] FrequencyBodyAccelerometerJerk.mean.Z 91 | 92 | ##### 32] FrequencyBodyAccelerometerJerk.meanFreq.X 93 | 94 | ##### 33] FrequencyBodyAccelerometerJerk.meanFreq.Y 95 | 96 | ##### 34] FrequencyBodyAccelerometerJerk.meanFreq.Z 97 | 98 | ##### 35] FrequencyBodyGyroscope.mean.X 99 | 100 | ##### 36] FrequencyBodyGyroscope.mean.Y 101 | 102 | ##### 37] FrequencyBodyGyroscope.mean.Z 103 | 104 | ##### 38] FrequencyBodyGyroscope.meanFreq.X 105 | 106 | ##### 39] FrequencyBodyGyroscope.meanFreq.Y 107 | 108 | ##### 40] FrequencyBodyGyroscope.meanFreq.Z 109 | 110 | ##### 41] FrequencyBodyAccelerometerMagnitude.mean 111 | 112 | ##### 42] FrequencyBodyAccelerometerMagnitude.meanFreq 113 | 114 | ##### 43] FrequencyBodyAccelerometerJerkMagnitude.mean 115 | 116 | ##### 44] FrequencyBodyAccelerometerJerkMagnitude.meanFreq 117 | 118 | ##### 45] FrequencyBodyGyroscopeMagnitude.mean 119 | 120 | ##### 46] FrequencyBodyGyroscopeMagnitude.meanFreq 121 | 122 | ##### 47] FrequencyBodyGyroscopeJerkMagnitude.mean 123 | 124 | ##### 48] FrequencyBodyGyroscopeJerkMagnitude.meanFreq 125 | 126 | ##### 49] Angle.TimeBodyAccelerometerMean.Gravity 127 | 128 | ##### 50] Angle.TimeBodyAccelerometerJerkMean.GravityMean 129 | 130 | ##### 51] Angle.TimeBodyGyroscopeMean.GravityMean 131 | 132 | ##### 52] Angle.TimeBodyGyroscopeJerkMean.GravityMean 133 | 134 | ##### 53] Angle.X.GravityMean 135 | 136 | ##### 54] Angle.Y.GravityMean 137 | 138 | ##### 55] Angle.Z.GravityMean 139 | 140 | ##### 56] TimeBodyAccelerometer.std.X 141 | 142 | ##### 57] TimeBodyAccelerometer.std.Y 143 | 144 | ##### 58] TimeBodyAccelerometer.std.Z 145 | 146 | ##### 59] TimeGravityAccelerometer.std.X 147 | 148 | ##### 60] TimeGravityAccelerometer.std.Y 149 | 150 | ##### 61] TimeGravityAccelerometer.std.Z 151 | 152 | ##### 62] TimeBodyAccelerometerJerk.std.X 153 | 154 | ##### 63] TimeBodyAccelerometerJerk.std.Y 155 | 156 | ##### 64] TimeBodyAccelerometerJerk.std.Z 157 | 158 | ##### 65] TimeBodyGyroscope.std.X 159 | 160 | ##### 66] TimeBodyGyroscope.std.Y 161 | 162 | ##### 67] TimeBodyGyroscope.std.Z 163 | 164 | ##### 68] TimeBodyGyroscopeJerk.std.X 165 | 166 | ##### 69] TimeBodyGyroscopeJerk.std.Y 167 | 168 | ##### 70] TimeBodyGyroscopeJerk.std.Z 169 | 170 | ##### 71] TimeBodyAccelerometerMagnitude.std 171 | 172 | ##### 72] TimeGravityAccelerometerMagnitude.std 173 | 174 | ##### 73] TimeBodyAccelerometerJerkMagnitude.std 175 | 176 | ##### 74] TimeBodyGyroscopeMagnitude.std 177 | 178 | ##### 75] TimeBodyGyroscopeJerkMagnitude.std 179 | 180 | ##### 76] FrequencyBodyAccelerometer.std.X 181 | 182 | ##### 77] FrequencyBodyAccelerometer.std.Y 183 | 184 | ##### 78] FrequencyBodyAccelerometer.std.Z 185 | 186 | ##### 79] FrequencyBodyAccelerometerJerk.std.X 187 | 188 | ##### 80] FrequencyBodyAccelerometerJerk.std.Y 189 | 190 | ##### 81] FrequencyBodyAccelerometerJerk.std.Z 191 | 192 | ##### 82] FrequencyBodyGyroscope.std.X 193 | 194 | ##### 83] FrequencyBodyGyroscope.std.Y 195 | 196 | ##### 84] FrequencyBodyGyroscope.std.Z 197 | 198 | ##### 85] FrequencyBodyAccelerometerMagnitude.std 199 | 200 | ##### 86] FrequencyBodyAccelerometerJerkMagnitude.std 201 | 202 | ##### 87] FrequencyBodyGyroscopeMagnitude.std 203 | 204 | ##### 88] FrequencyBodyGyroscopeJerkMagnitude.std 205 | --------------------------------------------------------------------------------