I'm parsing a file, converting the date in a Date type and adding 2 columns based on the date.
data <- read.csv(csvFile, na.strings = "")
## Convert date column as Date type
data[["date"]] <- as.POSIXct(data[["date"]])
## Add a time and day column
data[["time"]] <- strftime(data[["date"]], format="%H:%M")
data[["day"]] <- strftime(data[["date"]], format="%Y-%m-%d")
print(head(data))
And this is what I get. Note That for a sample ~2000 lines, it's blazing fast and seems to work. For my complete file (~1M lines) it's quite slow and it doesn't to work.
> source('~/Development/git/BasisDataAnalysis/basisData.R')
> basisData("sample.csv")
date calories gsr heart.rate skin.temp steps time day
1 2013-07-28 00:00:00 1.206 3.40587 57 93.2000 0 00:00 2013-07-28
2 2013-07-28 00:01:00 1.200 3.61320 53 93.2000 0 00:01 2013-07-28
3 2013-07-28 00:02:00 1.200 3.60855 55 93.2000 0 00:02 2013-07-28
4 2013-07-28 00:03:00 1.381 4.38401 57 93.2375 0 00:03 2013-07-28
5 2013-07-28 00:04:00 1.264 4.07134 55 93.2000 0 00:04 2013-07-28
6 2013-07-28 00:05:00 1.200 3.29479 55 93.0125 0 00:05 2013-07-28
> basisData("bodymetrics.csv")
date calories gsr heart.rate skin.temp steps time day
1 2013-07-28 1.206 3.40587 57 93.2000 0 00:00 2013-07-28
2 2013-07-28 1.200 3.61320 53 93.2000 0 00:00 2013-07-28
3 2013-07-28 1.200 3.60855 55 93.2000 0 00:00 2013-07-28
4 2013-07-28 1.381 4.38401 57 93.2375 0 00:00 2013-07-28
5 2013-07-28 1.264 4.07134 55 93.2000 0 00:00 2013-07-28
6 2013-07-28 1.200 3.29479 55 93.0125 0 00:00 2013-07-28
@alung suggested me to use dplyr:
library(dplyr)
data <- data %>% mutate(date = as.POSIXct(date), day = strftime(date, format="%Y-%m-%d"), time = strftime(date, format="%H:%M"))
But it still fail:
> basisData("sample.csv")
date calories gsr heart.rate skin.temp steps day time
1 2013-07-28 00:00:00 1.206 3.40587 57 93.2000 0 2013-07-28 00:00
2 2013-07-28 00:01:00 1.200 3.61320 53 93.2000 0 2013-07-28 00:01
3 2013-07-28 00:02:00 1.200 3.60855 55 93.2000 0 2013-07-28 00:02
4 2013-07-28 00:03:00 1.381 4.38401 57 93.2375 0 2013-07-28 00:03
5 2013-07-28 00:04:00 1.264 4.07134 55 93.2000 0 2013-07-28 00:04
6 2013-07-28 00:05:00 1.200 3.29479 55 93.0125 0 2013-07-28 00:05
> basisData("bodymetrics.csv")
date calories gsr heart.rate skin.temp steps day time
1 2013-07-28 1.206 3.40587 57 93.2000 0 BST 00:00
2 2013-07-28 1.200 3.61320 53 93.2000 0 BST 00:00
3 2013-07-28 1.200 3.60855 55 93.2000 0 BST 00:00
4 2013-07-28 1.381 4.38401 57 93.2375 0 BST 00:00
5 2013-07-28 1.264 4.07134 55 93.2000 0 BST 00:00
6 2013-07-28 1.200 3.29479 55 93.0125 0 BST 00:00
This was not working:
data <- data %>% mutate(fulldate = as.POSIXct(date))
But this seems to work well:
data <- data %>% mutate(fulldate = as.POSIXct(date, "%Y-%m-%d %H:%M", tz="UTC"))