Skip to content

Instantly share code, notes, and snippets.

@MrFlick
Last active August 29, 2015 13:58
Show Gist options
  • Save MrFlick/10202714 to your computer and use it in GitHub Desktop.
Save MrFlick/10202714 to your computer and use it in GitHub Desktop.
table.shingle.R: allows for creating tables with shingles (lattice) and factors
table.shingle<-function(..., as.data.frame=F) {
dots<-list(...)
if(is.logical(as.data.frame) && as.data.frame) {
as.data.frame <- list(collapse=T)
}
stopifnot(all(sapply(dots, class) %in% c("shingle","factor")))
stopifnot(length(unique(sapply(dots, length)))==1)
if(is.list(as.data.frame) && !as.data.frame$collapse) {
for(i in which(sapply(dots, class)=="shingle")) {
pts<-unique(sort(unlist(levels(dots[[i]]))))
attr(dots[[i]], "origlevels") <- attr(dots[[i]], "levels")
attr(dots[[i]], "levels") <- unname(lapply(as.list(data.frame(t(embed(pts,2)))), sort))
class(attr(dots[[i]], "levels"))<-"shingleLevel"
}
}
dims<-sapply(dots, nlevels)
varnames<-tail(lapply(match.call(), deparse),-1)
varnames["as.data.frame"]<-NULL
res<-array(0, dims, `names<-`(lapply(dots, function(x) as.vector(sapply(levels(x), paste, collapse=":"))), varnames))
isinlevel<-function(z, ints) {
if(is.factor(z)) {
as.numeric(z)
} else {
which(sapply(ints, function(y) {y[1]<=z & z<=y[2]}))
}
}
for(i in seq_along(dots[[1]])) {
dx<-vector("list", length(dots))
for(h in seq_along(dots)) {
dx[[h]]<-isinlevel(dots[[h]][i], levels(dots[[h]]))
}
idx<-as.matrix(do.call("expand.grid", dx))
res[idx]<-res[idx]+1
}
res<-as.table(res)
if(is.list(as.data.frame)) {
#attempts to preserve the shingles
shingleunoverlap<-function(x) {
if (is.null(attr(x, "origlevels"))) {
n<-nlevels(x)*2
ints<-do.call(rbind, levels(x))
starts<-c(1, seq(n/2+1, n-1))
ends<-c(seq(2, n/2),n)
if(any(ints[ends]<=ints[starts])) {
stop("each shingle level must have a range of values unique to that level to make a data.frame")
}
return((ints[starts] + ints[ends])/2);
} else {
pts<-sapply(attr(x, "levels"), mean)
return(pts)
}
}
res<-base::as.data.frame(res)
for(i in seq_along(dots)) {
if(class(dots[[i]])=="shingle") {
nvals<-shingleunoverlap(dots[[i]])[as.numeric(res[[i]])]
res[[i]]<-nvals
if(is.null(attr(dots[[i]], "origlevels") )) {
attr(res[[i]], "levels")<-attr(dots[[i]], "levels")
attr(res[[i]], "class")<-"shingle"
} else {
attr(res[[i]], "levels")<-attr(dots[[i]], "origlevels")
attr(res[[i]], "class")<-"shingle"
attr(dots[[i]], "origlevels")<-NULL
}
}
}
}
res;
}
require(lattice)
x<-equal.count(rnorm(50))
y<-factor(sample(c("M","F"), 50, replace=T))
z<-equal.count(rnorm(50))
#Creates counts for each level combination. Since shingles
#may overlap the sum of Freq will be greater that the total
#length of input
a<-table.shingle(x,y,z);a
#The as.data.frame=T parameter will preseve the singles
#where as.data.frame(table.shingle()) will convert them to factors
#useful for passing onto barchart() or other functions
b<-table.shingle(x,y,z, as.data.frame=T);b
#creates values corresponding to mutually exclusive intervals
#so the Freq sums to the length of the input
#will work if as.data.frame throws an error about intervals
#not having a unique range
d<-table.shingle(x,y,z, as.data.frame=list(collapse=F));d
@mlt
Copy link

mlt commented Apr 9, 2014

Line 7 fails for me with

 Error in array(0, dims, `names<-`(lapply(dots, function(x) as.vector(sapply(levels(x),  : 
  'names' attribute [4] must be the same length as the vector [3] 

if actual as.data.frame is supplied. It should be removed from varnames if present perhaps with varnames['as.data.frame'] <- NULL.
But then it fails later (at barchart?) with

Error in if (any(x > 0)) range(tapply(x[x > 0], y[x > 0, drop = TRUE],  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(x, 0) : > not meaningful for factors

@mlt
Copy link

mlt commented Apr 9, 2014

I forked it, and made changes. However produced barchart looks odd as it has something stacked, e.g. in 4th column
barchart plot

@MrFlick
Copy link
Author

MrFlick commented Apr 9, 2014

Thanks for the comments mlt. I updated the code. The "stacked-ness" was caused when the levels of a shingle had no values unique to each range. There simply is no way to convert that to a unique value per level so right now i check for that condition and throw an error if present. If that happens, perhaps changing the number of intervals for a given shingle will make more friendly levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment