Expand delimited columns in R

A postdoctoral researcher asked me the other day to help him expand a vector of comma delimited values so he could do computations in R with it. I wrote an R function to solve the problem. Here is the before and after:

> data
  Name      Score1   Score2
1 Bill 1,3,4,3,6,9 F1,F3,F2
2  Bob       3,2,3 F2,F2,F4
3  Sam       2,5,3 F5,F2,F4
> expand.delimited(data)
   Name Score1
1  Bill      1
2  Bill      3
3  Bill      4
4  Bill      3
5  Bill      6
6  Bill      9
7   Bob      3
8   Bob      2
9   Bob      3
10  Sam      2
11  Sam      5
12  Sam      3
# Description
# Accepts a data.frame where col1 represents a factor and col2 represents
# comma or other delimited values to be expanded according to col1.
# Returns a data.frame.

# Usage
# expand.delimited(x, ...)

# Default
# expand.delimited(x, col1=1, col2=2, sep=",")

# Arguments
# x     A data.frame
# col1  Column in data.frame to act as factor
# col2  Column in data.frame that is delimited and will be expanded
# sep   Delimiter

#Download data
#Read in data 
data<-read.table("expand_delimited.txt",header=T)

#Function to expand data
expand.delimited <- function(x, col1=1, col2=2, sep=",") {
  rnum <- 1
  expand_row <- function(y) {
    factr <- y[col1]
    strng <- toString(y[col2])
    expand <- strsplit(strng, sep)[[1]]
    num <- length(expand)
    factor <- rep(factr,num)
    return(as.data.frame(cbind(factor,expand),
          row.names=seq(rnum:(rnum+num)-1)))
    rnum <- (rnum+num)-1
  }
  expanded <- apply(x,1,expand_row)
  df <- do.call("rbind", expanded)
  names(df) <- c(names(x)[col1],names(x)[col2])
  return(df)
}

# Example
expand.delimited(data)
Posted in Data and tagged , , , . Bookmark the permalink. RSS feed for this post. Leave a trackback.

4 Responses to Expand delimited columns in R

  1. Carl Witthoft says:

    Just a thought: take the output of a dataframe:
    scoretable <- sapply(data$Score1, function(x) as.numeric(unlist(strsplit(x,','))))
    scoretable$names<-data$Name

    That should :-) give the same output.

    • Eldon says:

      Thanks for the thought. The only part of your code I needed to change was to make sure x was a character: scoretable <- sapply(data$Score1, function(x) as.numeric(unlist(strsplit(as.character(x),’,')))).

  2. seminym says:

    I think the “melt” function of hadleys “reshape” package does already provide exactly this functionality. http://www.statmethods.net/management/reshape.html

    • Eldon says:

      Thanks for the comment. From what I can tell after reading more about the melt function, in order to use it I would first need to parse the Score1 column and split it into columns. I think this would be problematic because the number of values in Score1 vary and are not in a meaningful order. I’m no expert on reshape2, so there may be a way to piece together multiple functions to achieve my goal, but this function gets the job done.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Swedish Greys - a WordPress theme from Nordic Themepark.