Grouped Median Function for R – gmedian (), mQ()

The formula for the median of grouped frequency distributions has a nice feature for analysing ordinal data – even if the data isn’t really grouped. This forumla computes the median and adds a decimal fraction value to it. The decimal fraction value is an estimate of where the median would be, if we were using a more granular scale. Especially social science data is often not available in a metric format. So using this formula is really helpful for making i.e. more differentiated comparisons between subsets of social data.

In SPSS you simply check the ‘Grouped Data’ box and you are good to go. As I migrated to R I couldn’t find an easy way to compute the grouped median on ordinal data, actually I couldn’t find anyway to do it. So I had to come up with a little function to compute the grouped median for ordinal data, which is not actually grouped. I did that some time ago and since then I have several times debugged that function, because it failed if the data was matching unforeseen criteria. Now it works correctly under most circumstances. Problems occur if there are categories with a frequency of zero. gmedian() can also be included in other functions, as tapply(), for comparing medians of subsets.

See this page of the University of Alberta to find out more about the idea of applying the median on a grouped frequency distribution. Deriving from that concept you can calculate the Q1 and Q3 Quartiles and henceforth the interquartile range. The following code, once executed in R, allows you to use the the functions gmedian() and mQ(). For now the code is not beautiful, but working. If I find the time I will update.

This Function computes the median of a grouped frequency distribution.

gmedian <- function(grpVar, percentile = 0.5) {
  if(is.factor(grpVar)) grpVar <- as.numeric(grpVar)
  scale_step <- as.numeric(names(table(grpVar))[2]) - as.numeric(names(table(grpVar))[1])
  grp_detect <- ifelse(scale_step==1, 0.5 , 0)
  algeb_sign <- 1
  egs <- quantile(grpVar, percentile, na.rm=T)
  egs_pointer_abs <- length(table(subset(grpVar, grpVar <= egs)))
  cumU <- sum(table(grpVar)[1 : egs_pointer_abs-1])
  intpol <- (length(na.omit(grpVar)) / 2 - cumU) / table(grpVar)[egs_pointer_abs] * scale_step
  ifelse(intpol <= 0, algeb_sign <- -1, algeb_sign <- 1)
  val <- as.numeric(names(table(grpVar))[egs_pointer_abs]) - grp_detect + (intpol * algeb_sign)
  round(val, digits = 3)
}

mQ <- function(grpVar) {
  (gmedian(grpVar, 0.75)-gmedian(grpVar, 0.25)) / 2
}