R:pems.utils

The R package pems.utils was developed as part of work on a series of vehicle emissions measurement studies, and originally intended for use with datasets collected using Portable Emission Measurement Systems or PEMS.

This note assumes you already have R. If not and anything here is of interest, R is open-source and free-to-distribute, and you can get it from [>the R Project].

For those that are unfamiliar with them, PEMS are on-board systems that measure the emissions of in-use vehicles. PEMS data collection is one of a number of methods developed to provide real-world alternatives to laboratory-based emissions measurement methods traditionally used in regulatory practices. [>for further information]

The rationale behind pems.utils was two-fold:

To streamline mundane, laborious and often error-prone elements of routine data handling; and,
To provide mobile data users with higher-level tools for the analysis of in-journey data.

Although originally intended for use with PEMS data, pems.utils has also been used with datasets collected with other mobile systems. So, if you are working with time-series collected by any moving objects, pems.utils may also be of interest.

If you use pems.utils, I hope it makes your work easier, better or maybe even both.

If you have any suggestions how to make pems.utils better or you have any problems using it, please let me know. [>email me].

Return to [>the website index].

Getting pems.utils

Installing pems.utils from R:

#Release version from main archive CRAN:
install.packages("pems.utils") 
#Developer's version from r-forge:
install.packages("pems.utils", repos="http://R-Forge.R-project.org")

Loading pems.utils, once it is installed:

library(pems.utils)

PEMS Data Structures

pems.utils uses two data types, pems and pems.elements, to manage data structure like units and meta information. pems.utils includes pems.1 as an example pems object:

pems.1

## pems (1000x25)
##               time.stamp  local.time  conc.co  conc.co2  conc.hc  conc.nox
##        [Y-M-D H:M:S GMT]         [s]   [vol%]    [vol%]  [ppmC6]     [ppm]
##   1  2005-09-08 11:46:07           0        0         0        0    20.447
##   2  2005-09-08 11:46:08           1        0         0        0    21.973
##   3  2005-09-08 11:46:09           2        0         0        0    20.752
##   4  2005-09-08 11:46:10           3        0         0        0    22.583
##   5  2005-09-08 11:46:11           4        0         0        0    20.142
##   6  2005-09-08 11:46:12           5        0         0        0    20.142
##  ... not showing: 994 rows; 19 cols (elements) 
##  ... other cols: afr; exh.flow.rate[L/min]; exh.temp[degC]; exh.press[kPa];
##       amb.temp[degC]; amb.press[kPa]; amb.humidity[%]; velocity[km/h];
##       revolution[rpm]; option.1[V]; option2[V]; option.3[V];
##       latitude[d.degLat]; longitude[d.degLon]; altitude[m];
##       gps.velocity[km/h]; satellite; n.s; w.e

Like a data.frame, the pems object contains named data-series as columns. However, unlike a data.frame:

It associates units with each column. For example, the first column, time.stamp has units of Y-M-D H:M:S GMT, reported in the second row of the print output.
By default, it limits displayed data to a screen-friendly range and a summary report of other columns and rows. This is a nice idea from tibbles for the tidy handling of larger datasets [>for more about tibbles]
All the columns in a pems object are pems.elements.

pems.elements can be extract from pems using $ and [.] operators, for example:

pems.1$time.stamp

## pems.element [n=1000]
##    [1] "2005-09-08 11:46:07 GMT" "2005-09-08 11:46:08 GMT"
##    [3] "2005-09-08 11:46:09 GMT" "2005-09-08 11:46:10 GMT"
##    [5] "2005-09-08 11:46:11 GMT" "2005-09-08 11:46:12 GMT"
##    ... not showing: 497 rows
##    ... <POSIXct,POSIXt> time.stamp [Y-M-D H:M:S GMT]

pems.elements act like other vectors but, like pems, track units and limits the default print range to prevent long data-series filling the screen.

See [>pems.utils operators] for more about operators.

See also print in [>pems.utils generics] or R help documentation (?pems.generics) for more about print report management.

Importing PEMS Data

You can make pems and pems.elements objects using pems() and pems.element(), but you are more likely to import PEMS data from a file previously logged by a PEMS or other mobile measurement system.

A family of import2PEMS() functions are included in pems.utils for such work. These are typically used in the form:

my.pems <- import2PEMS()

See [>pems.utils input/output] or R help documentation (?import2PEMS) for further details.

PEMS Data Handling

pems and pems.element objects can be handled like data.frames and vectors:

#first three rows and two columns of pems.1 
pems.1[1:3, 1:2]

## pems (3x2)
##               time.stamp  local.time
##        [Y-M-D H:M:S GMT]         [s]
##   1  2005-09-08 11:46:07           0
##   2  2005-09-08 11:46:08           1
##   3  2005-09-08 11:46:09           2

#15th to 25th velocity records in pems.1 
pems.1[15:25, "velocity"]

## pems.element [n=11]
##  [1] 0.1 0.3 0.2 0.3 0.2 0.4 0.1 0.2 0.2 0.1 0.1
##  ... <numeric> velocity [km/h]

They are however a little stricter, for example:

Calling part of pems object that does not exist generates an error rather than a NULL.
An added element does not wrap, unless specifically told to.
- So, pems.1$new <- 1 would add a pems.element called new to pems.1 which contained a ‘1’ followed by row a ’NA’s rather than a row of ’1’s.

Why did we do this? PEMS and other mobile data sources are typically contiguous data-series, each record following the one before in time order. If we align two datasets we rarely want data logged in shorter datasets wrapped and repeated alongside more than one row in longer datasets.

You can modify and calculate new data-series manually using R, but functions are also included in pems.utils for many common mobile data conversions and parameter calculations, for example:

pems.1$velocity

## pems.element [n=1000]
##    [1]  0.1  0.1  0.3  0.3  0.2  0.4  0.3  0.7  0.1  0.2  0.2  0.2  0.1  0.2
##   [15]  0.1  0.3  0.2  0.3  0.2  0.4  0.1  0.2  0.2  0.1  0.1  0.1  0.3  0.2
##   [29]  0.1  0.1  0.2  0.3  0.4  0.1  0.3  0.4  0.1  0.4  0.2  0.2  0.1  0.2
##    ... not showing: 69 rows
##    ... <numeric> velocity [km/h]

#convert pems.1$velocity units to miles per hour
convertUnits(pems.1$velocity, to="mi/h")

## pems.element [n=1000]
##    [1]  0.06213712  0.06213712  0.18641136  0.18641136  0.12427424  0.24854848
##    [7]  0.18641136  0.43495983  0.06213712  0.12427424  0.12427424  0.12427424
##   [13]  0.06213712  0.12427424  0.06213712  0.18641136  0.12427424  0.18641136
##    ... not showing: 164 rows
##    ... <numeric> pems.1$velocity [mi/h]

#calculate acceleration and add to pems.1
pems.1$accel <- calcAccel(velocity, local.time, pems.1)
pems.1$accel

## pems.element [n=1000]
##    [1]          NA  0.00000000  0.05555556  0.00000000 -0.02777778  0.05555556
##    [7] -0.02777778  0.11111111 -0.16666667  0.02777778  0.00000000  0.00000000
##   [13] -0.02777778  0.02777778 -0.02777778  0.05555556 -0.02777778  0.02777778
##    ... not showing: 164 rows
##    ... <numeric> accel [m/s/s]

See [>pems.utils operators] and [>pems.utils generics] or R help documentation for more about general pems handling.

See [>pems.utils units] or R help documentation (?pems.units) for more about pems units handling.

PEMS Data Analysis

We can use such functions to quickly process and analysis PEMS data, for example:

#calculate VSP and CO2 emissions and add both to pems.1
pems.1$vsp <- calcVSP(velocity, time = local.time, data = pems.1)
pems.1$em.co2 <- calcEm(conc.co2, data = pems.1)

At this stage these are measurements recorded in the same instant and comparing them suggests a small degree of association:

#initial instantaneous comparison
p <- pemsPlot(vsp, em.co2, data=pems.1)
loa::add.XYLOESSFit(p, col="red")

Using loa package function add.XYLOESSFit to add a LOESS curve to the plot.

However, there are time offsets between the engine action, the vehicle activity and the associated emissions leaving the vehicle exhaust. Aligning vsp and em.co2 reduces the offset:

#align vsp and em.co2
pems.2 <- cAlign(vsp~em.co2, pems.1, output=c("pems", "offset"))

## [1] "offset = 2"

In this case the alignment moved em.co2 two rows toward relative to vsp, and this provides a significantly stronger relationship.

#linear aligned VSP and CO2 emissions
p <- pemsPlot(vsp, em.co2, data=pems.2)
loa::add.XYLOESSFit(p, col="red")

Please Note: This is an approximation because the offset is not fixed.

In many cases such data is often binned and reported on this basis:

#VSP binning using the Frey et al (2002) 14 mode bins method. 
pems.2$ncsu.14 <- refVSPBin(vsp, data=pems.2)
summary(pems.2$ncsu.14)

## MODE01 MODE02 MODE03 MODE04 MODE05 MODE06 MODE07 MODE08 MODE09 MODE10 MODE11 
##    150     58    443    123     64     58     35     23     22     11      5 
## MODE12 MODE13 MODE14   NA's 
##      3      2      2      3

#frequency plot
VSPBinPlot(ncsu.14, data=pems.2, xlab="")

#emission averages plot
VSPBinPlot(ncsu.14, em.co2, data=pems.2,
           plot.type=2, xlab="")

Note: In this case there are relatively few measurements for bins 10 and above, and averages calculated for these are more uncertain.

See [>pems.utils plots] or R help documentation (?pems.plots) for more about plotting pems.

PEMS Data Batching and Reporting

One of the great things about R is that it allows you to batch process data, for example running the same code on different sections of a single dataset or different datasets.

The pems.utils function summaryReport generates a summary table of some common vehicle activities parameters:

summaryReport(velocity, local.time, data=pems.1)

##   distance.travelled.km time.total.s avg.speed.km.h avg.running.speed.km.h
## 1              6.186056         1000        22.2698               28.78538
##   time.idle.s time.idle.pc avg.accel.m.s.s time.accel.s time.accel.pc
## 1          40            4       0.7921279          271          27.1
##   avg.decel.m.s.s time.decel.s time.decel.pc
## 1      -0.9039449          238          23.8

If we cut that data in several subsections, we can analysis them separately and compare results:

pems.1$ref <- refRow(pems.1, n=4, labels=1:4)
sapply(1:4, function(x){ 
  summaryReport(velocity, local.time, data=pems.1[pems.1$ref==x,])
  })

##                        [,1]       [,2]       [,3]      [,4]      
## distance.travelled.km  1.281306   1.751556   1.674111  1.479083  
## time.total.s           251        250        250       249       
## avg.speed.km.h         18.37729   25.2224    24.1072   21.38434  
## avg.running.speed.km.h 22.37136   31.82727   32.03138  29.38619  
## time.idle.s            3          14         13        10        
## time.idle.pc           1.195219   5.6        5.2       4.016064  
## avg.accel.m.s.s        0.7419128  0.7680976  0.8423318 0.8117284 
## time.accel.s           79         66         71        54        
## time.accel.pc          31.4741    26.4       28.4      21.68675  
## avg.decel.m.s.s        -0.8488248 -0.7945205 -1.246612 -0.8753968
## time.decel.s           52         73         41        70        
## time.decel.pc          20.71713   29.2       16.4      28.11245

Or if we wanted to compare several datasets:

temp <- list(pems.1, pems.1, pems.1, pems.1)
sapply(temp, function(x){ 
  summaryReport(velocity, local.time, data=x)
  })

##                        [,1]       [,2]       [,3]       [,4]      
## distance.travelled.km  6.186056   6.186056   6.186056   6.186056  
## time.total.s           1000       1000       1000       1000      
## avg.speed.km.h         22.2698    22.2698    22.2698    22.2698   
## avg.running.speed.km.h 28.78538   28.78538   28.78538   28.78538  
## time.idle.s            40         40         40         40        
## time.idle.pc           4          4          4          4         
## avg.accel.m.s.s        0.7921279  0.7921279  0.7921279  0.7921279 
## time.accel.s           271        271        271        271       
## time.accel.pc          27.1       27.1       27.1       27.1      
## avg.decel.m.s.s        -0.9039449 -0.9039449 -0.9039449 -0.9039449
## time.decel.s           238        238        238        238       
## time.decel.pc          23.8       23.8       23.8       23.8

These examples are relatively crude and there are lots of different ways to do similar in R but hopefully this helps you to start to see how R and pems.utils could help you to analysis data more quickly and easily…

For further information on these and other pems.utils functions, see in-package documentation.

Any thoughts, suggestions or problems, please let me know [>email me].