R:pems.utils Operators

Karl Ropkins

2024-12-23

Background

The R package pems.utils uses two main data types: pems.elements, data-series with assigned units, and pems, data sets of simultaneously logged pems.elements.

This document provides an overview of their handling using the operators $, […] and [[…]].

For a quick and more general introduction to pems.utils, see [>pems.utils introduction] or to return to the [>website index].

If you have any suggestions how to make either pems.utils or this document better or you have any problems using either, please let me know [>email me].

Unless you have setup R to automatically load pems.utils, you will need to load it at the start of each R session e.g. using library(pems.utils).

For those already familiar with R object classes, pems behave in a similar fashion to other (row and column) data set objects like data.frames and data.tables, and pems.elements behave in a similar fashion to other vector objects like numerics, characters and factors, although there are some small differences documented in the introduction and below linked to object structure and PEMS data handling.

The $ Operator

The $ operator extracts pems.elements from pems in the form:

pems$pems.element 

So, for example, to get the velocity pems.element from the pems.1 example pems data set in pems.utils:

pems.1$velocity
## pems.element [n=1000]
##    [1]  0.1  0.1  0.3  0.3  0.2  0.4  0.3  0.7  0.1  0.2  0.2  0.2  0.1  0.2
##   [15]  0.1  0.3  0.2  0.3  0.2  0.4  0.1  0.2  0.2  0.1  0.1  0.1  0.3  0.2
##   [29]  0.1  0.1  0.2  0.3  0.4  0.1  0.3  0.4  0.1  0.4  0.2  0.2  0.1  0.2
##    ... not showing: 69 rows
##    ... <numeric> velocity [km/h]

pems.elements (or other vectors) can also be added to a pems in the form:

pems$name <- pems.element

Although the pems $ operator is very similar to the data.frame $ operator that many R users are likely to be more familiar with, one important difference should be noted:

When a pems.element (or other vector) is added to a pems if the dimension of the two do not match exactly, the two will be bound row-by-row and the shorter will be NA-padded without any attempt to wrap data.

So, for example:

pems.1$new <- 1 #add a pems.element called new to pems.1
pems.1$new      #'1' followed by row a 'NA's NOT row of '1's
## pems.element [n=1000]
##    [1]  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##   [25] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##   [49] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
##    ... not showing: 39 rows
##    ... <numeric> new

If you want data.frame-like output, you need to specifically request it, for example:

pems.1$new <- rep(1, nrow(pems.1))
pems.1$new    
## pems.element [n=1000]
##    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##   [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##    ... not showing: 25 rows
##    ... <numeric> new

Why do we do this?

PEMS and other mobile data sources are typically contiguous data-series, each record following the one before in time order. Also, if two datasets are logged separately, they are rarely exactly the same length. So, if we want to add a vector to an existing dataset (or align two datasets), we rarely want R to try to wrap the shorter data and repeat any records alongside later records in longer datasets.

The […] Operator

The […] operator gets and sets data by indices within pems.elements in a similar fashion to other vectors:

pems.element[i]       #gets value at position i from pems.element
pems.element[i:j]     #gets values from i to j from pems.element
pems.element[i] <- k  #set value at i in pems.element to k 
#etc

The […] operator gets and sets data by indices within pems in a similar fashion to data.frames:

pems[i]               #gets column i of pems
pems["name"]          #gets column called name of pems
pems[i,j]             #gets value in row i and column j of pems
pems[i,j] <- k        #set value in row i and column j of pems to k 
#etc

For example, to get the conc.co and local.time columns of pems.1 (in that order):

pems.1[c("conc.co", "local.time")]      #by name
## pems (1000x2)
##      conc.co  local.time
##       [vol%]         [s]
##   1        0           0
##   2        0           1
##   3        0           2
##   4        0           3
##   5        0           4
##   6        0           5
##  ... not showing: 994 rows
pems.1[c(3,2)]                          #by column number
## pems (1000x2)
##      conc.co  local.time
##       [vol%]         [s]
##   1        0           0
##   2        0           1
##   3        0           2
##   4        0           3
##   5        0           4
##   6        0           5
##  ... not showing: 994 rows

Again, handling is slightly different to that with a data.frame.

Calling part of pems object that does not exist generates an error rather than a NULL and, similarly, putting something into a pems somewhere it does not fit exactly generates an error.

For example, the call pems.1[1:4, "conc.hc"] <- 88 would generate an error because the target is four rows by one column (so four values), while the insert is a single value (here eighty eight).

The additional argument force can be used to specify how mismatching targets and inserts should be handling:

pems.1[1:4, "conc.hc", force="na.pad.insert"] <- 88
pems.1
## pems (1000x26)
##               time.stamp  local.time  conc.co  conc.co2  conc.hc  conc.nox
##        [Y-M-D H:M:S GMT]         [s]   [vol%]    [vol%]  [ppmC6]     [ppm]
##   1  2005-09-08 11:46:07           0        0         0       88    20.447
##   2  2005-09-08 11:46:08           1        0         0       NA    21.973
##   3  2005-09-08 11:46:09           2        0         0       NA    20.752
##   4  2005-09-08 11:46:10           3        0         0       NA    22.583
##   5  2005-09-08 11:46:11           4        0         0        0    20.142
##   6  2005-09-08 11:46:12           5        0         0        0    20.142
##  ... not showing: 994 rows; 20 cols (elements) 
##  ... other cols: afr; exh.flow.rate[L/min]; exh.temp[degC]; exh.press[kPa];
##       amb.temp[degC]; amb.press[kPa]; amb.humidity[%]; velocity[km/h];
##       revolution[rpm]; option.1[V]; option2[V]; option.3[V];
##       latitude[d.degLat]; longitude[d.degLon]; altitude[m];
##       gps.velocity[km/h]; satellite; n.s; w.e; new[NA]
pems.1[1:4, "conc.hc", force="fill.insert"] <- 88
pems.1
## pems (1000x26)
##               time.stamp  local.time  conc.co  conc.co2  conc.hc  conc.nox
##        [Y-M-D H:M:S GMT]         [s]   [vol%]    [vol%]  [ppmC6]     [ppm]
##   1  2005-09-08 11:46:07           0        0         0       88    20.447
##   2  2005-09-08 11:46:08           1        0         0       88    21.973
##   3  2005-09-08 11:46:09           2        0         0       88    20.752
##   4  2005-09-08 11:46:10           3        0         0       88    22.583
##   5  2005-09-08 11:46:11           4        0         0        0    20.142
##   6  2005-09-08 11:46:12           5        0         0        0    20.142
##  ... not showing: 994 rows; 20 cols (elements) 
##  ... other cols: afr; exh.flow.rate[L/min]; exh.temp[degC]; exh.press[kPa];
##       amb.temp[degC]; amb.press[kPa]; amb.humidity[%]; velocity[km/h];
##       revolution[rpm]; option.1[V]; option2[V]; option.3[V];
##       latitude[d.degLat]; longitude[d.degLon]; altitude[m];
##       gps.velocity[km/h]; satellite; n.s; w.e; new[NA]

Why do we do this?

NULLs do not always stop code and wrap-fits are not always noticed or wanted, and any of these can lead to later problems that are then less easily resolved. So, the preference is that anyone requesting data that does not exist or putting it somewhere it does not fit exactly, should specify correct handling at the time.

The [[…]] Operator

The [[…]] operator accesses the pems structure and any meta data associated with the pems.

Structural components include:

Meta data entries are additional information that has been added to the pems file, for example, instrumentation, driver, vehicle or routine details.

Meta data components are not strictly defined, so users are free to use or not use this option at their own discretion.

Both types of pems components can be accessed in the form:

pems[[n]]          #get pems component by indices 
pems[["name"]]     #get pems component by name
#etc

So, for example, exactly the full set of unit assignments of pems.elements in pems.1:

pems.1[["units"]]
##        time.stamp local.time conc.co conc.co2 conc.hc conc.nox afr
## 1 Y-M-D H:M:S GMT          s    vol%     vol%   ppmC6      ppm    
##   exh.flow.rate exh.temp exh.press amb.temp amb.press amb.humidity velocity
## 1         L/min     degC       kPa     degC       kPa            %     km/h
##   revolution option.1 option2 option.3 latitude longitude altitude gps.velocity
## 1        rpm        V       V        V d.degLat  d.degLon        m         km/h
##   satellite n.s w.e new
## 1                    NA

And new components can be added or accessed:

pems.1[["new.meta"]] <- "new.record"
pems.1[["new.meta"]]
## [1] "new.record"

Added meta data can also be extracted using:

pems.1[["extra.pems.tags"]]
## $pems
## [1] "Horiba OBS"
## 
## $new.meta
## [1] "new.record"

Although the [[…]] operation provides direct access to the inner workings of pems objects and is a useful shortcut for those developing new code, other options are arguably more convenient and therefore recommended for routine practices, for example:

See also [>pems.utils generics] or R help documentation (?pems.generics and ?pems.element.generics) for more on as.data.frame and other generic pems functions.

See also [>pems.utils units] or R help documentation (?pems.units) for more on pems units handling.

Other Operators

There are numerous operators in R, and pems and pems.element versions are only written for those where a need was identified.

If you think any other operators would be useful, please let me know. [>email me]

Likewise, if you have any suggestions how to make either pems.utils or this document better or you have any problems using either, please let me know. [>email me].

Return to the [>website index] or [>introduction].