The R package pems.utils uses two main data types: pems.elements, data-series with assigned units, and pems, data sets of simultaneously logged pems.elements.
This document provides an overview of their handling using the operators $, […] and [[…]].
For a quick and more general introduction to pems.utils, see [>pems.utils introduction] or to return to the [>website index].
If you have any suggestions how to make either pems.utils or this document better or you have any problems using either, please let me know [>email me].
Unless you have setup R to automatically load pems.utils, you will need to load it at the start of each R session e.g. using
library(pems.utils)
.
For those already familiar with R object classes, pems behave in a similar fashion to other (row and column) data set objects like data.frames and data.tables, and pems.elements behave in a similar fashion to other vector objects like numerics, characters and factors, although there are some small differences documented in the introduction and below linked to object structure and PEMS data handling.
The $ operator extracts pems.elements from pems in the form:
So, for example, to get the velocity
pems.element from the pems.1
example
pems data set in pems.utils:
## pems.element [n=1000]
## [1] 0.1 0.1 0.3 0.3 0.2 0.4 0.3 0.7 0.1 0.2 0.2 0.2 0.1 0.2
## [15] 0.1 0.3 0.2 0.3 0.2 0.4 0.1 0.2 0.2 0.1 0.1 0.1 0.3 0.2
## [29] 0.1 0.1 0.2 0.3 0.4 0.1 0.3 0.4 0.1 0.4 0.2 0.2 0.1 0.2
## ... not showing: 69 rows
## ... <numeric> velocity [km/h]
pems.elements (or other vectors) can also be added to a pems in the form:
Although the pems $ operator is very similar to the data.frame $ operator that many R users are likely to be more familiar with, one important difference should be noted:
When a pems.element (or other vector) is added to a pems if the dimension of the two do not match exactly, the two will be bound row-by-row and the shorter will be NA-padded without any attempt to wrap data.
So, for example:
pems.1$new <- 1 #add a pems.element called new to pems.1
pems.1$new #'1' followed by row a 'NA's NOT row of '1's
## pems.element [n=1000]
## [1] 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [25] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [49] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## ... not showing: 39 rows
## ... <numeric> new
If you want data.frame-like output, you need to specifically request it, for example:
## pems.element [n=1000]
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## ... not showing: 25 rows
## ... <numeric> new
Why do we do this?
PEMS and other mobile data sources are typically contiguous data-series, each record following the one before in time order. Also, if two datasets are logged separately, they are rarely exactly the same length. So, if we want to add a vector to an existing dataset (or align two datasets), we rarely want R to try to wrap the shorter data and repeat any records alongside later records in longer datasets.
The […] operator gets and sets data by indices within pems.elements in a similar fashion to other vectors:
pems.element[i] #gets value at position i from pems.element
pems.element[i:j] #gets values from i to j from pems.element
pems.element[i] <- k #set value at i in pems.element to k
#etc
The […] operator gets and sets data by indices within pems in a similar fashion to data.frames:
pems[i] #gets column i of pems
pems["name"] #gets column called name of pems
pems[i,j] #gets value in row i and column j of pems
pems[i,j] <- k #set value in row i and column j of pems to k
#etc
For example, to get the conc.co
and
local.time
columns of pems.1
(in that
order):
## pems (1000x2)
## conc.co local.time
## [vol%] [s]
## 1 0 0
## 2 0 1
## 3 0 2
## 4 0 3
## 5 0 4
## 6 0 5
## ... not showing: 994 rows
## pems (1000x2)
## conc.co local.time
## [vol%] [s]
## 1 0 0
## 2 0 1
## 3 0 2
## 4 0 3
## 5 0 4
## 6 0 5
## ... not showing: 994 rows
Again, handling is slightly different to that with a data.frame.
Calling part of pems object that does not exist generates an error rather than a
NULL
and, similarly, putting something into a pems somewhere it does not fit exactly generates an error.
For example, the call
pems.1[1:4, "conc.hc"] <- 88
would generate an error because the target is four rows by one column (so four values), while the insert is a single value (here eighty eight).
The additional argument force
can be used to specify how
mismatching targets and inserts should be handling:
## pems (1000x26)
## time.stamp local.time conc.co conc.co2 conc.hc conc.nox
## [Y-M-D H:M:S GMT] [s] [vol%] [vol%] [ppmC6] [ppm]
## 1 2005-09-08 11:46:07 0 0 0 88 20.447
## 2 2005-09-08 11:46:08 1 0 0 NA 21.973
## 3 2005-09-08 11:46:09 2 0 0 NA 20.752
## 4 2005-09-08 11:46:10 3 0 0 NA 22.583
## 5 2005-09-08 11:46:11 4 0 0 0 20.142
## 6 2005-09-08 11:46:12 5 0 0 0 20.142
## ... not showing: 994 rows; 20 cols (elements)
## ... other cols: afr; exh.flow.rate[L/min]; exh.temp[degC]; exh.press[kPa];
## amb.temp[degC]; amb.press[kPa]; amb.humidity[%]; velocity[km/h];
## revolution[rpm]; option.1[V]; option2[V]; option.3[V];
## latitude[d.degLat]; longitude[d.degLon]; altitude[m];
## gps.velocity[km/h]; satellite; n.s; w.e; new[NA]
## pems (1000x26)
## time.stamp local.time conc.co conc.co2 conc.hc conc.nox
## [Y-M-D H:M:S GMT] [s] [vol%] [vol%] [ppmC6] [ppm]
## 1 2005-09-08 11:46:07 0 0 0 88 20.447
## 2 2005-09-08 11:46:08 1 0 0 88 21.973
## 3 2005-09-08 11:46:09 2 0 0 88 20.752
## 4 2005-09-08 11:46:10 3 0 0 88 22.583
## 5 2005-09-08 11:46:11 4 0 0 0 20.142
## 6 2005-09-08 11:46:12 5 0 0 0 20.142
## ... not showing: 994 rows; 20 cols (elements)
## ... other cols: afr; exh.flow.rate[L/min]; exh.temp[degC]; exh.press[kPa];
## amb.temp[degC]; amb.press[kPa]; amb.humidity[%]; velocity[km/h];
## revolution[rpm]; option.1[V]; option2[V]; option.3[V];
## latitude[d.degLat]; longitude[d.degLon]; altitude[m];
## gps.velocity[km/h]; satellite; n.s; w.e; new[NA]
Why do we do this?
NULLs do not always stop code and wrap-fits are not always noticed or wanted, and any of these can lead to later problems that are then less easily resolved. So, the preference is that anyone requesting data that does not exist or putting it somewhere it does not fit exactly, should specify correct handling at the time.
The [[…]] operator accesses the pems structure and any meta data associated with the pems.
Structural components include:
Meta data entries are additional information that has been added to the pems file, for example, instrumentation, driver, vehicle or routine details.
Meta data components are not strictly defined, so users are free to use or not use this option at their own discretion.
Both types of pems components can be accessed in the form:
So, for example, exactly the full set of unit
assignments of pems.elements in
pems.1
:
## time.stamp local.time conc.co conc.co2 conc.hc conc.nox afr
## 1 Y-M-D H:M:S GMT s vol% vol% ppmC6 ppm
## exh.flow.rate exh.temp exh.press amb.temp amb.press amb.humidity velocity
## 1 L/min degC kPa degC kPa % km/h
## revolution option.1 option2 option.3 latitude longitude altitude gps.velocity
## 1 rpm V V V d.degLat d.degLon m km/h
## satellite n.s w.e new
## 1 NA
And new components can be added or accessed:
## [1] "new.record"
Added meta data can also be extracted using:
## $pems
## [1] "Horiba OBS"
##
## $new.meta
## [1] "new.record"
Although the [[…]] operation provides direct access to the inner workings of pems objects and is a useful shortcut for those developing new code, other options are arguably more convenient and therefore recommended for routine practices, for example:
pems[["data"]]
, use
as.data.frame(pems)
pems[["units"]]
, use
units(pems)
See also [>pems.utils generics] or R help documentation (?pems.generics and ?pems.element.generics) for more on
as.data.frame
and other generic pems functions.
See also [>pems.utils units] or R help documentation (?pems.units) for more on pems units handling.
There are numerous operators in R, and pems and pems.element versions are only written for those where a need was identified.
If you think any other operators would be useful, please let me know. [>email me]
Likewise, if you have any suggestions how to make either pems.utils or this document better or you have any problems using either, please let me know. [>email me].
Return to the [>website index] or [>introduction].