Title: | Soil and plant spectroscopic model building and prediction |
---|---|
Description: | Functions that cover reading of spectral data, outlier removal, spectral preprocessing, calibration sampling, PLS regression using caret, and model diagnostic statistics and plots. |
Authors: | Philipp Baumann [aut, cre] |
Maintainer: | Philipp Baumann <[email protected]> |
License: | GPL-2 |
Version: | 0.2.1 |
Built: | 2025-02-18 05:12:33 UTC |
Source: | https://github.com/philipp-baumann/simplerspec |
Return performance metrics for test set predictions and measured values, e.g. for different model outcome variables.
assess_multimodels( data, ..., .metrics = c("simplerspec", "yardstick"), .model_name = "model" )
assess_multimodels( data, ..., .metrics = c("simplerspec", "yardstick"), .model_name = "model" )
data |
Data frame with all measured (observed) and predicted variables. |
... |
Multiple arguments with observed (measured)-predicted pairs,
specified with |
.metrics |
Character vector with package used for metrics calculation.
Default is |
.model_name |
String with name for the new column that specifies the
model or the outcome variable. Default is |
Data frame with with summary statistics for measured values and performance metrics for the pairs of measured and predicted values.
Average spectra in list-column of spectra tibble (spc_tbl
) by
groups given in group column.
average_spc(spc_tbl, by = "sample_id", column_in = "spc_rs")
average_spc(spc_tbl, by = "sample_id", column_in = "spc_rs")
spc_tbl |
Tibble data frame containing at least the grouping column
given in argument |
by |
Character vector of length 1L or name/symbol that specifies the
column by which groups of spectra are averaged. Default is |
column_in |
Character vector of length 1L or or name/symbol that
specifies the list-column that contains the inputs spectra to be averaged.
Default is |
For memory efficiency and subsequent modeling, consider slicing the
extra row copies of spc_mean
resulting from average_spc()
for example by
split(x = spc_tbl, f = spc_tbl$<by>) %>% lapply(., function(x) x x[1, ]) %>% do.call(., rbind)
dplyr::group_by(spc_tbl, <by>) %>% dplyr::slice(1L)
Spectra tibble data frame (class "tbl_df"
, "tbl"
, "data.frame"
)
with a new list-column of column name "spc_mean"
at the last position,
containing mean spectra with identical row replicates within the same
by
-group.
Bind one to many list-columns in spectral tibble into a list of data.tables.
bind_lcols_dts(spc_tbl, lcols, spc_id = "unique_id", group_id = "sample_id")
bind_lcols_dts(spc_tbl, lcols, spc_id = "unique_id", group_id = "sample_id")
spc_tbl |
Spectral data in a tibble data frame (classes "tibble_df", "tbl" and "data.frame"). |
lcols |
Character vector of column names of list-columns to be bound into a list of data.tables |
spc_id |
Character vector denoting column name for a unique spectrum ID.
Default is |
group_id |
Character vector denoting column name for the spectrum group
ID. Default is |
A list of data.tables. Elements contain data from list-columns
specified in lcols
argument as data.tables. All data.tables contain in
addition spc_id
and group_id
columns.
Given a data frame with VIP outputs (wavenumber and vip columns), start and end values denoting spectral regions where VIP > 1 are returned as data frame. The functions can be used as helper function for plotting VIP.
create_vip_rects(df_vip)
create_vip_rects(df_vip)
df_vip |
Data frame containing |
Data.frame containing vectors start
(numeric; wavenumber),
end
(numeric; wavenumber) and group (integer; values are
1:length(start))
.
Calculate summary statistics of observed values and model
evaluation statistics for assessing agreement between observed (obs
) and
predicted (pred
) values.
evaluate_model(data, obs, pred) summary_df(df, x, y)
evaluate_model(data, obs, pred) summary_df(df, x, y)
data |
|
obs |
Column that contains observed values, |
pred |
Column that contains predicted values, |
df |
|
x |
Column that contains observed values, |
y |
Column that contains predicted values, |
Extract multiple tibble list columns, row bind them separately into single data tables and return a list of data.tables.
extract_lcols2dts(spc_tbl, lcols)
extract_lcols2dts(spc_tbl, lcols)
spc_tbl |
Spectral tibble (data frame) with spectral data contained in list-columns |
lcols |
Character vector containing names of list-columns to be extracted into a list of data.tables |
List of data.tables. Each element is a data.table derivied from a
list-column specified in the lcols
argument.
simplerspec::fit_pls()
VIPs are extracted based on the finalModel
sublist
in the caret::train
output contained in the model
element
of the simplerspec::fit_pls()
model output list. The VIPs for
derived number of PLS components in the finalModel
are computed.
extract_pls_vip(mout)
extract_pls_vip(mout)
mout |
Model output list returned from |
A tibble data frame with columns wavenumber
and correponding
VIP values in the column vip
for the finally chosen PLS regression
model at the final number of PLS components.
Perform calibration sampling and use selected calibration set for model tuning
fit_pls( spec_chem, response, variable = NULL, center = TRUE, scale = TRUE, evaluation_method = "test_set", validation = TRUE, split_method = "ken_stone", ratio_val = 1/3, ken_sto_pc = 2, pc, invert = TRUE, tuning_method = "resampling", resampling_method = "kfold_cv", cv = NULL, resampling_seed = 123, pls_ncomp_max = 20, ncomp_fixed = 5, print = TRUE, env = parent.frame() ) pls_ken_stone( spec_chem, response, variable = NULL, center = TRUE, scale = TRUE, evaluation_method = "test_set", validation = TRUE, split_method = "ken_stone", ratio_val = 1/3, ken_sto_pc = 2, pc, invert = TRUE, tuning_method = "resampling", resampling_method = "kfold_cv", cv = NULL, resampling_seed = 123, pls_ncomp_max = 20, ncomp_fixed = 5, print = TRUE, env = parent.frame() )
fit_pls( spec_chem, response, variable = NULL, center = TRUE, scale = TRUE, evaluation_method = "test_set", validation = TRUE, split_method = "ken_stone", ratio_val = 1/3, ken_sto_pc = 2, pc, invert = TRUE, tuning_method = "resampling", resampling_method = "kfold_cv", cv = NULL, resampling_seed = 123, pls_ncomp_max = 20, ncomp_fixed = 5, print = TRUE, env = parent.frame() ) pls_ken_stone( spec_chem, response, variable = NULL, center = TRUE, scale = TRUE, evaluation_method = "test_set", validation = TRUE, split_method = "ken_stone", ratio_val = 1/3, ken_sto_pc = 2, pc, invert = TRUE, tuning_method = "resampling", resampling_method = "kfold_cv", cv = NULL, resampling_seed = 123, pls_ncomp_max = 20, ncomp_fixed = 5, print = TRUE, env = parent.frame() )
spec_chem |
Tibble that contains spectra, metadata and chemical
reference as list-columns. The tibble to be supplied to |
response |
Response variable as symbol or name
(without quotes, no character string). The provided response symbol needs to be
a column name in the |
variable |
Depreciated and replaced by |
center |
Logical whether to perform mean centering of each spectrum column
(e.g. wavenumber or wavelength) after common spectrum preprocessing. Default is
|
scale |
Logical whether to perform standard deviation scaling
of each spectrum column (e.g. wavenumber or wavelength) after common
spectrum preprocessing. Default is |
evaluation_method |
Character string stating evaluation method.
Either |
validation |
Depreciated and replaced by |
split_method |
Method how to to split the data into a independent test
set. Default is |
ratio_val |
Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)). |
ken_sto_pc |
Number of component used
for calculating mahalanobsis distance on PCA scores for computing
Kennard-Stone algorithm.
Default is |
pc |
Depreciated; renamed argument is |
invert |
Logical |
tuning_method |
Character specifying tuning method. Tuning method
affects how caret selects a final tuning value set from a list of candidate
values. Possible values are |
resampling_method |
Character specifying resampling method. Currently,
|
cv |
Depreciated. Use |
resampling_seed |
Random seed (integer) that will be used for generating
resampling indices, which will be supplied to |
pls_ncomp_max |
Maximum number of PLS components that are evaluated
by caret::train. Caret will aggregate a performance profile using resampling
for an integer sequence from 1 to |
ncomp_fixed |
Integer of fixed number of PLS components. Will only be
used when |
print |
Logical expression whether model evaluation graphs shall be printed |
env |
Environment where function is evaluated. Default is
|
Perform calibration sampling and use selected calibration set for model tuning
fit_rf( spec_chem, response, variable = NULL, evaluation_method = "test_set", validation = NULL, split_method = "ken_stone", ratio_val, ken_sto_pc = 2, pc = NULL, invert = TRUE, tuning_method = "resampling", resampling_seed = 123, cv = NULL, ntree_max = 500, print = TRUE, env = parent.frame() )
fit_rf( spec_chem, response, variable = NULL, evaluation_method = "test_set", validation = NULL, split_method = "ken_stone", ratio_val, ken_sto_pc = 2, pc = NULL, invert = TRUE, tuning_method = "resampling", resampling_seed = 123, cv = NULL, ntree_max = 500, print = TRUE, env = parent.frame() )
spec_chem |
Tibble that contains spectra, metadata and chemical
reference as list-columns. The tibble to be supplied to |
response |
Response variable as symbol or name
(without quotes, no character string). The provided response symbol needs to be
a column name in the |
variable |
Depreciated and replaced by |
evaluation_method |
Character string stating evaluation method.
Either |
validation |
Depreciated and replaced by |
split_method |
Method how to to split the data into a independent test
set. Default is |
ratio_val |
Ratio of validation (test) samples to total number of samples (calibration (training) and validation (test)). |
ken_sto_pc |
Number of component used
for calculating mahalanobsis distance on PCA scores for computing
Kennard-Stone algorithm.
Default is |
pc |
Depreciated; renamed argument is |
invert |
Logical |
tuning_method |
Character specifying tuning method. Tuning method
affects how caret selects a final tuning value set from a list of candidate
values. Possible values are |
resampling_seed |
Random seed (integer) that will be used for generating
resampling indices, which will be supplied to |
cv |
Depreciated. Use |
ntree_max |
Maximum random forest trees
by caret::train. Caret will aggregate a performance profile using resampling
for an integer sequence from 1 to |
print |
Logical expression whether model evaluation graphs shall be printed |
env |
Environment where function is evaluated. Default is
|
Gather spectra, corresponding x-axis values, and device and
measurement metadata from a nested list into a spectra tibble, so that one
row represents one spectral measurement. Spectra, x-axis values and metadata
are mapped from the individual list elements (named after file name including
the extension) and transformed into (list-)columns of a spectra tibble,
which is an extended data frame. For each measurement, spectral data and
metadata are combined into one row of the tidy data frame. In addition, the ID
columns unique_id
, file_id
, and sample_id
are extracted from
"metadata"
(data frame) list entries and returned as identifier columns of
the spectra tibble. List-columns facilitate keeping related data together in
a rectangular data structure. They can be manipulated easily during
subsequent transformations, for example using the standardized functions of
the simplerspec data processing pipeline.
gather_spc(data, spc_types = "spc")
gather_spc(data, spc_types = "spc")
data |
Recursive list named with filename ( |
spc_types |
Character vector with the spectra types to be extracted
from
|
Spectra tibble (spc_tbl
with classes "tbl_df"
, "tbl"
, and
"data.frame"
) with the following (list-)columns:
"unique_id"
: Character vector with unique measurement identifier, likely
a string with file names in combination with date and time (extracted from
each "metadata"
data frame column).
"file_id"
: Character vector with file name including the extension
(extracted from each "metadata"
data frame column).
"sample_id"
: Character vector with sample identifier. For Bruker OPUS
binary files, this corresponds to the file name without the file extension
in integer increments of sample replicate measurements.
One or multiple of "spc"
, "spc_nocomp"
, "sc_sm"
, or "sc_rf"
:
List(s) of data.table's containing spectra type(s).
One or multiple of "wavenumbers"
, "wavelengths"
, "x_values"
,
"wavenumbers_sc_sm"
, "wavelengths_sc_sm"
, "x_values_sc_sm"
,
"wavenumbers_sc_rf"
, "wavelengths_sc_rf"
, or "x_values_sc_rf"
:
List(s) of numeric vectors with matched x-axis values (see "Details on
spectra data checks and matching" below).
gather_spc()
checks whether these conditions are met for each measurement
in the list data
:
Make sure that the first level data
elements are named (assumed to be
the file name the data originate from), and remove missing measurements with
an informative message.
Remove any duplicated file names and raise a message if there are name duplicates at first level.
Check whether spc_types
inputs are supported (see argument spc_types
)
and present at the second level of the data
list. If not, remove
all data elements for incomplete spectral measurements.
Match spectra types and possible corresponding x-axis types from
a lookup list. For each selected spectrum type (left), at least one of
the element names of the x-axis type (right) needs to be present for each
measurement in the list data
:
"spc"
: "wavenumbers"
, "wavelengths"
, or "x_values"
"spc_nocomp"
: "wavenumbers"
, "wavelengths"
, or "x_values"
"sc_sm"
: "wavenumbers_sc_sm"
, "wavelengths_sc_sm"
, or
"x_values_sc_sm"
"sc_rf"
: "wavenumbers_sc_rf"
, "wavelengths_sc_rf"
, or
"x_values_sc_rf"
Check if "metadata"
elements are present and remove data elements for
measurements with missing or incorrectly named metadata elements
(message).
Combines spectral data (data.frame) and chemical data (data.frame).
join_chem_spec(dat_chem, dat_spec, by = "sample_ID")
join_chem_spec(dat_chem, dat_spec, by = "sample_ID")
dat_chem |
data.frame that contains chemical values of the sample |
dat_spec |
List that contains spectral data |
by |
character of column name that defines sample_ID |
List: xxx
Combines spectral data (tibble class) and chemical data (tibble class).
join_spc_chem(spc_tbl, chem_tbl, by = "sample_id")
join_spc_chem(spc_tbl, chem_tbl, by = "sample_id")
spc_tbl |
Tibble that contains spectral data |
chem_tbl |
Tibble that contains chemical reference values of the samples |
by |
character of column name that defines sample_ID |
Tibble joined by sample_id
Helper function that merges all spectra and related data into a single long form data.table than can subsequently be used for plotting.
merge_dts( spc_tbl, lcols_spc = c("spc", "spc_pre"), lcol_measure = NULL, spc_id = "unique_id", group_id = "sample_id" )
merge_dts( spc_tbl, lcols_spc = c("spc", "spc_pre"), lcol_measure = NULL, spc_id = "unique_id", group_id = "sample_id" )
spc_tbl |
Tibble data frame containing spectra, x-axis values, metadata and eventual measured variables as list-columns. |
lcols_spc |
Character vector of spectral list-columns to be extracted.
Default is |
lcol_measure |
Character vector of length 1 denoting the column name
of the measure columns. This argument is optional. Default is |
spc_id |
Character vector of column that contains a unique spectral
identifier for all spectra. Default is |
group_id |
Character vector of columns that is used assigning spectra
into groups. Default is |
A single data.table containing long form aggregated data of spectra, x-axis values, metadata and an additionally measured variable.
merge_dts()
for list of tibbles to
aggregate data for plotting.Instead of a single spectral tibble (data frame) multiple
spectral tibbles can be merged into a long-form data.table for plotting
spectra and related data. For details, see
merge_dts
.
merge_dts_l( spc_tbl_l, lcols_spc = c("spc", "spc_pre"), lcol_measure = NULL, spc_id = "unique_id", group_id = "sample_id" )
merge_dts_l( spc_tbl_l, lcols_spc = c("spc", "spc_pre"), lcol_measure = NULL, spc_id = "unique_id", group_id = "sample_id" )
spc_tbl_l |
List of spectral tibbles (data frames). |
lcols_spc |
Character vector of spectral list-columns to be extracted.
Default is |
lcol_measure |
Character vector of length 1 denoting the column name
of the measure columns. This argument is optional. Default is |
spc_id |
Character vector of column that contains a unique spectral
identifier for all spectra. Default is |
group_id |
Character vector of columns that is used assigning spectra
into groups. Default is |
A single data.table containing long form aggregated data of
spectra, x-axis values, metadata and an additionally measured variable.
An additional column called group_id_tbl
is appended. It denotes
the name of the spectral tibble supplied with the list spc_tbl_l
.
Plot stacked ggplot2 graphs of VIP for the final PLS regression model output of the calibration (training) data set for the final number of components, raw (replicate mean) spectra, and preprocessed spectra. Regions with VIP > 1 are highlighted across the stacked graphs in beige colour rectangles. VIP calculation is implemented as described in Chong, I.-G., and Jun, C.-H. (2005). Performance of some variable selection methods when multicollinearity is present. Chemometrics and Intelligent Laboratory Systems, 78(1–2), 103–112. https://doi.org/10.1016/j.chemolab.2004.12.011
plot_pls_vip(mout, y1 = "spc_mean", y2 = "spc_pre", by = "sample_id", xlab = expression(paste("Wavenumber [", cm^-1, "]")), ylab1 = "Absorbance", ylab2 = "Preprocessed Abs.", alpha = 0.2)
plot_pls_vip(mout, y1 = "spc_mean", y2 = "spc_pre", by = "sample_id", xlab = expression(paste("Wavenumber [", cm^-1, "]")), ylab1 = "Absorbance", ylab2 = "Preprocessed Abs.", alpha = 0.2)
mout |
Model output list that is returned from
|
y1 |
Character vector of list-column name in
|
y2 |
Character string of list-column name in
|
by |
Character string that is used to assign spectra to the same group
and therefore ensures that all spectra are plotted with the same colour.
Default is |
xlab |
Character string of X axis title for shared x axis of stacked
graphs. Default is |
ylab1 |
Y axis title of bottom spectrum. Default is |
ylab2 |
Y axis title of bottom spectrum. Default is
|
alpha |
Double between 0 and 1 that defines transparency of spectra lines in returned graph (ggplot plot object). |
Plot spectra from tibble spectra objects.
plot_spc(spc_tbl, spc_tbl_2 = NULL, x_unit = "wavenumber", y = "spc", by = "unique_id", graph_id_1 = "Set 1", graph_id_2 = "Set 2", graph_id_1_col = "black", graph_id_2_col = "red", xlab = expression(paste("Wavenumber [", cm^-1, "]")), ylab = "Absorbance", alpha = 0.2, legend = TRUE)
plot_spc(spc_tbl, spc_tbl_2 = NULL, x_unit = "wavenumber", y = "spc", by = "unique_id", graph_id_1 = "Set 1", graph_id_2 = "Set 2", graph_id_1_col = "black", graph_id_2_col = "red", xlab = expression(paste("Wavenumber [", cm^-1, "]")), ylab = "Absorbance", alpha = 0.2, legend = TRUE)
spc_tbl |
Tibble that contains the first set of spectra to plot as list-column |
spc_tbl_2 |
Tibble that contains the second set of spectra (optional) to plot as list-column. |
x_unit |
Character string describing the x axis unit. Default is
|
y |
Character string of list-column name in tibble where spectra of desired type are extracted to plot. |
by |
Character string of column that is used to group the spectra.
Default is |
graph_id_1 |
Character string used for grouping the first spectra set
( |
graph_id_2 |
Character string used for grouping the second spectra set
( |
graph_id_1_col |
Character string for the colour of the first spectra
set. Default is |
graph_id_2_col |
Character string for the colour of the first spectra
set. Default is |
xlab |
Character string or mathematical expression
(use |
ylab |
Character string or mathematical expression
(use |
alpha |
Double in between 0 and 1. Sets the transparency for the plotted spectra lines. |
legend |
Logical whether to plot a legend for the spectra describing
its name selected in arguments |
plot_spc_ext
is a custom plotting function developed
within the simplerspec framework. Returns plots based on ggplot2
(class "ggplot"). Different spectra types such as raw or preprocessed spectra
and groups can be differentiated by different colors or by using panels
(so called facets). Additionally, spectra can be colored based on an
additional measure variable, e.g. determined by chemical reference analysis.
plot_spc_ext( spc_tbl, spc_tbl_l = NULL, lcols_spc = "spc", lcol_measure = NULL, lcol_measure_col_palette = "Spectral", lcol_measure_col_direction = -1, spc_id = "unique_id", group_id = "sample_id", group_id_order = TRUE, group_color = TRUE, group_color_palette = NULL, group_panel = TRUE, group_legend = FALSE, ncol = NULL, relabel_spc = TRUE, ylab = "Spectrum value", alpha = 0.5, line_width = 0.2, ... )
plot_spc_ext( spc_tbl, spc_tbl_l = NULL, lcols_spc = "spc", lcol_measure = NULL, lcol_measure_col_palette = "Spectral", lcol_measure_col_direction = -1, spc_id = "unique_id", group_id = "sample_id", group_id_order = TRUE, group_color = TRUE, group_color_palette = NULL, group_panel = TRUE, group_legend = FALSE, ncol = NULL, relabel_spc = TRUE, ylab = "Spectrum value", alpha = 0.5, line_width = 0.2, ... )
spc_tbl |
Tibble data frame containing spectra, x-axis values, metadata and eventual measured variables as list-columns. |
spc_tbl_l |
List of spectral tibbles (data frames). Default is
|
lcols_spc |
Character vector of spectral list-columns to be extracted.
Default is |
lcol_measure |
Character vector of length 1 denoting the column name
of the measure columns. This argument is optional. Default is |
lcol_measure_col_palette |
Palette value supplied to
|
lcol_measure_col_direction |
Sets the the order of colours in the scale
that is based on a measure column. Default is |
spc_id |
Character vector denoting column name for a unique spectrum ID.
Default is |
group_id |
Character vector denoting column name for the spectrum group
ID. Default is |
group_id_order |
Logical that specifies whether the panel names
derived from a numeric |
group_color |
Logical defining whether spectra are colored by the column
specified by |
group_color_palette |
Character (1L) defining the diverging colour
scales from colorbrewer.org; see |
group_panel |
Logical defining whether spectra are arranged into panels
by groups specified in |
group_legend |
Logical defining whether a legend for the |
ncol |
Integer vector of length 1. Defines number of columns when
plotting panels (facets). Default is |
relabel_spc |
Logical defining whether panels are relabeled with custom
names for spectra types. Default is TRUE. When |
ylab |
Character vector or vector of type |
alpha |
Integer of length 1, from 0 to 1. Defines transparency of
spectral lines. Default is |
line_width |
Numeric vector of length 1 specifying the width of the
spectral lines. Default is |
... |
Further arguments to be passed to |
Object of class "ggplot"
(ggplot2 graph).
Append predictions for a set of responses specified by a list of calibration models and a tibble containing preprocessed spectra as list-columns.
predict_from_spc(model_list, spc_tbl, slice = TRUE)
predict_from_spc(model_list, spc_tbl, slice = TRUE)
model_list |
List of model output generated from calibration step
( |
spc_tbl |
Tibble of spectra after preprocessing
( |
slice |
Logical expression wheather only one row per sample_id returned. |
tibble with new columns model
, and predicted values with
column names of model list.
Preprocesses spectra in tibble column by sample_id after
averaging spectra by simplerspec::average_spc()
.
preprocess_spc(spc_tbl, select, column_in = "spc_mean", custom_function = NULL)
preprocess_spc(spc_tbl, select, column_in = "spc_mean", custom_function = NULL)
spc_tbl |
Tibble that contains spectra to be preprocessed within a list-column. |
select |
Character vector of predefined preprocessing options to be
applied to the spectra list-column specified in |
column_in |
Character vector of single list-column in |
custom_function |
A character string of a custom processing function
that is later parsed (produces expression in a list) and evaluated within
the function |
Read tab delimited text (.txt) files exported from ASD field spectrometer into simplerspec spectra tibble. ASD Fieldspec data files are expected in .txt tab delimited file format. The first row should contain the name 'Wavelength' for the first column and the file names for the remaining columns.
read_asd(file)
read_asd(file)
file |
Tab delmited file from ASD software export where the first
column called |
Spectra data in tibble data frame (class tbl_df
) that contains
columns sample_id
(derived from 2nd and following column names of
tab delimited ASD exported text file),
spc
(list-column of spectral matrices)
and wavelengths
(list-column containing wavelength vectors).
Read multiple ASD binary files and gather spectra and metadata into a simplerspec spectral tibble (data frame). The resulting spectral tibble is compatible with the simplerspec spectra processing and modeling framework.
read_asd_bin(fnames)
read_asd_bin(fnames)
fnames |
Character vector containing full paths of ASD binary files to be read |
A spectral tibble (data frame) containing the follwing columns:
unique_id |
Character vector. Unique identifier containing file name pasted with date and time. |
file_id |
Character vector containing file names and exension |
sample_id |
Character vector containing files names without extension |
metadata |
List-column. List of data frames containing spectral metadata |
wavelengths |
List-column. List of wavelengths vectors (numeric). |
spc_radiance |
List-column. List of data.tables containing radiance sample spectra. |
spc_reference |
List-column. List of data.tables containing reference reflectance spectra. |
spc |
List-column. List of data.tables containing final reflectance spectra. |
Read single binary file acquired with an Bruker Vertex FTIR Instrument
read_opus_bin_univ(file_path, extract = c("spc"), print_progress = TRUE, atm_comp_minus4offset = FALSE)
read_opus_bin_univ(file_path, extract = c("spc"), print_progress = TRUE, atm_comp_minus4offset = FALSE)
file_path |
Character vector with path to file |
extract |
Character vector of spectra types to extract from OPUS binary
file. Default is |
print_progress |
Logical (default |
atm_comp_minus4offset |
Logical whether spectra after atmospheric
compensation are read with an offset of |
Read multiple spectral files measured with a Bruker FTIR Instrument. Files
containing spectra are in OPUS binary format.
read_opus_univ
is a wrapper for read_opus_bin_univ()
)
read_opus_univ(fnames, extract = c("spc"), parallel = FALSE, atm_comp_minus4offset = FALSE)
read_opus_univ(fnames, extract = c("spc"), parallel = FALSE, atm_comp_minus4offset = FALSE)
fnames |
List of character vectors containing full path names of spectra |
extract |
Character vector of spectra types to extract from file.
Possible values are: "spc" (AB block in Bruker Opus software), "spc_nocomp"
(Spectra before final atmospheric compensation; only present if background
correction has been set in Opus), "ScSm" (Single channel spectrum of the
sample), "ScRf" (Single channel spectrum of the sample), "IgSm" (Interferogram
of the sample), "IgRf" (Interferogram of the reference). Default is
|
parallel |
Logical ( |
atm_comp_minus4offset |
Logical whether spectra after atmospheric
compensation are read with an offset of |
out List spectra and metadata (parameters) extracted from Bruker OPUS spectrometer files. List names are the names of the OPUS files whose spectral data were extracted.
Remove outlier spectra based on the
pcout()
function of the mvoutlier
package.
remove_outliers(list_spectra, remove = TRUE)
remove_outliers(list_spectra, remove = TRUE)
list_spectra |
List that contains averaged
spectral information
in list element |
remove |
logical expression ( |
This is an optional function if one wants to remove outliers.
Returns list spectra_out
that contains:
MIR_mean
: Outlier removed MIR spectra as
data.frame object. If remove = FALSE
,
the function will
return almost identical list identical to list_spectra
,
except that the first indices
column of the spectral
data frame MIR_mean
is removed
(This is done for both options
remove = TRUE
and remove = FALSE
).
data_meta
: metadata data.frame, identical
as in the list_spectra
input list.
plot_out
: (optional) ggplot2 graph
that shows all spectra (absorbance on x-axis and wavenumber
on y-axis) with outlier marked, if
remove = TRUE
.
Resamples (interpolates) different spectra types with
corresponding x-axis values that are both stored in list-columns of a spectra
tibble. A spectra tibble hosts spectra, x-axis vectors, metadata, and
further linked data with standardized naming conventions. Data input for
resampling can for example be generated with simplerspec::gather_spc()
.
Resampling is a key harmonizing step to process and later model spectra
measured at different resolutions and spectral ranges (i.e., different
spectrometer devices and/or measurement settings).
resample_spc( spc_tbl, column_in = "spc", x_unit = c("wavenumber", "wavelength"), wn_lower = 500, wn_upper = 4000, wn_interval = 2, wl_lower = 350, wl_upper = 2500, wl_interval = 1, interpol_method = c("linear", "spline") )
resample_spc( spc_tbl, column_in = "spc", x_unit = c("wavenumber", "wavelength"), wn_lower = 500, wn_upper = 4000, wn_interval = 2, wl_lower = 350, wl_upper = 2500, wl_interval = 1, interpol_method = c("linear", "spline") )
spc_tbl |
Spectra data embedded in a tibble object (classes
|
column_in |
Character vector of length 1L or symbol/name specifying the name of list-column that contains the spectra to be resampled. |
x_unit |
Character vector of length 1L specifying the measurement unit
of the x-axis values (list-column) of the input spectra in |
wn_lower |
Numeric value of lowest wavenumber. This argument will only
be used if |
wn_upper |
Numeric value of highest wavenumber. This argument will only
be used if |
wn_interval |
Numeric value of the wavenumber increment for the new wavenumber sequence that the spectra will be resampled upon. Default value is 2 (i.e., in reciprocal centimeters). |
wl_lower |
Numeric value of lowest wavelength. This argument will only
be used if |
wl_upper |
Numeric value of highest wavelength. This argument will only
be used if |
wl_interval |
Numeric value of the wavelength increment for the new
wavenumber sequence that the spectra will be resampled upon. This argument
will only be used if |
interpol_method |
Character of |
A spectra tibble (spc_tbl
) containing two added list-columns:
spc_rs:
Resampled spectra as list of data.table
s
wavenumbers_rs
or wavelengths_rs
: Resampled x-axis values as list of
numeric vectors
The combinations of input spectrum types (column_in
) and
corresponding x-axis types are generated from a simple lookup list. The
following key-value(s) pairs can be matched at given key, which is the column
name from column_in
containing the spectra.
"spc"
: "wavenumbers"
or "wavelengths"
(raw spectra)
"spc_rs"
: "wavenumbers_rs"
or "wavelengths_rs"
) (resampled spectra)
"spc_mean"
: "wavenumbers_rs"
or "wavelengths_rs"
(mean spectra)
"spc_nocomp"
"wavenumbers"
or "wavelengths"
(spectra prior
atmospheric compensation)
"sc_sm" : c("wavenumbers_sc_sm", "wavelengths_sc_sm")
(single channel
sample spectra)
"sc_rf" : c("wavenumbers_sc_rf", "wavelengths_sc_rf")
(single channel
reference spectra)
"spc_pre" : "xvalues_pre"
(preprocessed spectra)
Select a set of calibration spectra to develop spectral models. Samples in this list will be analyzed using laboratory reference methods.
select_ref_spc(spc_tbl, ratio_ref, pc, print = TRUE)
select_ref_spc(spc_tbl, ratio_ref, pc, print = TRUE)
spc_tbl |
Spectra as tibble objects that contain preprocessed spectra |
ratio_ref |
Ratio of desired reference samples to total sample number |
pc |
Number of principal components (numeric). If pc < 1, the number of principal components kept corresponds to the number of components explaining at least (pc * 100) percent of the total variance. |
print |
logical expression whether a plot (ggplot2) of sample selection
for reference analysis is shown in PCA space
( |
spc_tbl
)Select every n-th spectral variable for all spectra and x-values in spectral
tibble (spc_tbl
)
select_spc_vars( spc_tbl, lcol_spc = "spc_pre", lcol_xvalues = "xvalues_pre", every = NULL )
select_spc_vars( spc_tbl, lcol_spc = "spc_pre", lcol_xvalues = "xvalues_pre", every = NULL )
spc_tbl |
Tibble data.frame containing spectra in list-column |
lcol_spc |
List-column containing spectra, specified with column name as symbols or 1L character vector. |
lcol_xvalues |
List-column containing x-values, specified with column name as symbols or 1L character vector. |
every |
Every n-th spectral positions to keep as 1L integer vector. |
a spectral tibble
Slice spectra contained in list-column of spectral tibble (data frame). A list of x-axis value ranges can be specified. Spectra are cut based on these ranges.
slice_xvalues( spc_tbl, xunit_lcol = "wavenumbers", spc_lcol = "spc", xvalues_cut = NULL )
slice_xvalues( spc_tbl, xunit_lcol = "wavenumbers", spc_lcol = "spc", xvalues_cut = NULL )
spc_tbl |
Spectral data in a tibble object (classes "tibble_df", "tbl"
and "data.frame"). The spectra tibble is expected to contain at least
the column |
xunit_lcol |
Character vector that specifies column name where x-axis
axis units are stored within |
spc_lcol |
Character vector that specifies which column (list-column)
contains spectra to be sliced. Default is |
xvalues_cut |
List of numeric vectors that contains upper and lower bounds of respective regions to keep in spectra. The spectral regions outside
the |
Spectral tibble (data frame with list-columns) with sliced x-axis
column and spectral column. Both the x-axis list-column and the spectral
tibble list-column only contain data specified within the xvalues_cut
argument (list of numeric vectors).
Data from "Estimation of soil properties with mid-infrared soil spectroscopy across yam production landscapes in West Africa".
soilspec_yamsys
soilspec_yamsys
soilspec_yamsys
A tibble data frame with 284 rows and 40 columns. The spectra are in the
spc
list-column.
https://soil.copernicus.org/articles/7/717/2021/
Helper function that calls split
on a tibble using a
grouping column within tibble.
split_df2l(tbl_df, group)
split_df2l(tbl_df, group)
tbl_df |
Tibble data frame |
group |
Character vector with name of column based on which tibble is split into a list of tibbles |
List of tibbles. Each tibble contains data split by
a group column within tbl_df
.