Tidy photometer data with `tidy_plates()`
Silvia Eckert
2024-05-25
Source:vignettes/articles/tidy_photometer_data.Rmd
tidy_photometer_data.Rmd
During my work as a postdoc in various laboratories, I experienced that old photometer devices are still in use, even if the associated proprietary software (e.g. depending on an outdated operating system) is no longer available. This is a good sign from an environmental point of view (e.g. avoiding electronic waste), but is often more a question of the laboratory’s financial resources. However, the absorption values provided by these devices (e.g. in an 8 x 12 table format in simple text files for 96-well plates) are usually analysed via a custom Excel spreadsheet that someone created a long time ago and that everyone relies on. This may work, as each type of photometer has its own file structure, but Excel is itself proprietary software, i.e. not easily accessible to everyone. Also, the steps are tedious in most cases as only some of the analyses can be performed in it and only limited experimental designs are supported. In contrast, R is an established open source statistics software. So why not also use it as a tool to organize photometer plate data, flexibly add experimental designs as metadata and even perform at least some simple visualizations and statistical analyses, all in one go?
The microdiluteR
package is designed to help researchers
tidy up data from photometer plates and provides functions to easily add
metadata. Regardless of whether the user is processing a single plate or
multiple plates with complex metadata structures, the
tidy_plates()
function provides flexibility and ease of use
to optimize the data processing workflow. This vignette guides the user
through a general workflow. For more specific use cases, please refer to
upcoming vignettes of this package. The microdiluteR
package was developed to support the analysis of broth
microdilution assays, but may be extended for other types of assays
in the future.
Installation
After installing R, install
the microdiluteR
package either via CRAN or install the
development version via GitHub using the following commands:
# via CRAN
install.packages("microdiluteR")
# via GitHub
install.packages("devtools")
library(devtools)
install_github("silvia-eckert/microdiluteR")
The microdiluteR
package can be loaded as follows:
Usage
To show the general workflow, we will use example data shipped with
the microdiluteR
package. For use cases relying on
real-world data, please check out upcoming vignettes. The example data
used here is a list of multiple photometer measurements on 96-well
plates.
data("bma")
names(bma) # check file names
#> [1] "bma_grp1_exp2_T0" "bma_grp1_exp2_T3" "bma_grp2_exp1_T0" "bma_grp2_exp1_T3"
bma[[1]] # absorption values from first plate
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> A 0.342 0.354 0.360 0.360 0.352 0.363 0.361 0.352 0.356 0.351 0.366 0.375
#> B 0.362 0.391 0.375 0.363 0.383 0.366 0.380 0.378 0.339 0.387 0.377 0.362
#> C 0.344 0.346 0.345 0.347 0.350 0.356 0.348 0.343 0.348 0.351 0.351 0.353
#> D 0.361 0.367 0.351 0.364 0.353 0.362 0.361 0.367 0.363 0.356 0.357 0.355
#> E 0.388 0.473 0.400 0.358 0.388 0.340 0.335 0.396 0.411 0.404 0.397 0.407
#> F 0.456 0.465 0.469 0.469 0.462 0.468 0.455 0.477 0.487 0.488 0.498 0.471
#> G 0.334 0.340 0.357 0.332 0.329 0.342 0.333 0.317 0.360 0.332 0.335 0.328
#> H 0.334 0.332 0.339 0.333 0.339 0.334 0.342 0.335 0.361 0.327 0.330 0.341
The data also contains information on the experimental setup, which can be retrieved using an attribute:
attr(bma, "metadata") # check out the experimental setup
#> plate_axis treatment concentration
#> 1 A 10% 100 ppm
#> 2 B 10% 200 ppm
#> 3 C 30% 100 ppm
#> 4 D 30% 200 ppm
#> 5 E 100% 100 ppm
#> 6 F 100% 200 ppm
#> 7 G Control 100 ppm
#> 8 H Control 200 ppm
We can see that the plate has been loaded in a horizontal direction (rows starting with A-H) denoted in the ‘plate_axis’ column. We can also see that there are four treatment levels (10%, 30%, 100%, and a negative control level), each being tested at two concentrations (100 ppm and 200 ppm). In the next step, we will try to add this metadata to the absorption values measured. You can also get details of the experimental setup using the help page of the data set:
?bma # check out details on the experimental setup
Read plates
Before we dive into the magic of the tidy_plates()
function, let’s first have a look at the read_plate()
and
read_plates()
functions. If adding metadata is not desired
and the photometer data should only be loaded into R for inspection or
for custom analyses, this can be achieved as with the
read_plate()
and read_plates()
functions.
Let’s first create a temporary mock file:
data <- "Line with additional information, e.g. wavelength
1 2 3 4 5 6 7 8 9 10 11 12
A 1 2 3 4 5 6 7 8 9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path <- tempfile()
writeLines(data, file_path, sep = "\n")
Now we’ll read this file using read_plate()
as
follows:
temp_file <- read_plate(file_path,
skip_lines = 2) # skip the first two lines
temp_file
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> A 1 2 3 4 5 6 7 8 9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96
# The skipped lines are stored here
attr(temp_file, "info")
#> [1] "Line with additional information, e.g. wavelength"
# Remove the temporary file
unlink(file_path)
We could also do this with multiple files using the
read_plates()
function. The content of the files will be
concatenated into a single list that is ready for further processing,
e.g. with the tidy_plates()
function. Let’s again create
two temporary files. This time, however, we will use a custom pattern
called “Assay_” to start the file names. This will make sure that only
these two files are considered from the temporary directory.
# File 1
data_T0 <- "Line with additional information, e.g. wavelength
1 2 3 4 5 6 7 8 9 10 11 12
A 1 2 3 4 5 6 7 8 9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path_T0 <- tempfile(pattern = "Assay_T0_",
fileext = ".txt")
writeLines(data_T0, file_path_T0, sep = "\n")
# File 2
data_T1 <- "Line with additional information, e.g. wavelength
1 2 3 4 5 6 7 8 9 10 11 12
A 1 2 3 4 5 6 7 8 9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path_T1 <- tempfile(pattern = "Assay_T1_",
fileext = ".txt")
writeLines(data_T1, file_path_T1, sep = "\n")
file_dir <- dirname(file_path_T1)
Now we’ll apply the read_plates()
function similarly to
read_plate()
as follows:
temp_files <- read_plates(file_dir,
pattern = "Assay_T", # our custom pattern
skip_lines = 2) # skip the first two lines
temp_files
#> $Assay_T0_17ec5fd80156
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> A 1 2 3 4 5 6 7 8 9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96
#>
#> $Assay_T1_17ec2a755ec
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> A 1 2 3 4 5 6 7 8 9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96
#>
#> attr(,"info")
#> File_name Attribute
#> 1 Assay_T0_17ec5fd80156 Line with additional information, e.g. wavelength
#> 2 Assay_T1_17ec2a755ec Line with additional information, e.g. wavelength
These functions are purely for inspection or for custom use. To add
metadata, these function are not necessary and users can directly apply
the tidy_plates()
function on either single files, a folder
pointing to single or multiple files (using custom patterns if
necessary) or lists of photometer data already loaded into R (with
read_plate()
, read_plates()
or any other
function as long as as a list structure is preserved).
Tidy data from a single plate
Tidying data from a single photometer plate is straightforward.
Simply provide your input data to tidy_plates()
, set the
desired metadata and it will handle the rest. Your input data can be a
file name, a folder pointing to this file (given there are no other
interfering files, or else use a pattern) or a list element that has
already been read into R. Here, we will use the first data set from
bma
and apply the metadata provided in the attributes:
single_plate_data <- bma[1]; single_plate_data
#> $bma_grp1_exp2_T0
#> 1 2 3 4 5 6 7 8 9 10 11 12
#> A 0.342 0.354 0.360 0.360 0.352 0.363 0.361 0.352 0.356 0.351 0.366 0.375
#> B 0.362 0.391 0.375 0.363 0.383 0.366 0.380 0.378 0.339 0.387 0.377 0.362
#> C 0.344 0.346 0.345 0.347 0.350 0.356 0.348 0.343 0.348 0.351 0.351 0.353
#> D 0.361 0.367 0.351 0.364 0.353 0.362 0.361 0.367 0.363 0.356 0.357 0.355
#> E 0.388 0.473 0.400 0.358 0.388 0.340 0.335 0.396 0.411 0.404 0.397 0.407
#> F 0.456 0.465 0.469 0.469 0.462 0.468 0.455 0.477 0.487 0.488 0.498 0.471
#> G 0.334 0.340 0.357 0.332 0.329 0.342 0.333 0.317 0.360 0.332 0.335 0.328
#> H 0.334 0.332 0.339 0.333 0.339 0.334 0.342 0.335 0.361 0.327 0.330 0.341
tidy_df <- tidy_plates(single_plate_data,
how_many = "single", # tidy single plate
direction = "horizontal",
group_ID = NA,
experiment_name = NA,
validity_method = "threshold",
threshold = 1,
treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
concentration_levels = rep(c(100, 200), 4)) # numeric
We see that we could have also provided a group identifier or an experiment name. This was not necessary here, but we could update our code to denote e.g. the pathogen used as the test organism (here Botrytis cinerea) in the group identifier or give the plate measurement an experiment name (e.g. “Assay 1”). Let’s do this:
tidy_df <- tidy_plates(single_plate_data,
how_many = "single",
direction = "horizontal",
group_ID = "Botrytis cinerea", # add test organism
experiment_name = "Assay 1", # add plate name
validity_method = "threshold",
threshold = 1,
treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
concentration_levels = rep(c(100, 200), 4))
The resulting tidy table looks as follows:
Position | Value | Validity | Treatment | Concentration | Timepoint | File | Group | Experiment |
---|---|---|---|---|---|---|---|---|
A-1 | 0.342 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-2 | 0.354 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-3 | 0.360 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-4 | 0.360 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-5 | 0.352 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-6 | 0.363 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-7 | 0.361 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-8 | 0.352 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-9 | 0.356 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
A-10 | 0.351 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 1 |
We see that the timepoint was added automatically and we didn’t have to specify it explicitely. That’s because the timepoint was extracted from the file name ‘bma_grp1_exp1_T0”. The file name also contains ’bma’, ‘grp1’ and ‘exp1’ as identifiers. This will become important in the next section when several plates are read.
Prepare file names
The easiest way to prepare your files for use with the
tidy_plates()
function is to add a pattern at the beginning
of the file names. Simply use the beginning of your file name and leave
the rest of the file name unchanged to make sure you still know what
your file is about. If we have a look at the file names of the
bma
dataset, we see that they have a specific pattern
including ‘grp1’, ‘grp2’, ‘exp1’, ‘exp2’, as well as timepoints (‘T0’
and ‘T1’). These patterns are necessary for tidy_plates()
to correctly assign the group and experiment names. For example, we
could have repeated the experiment from the above-mentioned plate called
‘Assay 1’ but used other concentrations or even other chemicals or
antibiotics, now using ‘Assay 2’ as the experiment name for this plate.
Additionally, we could also have repeated these experiments with another
test organism, e.g. Penicillium digitatum instead of B.
cinerea. Adding plate data from measurements with this second test
organism, we need to specify it in the group name.
We would assign the identifier “grp1” to the files from experiments with B. cinerea, the identifier “grp2” to the files from experiments with P. digitatum (and so on if there are experiments with other test organisms). The same applies, if there are multiple experiments with each of the test organisms, where “exp1” stands for the first set-up and “exp2” for the second set-up. As the plates are measured multiple times, a time point identifier is required, which is either T0 (T1, …) or t0 (t1, …). All files should have “BMA” or “bma” (which stands for ‘broth microdilution assay’ but a user-defined pattern such as “Assay_” is also fine) at the beginning of their name to distinguish them from other files located in the same folder.
This may sound a bit tedious, as you have to partially rename your files, but it helps a lot to identify the different groups and experiments. Photometers often only provide cryptic file names, consisting of a combination of letters or a sequence of numbers or just the date of the measurement, so renaming is often necessary for the sake of clarity anyway. The following table illustrates an example of a renaming strategy:
Original names | Renamed files | Group | Experiment |
---|---|---|---|
F0001.txt | BMA_grp1_exp1_T0_F0001.txt | B. cinerea | Fungizide A |
F0002.txt | BMA_grp1_exp1_T1_F0002.txt | B. cinerea | Fungizide A |
F0003.txt | BMA_grp1_exp1_T2_F0003.txt | B. cinerea | Fungizide A |
F0004.txt | BMA_grp1_exp2_T0_F0004.txt | B. cinerea | Fungizide B |
F0005.txt | BMA_grp1_exp2_T1_F0005.txt | B. cinerea | Fungizide B |
F0006.txt | BMA_grp1_exp2_T2_F0006.txt | B. cinerea | Fungizide B |
F0007.txt | BMA_grp2_exp1_T0_F0007.txt | P. digitatum | Fungizide A |
F0008.txt | BMA_grp2_exp1_T1_F0008.txt | P. digitatum | Fungizide A |
F0009.txt | BMA_grp2_exp1_T2_F0009.txt | P. digitatum | Fungizide A |
F0010.txt | BMA_grp2_exp3_T0_F0010.txt | P. digitatum | Fungizide C |
F0011.txt | BMA_grp2_exp3_T1_F0011.txt | P. digitatum | Fungizide C |
F0012.txt | BMA_grp2_exp3_T2_F0012.txt | P. digitatum | Fungizide C |
In our bma
example data, we have the following file
names:
names(bma)
#> [1] "bma_grp1_exp2_T0" "bma_grp1_exp2_T3" "bma_grp2_exp1_T0" "bma_grp2_exp1_T3"
This means, we have two groups, and one experiment per group, but each experiment measured at timepoint T0 and T3.
Tidy data from multiple plates with a common experimental setup
Since all measurements in bma
were performed under a
common treatment and concentration setting, we will apply this metadata
all at once. In this case, we need to specify that we want to clean
multiple plates.
Tidy via function parameters
The function tidy_plates()
offers to options here to add
metadata and it depends on the preference of the user which option feels
more convenient. Users are free to add metadata either by parameters or
via user prompts. At first, we will use parameters to do so:
multiple_plates_data <- bma
tidy_dfs <- tidy_plates(
multiple_plates_data,
how_many = "multiple", # changed from "single"
direction = "horizontal",
group_ID = c("Botrytis cinerea", "P. digitatum"), # additional group added
experiment_name = c("Assay 1", "Assay 2"), # additional experiment added
validity_method = "threshold",
threshold = 1,
treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
concentration_levels = rep(c(100, 200), 4)
)
Let’s have a look at the resulting table:
Position | Value | Validity | Treatment | Concentration | Timepoint | File | Group | Experiment |
---|---|---|---|---|---|---|---|---|
A-1 | 0.342 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-2 | 0.354 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-3 | 0.360 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-4 | 0.360 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-5 | 0.352 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-6 | 0.363 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-7 | 0.361 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-8 | 0.352 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-9 | 0.356 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
A-10 | 0.351 | valid | 10% | 100 | T0 | bma_grp1_exp2_T0 | Botrytis cinerea | Assay 2 |
Keep in mind that the naming strategy will be performed sequentially. This means that, for example, ‘Assay 1’ will be applied to identifier ‘exp1’. But since only files of the second group, P. digitatum, contain ‘exp1’ as identifier, all experiments with P. digitatum will be assigned to ‘Assay 1’ and all experiments with B. cinerea will be assigned to ‘Assay 2’.
Tidy via user prompts
Another way to add metadata is to prompt the user. In this way, each part, from the group name to the concentration level, is queried in turn.
tidy_dfs_prompts <- tidy_plates(
multiple_plates_data,
how_many = "multiple", # changed from "single"
user_prompt = T, # set user prompt option to TRUE
direction = "horizontal",
)
After processing with user prompts, you can check if both options give the same result once the same settings are used as input:
identical(tidy_dfs, tidy_dfs_prompts) # should be TRUE
Tidy data from multiple plates with different experimental setups
What if it gets more complicated and each group or experiment (or
both) has its own individual experimental set-up? But you still want
them in a common table for further processing? Don’t worry,
tidy_plates()
can handle that too. In that case, the user
is prompted for the experimental setup as before but for every plate
separately.
Further data processing options
This vignette shows how to add metadata to photometer files in plain
text format. However, this is certainly not where the analysis pipeline
ends. The microdiluteR
package has more to offer, but this
will be covered in further vignettes in this R package:
- Validation of samples using either thresholds or manually specifying
invalid samples with the
validate_cells()
andupdate_validity()
functions, which include visual inspection. - Usage of data cleaning with the
subtract_T0()
function. - Calculate and summarize growth performance across multiple
timepoints via the
calculate_growth_performance()
andsummarize_growth_performance()
functions. - Applying the sign test to assess whether growth performance
significantly differs from the baseline (usually the control) using the
apply_sign_test()
function. - Visualize your results, with or without sign test results in
asterisk notation, using the
plot_growth_performance()
function.
Citation
Creating this package was a lot of work, and I made it available for
free. If you use this package for your publication, be fair and cite it,
e.g. using the following biblatex
entry:
@software(eckert_micodiluteR_2024,
author = {Eckert, Silvia},
title = {microdiluteR},
version = {1.0.1.},
date = {2024-05-13}
year = {2024},
note = {R package version 1.0.1, available from CRAN},
url = {https://cran.r-project.org/package=microdiluteR}
doi = {10.5281/zenodo.11186926}
)
Or you can cite as follows using the APA citation style:
Eckert, S. (2024). microdiluteR (Version 1.0.1) [Software]. DOI: 10.5281/zenodo.11186926. Retrieved from: https://CRAN.R-project.org/package=microdiluteR
Or use citation("microdiluteR")
to retrieve citation
information after installing the R package.
Acknowledgements
The microdiluteR
1 package would not be possible without the
great usethis and testthat packages, and
answers from the amazing stackoverflow
community that has helped me to constantly improve my code (and save a
lot of nerves, especially with tidy evaluation).