Tidy photometer data with `tidy_plates()`

During my work as a postdoc in various laboratories, I experienced that old photometer devices are still in use, even if the associated proprietary software (e.g. depending on an outdated operating system) is no longer available. This is a good sign from an environmental point of view (e.g. avoiding electronic waste), but is often more a question of the laboratory’s financial resources. However, the absorption values provided by these devices (e.g. in an 8 x 12 table format in simple text files for 96-well plates) are usually analysed via a custom Excel spreadsheet that someone created a long time ago and that everyone relies on. This may work, as each type of photometer has its own file structure, but Excel is itself proprietary software, i.e. not easily accessible to everyone. Also, the steps are tedious in most cases as only some of the analyses can be performed in it and only limited experimental designs are supported. In contrast, R is an established open source statistics software. So why not also use it as a tool to organize photometer plate data, flexibly add experimental designs as metadata and even perform at least some simple visualizations and statistical analyses, all in one go?

A 96-well photometer plate is often used for broth microdilution assays.

The microdiluteR package is designed to help researchers tidy up data from photometer plates and provides functions to easily add metadata. Regardless of whether the user is processing a single plate or multiple plates with complex metadata structures, the tidy_plates() function provides flexibility and ease of use to optimize the data processing workflow. This vignette guides the user through a general workflow. For more specific use cases, please refer to upcoming vignettes of this package. The microdiluteR package was developed to support the analysis of broth microdilution assays, but may be extended for other types of assays in the future.

Installation

After installing R, install the microdiluteR package either via CRAN or install the development version via GitHub using the following commands:

# via CRAN
install.packages("microdiluteR")
# via GitHub
install.packages("devtools")
library(devtools)
install_github("silvia-eckert/microdiluteR")

The microdiluteR package can be loaded as follows:

library(microdiluteR)

Usage

To show the general workflow, we will use example data shipped with the microdiluteR package. For use cases relying on real-world data, please check out upcoming vignettes. The example data used here is a list of multiple photometer measurements on 96-well plates.

data("bma")
names(bma) # check file names
#> [1] "bma_grp1_exp2_T0" "bma_grp1_exp2_T3" "bma_grp2_exp1_T0" "bma_grp2_exp1_T3"

bma[[1]] # absorption values from first plate
#>       1     2     3     4     5     6     7     8     9    10    11    12
#> A 0.342 0.354 0.360 0.360 0.352 0.363 0.361 0.352 0.356 0.351 0.366 0.375
#> B 0.362 0.391 0.375 0.363 0.383 0.366 0.380 0.378 0.339 0.387 0.377 0.362
#> C 0.344 0.346 0.345 0.347 0.350 0.356 0.348 0.343 0.348 0.351 0.351 0.353
#> D 0.361 0.367 0.351 0.364 0.353 0.362 0.361 0.367 0.363 0.356 0.357 0.355
#> E 0.388 0.473 0.400 0.358 0.388 0.340 0.335 0.396 0.411 0.404 0.397 0.407
#> F 0.456 0.465 0.469 0.469 0.462 0.468 0.455 0.477 0.487 0.488 0.498 0.471
#> G 0.334 0.340 0.357 0.332 0.329 0.342 0.333 0.317 0.360 0.332 0.335 0.328
#> H 0.334 0.332 0.339 0.333 0.339 0.334 0.342 0.335 0.361 0.327 0.330 0.341

The data also contains information on the experimental setup, which can be retrieved using an attribute:

attr(bma, "metadata") # check out the experimental setup
#>   plate_axis treatment concentration
#> 1          A       10%       100 ppm
#> 2          B       10%       200 ppm
#> 3          C       30%       100 ppm
#> 4          D       30%       200 ppm
#> 5          E      100%       100 ppm
#> 6          F      100%       200 ppm
#> 7          G   Control       100 ppm
#> 8          H   Control       200 ppm

We can see that the plate has been loaded in a horizontal direction (rows starting with A-H) denoted in the ‘plate_axis’ column. We can also see that there are four treatment levels (10%, 30%, 100%, and a negative control level), each being tested at two concentrations (100 ppm and 200 ppm). In the next step, we will try to add this metadata to the absorption values measured. You can also get details of the experimental setup using the help page of the data set:

?bma # check out details on the experimental setup

Read plates

Before we dive into the magic of the tidy_plates() function, let’s first have a look at the read_plate() and read_plates() functions. If adding metadata is not desired and the photometer data should only be loaded into R for inspection or for custom analyses, this can be achieved as with the read_plate() and read_plates() functions. Let’s first create a temporary mock file:

data <- "Line with additional information, e.g. wavelength

   1  2  3  4  5  6  7  8  9 10 11 12
A  1  2  3  4  5  6  7  8  9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path <- tempfile()
writeLines(data, file_path, sep = "\n")

Now we’ll read this file using read_plate() as follows:

temp_file <- read_plate(file_path,
                        skip_lines = 2) # skip the first two lines
temp_file
#>    1  2  3  4  5  6  7  8  9 10 11 12
#> A  1  2  3  4  5  6  7  8  9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96

# The skipped lines are stored here
attr(temp_file, "info")
#> [1] "Line with additional information, e.g. wavelength"

# Remove the temporary file
unlink(file_path)

We could also do this with multiple files using the read_plates() function. The content of the files will be concatenated into a single list that is ready for further processing, e.g. with the tidy_plates() function. Let’s again create two temporary files. This time, however, we will use a custom pattern called “Assay_” to start the file names. This will make sure that only these two files are considered from the temporary directory.

# File 1
data_T0 <- "Line with additional information, e.g. wavelength

   1  2  3  4  5  6  7  8  9 10 11 12
A  1  2  3  4  5  6  7  8  9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path_T0 <- tempfile(pattern = "Assay_T0_",
                         fileext = ".txt")
writeLines(data_T0, file_path_T0, sep = "\n")
# File 2
data_T1 <- "Line with additional information, e.g. wavelength

   1  2  3  4  5  6  7  8  9 10 11 12
A  1  2  3  4  5  6  7  8  9 10 11 12
B 13 14 15 16 17 18 19 20 21 22 23 24
C 25 26 27 28 29 30 31 32 33 34 35 36
D 37 38 39 40 41 42 43 44 45 46 47 48
E 49 50 51 52 53 54 55 56 57 58 59 60
F 61 62 63 64 65 66 67 68 69 70 71 72
G 73 74 75 76 77 78 79 80 81 82 83 84
H 85 86 87 88 89 90 91 92 93 94 95 96"
file_path_T1 <- tempfile(pattern = "Assay_T1_",
                         fileext = ".txt")
writeLines(data_T1, file_path_T1, sep = "\n")
file_dir <- dirname(file_path_T1)

Now we’ll apply the read_plates() function similarly to read_plate() as follows:

temp_files <- read_plates(file_dir,
                        pattern = "Assay_T", # our custom pattern
                        skip_lines = 2) # skip the first two lines
temp_files
#> $Assay_T0_17ec5fd80156
#>    1  2  3  4  5  6  7  8  9 10 11 12
#> A  1  2  3  4  5  6  7  8  9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96
#> 
#> $Assay_T1_17ec2a755ec
#>    1  2  3  4  5  6  7  8  9 10 11 12
#> A  1  2  3  4  5  6  7  8  9 10 11 12
#> B 13 14 15 16 17 18 19 20 21 22 23 24
#> C 25 26 27 28 29 30 31 32 33 34 35 36
#> D 37 38 39 40 41 42 43 44 45 46 47 48
#> E 49 50 51 52 53 54 55 56 57 58 59 60
#> F 61 62 63 64 65 66 67 68 69 70 71 72
#> G 73 74 75 76 77 78 79 80 81 82 83 84
#> H 85 86 87 88 89 90 91 92 93 94 95 96
#> 
#> attr(,"info")
#>               File_name                                         Attribute
#> 1 Assay_T0_17ec5fd80156 Line with additional information, e.g. wavelength
#> 2  Assay_T1_17ec2a755ec Line with additional information, e.g. wavelength

# Remove temporary files
unlink(file_path_T0) 
unlink(file_path_T1)

These functions are purely for inspection or for custom use. To add metadata, these function are not necessary and users can directly apply the tidy_plates() function on either single files, a folder pointing to single or multiple files (using custom patterns if necessary) or lists of photometer data already loaded into R (with read_plate(), read_plates() or any other function as long as as a list structure is preserved).

Tidy data from a single plate

Tidying data from a single photometer plate is straightforward. Simply provide your input data to tidy_plates(), set the desired metadata and it will handle the rest. Your input data can be a file name, a folder pointing to this file (given there are no other interfering files, or else use a pattern) or a list element that has already been read into R. Here, we will use the first data set from bma and apply the metadata provided in the attributes:

single_plate_data <- bma[1]; single_plate_data
#> $bma_grp1_exp2_T0
#>       1     2     3     4     5     6     7     8     9    10    11    12
#> A 0.342 0.354 0.360 0.360 0.352 0.363 0.361 0.352 0.356 0.351 0.366 0.375
#> B 0.362 0.391 0.375 0.363 0.383 0.366 0.380 0.378 0.339 0.387 0.377 0.362
#> C 0.344 0.346 0.345 0.347 0.350 0.356 0.348 0.343 0.348 0.351 0.351 0.353
#> D 0.361 0.367 0.351 0.364 0.353 0.362 0.361 0.367 0.363 0.356 0.357 0.355
#> E 0.388 0.473 0.400 0.358 0.388 0.340 0.335 0.396 0.411 0.404 0.397 0.407
#> F 0.456 0.465 0.469 0.469 0.462 0.468 0.455 0.477 0.487 0.488 0.498 0.471
#> G 0.334 0.340 0.357 0.332 0.329 0.342 0.333 0.317 0.360 0.332 0.335 0.328
#> H 0.334 0.332 0.339 0.333 0.339 0.334 0.342 0.335 0.361 0.327 0.330 0.341

tidy_df <- tidy_plates(single_plate_data,
                       how_many = "single", # tidy single plate
                       direction = "horizontal",
                       group_ID = NA,
                       experiment_name = NA,
                       validity_method = "threshold",
                       threshold = 1,
                       treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
                       concentration_levels = rep(c(100, 200), 4)) # numeric

We see that we could have also provided a group identifier or an experiment name. This was not necessary here, but we could update our code to denote e.g. the pathogen used as the test organism (here Botrytis cinerea) in the group identifier or give the plate measurement an experiment name (e.g. “Assay 1”). Let’s do this:

tidy_df <- tidy_plates(single_plate_data,
                       how_many = "single",
                       direction = "horizontal",
                       group_ID = "Botrytis cinerea", # add test organism
                       experiment_name = "Assay 1", # add plate name
                       validity_method = "threshold",
                       threshold = 1,
                       treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
                       concentration_levels = rep(c(100, 200), 4))

The resulting tidy table looks as follows:

knitr::kable(head(tidy_df, 10))

Position	Value	Validity	Treatment	Concentration	Timepoint	File	Group	Experiment
A-1	0.342	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-2	0.354	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-3	0.360	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-4	0.360	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-5	0.352	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-6	0.363	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-7	0.361	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-8	0.352	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-9	0.356	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1
A-10	0.351	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 1

We see that the timepoint was added automatically and we didn’t have to specify it explicitely. That’s because the timepoint was extracted from the file name ‘bma_grp1_exp1_T0”. The file name also contains ’bma’, ‘grp1’ and ‘exp1’ as identifiers. This will become important in the next section when several plates are read.

Prepare file names

The easiest way to prepare your files for use with the tidy_plates() function is to add a pattern at the beginning of the file names. Simply use the beginning of your file name and leave the rest of the file name unchanged to make sure you still know what your file is about. If we have a look at the file names of the bma dataset, we see that they have a specific pattern including ‘grp1’, ‘grp2’, ‘exp1’, ‘exp2’, as well as timepoints (‘T0’ and ‘T1’). These patterns are necessary for tidy_plates() to correctly assign the group and experiment names. For example, we could have repeated the experiment from the above-mentioned plate called ‘Assay 1’ but used other concentrations or even other chemicals or antibiotics, now using ‘Assay 2’ as the experiment name for this plate. Additionally, we could also have repeated these experiments with another test organism, e.g. Penicillium digitatum instead of B. cinerea. Adding plate data from measurements with this second test organism, we need to specify it in the group name.

We would assign the identifier “grp1” to the files from experiments with B. cinerea, the identifier “grp2” to the files from experiments with P. digitatum (and so on if there are experiments with other test organisms). The same applies, if there are multiple experiments with each of the test organisms, where “exp1” stands for the first set-up and “exp2” for the second set-up. As the plates are measured multiple times, a time point identifier is required, which is either T0 (T1, …) or t0 (t1, …). All files should have “BMA” or “bma” (which stands for ‘broth microdilution assay’ but a user-defined pattern such as “Assay_” is also fine) at the beginning of their name to distinguish them from other files located in the same folder.

This may sound a bit tedious, as you have to partially rename your files, but it helps a lot to identify the different groups and experiments. Photometers often only provide cryptic file names, consisting of a combination of letters or a sequence of numbers or just the date of the measurement, so renaming is often necessary for the sake of clarity anyway. The following table illustrates an example of a renaming strategy:

Original names	Renamed files	Group	Experiment
F0001.txt	BMA_grp1_exp1_T0_F0001.txt	B. cinerea	Fungizide A
F0002.txt	BMA_grp1_exp1_T1_F0002.txt	B. cinerea	Fungizide A
F0003.txt	BMA_grp1_exp1_T2_F0003.txt	B. cinerea	Fungizide A
F0004.txt	BMA_grp1_exp2_T0_F0004.txt	B. cinerea	Fungizide B
F0005.txt	BMA_grp1_exp2_T1_F0005.txt	B. cinerea	Fungizide B
F0006.txt	BMA_grp1_exp2_T2_F0006.txt	B. cinerea	Fungizide B
F0007.txt	BMA_grp2_exp1_T0_F0007.txt	P. digitatum	Fungizide A
F0008.txt	BMA_grp2_exp1_T1_F0008.txt	P. digitatum	Fungizide A
F0009.txt	BMA_grp2_exp1_T2_F0009.txt	P. digitatum	Fungizide A
F0010.txt	BMA_grp2_exp3_T0_F0010.txt	P. digitatum	Fungizide C
F0011.txt	BMA_grp2_exp3_T1_F0011.txt	P. digitatum	Fungizide C
F0012.txt	BMA_grp2_exp3_T2_F0012.txt	P. digitatum	Fungizide C

In our bma example data, we have the following file names:

names(bma)
#> [1] "bma_grp1_exp2_T0" "bma_grp1_exp2_T3" "bma_grp2_exp1_T0" "bma_grp2_exp1_T3"

This means, we have two groups, and one experiment per group, but each experiment measured at timepoint T0 and T3.

Tidy data from multiple plates with a common experimental setup

Since all measurements in bma were performed under a common treatment and concentration setting, we will apply this metadata all at once. In this case, we need to specify that we want to clean multiple plates.

Tidy via function parameters

The function tidy_plates() offers to options here to add metadata and it depends on the preference of the user which option feels more convenient. Users are free to add metadata either by parameters or via user prompts. At first, we will use parameters to do so:

multiple_plates_data <- bma
tidy_dfs <- tidy_plates(
  multiple_plates_data,
  how_many = "multiple", # changed from "single"
  direction = "horizontal",
  group_ID = c("Botrytis cinerea", "P. digitatum"), # additional group added
  experiment_name = c("Assay 1", "Assay 2"), # additional experiment added
  validity_method = "threshold",
  threshold = 1,
  treatment_labels = rep(c("10%", "30%", "100%", "Control"), each = 2),
  concentration_levels = rep(c(100, 200), 4)
  )

Let’s have a look at the resulting table:

knitr::kable(head(tidy_dfs, 10))

Position	Value	Validity	Treatment	Concentration	Timepoint	File	Group	Experiment
A-1	0.342	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-2	0.354	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-3	0.360	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-4	0.360	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-5	0.352	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-6	0.363	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-7	0.361	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-8	0.352	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-9	0.356	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2
A-10	0.351	valid	10%	100	T0	bma_grp1_exp2_T0	Botrytis cinerea	Assay 2

Keep in mind that the naming strategy will be performed sequentially. This means that, for example, ‘Assay 1’ will be applied to identifier ‘exp1’. But since only files of the second group, P. digitatum, contain ‘exp1’ as identifier, all experiments with P. digitatum will be assigned to ‘Assay 1’ and all experiments with B. cinerea will be assigned to ‘Assay 2’.

Tidy via user prompts

Another way to add metadata is to prompt the user. In this way, each part, from the group name to the concentration level, is queried in turn.

tidy_dfs_prompts <- tidy_plates(
  multiple_plates_data,
  how_many = "multiple", # changed from "single"
  user_prompt = T, # set user prompt option to TRUE
  direction = "horizontal",
  )

After processing with user prompts, you can check if both options give the same result once the same settings are used as input:

identical(tidy_dfs, tidy_dfs_prompts) # should be TRUE

Tidy data from multiple plates with different experimental setups

What if it gets more complicated and each group or experiment (or both) has its own individual experimental set-up? But you still want them in a common table for further processing? Don’t worry, tidy_plates() can handle that too. In that case, the user is prompted for the experimental setup as before but for every plate separately.

tidy_dfs_individually <- tidy_plates(
  multiple_plates_data,
  how_many = "multiple",
  user_prompt = T,
  multiple_structures = T # set to TRUE to handle plates individually
  direction = "horizontal",
  )

Further data processing options

This vignette shows how to add metadata to photometer files in plain text format. However, this is certainly not where the analysis pipeline ends. The microdiluteR package has more to offer, but this will be covered in further vignettes in this R package:

Validation of samples using either thresholds or manually specifying invalid samples with the validate_cells() and update_validity() functions, which include visual inspection.
Usage of data cleaning with the subtract_T0() function.
Calculate and summarize growth performance across multiple timepoints via the calculate_growth_performance() and summarize_growth_performance() functions.
Applying the sign test to assess whether growth performance significantly differs from the baseline (usually the control) using the apply_sign_test() function.
Visualize your results, with or without sign test results in asterisk notation, using the plot_growth_performance() function.

Citation

Creating this package was a lot of work, and I made it available for free. If you use this package for your publication, be fair and cite it, e.g. using the following biblatex entry:

@software(eckert_micodiluteR_2024,
  author  = {Eckert, Silvia},
  title   = {microdiluteR},
  version = {1.0.1.},
  date    = {2024-05-13}
  year    = {2024},
  note    = {R package version 1.0.1, available from CRAN},
  url     = {https://cran.r-project.org/package=microdiluteR}
  doi     = {10.5281/zenodo.11186926}
)

Or you can cite as follows using the APA citation style:

Eckert, S. (2024). microdiluteR (Version 1.0.1) [Software]. DOI: 10.5281/zenodo.11186926. Retrieved from: https://CRAN.R-project.org/package=microdiluteR

Or use citation("microdiluteR") to retrieve citation information after installing the R package.

Acknowledgements

The microdiluteR¹ package would not be possible without the great usethis and testthat packages, and answers from the amazing stackoverflow community that has helped me to constantly improve my code (and save a lot of nerves, especially with tidy evaluation).

Silvia Eckert

2024-05-25