1 Load packages

library(Rega)

2 Setting Up Secure Credentials

The Rega package follows security best practices by storing sensitive information (like API keys or passwords) in credential store or as environment variables rather than hard-coding them into your scripts.

To keep your credentials secure, we offer two options (see below for details):

  • Using operating system credential store
  • Using environmental variables with a secret key to encrypt/decrypt data

2.1 Using operating system credential store

You can add an entry to your operating system credential store using keyring package. By default, Rega will look for a REGA_EGA service name. You should also specify your username, to avoid typing it every time you connect to the API. Avoid using more than a single user for this service, for simplicity Rega will only retrieve the first username.

# You will be prompted for password
keyring::key_set(
    service = "REGA_EGA",
    username = "<your-ega-username>"
)

2.2 Using environmental variables with httr2 secret

2.2.1 Create and Store a Master Secret Key

# Run this in your R console to generate a key
httr2::secret_make_key()
[1] "FPAc6dQWJ3FTXrblIcFW7Q"

To make this key available every time you open R, you must store it in your user-level .Renviron file.

  • Run usethis::edit_r_environ() to open the file.
  • Add the following line (replace the string with the key you just generated): REGA_KEY="<your-generated-key>"
  • Save and close the file.

Important: Restart R after saving to ensure the variable is loaded into your environment.

2.2.2 Encrypt your EGA password

Now, use your master key (REGA_KEY) to encrypt your actual EGA password. This ensures that even if someone sees your .Renviron file, they cannot read your password.

Run httr2::secret_encrypt("<your-ega-password>", "REGA_KEY") and copy the encrypted password.

2.2.3 Store the encrypted password

Finally, store the encrypted string (not your plain-text password) in your .Renviron file.

  • Open your .Renviron again: usethis::edit_r_environ().
  • Add the encrypted string as a new variable: REGA_EGA_PASSWORD="<your-encrypted-string>"
  • Save and close.

2.2.4 Store your username

  • Run usethis::edit_r_environ() to open the file.
  • Add the following line: REGA_EGA_USERNAME="<your-ega-username>"
  • Save and close the file.

2.2.5 Restart R

3 Fill in the submission template

Download the empty MS Excel template from inst/extdata/ega_full_template_v3.xlsx and fill it in according to the instructions in the ‘Instructions’ tab.

4 Data submission

4.1 Metadata parsing

The default parser is pre-configured to handle the bundled xlsx template (inst/extdata/ega_full_template_v3.xlsx) automatically. As long as the templateis filled out according to the provided instructions, the default parameters will work seamlessly, and no manual adjustments are required.

If you need to customize the parser’s behavior—such as toggling the c4gh file extension, you can modify the settings via the YAML configuration. To do this, create a local copy of inst/extdata/default_parser_params.yaml, adjust the values as needed, and pass the path of your new file to the param_file argument in the default_parser function.

metadata_file <- system.file(
    "extdata/submission_example.xlsx",
    package = "Rega"
)

parsed_metadata <- default_parser(metadata_file)
head(parsed_metadata)
$aliases
$aliases$studies
[1] "Study1"

$aliases$experiments
[1] "Experiment1"

$aliases$datasets
[1] "Dataset1"

$aliases$samples
[1] "Sample1"

$aliases$runs
[1] "Run1"

$aliases$analyses
[1] "Analysis1"


$files
# A tibble: 1 × 2
  file             ega_file              
  <chr>            <chr>                 
1 example.fastq.gz /example.fastq.gz.c4gh

$submission
# A tibble: 1 × 1
  title               
  <chr>               
1 Your submission name

$studies
# A tibble: 1 × 4
  study  title         description               study_type            
  <chr>  <chr>         <chr>                     <chr>                 
1 Study1 Example Study Example Study Description Transcriptome Analysis

$samples
# A tibble: 1 × 4
  alias   phenotype biological_sex subject_id
  <chr>   <chr>     <chr>          <chr>     
1 Sample1 Control   male           S1        

$experiments
# A tibble: 1 × 8
  study  experiment  design_description library_selection instrument_model_id
  <chr>  <chr>       <chr>              <chr>                           <int>
1 Study1 Experiment1 Expermient Design  RANDOM                              1
# ℹ 3 more variables: library_layout <chr>, library_strategy <chr>,
#   library_source <chr>

4.2 Metadata validation

To ensure a seamless submission process, the package includes a client-side validation layer. This system automatically cross-references your metadata against the schema requirements of both the EGA API and the underlying target database. To ensure your submission continues smoothly, you should address all flagged validation failures and errors.

validation_summary <- default_validator(parsed_metadata)
validation_summary
                            name items passes fails nNA error warning
1                    study_is_na     1      1     0   0 FALSE   FALSE
2                study_is_unique     1      1     0   0 FALSE   FALSE
3               study_in_aliases     1      1     0   0 FALSE   FALSE
4              study_all_aliases     1      1     0   0 FALSE   FALSE
5               experiment_is_na     1      1     0   0 FALSE   FALSE
6           experiment_is_unique     1      1     0   0 FALSE   FALSE
7          experiment_in_aliases     1      1     0   0 FALSE   FALSE
8         experiment_all_aliases     1      1     0   0 FALSE   FALSE
9                    alias_is_na     1      1     0   0 FALSE   FALSE
10               alias_is_unique     1      1     0   0 FALSE   FALSE
11              alias_in_aliases     1      1     0   0 FALSE   FALSE
12             alias_all_aliases     1      1     0   0 FALSE   FALSE
13                     run_is_na     1      1     0   0 FALSE   FALSE
14                 run_is_unique     1      1     0   0 FALSE   FALSE
15                run_in_aliases     1      1     0   0 FALSE   FALSE
16               run_all_aliases     1      1     0   0 FALSE   FALSE
17                 dataset_is_na     1      1     0   0 FALSE   FALSE
18             dataset_is_unique     1      1     0   0 FALSE   FALSE
19            dataset_in_aliases     1      1     0   0 FALSE   FALSE
20           dataset_all_aliases     1      1     0   0 FALSE   FALSE
21        submission_title_is_na     1      1     0   0 FALSE   FALSE
22          run_experiment_is_na     1      1     0   0 FALSE   FALSE
23              run_sample_is_na     1      1     0   0 FALSE   FALSE
24           run_file_type_is_na     1      1     0   0 FALSE   FALSE
25                run_file_is_na     1      1     0   0 FALSE   FALSE
26            run_file_is_unique     1      1     0   0 FALSE   FALSE
27     run_experiment_in_aliases     1      1     0   0 FALSE   FALSE
28         run_sample_in_aliases     1      1     0   0 FALSE   FALSE
29       studies_title_is_unique     1      1     0   0 FALSE   FALSE
30 studies_description_is_unique     1      1     0   0 FALSE   FALSE
31          studies_title_length     1      0     1   0 FALSE   FALSE
32    studies_description_length     1      0     1   0 FALSE   FALSE
33       dataset_title_is_unique     1      1     0   0 FALSE   FALSE
34 dataset_description_is_unique     1      1     0   0 FALSE   FALSE
35        dataset_run_in_aliases     1      1     0   0 FALSE   FALSE
36    dataset_all_aliases_in_run     1      1     0   0 FALSE   FALSE
37          dataset_title_length     1      0     1   0 FALSE   FALSE
38    dataset_description_length     1      0     1   0 FALSE   FALSE
                                                                          expression
1                                                                      !is.na(study)
2                                                                   is_unique(study)
3                                                   study %vin% aliases[["studies"]]
4                                                   aliases[["studies"]] %vin% study
5                                                                 !is.na(experiment)
6                                                              is_unique(experiment)
7                                          experiment %vin% aliases[["experiments"]]
8                                          aliases[["experiments"]] %vin% experiment
9                                                                      !is.na(alias)
10                                                                  is_unique(alias)
11                                                  alias %vin% aliases[["samples"]]
12                                                  aliases[["samples"]] %vin% alias
13                                                                       !is.na(run)
14                                                                    is_unique(run)
15                                                       run %vin% aliases[["runs"]]
16                                                       aliases[["runs"]] %vin% run
17                                                                   !is.na(dataset)
18                                                                is_unique(dataset)
19                                               dataset %vin% aliases[["datasets"]]
20                                               aliases[["datasets"]] %vin% dataset
21                                                                     !is.na(title)
22                                                                !is.na(experiment)
23                                                                     !is.na(alias)
24                                                             !is.na(run_file_type)
25                                                                     !is.na(files)
26                                                          is_unique(unlist(files))
27                                         experiment %vin% aliases[["experiments"]]
28                                                  alias %vin% aliases[["samples"]]
29                                                                  is_unique(title)
30                                                            is_unique(description)
31                        get_word_number(title) >= 3 & get_word_number(title) <= 20
32     get_sentence_number(description) >= 3 & get_sentence_number(description) <= 5
33                                                                  is_unique(title)
34                                                            is_unique(description)
35                                              unlist(runs) %vin% aliases[["runs"]]
36                                              aliases[["runs"]] %vin% unlist(runs)
37                    (get_word_number(title) >= 3) & (get_word_number(title) <= 20)
38 (get_sentence_number(description) >= 3) & (get_sentence_number(description) <= 5)

4.3 Running new_submission workflow

responses <- new_submission(parsed_metadata, logfile = "log.yaml")

5 Manual client creation

If you encounter errors during metadata submission and would like to get more details, you can create a client with verbose logging.

Extract EGA API using the bundled YAML specification and create a client using the embedded httr2 OAuth authentication (default), changing the verbosity.

api <- extract_api()
ega <- create_client(api, verbosity = 3)

Run the new_submission workflow with the custom client.

responses <- new_submission(parsed_metadata, client = ega)

This will create your metadata submission in EGA and fill in all provided information. However, this workflow does not finalize your submission. In order to finalize submission either use the GUI interface of EGA Submitter Portal, or run finalise_submission("<returned_submission_id>", "<release_date>"). Note that the release date should ideally be around 2 weeks away from metadata submission to allow for review by EGA team.

6 Other workflows

There are several other workflow available:

  • get_submission:
  • get_entry_by_title:
  • delete_submission_contents:
  • delete_submission:
  • rollback_submission:

Please see the corresponding help pages for more details.

6.1 Examples

You can get the detailed data on individual tables (submissions, studies, samples, experiments, runs, analyses and datasets) that contain a specific string in their title column using get_entry_by_title function.

# checks all tables
resp <- get_entry_by_title("RNASeq")
# checks only samples and studies, logs responses
resp <- get_entry_by_title(
    "RNASeq", type = c("samples", "studies"), logfile = "log.yaml"
)

Or delete the entire contents of current submission metadata via delete_submission_contents workflow or delete the entire submission by using the delete_submission workflow.

resp <- delete_submission_contents(00001, ega)
resp <- delete_submission(00001, ega)

7 Utilities

If you wish to create your own templates for EGA submissions, we provide a few functions to retrieve properties and enums through API and save them in text files. We will use the API and the client created above.

Relevant functions include:

  • get_schemas()
  • get_properties()

8 Notes

8.1 Bearer token authentication

For testing, debugging and prototyping purposes, it is possible to directly use generated bearer token with API when creating the client. It is then the responsibility of the user to track the validity and refresh the token as necessary.

bt <- ega_token()
ega <- create_client(api, bt$access_token)

ega$get__enums()

9 Issues

Workflow for updating the submission metadata by PUT method is not available. For this particular use case, the users are advised to create the client with ega <- create_client(extract_api()) and use individual functions prefixed with put__ e.g. ega$put__samples__accession_id to update the submission metadata.

10 Session Info

sessionInfo()
R version 4.6.0 alpha (2026-04-05 r89794)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rega_0.99.6      knitr_1.51       BiocStyle_2.39.0

loaded via a namespace (and not attached):
 [1] jsonlite_2.0.0      dplyr_1.2.1         compiler_4.6.0     
 [4] BiocManager_1.30.27 tidyselect_1.2.1    validate_1.1.7     
 [7] stringr_1.6.0       tidyr_1.3.2         jquerylib_0.1.4    
[10] yaml_2.3.12         fastmap_1.2.0       readxl_1.4.5       
[13] jsonvalidate_1.5.0  R6_2.6.1            generics_0.1.4     
[16] httr2_1.2.2         tibble_3.3.1        bookdown_0.46      
[19] openssl_2.3.5       bslib_0.10.0        pillar_1.11.1      
[22] rlang_1.2.0         utf8_1.2.6          cachem_1.1.0       
[25] stringi_1.8.7       xfun_0.57           sass_0.4.10        
[28] otel_0.2.0          cli_3.6.6           magrittr_2.0.5     
[31] grid_4.6.0          digest_0.6.39       settings_0.2.7     
[34] keyring_1.4.1       rappdirs_0.3.4      askpass_1.2.1      
[37] lifecycle_1.0.5     vctrs_0.7.3         evaluate_1.0.5     
[40] glue_1.8.0          cellranger_1.1.0    rmarkdown_2.31     
[43] purrr_1.2.2         tools_4.6.0         pkgconfig_2.0.3    
[46] htmltools_0.5.9