Introduction
thoth
is an R package designed to streamline the setup
and management of reproducible analytics projects. It provides an
integrated framework that combines modern data science tools and best
practices, making it easier to create and maintain professional
analytical workflows.
The package offers seamless integration with essential tools like Data Version Control (DVC), Docker containerization, and Quarto reporting, while maintaining a tidyverse-friendly syntax that R users are familiar with.
Installation
You can install the development version of thoth
from
GitHub:
# install.packages("devtools")
devtools::install_github("sebrauschert/thoth")
The package requires several system dependencies to unlock its full potential:
- R (>= 4.1.0)
- DVC for data version control
- Docker for containerization
- Quarto for report generation
- git for version control
You can check if your system meets all requirements using:
Project Setup
Creating a new analytics project with thoth
is
straightforward:
create_analytics_project(
"my_project",
use_dvc = TRUE,
use_docker = TRUE,
use_quarto = TRUE
)
This command sets up a standardized project structure with integrated version control, data tracking, and reporting capabilities. The created project includes:
- A standardized directory structure for data, code, and
documentation
- Git initialization with appropriate
.gitignore
settings
- DVC configuration for data version control
- Docker setup for reproducible environments
- Quarto integration for professional reporting
- A comprehensive README file
Core Features
Data Version Control
thoth
provides tidyverse-style functions for data
version control:
# Track and version your data
processed_data |>
write_csv_dvc(
"data/processed/results.csv",
"Added processed analysis results"
)
# Track R objects
model |>
write_rds_dvc(
"models/final_model.rds",
"Trained final model"
)
Decision Tracking
Document and track analytical decisions throughout your project:
# Initialize decision tracking
initialize_decision_tree()
# Record important decisions
record_decision(
"Data Preprocessing",
"Removed outliers beyond 3 standard deviations",
"Statistical validity",
consequences = "Reduced dataset by 2%"
)
# Generate methods section for reporting
generate_methods_section()
Quarto Integration
Create and apply custom report templates:
# Create a custom template
create_quarto_template(
"company_template",
primary_color = "#0054AD",
secondary_color = "#00B4E0"
)
# Apply template to a report
apply_template_to_report(
"reports/analysis.qmd",
"company_template"
)
Next Steps
For more detailed information about specific features, check out these articles:
- Data Version Control:
vignette("dvc-tracking")
- Git Integration:
vignette("git-integration")
- Docker Setup:
vignette("docker-setup")
- Custom Templates:
vignette("custom-templates")
- Decision Tracking:
vignette("decision-tracking")
For practical examples:
- End-to-End Example: vignette("end-to-end-example")
- RNA-seq Analysis: vignette("bioinformatics-example")
Getting Help
If you encounter issues or have questions:
- Check the package
documentation
- File issues on GitHub
- Read the function documentation with
?function_name