Skip to contents

Introduction

thoth is an R package designed to streamline the setup and management of reproducible analytics projects. It provides an integrated framework that combines modern data science tools and best practices, making it easier to create and maintain professional analytical workflows.

The package offers seamless integration with essential tools like Data Version Control (DVC), Docker containerization, and Quarto reporting, while maintaining a tidyverse-friendly syntax that R users are familiar with.

Installation

You can install the development version of thoth from GitHub:

# install.packages("devtools")
devtools::install_github("sebrauschert/thoth")

The package requires several system dependencies to unlock its full potential:

  • R (>= 4.1.0)
  • DVC for data version control
  • Docker for containerization
  • Quarto for report generation
  • git for version control

You can check if your system meets all requirements using:

Project Setup

Creating a new analytics project with thoth is straightforward:

create_analytics_project(
  "my_project",
  use_dvc = TRUE,
  use_docker = TRUE,
  use_quarto = TRUE
)

This command sets up a standardized project structure with integrated version control, data tracking, and reporting capabilities. The created project includes:

  • A standardized directory structure for data, code, and documentation
  • Git initialization with appropriate .gitignore settings
  • DVC configuration for data version control
  • Docker setup for reproducible environments
  • Quarto integration for professional reporting
  • A comprehensive README file

Core Features

Data Version Control

thoth provides tidyverse-style functions for data version control:

# Track and version your data
processed_data |>
  write_csv_dvc(
    "data/processed/results.csv",
    "Added processed analysis results"
  )

# Track R objects
model |>
  write_rds_dvc(
    "models/final_model.rds",
    "Trained final model"
  )

Decision Tracking

Document and track analytical decisions throughout your project:

# Initialize decision tracking
initialize_decision_tree()

# Record important decisions
record_decision(
  "Data Preprocessing",
  "Removed outliers beyond 3 standard deviations",
  "Statistical validity",
  consequences = "Reduced dataset by 2%"
)

# Generate methods section for reporting
generate_methods_section()

Quarto Integration

Create and apply custom report templates:

# Create a custom template
create_quarto_template(
  "company_template",
  primary_color = "#0054AD",
  secondary_color = "#00B4E0"
)

# Apply template to a report
apply_template_to_report(
  "reports/analysis.qmd",
  "company_template"
)

Next Steps

For more detailed information about specific features, check out these articles:

For practical examples:
- End-to-End Example: vignette("end-to-end-example")
- RNA-seq Analysis: vignette("bioinformatics-example")

Getting Help

If you encounter issues or have questions:

  1. Check the package documentation
  2. File issues on GitHub
  3. Read the function documentation with ?function_name