
thoth: Reproducible Analytics Framework with Data Version Control
Source:R/thoth-package.R
thoth-package.Rd
Provides a framework for setting up reproducible analytics projects with integrated version control for data using DVC (Data Version Control), containerization using Docker, dependency management using renv, and customizable reporting using Quarto. Implements best practices for project organization, workflow management, and reproducible research.
System Requirements
This package requires several external tools to be installed:
DVC (Data Version Control) >= 2.0.0
Required for data version control features
Installation: Visit https://dvc.org/doc/install
Note: The package will work without DVC installed, but will create mock .dvc files instead of actual version control
Python >= 3.7
Required for DVC
Installation: Visit https://www.python.org/downloads/
Docker >= 20.10.0
Required for containerization features
Installation: Visit https://docs.docker.com/get-docker/
Note: Docker features are optional
Package Features
Project Setup
Create standardized project structures
Initialize version control
Set up dependency management
Data Version Control
Track large data files
Create reproducible pipelines
Track metrics and plots
Containerization
Create reproducible environments
Package analyses for distribution
Reporting
Customizable report templates
Decision tracking
Methods section generation
Getting Started
To get started with thoth:
Install system requirements (DVC, Python, Docker)
Create a new project:
library(thoth) create_analytics_project("my_analysis")
Read the vignettes:
browseVignettes("thoth")
Author
Maintainer: Sebastian Rauschert sebastian.rauschert@telethonkids.org.au