Skip to contents

Provides a framework for setting up reproducible analytics projects with integrated version control for data using DVC (Data Version Control), containerization using Docker, dependency management using renv, and customizable reporting using Quarto. Implements best practices for project organization, workflow management, and reproducible research.

System Requirements

This package requires several external tools to be installed:

  • DVC (Data Version Control) >= 2.0.0

    • Required for data version control features

    • Installation: Visit https://dvc.org/doc/install

    • Note: The package will work without DVC installed, but will create mock .dvc files instead of actual version control

  • Python >= 3.7

    • Required for DVC

    • Installation: Visit https://www.python.org/downloads/

  • Docker >= 20.10.0

    • Required for containerization features

    • Installation: Visit https://docs.docker.com/get-docker/

    • Note: Docker features are optional

Package Features

  • Project Setup

    • Create standardized project structures

    • Initialize version control

    • Set up dependency management

  • Data Version Control

    • Track large data files

    • Create reproducible pipelines

    • Track metrics and plots

  • Containerization

    • Create reproducible environments

    • Package analyses for distribution

  • Reporting

    • Customizable report templates

    • Decision tracking

    • Methods section generation

Getting Started

To get started with thoth:

  1. Install system requirements (DVC, Python, Docker)

  2. Create a new project:

  3. Read the vignettes:

Author

Maintainer: Sebastian Rauschert sebastian.rauschert@telethonkids.org.au