Introduction
thoth
provides a structured way to document and track
human decisions made during data analysis. While code and data can be
version controlled, the reasoning behind analytical choices often
remains undocumented. The decision tracking functionality helps make
these choices explicit and traceable.
Core Functions
Initialize Tracking
Start tracking decisions in your analysis:
library(thoth)
# Create a new decision tree
tree <- initialize_decision_tree(
analysis_id = "RNA_seq_2024",
analyst = "Jane Smith",
description = "Differential expression analysis"
)
Record Decisions
Document key analytical choices as you make them:
# Quality control decision
record_decision(
file_path = tree,
check = "Sample QC",
observation = "Sample T3 clusters with controls in PCA",
decision = "Remove sample T3",
reasoning = "Likely sample swap or contamination",
evidence = "plots/pca_plot.pdf"
)
# Analysis parameter decision
record_decision(
file_path = tree,
check = "Differential Expression",
observation = "High biological variability in controls",
decision = "Use DESeq2 with LRT test",
reasoning = "Better handles overdispersion",
evidence = "tables/variance_analysis.csv"
)
Generate Documentation
Automatically create documentation from your decisions:
# Generate methods section
methods_text <- generate_methods_section(tree)
# Export decision tree
export_decision_tree(tree, format = "md") # Markdown
export_decision_tree(tree, format = "html") # HTML report
Best Practices
1. Recording Decisions
- Document decisions as they happen
- Include clear observations and reasoning
- Link to supporting evidence
- Use consistent terminology
2. Organization
- One decision tree per analysis
- Keep trees under version control
- Review decisions with team members
- Update as analysis evolves
3. Documentation
# Example workflow
tree <- initialize_decision_tree(
analysis_id = "project_2024",
description = "Analysis workflow"
)
# Record initial choices
record_decision(
file_path = tree,
check = "Data Processing",
observation = "Raw data contains outliers",
decision = "Apply robust normalization",
reasoning = "Reduce impact of extreme values",
evidence = "scripts/normalize.R"
)
# Generate documentation
methods <- generate_methods_section(tree)
Common Patterns
Quality Control Decisions
# Document QC steps
record_decision(
file_path = tree,
check = "Data Quality",
observation = "10% missing values in variable X",
decision = "Impute using KNN",
reasoning = "Data appears MCAR",
evidence = "reports/missing_analysis.html"
)
Parameter Selection
# Document parameter choices
record_decision(
file_path = tree,
check = "Model Parameters",
observation = "Cross-validation shows overfitting",
decision = "Increase regularization to 0.1",
reasoning = "Reduces test error by 15%",
evidence = "plots/cv_results.pdf"
)
Next Steps
- Try the end-to-end example:
vignette("end-to-end-example")
- Learn about Git integration:
vignette("git-integration")
- Check out DVC tracking:
vignette("dvc-tracking")