dvc
DVC (Data Version Control) is an open-source command-line tool that helps data scientists and machine learning engineers manage and version control their data and machine learning models. It is designed to work alongside Git and is often referred to as "Git for data."
Here are some key features and functionalities of DVC:
-
Data and model versioning: DVC allows you to track, version, and share your data and ML models. It integrates seamlessly with Git, ensuring that data and model files are efficiently managed.
-
Reproducible pipelines: DVC allows you to define and run complex data pipelines, making it easy to reproduce experiments and workflows. You can track dependencies between steps, manage execution order, and cache intermediate outputs.
-
Large file handling: DVC is built to handle large data files efficiently. It provides a mechanism for data file storage and handling, known as DVC remotes, which supports various storage options like local file systems, cloud storage services, and networked storage.
-
Collaboration: DVC enables seamless collaboration between team members by providing a shared data repository and facilitating sharing and syncing of data and ML models. It helps manage and merge changes made by different team members across a project.
-
Data versioning in the cloud: DVC integrates with popular cloud services like AWS S3, Google Cloud Storage, and Microsoft Azure Blob Storage, allowing you to store data and models in the cloud and version them.
-
Experiment tracking: DVC provides tools to track experiments, including logging and comparing metrics, hyperparameters, and output visualizations, giving you insights into the performance of different models and experiments.
Overall, DVC simplifies the management, versioning, and collaboration of data and ML models, ensuring reproducibility and scalability in data science projects.
List of commands for dvc:
-
dvc-add:tldr:59074 dvc-add: Add a target file with a custom `.dvc` filename.$ dvc add --file ${custom_name-dvc} ${filename}try on your machineexplain this command
-
dvc-add:tldr:64cf1 dvc-add: Recursively add all the files in a given target directory.$ dvc add --recursive ${path-to-directory}try on your machineexplain this command
-
dvc-add:tldr:6a8b2 dvc-add: Add a single target file to the index.$ dvc add ${filename}try on your machineexplain this command
-
dvc-checkout:tldr:1e0e3 dvc-checkout: Checkout the latest version of a specified target.$ dvc checkout ${target}try on your machineexplain this command
-
dvc-checkout:tldr:1e441 dvc-checkout: Checkout the latest version of all target files and directories.$ dvc checkouttry on your machineexplain this command
-
dvc-commit:tldr:75ba8 dvc-commit: Recursively commit all DVC-tracked files in a directory.$ dvc commit --recursive ${path-to-directory}try on your machineexplain this command
-
dvc-commit:tldr:9643f dvc-commit: Commit changes to all DVC-tracked files and directories.$ dvc committry on your machineexplain this command
-
dvc-commit:tldr:ccef7 dvc-commit: Commit changes to a specified DVC-tracked target.$ dvc commit ${target}try on your machineexplain this command
-
dvc-config:tldr:00b41 dvc-config: Unset a project level config value for a given key.$ dvc config --unset ${key}try on your machineexplain this command
-
dvc-config:tldr:66bd8 dvc-config: Get the config value for a specified key for the current project.$ dvc config ${key}try on your machineexplain this command
-
dvc-config:tldr:71c56 dvc-config: Unset the project's default remote.$ dvc config --unset core.remotetry on your machineexplain this command
-
dvc-config:tldr:af024 dvc-config: Set the project's default remote.$ dvc config core.remote ${remote_name}try on your machineexplain this command
-
dvc-config:tldr:c2adf dvc-config: Set a local, global, or system level config value.$ dvc config --local/global/system ${key} ${value}try on your machineexplain this command
-
dvc-config:tldr:c9ddb dvc-config: Set the config value for a key on a project level.$ dvc config ${key} ${value}try on your machineexplain this command
-
dvc-config:tldr:edef5 dvc-config: Get the name of the default remote.$ dvc config core.remotetry on your machineexplain this command
-
dvc-dag:tldr:43dcb dvc-dag: Visualize the pipeline stages up to a specified target stage.$ dvc dag ${target}try on your machineexplain this command
-
dvc-dag:tldr:7168b dvc-dag: Export the pipeline in the dot format.$ dvc dag --dot > ${path-to-pipeline-dot}try on your machineexplain this command
-
dvc-dag:tldr:76c37 dvc-dag: Visualize the entire pipeline.$ dvc dagtry on your machineexplain this command
-
dvc-destroy:tldr:acb14 dvc-destroy: Destroy the current project.$ dvc destroytry on your machineexplain this command
-
dvc-destroy:tldr:ea0c0 dvc-destroy: Force destroy the current project.$ dvc destroy --forcetry on your machineexplain this command
-
dvc-diff:tldr:03f5d dvc-diff: Compare DVC tracked files from different Git commits, tags, and branches w.r.t the current workspace.$ dvc diff ${commit_hash-tag-branch}try on your machineexplain this command
-
dvc-diff:tldr:339cb dvc-diff: Compare DVC tracked files, displaying the output as Markdown.$ dvc diff --show-md --show-hash ${commit}try on your machineexplain this command
-
dvc-diff:tldr:4b2fb dvc-diff: Compare DVC tracked files, displaying the output as JSON.$ dvc diff --show-json --show-hash ${commit}try on your machineexplain this command
-
dvc-diff:tldr:676b0 dvc-diff: Compare the changes in DVC tracked files from 1 Git commit to another.$ dvc diff ${revision_b} ${revision_a}try on your machineexplain this command
-
dvc-diff:tldr:efe6d dvc-diff: Compare DVC tracked files, along with their latest hash.$ dvc diff --show-hash ${commit}try on your machineexplain this command
-
dvc-fetch:tldr:3eefb dvc-fetch: Fetch changes for all commits.$ dvc fetch --all-commitstry on your machineexplain this command
-
dvc-fetch:tldr:4df8c dvc-fetch: Fetch changes for all branch and tags.$ dvc fetch --all-branches --all-tagstry on your machineexplain this command
-
dvc-fetch:tldr:58fde dvc-fetch: Fetch the latest changes from the default remote upstream repository (if set).$ dvc fetchtry on your machineexplain this command
-
dvc-fetch:tldr:e4dd0 dvc-fetch: Fetch the latest changes for a specific target/s.$ dvc fetch ${target-s}try on your machineexplain this command
-
dvc-fetch:tldr:ffd76 dvc-fetch: Fetch changes from a specific remote upstream repository.$ dvc fetch --remote ${remote_name}try on your machineexplain this command
-
dvc-freeze:tldr:eb8bd dvc-freeze: Freeze 1 or more specified stages.$ dvc freeze ${stage_name_a} [${stage_name_b} ...]try on your machineexplain this command
-
dvc-gc:tldr:32f15 dvc-gc: Garbage collect from the cache, keeping only versions referenced by branch, tags, and commits.$ dvc gc --all-branches --all-tags --all-commitstry on your machineexplain this command
-
dvc-gc:tldr:53784 dvc-gc: Garbage collect from the cache, including a specific cloud remote storage.$ dvc gc --all-commits --cloud --remote ${remote_name}try on your machineexplain this command
-
dvc-gc:tldr:72386 dvc-gc: Garbage collect from the cache, keeping only versions referenced by the current workspace.$ dvc gc --workspacetry on your machineexplain this command
-
dvc-gc:tldr:d91a5 dvc-gc: Garbage collect from the cache, including the default cloud remote storage (if set).$ dvc gc --all-commits --cloudtry on your machineexplain this command
-
dvc-init:tldr:38db1 dvc-init: Initialize a new local repository.$ dvc inittry on your machineexplain this command
-
dvc-init:tldr:52538 dvc-init: Initialize DVC without Git.$ dvc init --no-scmtry on your machineexplain this command
-
dvc-unfreeze:tldr:694f2 dvc-unfreeze: Unfreeze 1 or more specified stages.$ dvc unfreeze ${stage_name_a} [${stage_name_b} ...]try on your machineexplain this command
-
dvc:tldr:40733 dvc: Execute a DVC subcommand.$ dvc ${subcommand}try on your machineexplain this command
-
dvc:tldr:860f9 dvc: Display help about a specific subcommand.$ dvc ${subcommand} --helptry on your machineexplain this command