duperemove
Duperemove is a command line tool used to find and remove duplicate files on a filesystem. It works by comparing the contents of files and identifying duplicates based on their hash values. The tool uses a combination of file size, metadata, and hash algorithms to determine duplication and uniqueness. Duperemove is designed to be efficient and fast, capable of handling huge file systems with millions of files. It supports a variety of hash algorithms, including MD5, SHA1, and SHA256, allowing users to choose the level of security and speed they require. By identifying and removing duplicate files, duperemove helps reclaim disk space and improve overall system performance. The tool provides options to override or skip certain files or directories, ensuring flexibility in usage. Duperemove works on a block-by-block basis, which means it does not require the file to be completely read and loaded into memory. Additionally, it supports both hard links and symbolic links, allowing users to choose how duplicates should be handled. Duperemove is an open-source tool available for Linux and other Unix-like systems, providing a free and community-supported solution for duplicate file removal.
List of commands for duperemove:
-
duperemove:tldr:179b8 duperemove: Use a hash file to store extent hashes (less memory usage and can be reused on subsequent runs).$ duperemove -r -d --hashfile=${path-to-hashfile} ${path-to-directory}try on your machineexplain this command
-
duperemove:tldr:839cb duperemove: Deduplicate duplicate extents on a Btrfs or XFS (experimental) filesystem.$ duperemove -r -d ${path-to-directory}try on your machineexplain this command
-
duperemove:tldr:b1b5e duperemove: Search for duplicate extents in a directory and show them.$ duperemove -r ${path-to-directory}try on your machineexplain this command
-
duperemove:tldr:fa684 duperemove: Limit I/O threads (for hashing and dedupe stage) and CPU threads (for duplicate extent finding stage).$ duperemove -r -d --hashfile=${path-to-hashfile} --io-threads=${N} --cpu-threads=${N} ${path-to-directory}try on your machineexplain this command