While compressing large amount of file using
gzip I realised that it is quite slow, specially if you use
--best flag for compressing maximum. While searching on web, I got hold of this tool named pigz which is quite fast as it does the compression in a parralel manner accross multiple cores. The website explains it as:
pigz, which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries.
Let me try to compress files using
gzip see how fast can
Vendor ID: GenuineIntel CPU family: 6 Model: 142 Model name: Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz Stepping: 9 CPU MHz: 1102.124 CPU max MHz: 2300.0000 CPU min MHz: 400.0000 BogoMIPS: 4599.93 Virtualization: VT-x L1d cache: 64 KiB L1i cache: 64 KiB L2 cache: 512 KiB L3 cache: 3 MiB
Memory status while running the tools:
free -h total used free shared buff/cache available Mem: 11Gi 2.9Gi 2.6Gi 359Mi 5.9Gi 7.9Gi Swap: 2.0Gi 710Mi 1.3Gi
1) For a single file
Original size: `1649192 VID_20200413_193959.mp4`
Compressing this video file using
time pigz --best -k VID_20200413_193959.mp4 real 0m31.855s user 1m56.362s sys 0m2.726s
Due to the parallel execution nature of
pigzall the cores are being used simulataneously.
The output compressed file is:
du VID_20200413_193959.mp4.gz 1648308 VID_20200413_193959.mp4.gz
time gzip --best -k VID_20200413_193959.mp4 real 1m16.566s user 1m14.458s sys 0m1.748s
As you can see, due to the single threaded execution nature of
gziponly one core is being used, that too 100%, while other cores are much free, which is not good!
The compressed output file:
du VID_20200413_193959.mp4.gz 1648068 VID_20200413_193959.mp4.gz
Even after trying to compress the same file multiple times, we see that
gzipis faster and compresses more than
pigz. Hence, parallel execution does not always guarantee faster execution.
2) For a directory of files.
Original size: `253M Me/`
du meGzip.tar.gz 249408 meGzip.tar.gz
du mePigz.tar.gz 249512 mePigz.tar.gz
Result- While dealing with multiple files, parallel execution did help obtain higher compression speed.
3) Compressing a directory of larger size:
Original size: `4.0G retire/`
time tar -cv retire/ | pigz --best > retirePigz.tar.gz
real 2m5.427s user 5m25.956s sys 0m14.496s
time tar -cv retire/ | gzip --best > retireGzip.tar.gz
real 3m26.640s user 3m8.151s sys 0m10.939s
Result - For larger sized directories, the difference in speed is more significant while compressing parallelly.
Here we can see that
pigz is much faster then
gzip when compressing multiple files, but
gzip was able to compress more in any condition! Hence when compressing multiple files, use
pigz but for single file use