Zlib este un algoritm de comprimare / decomprimare blocuri de date (texte, fisiere, imagini) oferit gratis in comunitatea de programatori si lipsit de patente sau alte pretentii legale.
Limbajul Python poate folosi (diferit, in functie de versiune) biblioteca zlib, iar solutia este incredibil de simpla:
#
# zlib compress/decompress on python 2 and 3
#
import sys
import zlib
py = sys.version_info
py3k = py >= (3, 0, 0)
def test_zlib(teststr):
if py3k:
compressed = zlib.compress(teststr.encode('utf-8'))
decompressed = zlib.decompress(compressed)
rate = float(len(compressed)) / float(len(teststr))
print('Compress rate:{}/{}={:.3f}% - {}'.format(
len(compressed), len(teststr), rate,
teststr.encode('utf-8') == decompressed))
else:
# varianta 1
compressed1 = teststr.encode('zlib') # 'zlib' sau 'zip' e acelasi rezultat
decompressed = compressed1.decode('zlib')
rate = float(len(compressed1)) / float(len(teststr))
print('Compress rate:{}/{}={:.3f}% - {}'.format(
len(compressed1), len(teststr), rate,
teststr.encode('utf-8') == decompressed))
# varianta 2
compressed2 = zlib.compress(teststr.encode('utf-8'))
decompressed = zlib.decompress(compressed2)
rate = float(len(compressed1)) / float(len(teststr))
print('Compress rate:{}/{}={:.3f}% - {}'.format(
len(compressed2), len(teststr), rate,
teststr.encode('utf-8') == decompressed))
print('Same: {}'.format(compressed1 == compressed2))
if __name__ == "__main__":
test_zlib('123')
test_zlib('ax'*1000)
test_zlib('a'*1000)
test_zlib('12345678'*10000)
if py3k:
input('Enter')
else:
raw_input('Enter')
Modulul zlib este suportat de ambele versiuni, python2 si python3, dar python2 il implementeaza si tranparent ca functii de compresie si decompresie ale variabilei-string.
Rezultatul executiei in python3:
Compress rate:11/3=3.667% - True
Compress rate:24/2000=0.012% - True
Compress rate:17/1000=0.017% - True
Compress rate:148/80000=0.002% - True
Comprimarea nu este eficienta pentru secvente scurte (exmplul '123' in program ocupa 11 bytes) sau care nu au multe blocuri care sa se repete, dar compresia a 1000 de 'ax' ocupa 24 bytes, compresia a 1000 de 'a' ocupa 17 bytes si 10000 de '12345678' ocupa 148.