TheSwordlessNinja t1_j1yfh70 wrote on December 28, 2022 at 9:00 AM

#1,111,463

A very simple form of compression can remove repeating bytes. So imagine behind the scenes of your file you have a long series of 0x00 (zero represented in hex). That can be condensed to just 1 zero with a marker on how big it actually is (so it knows how long it should be when being decompressed in the future).

Jason_Peterson t1_j1yftv9 wrote on December 28, 2022 at 9:05 AM

#1,111,511

This is only possible in exceptional cases where the input data is highly repetitive, such as a simple digital drawing consisting of a few colors. Then the algorithm records how many bytes to repeat instead of writing them out one after the other. This is one of the most basic methods of how compression works.

Other methods include keeping a dictionary of sequences that have been encountered recently with the aim of using references into that table which are shorted than the data they describe, and prediction of a continuous signal, such as subtraction of the previous pixel in a row.

Most normal data that has a meaning is not very repetitive. It contains variations and noise, which make exact matches unlikely to occur. Typically compression achieves a ratio of 25% to 50%.

An "online" algorithm is not a meaningful classification. Perhaps you want to clarify what specific program you mean by online.

vatbub t1_j1ykyod wrote on December 28, 2022 at 10:18 AM

#1,112,252

In general, there's two types of compression: Lossy compression and lossless compression.

With lossy compression, imperceptible details are thrown away. An example is JPEG or MP3. You don't need every visual detail in a picture to see it's beauty much like most people don't need to hear the highest frequencies to appreciate their music. And if this information is thrown away, it doesn't need to be stored, resulting in smaller file sizes.

On the other hand, lossless compression simply stores the information in a more clever and efficient way. As an example, I might want to store the text "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" which takes up 30 letters. A compression algorithm might see that in fact, this is the same letter repeated 30 times and store the same text like so: "30*A" which only requires 4 letters. When reading this file, a decompression algorithm will simply do the inverse and output my 30 As again, resulting in no information being lost. More advanced algorithms usually recognize more patterns and some specialized compression algorithms (like for instance PNG for images) employ particular characteristics of the data to store it in a smarter way.

jak0b345 t1_j1yl1bo wrote on December 28, 2022 at 10:19 AM

#1,112,260

it depends. there are two types of compression.

lossless compression has to goal to restore each and every bit exactly, this is needed for things like e.g. program code, word documents or just generally any file which you don't know what the content represents. this type of compression works by finding patterns of bits which are present many times in the file (like e.g. a string of 8 zeros) and assingning them a shorter pattern (e.g. only two zeros).

the other type of compression is lossy compression and is used e.g. for images, video and sound files (like .jpg or .mp3). this works by only keeping the information that is perceptible to humans. in a image that could mean e.g. smoothing edges to be less sharp but only to a degree thats barely noticeably to the human eye. this is also where the wierd effect next to black and white text of memes which have been repostet a lot comes from. see e.g this website

generally lossy comoression is much more powerful than lossless compression, since it does not need to be able to reproduce each bit exactly. it only needs to provide a result which is quite close to the original considering the limits of human perception

maw32 t1_j1ylkbk wrote on December 28, 2022 at 10:27 AM

#1,112,333

Replying to Jason_Peterson (#1,111,511)

In sorting algorithms there is a classification in online and offline. Offline algorithms need all of the data to work. Online algorithms can work with parts of the data and incorporate more data as it is received. I guess he is looking for an algorithm which can compress data arriving from a stream without buffering the whole stream first.

Or he want to compress data over the internet because he has no local admin rights to install a compression software/library.

TheSwordlessNinja t1_j1yovav wrote on December 28, 2022 at 11:12 AM

#1,112,760

Replying to vatbub (#1,112,252)

It's worth noting that lossy doesn't maintain the quality of the image

Fred2718 t1_j21b6p0 wrote on December 28, 2022 at 11:01 PM

#1,129,613

Here are a couple of techniques which may not be in use anymore.

"Rook move encoding" for outlines of shapes, in particular digitized text characters. Instead of recording all the black and white pixels inside the character's body, record the outline of the character, in xy pixel space, as if you were moving a chess rook. E.g. 5 up, 2 left, 4 up, 3 left, ..... I worked with laser printers which did this.

"Patchified memory". Instead of treating a page as a set of raster lines, treat it as a set of small squares, like a quilt. Printed pages of text + black/white graphics tend to have lots of such patches which are all-white or all-black. Those patches can be represented in a very compact way.

These techniques are mathematically unsophisticated, but are easy to understand. Oh, also, they are both "lossless".

[ELI5] How do online compression algorithms manage to take a file that is dozens of megabytes in size and shrink it down to just a few kilobytes, while mantaining the same quality?

Comments