I don’t think that’s what’s happening. There’s no hard requirement for cat
to read everything straight into memory. It can send data once it’s available, and the receiving process can read it as fast as it wants. There are cases where this might be more clear: Let’s say you have a big video file that you want to convert to something that only supports like y4m input and is not in ffmpeg. A common way is something like ffmpeg -i infile -f yuv4mpegpipe - | encoder --y4m outfile
- I’m pretty sure ffmpeg won’t read the whole infile into memory, nor will it store the whole y4m representation in memory. Instead, it will decode infile as necessary and push into the pipe at the speed the encoder can handle.
But yeah, I remember something about tar using libraries for compression being more efficient that piping its output to a compressor. So it’s still the better route, but probably not as much better as you think.
It’s not that far-fetched, PDFs in my opinion are closer to vector graphics than to document formats like odt and docx. They have no understanding of format if not using advanced features, like a table in a PDF is just spaced text with lines between them, and text is just independently placed letters. In fact the space symbol doesn’t exist in most PDFs, it’s just that two letters were spaced further apart. So they basically are multiple canvases that are being painted on with letters, lines, fill areas and even bitmap graphics.
Modern PDF actually does further in the direction of a document format by providing the content in a structured way, mostly for accessibility, but also for making the format suitable for automatic processing the contained data.