I want to have a mirror of my local music collection on my server, and a script that periodically updates the server to, well, mirror my local collection.
But crucially, I want to convert all lossless files to lossy, preferably before uploading them.
That’s the one reason why I can’t just use git
- or so I believe.
I also want locally deleted files to be deleted on the server.
Sometimes I even move files around (I believe in directory structure) and again, git deals with this perfectly. If it weren’t for the lossless-to-lossy caveat.
It would be perfect if my script could recognize that just like git does, instead of deleting and reuploading the same file to a different location.
My head is spinning round and round and before I continue messing around with find
and scp
it’s time to ask the community.
I am writing in bash but if some python module could help with it I’m sure I could find my way around it.
TIA
I’m not sure if syncthing will do everything you want, could be worth taking a look
Not sure what you’re asking, but can you use git hooks? What is the purpose of the mirror: for backup, for remote listening, or what? If the mirror is the permanent home for the files, you should keep the lossless version there. Is the lossy conversion just to reduce upload bandwidth? How did you get the lossless files onto the client to begin with?
If I imagine this setup, the lossless versions would live on the server, lossy compression would also be done on the server, and then the client could download either version.
I think version control isn’t really what you want, since you normally won’t have multiple revisions of the same file.
Maybe you could look at git-annex for handling the large binaries in your git repo.
Git is for text files. Your git repo might get very big after some time. Especially if you move files. But it’s your choice. Sounds like your problem can be solved with pre-commit hook
Your git repo might get very big after some time. Especially if you move files.
Moving files does not noticeably increase git repo size. The files are stored as blob objects. Changing their path does not duplicate them.
Is there a reason not to have the lossless/original files on the server? What I mean is, you could setup one of the myriad of self hosted music streaming apps here and the vast majority will transcode to lossy, appropriately compressed files for steaming or even downloading on remote devices for offline listening, on the fly.
I also want locally deleted files to be deleted on the server.
Sometimes I even move files around (I believe in directory structure) and again, git deals with this perfectly. If it weren’t for the lossless-to-lossy caveat.
It would be perfect if my script could recognize that just like git does, instead of deleting and reuploading the same file to a different location.
If you were to use Git, deleted files get deleted in the working copy, but not in history. It’s still there, taking up disk space, although no transmission.
I’d look at existing backup and file sync solutions. They may have what you want.
For an implementation, I would work with an index. If you store paths + file size + content checksum you can match files under different paths. If you compare local index and remote you could identify file moves and do the move on the remote site too.
Git is for text files and retaining a history of every change and every state that has ever existed. It is the wrong tool for what you want, because it would be wasteful of resources.
I suggest automating lossy encodings locally (there are quite a few approaches you could use here, such as a cron job with the encoder of your choice), and automating an rsync job to keep your server updated.
It is the wrong tool for what you want, because it would be wasteful of resources.
I’m actually coming round to this.
I guess rsync can be told to remove removed files on the destination, too?
Then I’d just exclude the lossless file extensions, and deal with them differently.
I guess rsync can be told to remove removed files on the destination, too?
Yes. The
--delete
family of options are relevant here.