Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems

cm0002@lemmy.world · 3 months ago

Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems

MudMan@fedia.io · 3 months ago

Case insensitive is more intuitive and MUCH safer.

You do not want every Windows user to live in a world where Office.exe, office.exe, Offlce.exe and 0fflce.exe are all different files.

OSs and filesystems aren’t built for programmers, they’re built for grandmas. Programmers just happen to use them. It’s much more sensible to give programmers a harder time fixing bugs and incompatibilities than it is to make the user experience even marginally worse.

I mean, all due respect for the guy, but that is an absolutely terrible opinion and I will die on this hill.

pelya@lemmy.world · 3 months ago

Your grandma will never type file names in shell, she’ll use Open File dialog, where case sensitivity does not matter.

MudMan@fedia.io · 3 months ago

Hah. Second absolutely deadpan Average Familiarity instance in a Linux forum I have this week.

I mean, no offense to grandma. Plenty of grandmas are computer literate. But the idea of this hypothetical normie Windows user doing anything but double click on an icon (too slowly, with a bit too much pressure on the left mouse button, as if that made a difference, probably having single clicked to select first, just in case) is absurd.

File names are icon names first and foremost. File paths are a UI element to breadcrumb the location of the currently open file manager/explorer window unless proven otherwise.

And that is the right answer and how the whole thing should be designed.

ulterno@programming.dev · 3 months ago

double click on an icon (too slowly, with a bit too much pressure on the left mouse button, as if that made a difference, probably having single clicked to select first, just in case

I do that.
I use KDE.
I am a programmer.

Also, I make directories with the correct capitalisations for the project names before going inside them and running git clone, which makes another directory in small letters.

Also, when I make header files matching class names, I capitalise them same as the class name. That messes up stuff for some others, sometimes. I like it.

masterspace@lemmy.ca · 3 months ago

OSs and filesystems aren’t built for programmers, they’re built for grandmas.

You’re just flat out and completely wrong.

No grandma is typing out file URLs. This is not a point.
OSes literally do nothing useful on their own. Their explicit purpose is to allow developers to write applications for them for users to use.
Case insensitivity can be handled at the application level, there is no necessity to handle it at the OS level.
Case insensitivity isn’t even clearly defined as Linus outlined, but you know what is clearly defined? Different character byte codes.

MudMan@fedia.io · 3 months ago

The entire issue is that gradmas don’t type out filepaths.

When you’re tying filenames case is easy, because a) you have to press something different, and b) typically terminal monospace fonts look very different in caps and non caps.

But in a GUI where you aren’t typing the names out? For a human reading human text caps and non caps are interchangeable. So as the name of an icon case sensitivity is confusing and prone to human error.

I mean, it’s that in typing, too, because it’s a very easy typo to make and all sorts of mixed case choices can be hard to remember, but it’s MORE confusing if you end up with just an icon with a name and the exact same icon with the exact same name just one character is a different case.

OSs don’t do anything by themselves, but they come bundled with all sorts of standardize applications built on top of them. If case sensitivity is baked into the filesystem, it’s baked into the filesystem. And absolutely no, you can’t put it in at the application level. I mean, congratulations for finding the absolute worst of both worlds, but how would that even work? If I tell an app to use a file and there are two of them with different cases how would that play out? You can build it into indexing and search queries and so on when they will display more than one result (and that, by the way, is typically extra EXTRA confusing), but you can’t possibly override the case sensitive filesystem.

Now, character byte codes are a different thing, and it’s true that the gripe in this particular rant seems to be almost more focused into weird unicode quirks and the case sensitivity thing seems to be mostly a pet peeve he rolls into it, I suspect somewhat facetiously.

But still, that’s for the OS, the filesystem and the applications to sort out. It’s an edge case to handle and it can be sorted out via arbitrary convention regardless of whether you do case sensitivity for filenames. “Case insensitive means insensitive to other things, too” is not a given at all.

masterspace@lemmy.ca · edit-2 3 months ago

Now, character byte codes are a different thing, and it’s true that the gripe in this particular rant seems to be almost more focused into weird unicode quirks and the case sensitivity thing seems to be mostly a pet peeve he rolls into it, I suspect somewhat facetiously.

No, it has nothing to do with “weird Unicode quirks”.

It has everything to do with their being a universal standard for representing different characters, and the file system deciding to then apply its own random additional standard on top that arbitrarily decides some thing are probably the same as others.

This is just like Javascript’s early ==, fuzzy equality choice. It was done to be helpful, but was a fuzzy system that doesn’t cover enough edge cases to be implemented at that low of a level.

MudMan@fedia.io · 3 months ago

Arbitrary is the word.

Arbitrary means you can implement it however you want. The limits to it are by convention. There is no need to go any further than case insensitive filenames. At all. Rolling case insensitive filenames into the same issue is entirely an attempt to make a case against a pet peeve for unrelated reasons.

You want it to handle the edge cases? Have it handle the edge cases. You want to restrict it to the minimum feature just for alphabet characters? Do that.

But you do NOT give up on the functionality or user experience because of the edge cases. You don’t design a user interface (and that’s what a OS with a GUI is, ultimately) for consistency or code elegance, you design it for usability. Everything else works around that.

I can feel this conversation slipping towards the black hole that is the argument about the mainstream readiness of Linux and I think we should make a suicide pact to not go there, but man, is it starting to form a narrative and am I finding it hard to avoid it.

masterspace@lemmy.ca · edit-2 3 months ago

There is no need to go any further than case insensitive filenames. At all. Rolling case insensitive filenames into the same issue is entirely an attempt to make a case against a pet peeve for unrelated reasons.

This is literally just the same issue. I cannot see what two issues you are separating this into.

All of this stems from case insensitive file names.

But you do NOT give up on the functionality or user experience because of the edge cases. You don’t design a user interface (and that’s what a OS with a GUI is, ultimately) for consistency or code elegance, you design it for usability. Everything else works around that.

The OS is not the GUI. Every GUI you see in the OS is an application running on top of the actual OS.

The OS should not arbitrarily decide that some characters are the same as others, it should respect the unified standards for what bytes represent what characters. Unless there is an internationally agreed upon standard for EXACTLY what case insensitive means for every character byte code, then you are building a flawed system that will ruin the user experience when massive bugs and stability issues come up because you didn’t actually plan out your system properly to cover edge cases.

You know, as Linus is pointing out given his multi decade history of running Linux.

MudMan@fedia.io · 3 months ago

No, hold on, this is not about the OS.

This is about whether the filesystem in the OS supports case insensitive names.

That determines whether the GUI supports case insensitive names down the line, so the choices made by the filesystem and by the OS support of the filesystem must be done with the usability of the GUI in mind.

So absolutely yes, the OS should decide that some characters are the same as others, not arbitrarily but because the characters are hard to read distinctly by humans and that is the first consideration.

Or hey, we can go back to making all filenames all caps. That works, too and fully solves the problem.

masterspace@lemmy.ca · edit-2 3 months ago

No, hold on, this is not about the OS.

Holding on.

This is about whether the filesystem in the OS supports case insensitive names.

K, now that we’re done being pedantic…

That determines whether the GUI supports case insensitive names down the line, so the choices made by the filesystem and by the OS support of the filesystem must be done with the usability of the GUI in mind.

Oh yes, let’s prioritize making sure that when grandmas are using the raw filesystem they’re not confused by case sensitivity, totally worth it over stable, bug-free, secure, software.

Definitely couldn’t have just built grandmas a case insensitive option on the user portion of the file system instead of introducing bugs and edges cases into literally every single piece of software they might use…

MudMan@fedia.io · 3 months ago

OK, no, but yes, do that.

Yes, prioritize making sure that grandmas are not confused by case sensitivity over bug-free secure software. That’s correct.

Also do that robustly in the user layer. Why not? That’s cool as well.

I am a bit confused about how you suggest implementing a file system where two files can have the same user-facing name in document names, file manager paths, shortcuts/symlinks, file selectors and everywhere else exposed by the user without having the file system prevent two files with the same case-insensitive name existing next to each other. That seems literally worse in every way and not how filenames are implemented in any filesystem I’ve ever used or known about. I could be wrong, though.

Point is, I don’t care. If you figure out a good implementation go nuts.

But whatever it is, it NEEDS to make sure grandma will never see Office.exe and office.exe next to each other in the same directory. Deal?

Deebster@programming.dev · edit-2 3 months ago

Case insensitive is more intuitive

Are these the same filename?

ΑΓΑΘΉ.txt
αγαθή.txt

What about these?

MY-NOTES-ON-Δ.txt
μυ-notes-on-δ.txt

Databases have different case-insensitive collations - these control what letters are equivalent to each other. The fact that there’s multiple options should tell you that there’s no one-size-fits-all solution to case insensitivity.

This issue is only simple and obvious if you don’t know enough about it.

MudMan@fedia.io · 3 months ago

I mean, cases in non-latin alphabets are cases as long as they function like cases, equivalences between alphabets are not cases, they’re equivalences between alphabets and a different issue altogether. At least that’d be my starting point for implementation.

But you’re misrepresenting my argument. I don’t give a crap if it’s simple and obvious to implement and it’s not my claim that it is. If it’s simple and obvious to the user it’s still the right call, even if the implementation is complicated and has to deal with edge cases.

My last caveat there would be that nobody claimed that a one-size-fits-all is necessary. Ultimately you’re not deciding the case sensitiveness of databases, just of one database, and that’s the filesystem’s naming rules. The rules are arbitrary and conventional. Short of raw “any character code will always be different from any other character code regardless of how visually similar or grammatically interchangeable the user-facing glyphs may be” any other solution is just as arbitrary as each other. You’re always making a decision about it.

My contention is the decision shouldn’t be based on what is comfortable or more straightforward to implement, debug or use for the OS developers, it should be what is more usable by the lowest common denominator GUI-only users. And that’s case insensitive (but otherwise long and flexible) filenames.

Deebster@programming.dev · edit-2 3 months ago

But you’re misrepresenting my argument.

Hardly, I’m directly addressing your statement that case insensitive is intuitive to users, grandmas or otherwise - I give examples where it’s not initiative or obvious which filenames match. I didn’t mention ease of implementation at all.

The principle of least surprise is an important UX consideration, and your idea of effectively introducing collation and localising which files conflict is just trading one problem for another set of problems and suprises (e.g. copying directories between drives with different settings).

MudMan@fedia.io · 3 months ago

No, it’s not. You’re substituting a base use case for an edge case and pretending they are on the same order when it comes to UX. They are not. File localization and mixing and matching alphabets in filenames is NOT the same as case sensitivity and using cases (or spaces, if we want to roll this conversation back a couple decades and talk about an actual implementation mess) in filesystems. Security and stability care about edge cases, it’s weird that you try to flex by name dropping “principle of least surprise” and then pretend that a problem impacting every single user who types a filename is the same impact on that than a user mixing and matching alphabets on multiple cases. ESPECIALLY when your example requires making the conscious decision that equivalent characters across alphabets is equivalent to case sensitivity, which is not a given at all.

Oh, and it’s not my idea. Default Windows and Mac FSs are case insensitive, legacy FAT systems are case insensitive. If the issue is standardization across systems, case sensistivity is the odd one out. If you’re having issues mixing and matching drives in older supported case-insensitive FSs the blanket fix for that is not having a case sensitive system elsewhere for no particularly good reason. I mean, speaking of minimizing surprise…

SwingingTheLamp@midwest.social · 3 months ago

I don’t have Windows here to test, so I keep wondering, are all of these forms the same?

facture-février.pdf
FACTURE-FÉVRIER.PDF
FACTURE-FEVRIER.PDF

MudMan@fedia.io · 3 months ago

On a NTFS drive on Windows with default settings the first two are the same, the third one is not.

Caps and non-caps are matched, accented/unaccented characters are not, which is probably what you’d expect.

SwingingTheLamp@midwest.social · 3 months ago

Thanks. That is what I’d expect, and highlights the disconnect I saw in this comment chain: I think what some other folks were trying (less-than-artfully) to say is that there’s a difference between what one might expect case-insensitive means as a computer programmer, and what one might expect case-insensitive to mean in human language. All three of those should be the same filename in fr_FR locale, since some French speakers consider diacritical marks to be optional in upper case. While that might be an edge case, it does exist. English is even worse, with a number of diacritical marks that are completely optional, but may be used to aid legibility, e.g. café, naïve, coöperation. (Whether that quirk is obvious or not, or whether it outweighs any utility of case-insensitivity is not something that I have a strong opinion on, though.)

MudMan@fedia.io · 3 months ago

Whaaaat? You’re telling me someone in the Linux community chooses to be deliberately obscure based on a technicality no end user cares about in a patronizing, elitist manner?

Naah. Impossible.

The issue with the special characters for accent marks and diacritics is their importance fluctuates per language, so you have to keep them separate unless you want to make different rules per locale instead of per character.

They do it the other way for number formatting and that’s already a mess. If you’ve ever tried to work with spreadsheets across locale formats it’s absolutely bonkers. Excel outright changes the separators in formulas.

fubo@lemmy.world · 3 months ago

But if someone creates a file called HEAD, should it overwrite a file called head?

That shouldn’t matter to the “nontechnical” end-user at all. To the nontechnical user, even the abstraction of “creating a file” has largely gone away. You create a document, and changes you make to it are automatically persisted to storage, either local or cloud.

Only the technical command-line user cares about whether /usr/bin/HEAD and /usr/bin/head are the same path. And only in a specific circumstance — such as the early days of Mac OS X, where the Macintosh and Unix cultures collided — could the bug that I described emerge.

MudMan@fedia.io · 3 months ago

I found this post confusing because on the face of it, it sounds like you agree with me.

I mean, yeah, HEAD and head should overwrite each other.

As you say, only technical command-line users care about the case sensitivity. So no, it shouldn’t matter to the nontechnical user. And because the nontechnical user doesn’t care about the distinction if something is called “head” in any permutation it shares a name with anything else called “head”. And the rules are items within a directory have unique filenames. So “head” and “HEAD” aren’t unique.

The issue isn’t that the names are case insensitive, the issue is that two applications are using the same name in the same path.

If we’re not careful that’ll lead to a question about whether consolidating things in the Unix-style directory structure is a bad idea. I normally tend to be neutral on that choice, but you make a case for how the DOS/Windows structure that keeps all binaries, libraries and dependencies under the same directory at the cost of redundancy doesn’t have this problem to begin with.

But either way, if two pieces of software happen to choose the same name they will step over each other. The problem there is neither with case sensitivity or case insensitivity. The problem there is going back and forth between the two in a directory structure that doesn’t fence optional packages under per-application directories. As you say, this is only possible in a very particular scenario (and not what the post in question is about anyway).

soc@programming.dev · 3 months ago

So every time grandma picks a file name she needs to specify the locale?

What a stupid hill to die on.

MudMan@fedia.io · 3 months ago

Why would grandma need to specifiy the locale? The locale has been an environment variable set on install as far back as MS DOS and it is still that on Linux, Windows and Mac OS to this day, to my knowledge. You can tell grandma she can edit her config.sys if things aren’t working as expected, I suppose.

This is the second time someone spontaneously brings up non-latin alphabets as if they are equivalent to case sensitivity for no good reason. Who made a nerdy blog post about this and poisoned the well?

soc@programming.dev · edit-2 3 months ago

Why do have such a strong opinion on things you –rather obviously– understand very poorly?

Why would grandma need to specifiy the locale?

Yeah, maybe figure this one out for yourself to get you on track for learning something.

someone spontaneously brings up non-latin alphabets as if they are equivalent to case sensitivity for no good reason

That sentence makes no sense. You don’t have to agree on things, but at least be coherent in your objection.

MudMan@fedia.io · 3 months ago

Seriously, I will find that blog.

But to your question, case sensitivity is user-facing. That’s been my argument from the beginning.

You don’t need to care about the implementation side. You just need to care about how it’s used. If it makes more sense for the user to have File.txt and file.txt be the same, then that’s as much as you need to “understand” to have an opinion on this. A correct one, at that. The rest of it serves the user and the usability first.

soc@programming.dev · 3 months ago

What blog?

To spell it out for you, very slowly: Casing is locale-sensitive.

You cannot determine whether file A and file B have the same case without taking the language the filename was written in into account.

Which means you need to somehow attach the locale to every file (name). Your browser could implement something to add that (semi-)automatically, but if grandma is creating a file from scratch, there is only so much you can do.

I hope this helps you understand why the thing you propose is stupid.

MudMan@fedia.io · 3 months ago

What thing that I propose? Case insensitive FSs aren’t a new thing. They are, in fact, the norm.

Why are you talking about this as if it’s a weird thing some rando in the Internet came up with? Yes, you need to attach the locale to the filename. No, I have no idea off the top of my head of how different file systems encode or store that.

I do know that the information is available, that it is handled in many commonly used FSs already… AND that you need to handle that anyway for a whole bunch of other reasons as well, from special characters ato alternate alphabets. And yes, there are edge cases (hey, to this day Windows won’t parse Japanese filenames out of the box half the time). That’s no excuse.

Once again, this is not about implementation, this is about the user-facing feature working as expected. “Oh, you can’t do case insensitivity because now you have to store additional information” is not an excuse to not support the feature.

Or, if it is, then let’s go back to eight characters from the English alphabet in all caps. 8.3 filenames. Why not? If we don’t want to have to implement features into the FS and convenience doesn’t trump having to deal with additional requirements and edge cases we may as well keep going. Why are spaces, cyrillic, special characters and long names worth doing but case insensitivity isn’t?

soc@programming.dev · 3 months ago

Case insensitive FSs aren’t a new thing.

More precisely, they came up in a time where Unicode was not a thing.

Yes, you need to attach the locale to the filename. No, I have no idea off the top of my head of how different file systems encode or store that.

They don’t. None of them.

Or, if it is, then let’s go back to eight characters from the English alphabet in all caps. 8.3 filenames. Why not? […] Why are spaces, cyrillic, special characters and long names worth doing but case insensitivity isn’t?

Because you cannot have both.

It is either “spaces, cyrillic, special characters and long names” or case insensitivity.

MudMan@fedia.io · 3 months ago

OK, you’re going to have to be specific about your technical claims here. Which part of unicode support on filenames prevents case insensitive filenames?

I mean, I don’t know why you made me try, because I do use non-English characters day to day. But you made me try, and sure enough, you can absolutely mix and match alphabets and cases in NTFS and still have it behave case-insensitively. i.e. делоinsensitive.txt and делоINSENSITIVE.txt can’t be placed in the same folder and are parsed as the same name.

So if you have some bit of nuance about why you “cannot have” the feature that clearly works right now in front of me in my computer and has for ages I’m all ears. For a completely impossible implementation of a clearly useful feature it sure seems like it is actively supported in most of the FSs currently in use.

EDIT: Made me go check because now I’m curious about the edge cases you guys keep claiming are catastrophic. Putting those two files in a case sensitive Samba share an opening it in Windows keeps the mix of characters but a suffix gets unceremoniously appended to the filename of one of the files. That’s probably a bit of a mess in some circumstances if you don’t know it’s coming, but there you go.

Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems

Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems

Attention Required! | Cloudflare