I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
He could also refer to the mere possibility of having duplicates which does not mean there are duplicates. And even then it could be by accident. Of course db design could prevent this. But I guess he is inflating the importance of this issue.
Because a simple query would have shown that SSN was a compound key with another column (birth date, I think), and not the identifier he thinks it is.
Why would one person, one SSN ever have two different birth dates? That sounds like an issue all onto itself.
I think what he means is that the unique identifier for a database record is a composite of two fields: SSN + birth date. That doesn’t mean that SSN to birth date is a one-to-many relation.
But they are implying SSN to SSN+Birthdate is a one-to-many relationship. Since SSN to SSN should be one-to-one, you can conclude the SSN to Birthdate is one-to-many, right?
No, who said there was a relationship?
A compound key is a composite key where one or both sides can be foreign keys to other tables themselves; it’s a safe assumption this is probably true in a large data set like social security. A composite key is a candidate key (a uniquely identified key) made up of more than one column.
This basically means that there is a finite number of available SSNs because they’re only 10 digits long and someone intends to recycle SSNs after the current user of one dies. Linking it to birthday is “unique enough” as to never recur.
I think I was getting some wires crossed and/or misunderstood what geoff (parent commentor to my last comment) was saying, so my comment may be misdirected some.
But according to The Social Security FAQ page, SSNs are not recycled, so that data (especially when compounded and hashed with other data) should be able to establish a one-to-one relationship between each primary key and an SSN, thusly having SSNs appear associated with multiple primary keys is a concern.
Other comments have pointed to other explanations for why SSNs could appear to occur multiple times, but those amount to “it appeared in a different field associated with the same primary key”. I think thats the most likely explanation of things.
Note that it being only part of a key is a technology choice that does not require the reality map to it. It may seem like overkill, but someone may not trust the political process to preserve that promise and so they add the birthdate, just in case something goes sideway in the future. Lots of technical choices are made anticipating likely changes and problems and designing things to be extra robust in the face of those
Yeah this strikes me as safeguarding against a possible bad decision.
A weak example would be my grandma. She was born before social security and was told as a kid she was born in 1938. Because I guess in the olden days, you just didn’t need to pass your birth certificate around for anything, it wasn’t until she went to get married at ~age 25 that her birth certificate actually said she was born in 1940 (I forget the actual years, but I remember it was a two year and two day gap between dates).
Its a weak example that should apply to only a microscopic portion of the population, but I could see her having some weird records in the databases as a result.
Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.
This explanation makes no sense in the context of OP’s question, given the order of comments…
Yeah, a better explanation is that Deduplicating Databases are an absolutely terrible idea for every use case, as it means deleting history from the database.
The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.
The man continues to be a malignant moron
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.
Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.
https://www.ssa.gov/history/hfaq.html
Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.
Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.
In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.
It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.
Hell, I work in a state agency and one of our older databases has a dozen tables with databases.
- One has the whole thing as a long int: 222333444
- One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
- One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
- One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444
The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.
Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.
SSNs should be stored as strings without any hyphen or additional markup, nothing else.
- Storing as a number can cause issues if you ever wanna support trailing zeros
- any “styling” like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times
It’s more likely though it’s just a composite key…
This is not what he is actively doing though. He isn’t trying to improve databases.
He is tearing down entire departments and agencies and using shit like this to justify it.
Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.
Denormalization due to multiple different tables storing their own copies of the same data, in different formats worse yet, would actually be the kind of problem he’s tweeting about.
As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.
Musk canceled the support for the long running Common Education Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.
It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.
A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.
Of all the comments ao far, I find yours the most compelling.
Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:
SSN_Table
ID | SSN | Other info
Other_Table
ID | SSN_ID | Other info
When you want to connect them to have both sets of info, it’d be the following:
SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID
Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.
But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.
It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:
SSN_Table
ID | SSN | Other info
Driver_License_Table
ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info
Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.
I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.
This is true, but there are many instances where denormalization makes sense and is frequently used.
A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.
There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.
Yeah, I work daily with a database with a very important non-ID field that is denormalized throughout most of the database. It’s not a common design pattern, but it is done from time to time.
Yeah, no one appreciates security.
I probably overused that saying to explain it: ‘if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?’
To be a bit more generic here, when you’re at government scale you’re generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.
E.g. caching is duplicating data to save time. Without it we’d have lower storage costs, but longer wait times and more network traffic.
The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.
Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!
If he doesn’t think the government uses sql after having his goons break into multiple government servers he is an idiot.
If he is lying to cover his ass for fucking up so many things (the more likely explanation) then saying “he never used sql” is basically a dig at how technically inept he really is despite bragging about being a tech bro.
To oversimplify, there are two basic kinds of databases: SQL and noSQL (“Not Only SQL”).
SQL databases work as you’d imagine, with tables of rows and columns like a spreadsheet that are structured according to a fixed schema.
NoSQL includes all other forms of databases, document-based, graph-based, key-value pairs, etc.
The former are highly consistent and efficient at processing complicated queries or recording transactions, while the latter is flexible and fast at reads/writes but not neccessarily consistent.
All large orgs will have both types in use for different purposes; SQL is better for banking purposes where consistency is paramount, NoSQL better for real-time web apps that need minimal response times and scalable capacity.
Just so I’m clear, you’re implying that a given SSN could appear associated to multiple “keys” because the key-value pair in a NoSQL database could have complex data.
An example I can imagine is a widow collecting her dead husband’s Social Security. Her SSN could appear in her own entry and also in her dead husband’s as a payee of that benefit, thus appearing as a “duplicate” SSN.
Is that in line with what you’re saying?
Indeed, that’s a possibility, but I’m not privy to the structure of the social security administration’s databases so I couldn’t say if it was indeed the case.
Thats how I feel too.
Lol, I’d love to see the data hes trying to speak about (not that that’d be any kind of concerning for privacy /s). I don’t think he’s outright lying, but it definitely feels like a misrepresentation / wrong conclusion from the data.
But thanks for your part in helping me understand all this!
I didn’t read it like that. What I take from it is that he’s implying that the government uses something much stupider than sql, like Lotus1-2-3 or plain txt files or excel. I really wouldn’t be surprised that there’s some government department that had their IT done during the first Bush administration and didn’t really upgrade from it since.
There are also probably some departments that don’t get much funding, so they organise part of their work into some shared excel files.l
Nothing really wrong with that. Unless he’s implying that the entire federal government works like that, which is preposterously stupid.
Rows in a SQL table have a primary key which works as the unique identifier for that row. The primary key can be as simple as an incrementing number.
Right, but if there were multiple entries with the same SSN, wouldnt that be a concern?
Not unless the data associated with that SSN is itself inconsistent.
For example, when multiple people are fraudulently using the same SSN, the fraud monitoring DB would neccessarily need to record several entries with the same SSN.
Ah the old “malware detectors have the selectors for malware and so they show up as malware to other malware detection systems” problem.
Yeah, that seems like a reasonable case to have duplicate SSNs.
Its because the comments he made are inconsistent with common conventions in data engineering.
- It is very common not to deduplicate data and instead just append rows, The current value is the most recent and all the old ones are simply historical. That way you don’t risk losing data and you have an entire history.
- whilst you could do some trickery to deduplicate the data it does create more complexity. There’s an old saying with ZFS: “Friends don’t let friends dedupe” And it’s much the same here.
- compression is usually good enough. It will catch duplicated data and deal with it in a fairly efficient way, not as efficient as deduplication but it’s probably fine and it’s definitely a lot simpler
- Claiming the government does not use SQL
- It’s possible they have rolled their own solution or they are using MongoDB Or something but this would be unlikely and wouldn’t really refute the initial claim
- I believe many other commenters noted that it probably is MySQL anyway.
Basically what he said is incoherent to anybody who has worked with larger data.
In terms of using SQL, it’s basically just a more reliable and better Excel that doesn’t come with a default GUI.
If you need to store data, It’s almost always best throw it into a SQLite database Because it keeps it structured. It’s standardised and it can be used from any programming language.
However, many people use excel because they don’t have experience with programming languages.
Get chatGpt to help you write a PyQT GUI for a SQLite database and I think you would develop a high level understanding for how the pieces fit together
There’s an old saying with ZFS: “Friends don’t let friends dedupe”
That’s a bad example to reference. The ZFS implementation of deduplication is poorly thought out, and I say that even though I like and run ZFS on my own Linux server(s). I understand that the BTRFS implementation of dedupe works well (no first-hand experience), and the Windows one works great (first-hand experience).
I’ve had a poor experience with btrfs dedupe tbh (and a terrible experience with qgroups), however, this was years ago. Btrfs snapshots I prefer though, much easier not to have that dependence.
What distro are you using for ZFS, void?
Great explanation, but I have a tiny, tiny, minor nit-pick
Basically what he said is incoherent to anybody who has worked with larger data.
I’m being pedantic, but I disagree with your wording. As a backend dev, I work with relational databases a ton, and what Musk said wasn’t incomprehensible to me, it just sounded like something a first year engineer fresh out of college would say.
Again, the rest of your explanation is spot on, absolutely no notes, but I do think the distinction between “adult making up incomprehensible bullshit” and “adult cosplaying as a baby engineer who thinks he’s hot shit but doesn’t know anything beyond surface level stuff” is important.
Fair point, I’ve edited the answer to be clearer for future readers.
It was a great answer until the very last sentence. ChatGPT is never a reference for anything ever if you have any fraction of a brain.
I disagree, it’s just a tool. It’s a fantastic way to template applications very quickly, particularly for those who are not already familiar with technologies and may not have the time or opportunity to play around with things otherwise.
Llm is not a search engine and it can produce awful code. This is not production code, it’s for tinkering. As a sandbox tool, LLMs are fantastic.
On the ethical side of things, yeah openAI sucks, Qwen2.5 would be up to this task, one can run that locally.
It’s a disinformation machine which completely lacks all context. If it’s about 85% accurate to average internet denizens and 15% halucination, then it’s an absolutely atrocious source to learn from. You’re literally lying to yourself, that is what the tool does.
Well Ive ad a great time using LLMs to sandbox a dozen implementations and then investigate the shortcoming and advantages of different implementations.
Mistakes happen a lot but they can be managed on a small MWE with a couple of tests.
It’s how the tool is used more than any given tool being bad.
I understand your point and you’re not wrong. However, I’m not wrong either and you should take a second look at how you might use these tools in a way that makes your life easier and addresses the valid limitations you’ve described.
I have a fraction of a brain, I think, and use ChatGPT as a guide so that I have something to start with. Even if it’s slightly off, my two brain cells can pick it out and go from there. It’s not so bad.
And you know, I get it if you don’t like AI, but let’s be honest about it at the very least.
To be honest it’s a shit solution that makes you worse by merely using it.
I mostly ask it things I don’t know, though. I’m not exporting my thinking to it.
I ask it difficult translations, how to code something I’m unfamiliar with, help with grammar, i use it as an OCR for other languages, to help me remember things I can’t directly search, etc. I have a hard time believing all use is detrimental, especially when you’re filling in the gaps of your knowledge and a best guess will do. It’s surely better than a web search for things you don’t even know how to write in a search box.
You sound like common sense and the other person sounds like they have an axe to grind.
I mostly ask it things I don’t know, though. I’m not exporting my thinking to it.
Exhibit A
Which are then obviously confirmed with a web search. Jesus, spare me the cynicism.
And I’m just going to say this as a general observation, but the user base of the fediverse is pretty sophisticated at this time to be assuming shit like this. You make this place hostile by not giving the benefit of the doubt, you know. And even then. How hard is it to not think the worst of everyone you come across online? So ridiculous and petty.
- It is very common not to deduplicate data and instead just append rows, The current value is the most recent and all the old ones are simply historical. That way you don’t risk losing data and you have an entire history.
If SSNs are used as a primary key (a unique identifier for a row of data) then they’d have to be duplicated to be able to merge data together.
However, even if they aren’t using ssn as an identifier as it’s sensitive information. It’s not uncommon to repeat data either for speed/performance sake, simplicity in table design, it’s in a lookup table, or you have disconnected tables.
Having a value repeated doesn’t tell you anything about fraud risk, efficency, or really anything. Using it as the primary piece of evidence for a claim isn’t a strong arguement.
This sounds like a reasonable argument.
Can you pass any resources with examples on when having duplicate values would be useful/best practices?
Sure, basically any time you have a many-to-many relationship you’ll have to repeat keys multiple times. Think students taking courses. You’d have a students table and a courses table, but the relationship is many students take many courses. So you’d want a third table for lookups where each row is [student_id, course_id].
This stackoverflow post has a similar example with authors and books - https://stackoverflow.com/questions/13970628/how-do-i-model-a-many-to-many-relation-in-sql-server#13970688
This is the answer… it seems few on lemmy have ever normalized a database. But they do know how to give answers!
Thanks, OP seemed more curious about the technical aspects than just the absurdity of the comment (since pretty much every business uses SQL) so hoped a more technical explanation might be appreciated.
TIL Elon doesn’t know SQL or have any basic human decency.
J/K, I already knew he doesn’t have basic human decency.
If he knew anything about SQL, he could have run a quick search to see whether any SSNs are actually duplicated. (spoiler alert: they’re not, he’s just stupid).
I saw a comment about this in the last couple of days that was really interesting and educational. Unfortunately I can’t seem to find it again to link it, but the gist of it was that there would be two things wrong with using SSNs as primary keys in a SQL database:
- You should not use externally generated data as primary keys
- You should not use personally identifying data as primary keys
Using SSNs as keys would violate both.
I went looking for best practices regarding SQL primary keys and found this really interesting post and discussion on Stack Overflow:
https://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables
My first thought was that people’s SSNs can and do change, and sometimes (rarely?) people may have more than one SSN. Like someone mentions in that link, human error would be another reason why you would not want to use external data and particularly SSNs as primary keys.
It may be bad practice to use SSN as a primary key, but that won’t deter thousands of companies from doing exactly that.
Oh, I hear you!
From what I’m seeing in other comments, it seems SSNs aren’t used as primary keys, but they are part of generating the primary key. I haven’t seen anyone directly say it, but it sounds like the primary key is a hash of SSN + DOB (I hope with more data to add entropy, because thats still a tiny bit of data to build a rainbow table from).
Still, assuming we haven’t begun re-using SSNs, it seems concerning to me that a SSN is appearing multiple times in the database. It seems a safe assumption that the uniqueness of a SSN should make the resultant hash unique, so a SSN appearing as associated to multiple primary keys should be a concern, right?
Other comments have led me to believe the “duplicate SSNs” are probably appearing in “different fields” (e.g. a dead man’s SSN would appear directly associated to him, but also as a sort of “collecting payments from” entry in his living wife’s entry). That would a misrepresentation of the facts (which we know Vice Bro, Elon Musk the Wise and Honest would never do). Occam’s Razor though has me leaning in that direction.
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
I can imagine an SSN existing in more than one primary key due to errors. If they use SSNs in the primary key at all, but combined with something else, that leads me to believe that the designers felt that SSNs were reliable for being a pure primary key.
I agree with you about Occam’s Razor. The guy has demonstrated multiple times that he’s a dishonest moron.
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
Yes, in the case of duplicate SSN assignments for two people (rare) l you would need to change their records to align with the new SSN while not changing the records that go the the person who keeps the SSN. We do it with state identifiers and it is a gigantic pain in the ass.
If two numbers are assigned to the same person merging them to one of the two is far easier.
I can definitely imagine all that. Thanks!
I’m not familiar with cases where someone’s SSN could change. Could you link to resources on when that would happen?
I don’t have any resources handy, but I do know someone who this happened to: they were an immigrant who got an SSN the first time they migrated to the US, went back to live in their country for a number of years, then returned to the US and I guess applied for an SSN again. Voilá, two SSNs and a mess.
Yeah, I can imagine thats be an administrative headache. I do not envy them the opportunity of sorting that out.
Thanks for the example though. That makes sense.
I don’t envy either party either. You’re welcome!
I think the thing that’s catching you up the most is that you’re assuming Elon has the slightest clue what he’s talking about about. In your mind, you’ve read the words “the social security database” from his post and have made assumptions about what that means.
I’ve worked with databases for 20+ years, several of those being years working on federal government systems. Each agency has dozens or possibly hundreds of databases all used for different purposes. Saying “the social security database” is so fucking general that it’s basically nonsensical. It’d be like saying “Ford’s car database”.
Elon clearly heard someone technical talking about something, then misinterpreted it for his own purposes to justify what he is doing by destroying our government institutions. His follow up of saying the government doesn’t use SQL just reinforces that point.
Trying to logically backtrack into what he actually meant - and what the primary keys should be - is just sane washing an insane statement.
Musk is the walking Dunning-Krueger, he is too stupid to realize how terrible he sounds.
Dedup is about saving storage and has literally nothing to do with primary keys.
It’s a terminology thing really yes. I mean a database (SQL or not) shouldn’t need de-duplication by nature of how the record index/keys work.
If they’re not using a form of SQL though, I’d be very interested in what they are using. Back in the 90s I was messing around with things like Btrieve and other even more antiquated database engines. But all the software I used that utilised such things was converted to use a form of SQL (even if in some cases there were internal wrappers to allow access in the older way too via legacy code) over 20 years ago.
If I were an American though my biggest concern would be that Musk is able to know the structure AND content of the social security database. His post (if we believe it) demonstrates he must have access to both pieces of information.
His post (if we believe it) demonstrates he must have access to both pieces of information.
At best he is referring to an older mainframe he is aware of not being sql while being completely oblivious of all the government systems that are in sql.
Which isn’t giving him any credit, because in that case he is atill running his mouth based on being ignorant about other government systems.
I submitted fata to a government database yesterday that I know for a fact is sql because we have had an ongoing years long relationship that involves improving that system and aligning our state level sql database. The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.
The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.
This makes more sense. But even then they would surely transfer data from the old system over.
I mean I’m liking the idea that they went down into the basement, started up an old mini computer, with “superman 3” magnetic tapes with data from the 1980s to force them to try to integrate with that and only after transferring the data at 1000cps, find out it’s entirely out of date.
I mean, it won’t be the case, but I’d really like it to be. 😛
This makes more sense. But even then they would surely transfer data from the old system over.
All you gotta do is snap your fingers!
Moving data from system to system is a massive undertaking. It probably needs to be restructured, and decisions made during the process will be found to be imperfect and adjustments will need to be made along the way.
Then you have to change all the connections to other systems and recreate the existing reports and by the way the changed structure impacts all of that and you need to revisit why you have all this stuff snd why don’t we just leave it alone after all.
There is a reason that legacy systems stick around. I’m sure they have legacy mainframes with financial data. At my state office we have a financial mainframe we have been wanting to get rid of for over a decade and while we have peeled off what processes we can there is still a ton left to do. Nothing about it is easy compared to creating something new from scratch, in fact transitioning to a new system to replace an old system is probably ten times as much work. Not to mention you still have to use and maintain the old system the entire time!
Hanlon’s razor. He’s obviously referring to himself lol.
I think a lot of comments here miss the mark, it’s not really just about stating the gov does not use SQL.
Deduplication is generally part of a compression strategy and has nothing to do with SQL. If we’re being generous he may have been talking about normalization, but no one I have ever met has confused the two terms (they are distinctly different from an engineering perspective).
There are degrees of normalization too, so it may make total sense to normalize 3NF (third normal form) rather than say 6NF.
This is it, relational databases are normalized under forms, deduplicate is usually a term used when talking about a concrete data set from data sources like a database, not the relational data model in the database itself.
Thats interesting. I didn’t know anything about normal forms, but a quick glance at G4G has some interesting information. I don’t have the time to go through their full article at the moment, but its been added to my to do list.
Link for the lazy: https://www.geeksforgeeks.org/types-of-normal-forms-in-dbms/