Explanation: Python is a programming language. Numpy is a library for python that makes it possible to run large computations much faster than in native python. In order to make that possible, it needs to keep its own set of data types that are different from python’s native datatypes, which means you now have two different bool
types and two different sets of True
and False
. Lovely.
Mypy is a type checker for python (python supports static typing, but doesn’t actually enforce it). Mypy treats numpy’s bool_
and python’s native bool
as incompatible types, leading to the asinine error message above. Mypy is “technically” correct, since they are two completely different classes. But in practice, there is little functional difference between bool
and bool_
. So you have to do dumb workarounds like declaring every bool values as bool | np.bool_
or casting bool_
down to bool
. Ugh. Both numpy and mypy declared this issue a WONTFIX. Lovely.
So many people here explaining why Python works that way, but what’s the reason for numpy to introduce its own boolean? Is the Python boolean somehow insufficient?
From numpy’s docs:
and likewise:
Someone else points out that Python’s native
bool
is a subtype ofint
, so adding abool
to anint
(or performing other mixed operations) is not an error, which might then go on to cause of a hard-to-catch semantic/mathematical error.I am assuming that trying to add a NumPy
bool_
to anint
causes a compilation error at best and a run-time warning, or traceable program crash at worst.here’s a good question answer on this topic
https://stackoverflow.com/questions/18922407/boolean-and-type-checking-in-python-vs-numpy
plus this is kinda the tools doing their jobs.
bool_
exists for whatever reason. its not abool
but functionally equivalent.the static type checker mpy, correctly, states
bool_
andbool
aren’t compatible. in the same way other type different types aren’t compatibleTechnically the Python bool is fine, but it’s part of what makes numpy special. Under the hood numpy uses c type data structures, (can look into cython if you want to learn more).
It’s part of where the speed comes from for numpy, these more optimized c structures, this means if you want to compare things (say an array of booleans to find if any are false) you either need to slow back down and mix back in Python’s frameworks, or as numpy did, keep everything cython, make your own data type, and keep on trucking knowing everything is compatible.
There’s probably more reasons, but that’s the main one I see. If they depend on any specific logic (say treating it as an actual boolean and not letting you adding two True values together and getting an int like you do in base Python) then having their own also ensures that logic.
This is the only actual explanation I’ve found for why numpy leverages its own implementation of what is in most languages a primitive data type, or a derivative of an integer.
You know, at some point in my career I thought, it was kind of silly that so many programming languages optimize speed so much.
But I guess, that’s what you get for not doing it. People having to leave your ecosystem behind and spreading across Numpy/Polars, Cython, plain C/Rust and probably others. 🫠