Explanation: Python is a programming language. Numpy is a library for python that makes it possible to run large computations much faster than in native python. In order to make that possible, it needs to keep its own set of data types that are different from python’s native datatypes, which means you now have two different bool
types and two different sets of True
and False
. Lovely.
Mypy is a type checker for python (python supports static typing, but doesn’t actually enforce it). Mypy treats numpy’s bool_
and python’s native bool
as incompatible types, leading to the asinine error message above. Mypy is “technically” correct, since they are two completely different classes. But in practice, there is little functional difference between bool
and bool_
. So you have to do dumb workarounds like declaring every bool values as bool | np.bool_
or casting bool_
down to bool
. Ugh. Both numpy and mypy declared this issue a WONTFIX. Lovely.
Well, yeah, but they do mean the exact same thing, hopefully: true or false
Although thinking about it, someone above mentioned that the numpy
bool_
is an object, so I guess that is really: true or false or null/NoneIn an abstract sense, they do mean the same things but, in a technical sense, the one most relevant to programming, they do not.
The standard Python
bool
type is a subclass of the integer type. This means that it is stored as either 4 bytes (int32
) or 8 bytes (int64
).The
numpy.bool_
type is something closer to a native C boolean and is stored in 1 byte.So, memory-wise, one could store a
numpy.bool_
in a Pythonbool
but that now leaves 3-7 extra bytes that are unused in the variable. This introduces not just unnecessary memory usage but potential space for malicious data injection or extraction. Now, if one tries to store a Pythonbool
in anumpy.bool_
, if the interpreter or OS don’t throw an error and kill the process, you now have a buffer overflow/illegal memory access problem.What about converting on the fly? Well, that can be done but will come at a performance cost as every function that can accept a
numpy.bool_
now has to perform additional type checking, validation, and conversion on every single function call. That adds up quick when processing data on scales where numpy is called for.