Add "dict lookup performance" section

Closes: #208
This commit is contained in:
Yonatan Goldschmidt 2020-07-04 18:36:27 +03:00
parent 7457ffb848
commit 0b74f9ba5d
1 changed files with 32 additions and 0 deletions

32
README.md vendored
View File

@ -92,6 +92,7 @@ So, here we go...
* [Section: Miscellaneous](#section-miscellaneous)
+ [ `+=` is faster](#--is-faster)
+ [ Let's make a giant string!](#-lets-make-a-giant-string)
+ [ `dict` lookup performance](#-dict-lookup-performance)
+ [ Minor Ones *](#-minor-ones-)
- [Contributing](#contributing)
- [Acknowledgements](#acknowledgements)
@ -3348,6 +3349,37 @@ Let's increase the number of iterations by a factor of 10.
---
### ▶ `dict` lookup performance
```py
>>> some_dict = {str(i): 1 for i in range(1_000_000)}
>>> %timeit some_dict['5']
28.6 ns ± 0.115 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> some_dict[1] = 1
>>> %timeit some_dict['5']
37.2 ns ± 0.265 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# why did it become much slower?
```
#### 💡 Explanation:
+ CPython has a generic dictionary lookup function that handles all types of keys (`str`, `int`, any object ...), and a specialized one for the common case of dictionaries composed of `str`-only keys.
+ The specialized function (named `lookdict_unicode` in CPython's sources) knows all existing keys (including the looked-up key) are strings, and uses the faster & simpler string comparison to compare keys, instead of calling the `__eq__` method.
+ The first time a `dict` instance is accessed with a non-`str` key, it's modified so future lookups use the generic function.
+ This process is not reversible for the particular `dict` instance, and the key doesn't even have to exist in the dictionary - attempting a failed lookup has the same effect:
```py
>>> some_dict = {str(i): 1 for i in range(1_000_000)}
>>> %timeit some_dict['5']
28.5 ns ± 0.142 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> some_dict[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 1
>>> %timeit some_dict['5']
38.5 ns ± 0.0913 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
```
---
### ▶ Minor Ones *
<!-- Example ID: f885cb82-f1e4-4daa-9ff3-972b14cb1324 --->
* `join()` is a string operation instead of list operation. (sort of counter-intuitive at first usage)