{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Exploring and understanding Python through surprising snippets.
\n", "\n", "Translations: [Chinese \u4e2d\u6587](https://github.com/leisurelicht/wtfpython-cn) | [Vietnamese Ti\u1ebfng Vi\u1ec7t](https://github.com/vuduclyunitn/wtfptyhon-vi) | [Add translation](https://github.com/satwikkansal/wtfpython/issues/new?title=Add%20translation%20for%20[LANGUAGE]&body=Expected%20time%20to%20finish:%20[X]%20weeks.%20I%27ll%20start%20working%20on%20it%20from%20[Y].)\n", "\n", "Other modes: [Interactive](https://colab.research.google.com/github/satwikkansal/wtfpython/blob/master/irrelevant/wtf.ipynb) | [CLI](https://pypi.python.org/pypi/wtfpython)\n", "\n", "Python, being a beautifully designed high-level and interpreter-based programming language, provides us with many features for the programmer's comfort. But sometimes, the outcomes of a Python snippet may not seem obvious at first sight.\n", "\n", "Here's a fun project attempting to explain what exactly is happening under the hood for some counter-intuitive snippets and lesser-known features in Python.\n", "\n", "While some of the examples you see below may not be WTFs in the truest sense, but they'll reveal some of the interesting parts of Python that you might be unaware of. I find it a nice way to learn the internals of a programming language, and I believe that you'll find it interesting too!\n", "\n", "If you're an experienced Python programmer, you can take it as a challenge to get most of them right in the first attempt. You may have already experienced some of them before, and I might be able to revive sweet old memories of yours! :sweat_smile:\n", "\n", "PS: If you're a returning reader, you can learn about the new modifications [here](https://github.com/satwikkansal/wtfpython/releases/) (the examples marked with asterisk are the ones added in the latest major revision). \n", "\n", "So, here we go...\n", "\n", "\n", "# Structure of the Examples\n", "\n", "All the examples are structured like below:\n", "\n", "> ### \u25b6 Some fancy Title\n", ">\n", "> ```py\n", "> # Set up the code.\n", "> # Preparation for the magic...\n", "> ```\n", ">\n", "> **Output (Python version(s)):**\n", ">\n", "> ```py\n", "> >>> triggering_statement\n", "> Some unexpected output\n", "> ```\n", "> (Optional): One line describing the unexpected output.\n", ">\n", ">\n", "> #### \ud83d\udca1 Explanation:\n", ">\n", "> * Brief explanation of what's happening and why is it happening.\n", "> ```py\n", "> # Set up code\n", "> # More examples for further clarification (if necessary)\n", "> ```\n", "> **Output (Python version(s)):**\n", ">\n", "> ```py\n", "> >>> trigger # some example that makes it easy to unveil the magic\n", "> # some justified output\n", "> ```\n", "\n", "**Note:** All the examples are tested on Python 3.5.2 interactive interpreter, and they should work for all the Python versions unless explicitly specified before the output.\n", "\n", "# Usage\n", "\n", "A nice way to get the most out of these examples, in my opinion, is to read them chronologically, and for every example:\n", "- Carefully read the initial code for setting up the example. If you're an experienced Python programmer, you'll successfully anticipate what's going to happen next most of the time.\n", "- Read the output snippets and,\n", " + Check if the outputs are the same as you'd expect.\n", " + Make sure if you know the exact reason behind the output being the way it is.\n", " - If the answer is no (which is perfectly okay), take a deep breath, and read the explanation (and if you still don't understand, shout out! and create an issue [here](https://github.com/satwikkansal/wtfpython/issues/new)).\n", " - If yes, give a gentle pat on your back, and you may skip to the next example.\n", "\n", "PS: You can also read WTFPython at the command line using the [pypi package](https://pypi.python.org/pypi/wtfpython),\n", "```sh\n", "$ pip install wtfpython -U\n", "$ wtfpython\n", "```\n", "---\n", "\n", "# \ud83d\udc40 Examples\n", "\n", "\n\n## Hosted notebook instructions\n\nThis is just an experimental attempt of browsing wtfpython through jupyter notebooks. Some examples are read-only because, \n- they either require a version of Python that's not supported in the hosted runtime.\n- or they can't be reproduced in the notebook envrinonment.\n\nThe expected outputs are already present in collapsed cells following the code cells. The Google colab provides Python2 (2.7) and Python3 (3.6, default) runtimes. You can switch among these for Python2 specific examples. For examples specific to other minor versions, you can simply refer to collapsed outputs (it's not possible to control the minor version in hosted notebooks as of now). You can check the active version using\n\n```py\n>>> import sys\n>>> sys.version\n# Prints out Python version here.\n```\n\nThat being said, most of the examples do work as expected. If you face any trouble, feel free to consult the original content on wtfpython and create an issue in the repo. Have fun!\n\n---\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 Strings can be tricky sometimes\n", "1\\.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140420665652016\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = \"some_string\"\n", "id(a)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140420665652016\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(\"some\" + \"_\" + \"string\") # Notice that both the ids are same.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "2\\.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = \"wtf\"\n", "b = \"wtf\"\n", "a is b\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = \"wtf!\"\n", "b = \"wtf!\"\n", "a is b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "3\\.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a, b = \"wtf!\", \"wtf!\"\n", "a is b # All versions except 3.7.x\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = \"wtf!\"; b = \"wtf!\"\n", "a is b # This will print True or False depending on where you're invoking it (python shell / ipython / as a script)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "metadata": { "collapsed": true }, "execution_count": null, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [] } ], "source": [ "# This time in file some_file.py\n", "a = \"wtf!\"\n", "b = \"wtf!\"\n", "print(a is b)\n", "\n", "# prints True when the module is invoked!\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "4\\.\n", "\n", "**Output (< Python3.7 )**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "'a' * 20 is 'aaaaaaaaaaaaaaaaaaaa'\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "'a' * 21 is 'aaaaaaaaaaaaaaaaaaaaa'\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Makes sense, right?\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \ud83d\udca1 Explanation:\n", "+ The behavior in first and second snippets is due to a CPython optimization (called string interning) that tries to use existing immutable objects in some cases rather than creating a new object every time.\n", "+ After being \"interned,\" many variables may reference the same string object in memory (saving memory thereby).\n", "+ In the snippets above, strings are implicitly interned. The decision of when to implicitly intern a string is implementation-dependent. There are some rules that can be used to guess if a string will be interned or not:\n", " * All length 0 and length 1 strings are interned.\n", " * Strings are interned at compile time (`'wtf'` will be interned but `''.join(['w', 't', 'f'])` will not be interned)\n", " * Strings that are not composed of ASCII letters, digits or underscores, are not interned. This explains why `'wtf!'` was not interned due to `!`. CPython implementation of this rule can be found [here](https://github.com/python/cpython/blob/3.6/Objects/codeobject.c#L19)\n", " ![image](/images/string-intern/string_intern.png)\n", "+ When `a` and `b` are set to `\"wtf!\"` in the same line, the Python interpreter creates a new object, then references the second variable at the same time. If you do it on separate lines, it doesn't \"know\" that there's already `\"wtf!\"` as an object (because `\"wtf!\"` is not implicitly interned as per the facts mentioned above). It's a compile-time optimization. This optimization doesn't apply to 3.7.x versions of CPython (check this [issue](https://github.com/satwikkansal/wtfpython/issues/100) for more discussion).\n", "+ A compile unit in an interactive environment like IPython consists of a single statement, whereas it consists of the entire module in case of modules. `a, b = \"wtf!\", \"wtf!\"` is single statement, whereas `a = \"wtf!\"; b = \"wtf!\"` are two statements in a single line. This explains why the identities are different in `a = \"wtf!\"; b = \"wtf!\"`, and also explain why they are same when invoked in `some_file.py`\n", "+ The abrupt change in the output of the fourth snippet is due to a [peephole optimization](https://en.wikipedia.org/wiki/Peephole_optimization) technique known as Constant folding. This means the expression `'a'*20` is replaced by `'aaaaaaaaaaaaaaaaaaaa'` during compilation to save a few clock cycles during runtime. Constant folding only occurs for strings having a length of less than 21. (Why? Imagine the size of `.pyc` file generated as a result of the expression `'a'*10**10`). [Here's](https://github.com/python/cpython/blob/3.6/Python/peephole.c#L288) the implementation source for the same.\n", "+ Note: In Python 3.7, Constant folding was moved out from peephole optimizer to the new AST optimizer with some change in logic as well, so the fourth snippet doesn't work for Python 3.7. You can read more about the change [here](https://bugs.python.org/issue11549). \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 Be careful with chained operations\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "(False == False) in [False] # makes sense\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "False == (False in [False]) # makes sense\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "False == False in [False] # now what?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "True is False == False\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "False is False is False\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "1 > 0 < 1\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "(1 > 0) < 1\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "1 > (0 < 1)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \ud83d\udca1 Explanation:\n", "\n", "As per https://docs.python.org/3/reference/expressions.html#membership-test-operations\n", "\n", "> Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.\n", "\n", "While such behavior might seem silly to you in the above examples, it's fantastic with stuff like `a == b == c` and `0 <= x <= 100`.\n", "\n", "* `False is False is False` is equivalent to `(False is False) and (False is False)`\n", "* `True is False == False` is equivalent to `True is False and False == False` and since the first part of the statement (`True is False`) evaluates to `False`, the overall expression evaluates to `False`.\n", "* `1 > 0 < 1` is equivalent to `1 > 0 and 0 < 1` which evaluates to `True`.\n", "* The expression `(1 > 0) < 1` is equivalent to `True < 1` and\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " 1\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " int(True)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " 2\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " True + 1 #not relevant for this example, but just for fun\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " So, `1 < 1` evaluates to `False`\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 How not to use `is` operator\n", "The following is a very famous example present all over the internet.\n", "\n", "1\\.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = 256\n", "b = 256\n", "a is b\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = 257\n", "b = 257\n", "a is b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "2\\.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = []\n", "b = []\n", "a is b\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = tuple()\n", "b = tuple()\n", "a is b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "3\\.\n", "**Output**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a, b = 257, 257\n", "a is b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Output (Python 3.7.x specifically)**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ ">> a is b\n", "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a, b = 257, 257\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \ud83d\udca1 Explanation:\n", "\n", "**The difference between `is` and `==`**\n", "\n", "* `is` operator checks if both the operands refer to the same object (i.e., it checks if the identity of the operands matches or not).\n", "* `==` operator compares the values of both the operands and checks if they are the same.\n", "* So `is` is for reference equality and `==` is for value equality. An example to clear things up,\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " class A: pass\n", " A() is A() # These are two empty objects at two different memory locations.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**`256` is an existing object but `257` isn't**\n", "\n", "When you start up python the numbers from `-5` to `256` will be allocated. These numbers are used a lot, so it makes sense just to have them ready.\n", "\n", "Quoting from https://docs.python.org/3/c-api/long.html\n", "> The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behavior of Python, in this case, is undefined. :-)\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "10922528\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(256)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "10922528\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = 256\n", "b = 256\n", "id(a)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "10922528\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(b)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140084850247312\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(257)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140084850247440\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "x = 257\n", "y = 257\n", "id(x)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140084850247344\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(y)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Here the interpreter isn't smart enough while executing `y = 257` to recognize that we've already created an integer of the value `257,` and so it goes on to create another object in the memory.\n", "\n", "Similar optimization applies to other **immutable** objects like empty tuples as well. Since lists are mutable, that's why `[] is []` will return `False` and `() is ()` will return `True`. This explains our second snippet. Let's move on to the third one, \n", "\n", "**Both `a` and `b` refer to the same object when initialized with same value in the same line.**\n", "\n", "**Output**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140640774013296\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a, b = 257, 257\n", "id(a)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140640774013296\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(b)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140640774013392\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "a = 257\n", "b = 257\n", "id(a)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "140640774013488\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(b)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "* When a and b are set to `257` in the same line, the Python interpreter creates a new object, then references the second variable at the same time. If you do it on separate lines, it doesn't \"know\" that there's already `257` as an object.\n", "\n", "* It's a compiler optimization and specifically applies to the interactive environment. When you enter two lines in a live interpreter, they're compiled separately, therefore optimized separately. If you were to try this example in a `.py` file, you would not see the same behavior, because the file is compiled all at once. This optimization is not limited to integers, it works for other immutable data types like strings (check the \"Strings are tricky example\") and floats as well,\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " a, b = 257.0, 257.0\n", " a is b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "* Why didn't this work for Python 3.7? The abstract reason is because such compiler optimizations are implementation specific (i.e. may change with version, OS, etc). I'm still figuring out what exact implementation change cause the issue, you can check out this [issue](https://github.com/satwikkansal/wtfpython/issues/100) for updates.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 Hash brownies\n", "1\\.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "some_dict = {}\n", "some_dict[5.5] = \"JavaScript\"\n", "some_dict[5.0] = \"Ruby\"\n", "some_dict[5] = \"Python\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Output:**\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "\"JavaScript\"\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "some_dict[5.5]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "\"Python\"\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "some_dict[5.0] # \"Python\" destroyed the existence of \"Ruby\"?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "\"Python\"\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "some_dict[5] \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "complex\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "complex_five = 5 + 0j\n", "type(complex_five)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "\"Python\"\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "some_dict[complex_five]\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "So, why is Python all over the place?\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \ud83d\udca1 Explanation\n", "\n", "* Uniqueness of keys in a Python dictionary is by *equivalence*, not identity. So even though `5`, `5.0`, and `5 + 0j` are distinct objects of different types, since they're equal, they can't both be in the same `dict` (or `set`). As soon as you insert any one of them, attempting to look up any distinct but equivalent key will succeed with the original mapped value (rather than failing with a `KeyError`):\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " 5 == 5.0 == 5 + 0j\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " 5 is not 5.0 is not 5 + 0j\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " some_dict = {}\n", " some_dict[5.0] = \"Ruby\"\n", " 5.0 in some_dict\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " (5 in some_dict) and (5 + 0j in some_dict)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* This applies when setting an item as well. So when you do `some_dict[5] = \"Python\"`, Python finds the existing item with equivalent key `5.0 -> \"Ruby\"`, overwrites its value in place, and leaves the original key alone.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " {5.0: 'Ruby'}\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " some_dict\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " {5.0: 'Python'}\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " some_dict[5] = \"Python\"\n", " some_dict\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* So how can we update the key to `5` (instead of `5.0`)? We can't actually do this update in place, but what we can do is first delete the key (`del some_dict[5.0]`), and then set it (`some_dict[5]`) to get the integer `5` as the key instead of floating `5.0`, though this should be needed in rare cases.\n", "\n", "* How did Python find `5` in a dictionary containing `5.0`? Python does this in constant time without having to scan through every item by using hash functions. When Python looks up a key `foo` in a dict, it first computes `hash(foo)` (which runs in constant-time). Since in Python it is required that objects that compare equal also have the same hash value ([docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__) here), `5`, `5.0`, and `5 + 0j` have the same hash value.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " 5 == 5.0 == 5 + 0j\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " hash(5) == hash(5.0) == hash(5 + 0j)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " **Note:** The inverse is not necessarily true: Objects with equal hash values may themselves be unequal. (This causes what's known as a [hash collision](https://en.wikipedia.org/wiki/Collision_(computer_science)), and degrades the constant-time performance that hashing usually provides.)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 Deep down, we're all the same.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "class WTF:\n", " pass\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Output:**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "WTF() == WTF() # two different instances can't be equal\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "WTF() is WTF() # identities are also different\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "hash(WTF()) == hash(WTF()) # hashes _should_ be different as well\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "id(WTF()) == id(WTF())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### \ud83d\udca1 Explanation:\n", "\n", "* When `id` was called, Python created a `WTF` class object and passed it to the `id` function. The `id` function takes its `id` (its memory location), and throws away the object. The object is destroyed.\n", "* When we do this twice in succession, Python allocates the same memory location to this second object as well. Since (in CPython) `id` uses the memory location as the object id, the id of the two objects is the same.\n", "* So, the object's id is unique only for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.\n", "* But why did the `is` operator evaluated to `False`? Let's see with this snippet.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " class WTF(object):\n", " def __init__(self): print(\"I\")\n", " def __del__(self): print(\"D\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " **Output:**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " I\n", " I\n", " D\n", " D\n", " False\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " WTF() is WTF()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ " I\n", " D\n", " I\n", " D\n", " True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ " id(WTF()) == id(WTF())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " As you may observe, the order in which the objects are destroyed is what made all the difference here.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \u25b6 Disorder within order *\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "from collections import OrderedDict\n", "\n", "dictionary = dict()\n", "dictionary[1] = 'a'; dictionary[2] = 'b';\n", "\n", "ordered_dict = OrderedDict()\n", "ordered_dict[1] = 'a'; ordered_dict[2] = 'b';\n", "\n", "another_ordered_dict = OrderedDict()\n", "another_ordered_dict[2] = 'b'; another_ordered_dict[1] = 'a';\n", "\n", "class DictWithHash(dict):\n", " \"\"\"\n", " A dict that also implements __hash__ magic.\n", " \"\"\"\n", " __hash__ = lambda self: 0\n", "\n", "class OrderedDictWithHash(OrderedDict):\n", " \"\"\"\n", " An OrderedDict that also implements __hash__ magic.\n", " \"\"\"\n", " __hash__ = lambda self: 0\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Output**\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "dictionary == ordered_dict # If a == b\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "True\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "dictionary == another_ordered_dict # and b == c\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "False\n", "\n", "# We all know that a set consists of only unique elements,\n", "# let's try making a set of these dictionaries and see what happens...\n", "\n" ] }, "output_type": "execute_result", "metadata": {}, "execution_count": null } ], "source": [ "ordered_dict == another_ordered_dict # then why isn't c == a ??\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "Traceback (most recent call last):\n", " File \"