Compiling¶

Python code -> Parse tree
Parse tree -> AST
Symbol table generated
Code object generated
Flow control graph generated
Code object optimization (Peephole optimization)
Bytecode generated

1. Lexer¶

Take the source code into each word.

Parser/tokenizer.c -> PyTokenizer_FromString
Parser/parsetok.c -> parsetok
Lib/tokenize.py

Tokenizing¶

The token is the name of some sort of symbol

For example:

a = 4
if (a <= 3):
    print("hello")

so it would turn into a list like below:

Name: a
EQUAL: =
NUMBER: 4
IF: if
LPAREN: (
etc

python3 -m tokenize test.py

2. Parsing¶

The parser does not know what the source file means for, instead, it just knows the token generated by the lexer, and the token object would use function next() to give a token to the parser one by one.

Python/pythonrun.c -> PyParser_ASTFromStringObject

import parser
code = "x = 2 + 2"
st = parser.suite(code)

>>> print(parser.st2list(st))
[257, [269, [270, [271, [272, [274, [305, [309, [310, [311, [312, [315, [316, [317, [318, [319, [320, [321, [322, [323, [324, [1, 'x']]]]]]]]]]]]]]]]], [22, '='], [274, [305, [309, [310, [311, [312, [315, [316, [317, [318, [319, [320, [321, [322, [323, [324, [2, '2']]]]]], [14, '+'], [320, [321, [322, [323, [324, [2, '2']]]]]]]]]]]]]]]]]]], [4, '']]], [4, ''], [0, '']]

LL_parser Full Grammar specification: https://docs.python.org/3/reference/grammar.html

3. AST¶

import dis
import ast
tree = ast.parse("x=2+2")
print(type(ast.dump(tree)))

AST example:

x= 1 + 1
y= x + 2
print(y)

Generated by Python AST Visualizer: https://vpyast.appspot.com/

4. Compiler¶

Python/compile.c

import dis
import ast
tree = ast.parse("x=2+2")
code_obejct = compile(tree,'test.py',mode='exec')
dis.dis(code_obejct)

c = compile(open('test.py').read(), 'test.py', 'exec')