python3源码浅读——对象PyObject

对象CPython声明

Object and type object interface

首先我们先来看看object.h文件中的注释。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* Object and type object interface */
/*
Objects are structures allocated on the heap. Special rules apply to
the use of objects to ensure they are properly garbage-collected.
Objects are never allocated statically or on the stack; they must be
accessed through special macros and functions only. (Type objects are
exceptions to the first rule; the standard types are represented by
statically initialized type objects, although work on type/class unification
for Python 2.2 made it possible to have heap-allocated type objects too).
An object has a 'reference count' that is increased or decreased when a
pointer to the object is copied or deleted; when the reference count
reaches zero there are no references to the object left and it can be
removed from the heap.
An object has a 'type' that determines what it represents and what kind
of data it contains. An object's type is fixed when it is created.
Types themselves are represented as objects; an object contains a
pointer to the corresponding type object. The type itself has a type
pointer pointing to the object representing the type 'type', which
contains a pointer to itself!).
Objects do not float around in memory; once allocated an object keeps
the same size and address. Objects that must hold variable-size data
can contain pointers to variable-size parts of the object. Not all
objects of the same type have the same size; but the size cannot change
after allocation. (These restrictions are made so a reference to an
object can be simply a pointer -- moving an object would require
updating all the pointers, and changing an object's size would require
moving it if there was another object right next to it.)
Objects are always accessed through pointers of the type 'PyObject *'.
The type 'PyObject' is a structure that only contains the reference count
and the type pointer. The actual memory allocated for an object
contains other data that can only be accessed after casting the pointer
to a pointer to a longer structure type. This longer type must start
with the reference count and type fields; the macro PyObject_HEAD should be
used for this (to accommodate for future changes). The implementation
of a particular object type can cast the object pointer to the proper
type and back.
A standard interface exists for objects that contain an array of items
whose size is determined when the object is allocated.
*/

简单总结如下:

  • 对象分配一般在堆上进行
  • 每个对象有一个reference count 引用计数
  • 每一个对象有一个type来真正表明对象类型
  • 对象一旦分配,其在内存的地址就不在改变
  • 任何对象可以通过PyObject *类型指针访问,PyObject结构只包含引用计数和type类型指针两个字段,实际的对象数据由type类型指针指向,即_typeobject结构体。

PyObject和PyVarObject

在Python中一切都是对象,CPython通过相应的结构体来实现。而Python中的对象又分为定长和变长两种,在object.h中分别声明了PyObject和PyVarObject两个结构体来表示,代码如下:

1
2
3
4
5
6
7
8
9
10
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

从两个结构体定义可以发现,PyObject嵌套在PyVarObject内,实现了继承关系,任何一个指向Python对象的指针都可以转换为PyObject*,而任何一个指向Python变长对象的指针都可以转换为PyVarObject*

各个字段简要说明:

字段 说明
_PyObject_HEAD_EXTRA 双向链表结构,用于垃圾回收
ob_refcnt 即引用计数
ob_type 指向类型对象的指针,实现对象真正的_typeobject结构体
ob_size 变长对象中元素的个数,如列表与元素

主要结构体

结构体 说明
PyNumberMethods 数值方法
PySequenceMethods 序列方法
PyMappingMethods 映射方法
PyAsyncMethods 异步方法
_typeobject 类型对象

在结构体中,我们可以看见如下代码片段:

1
2
3
PyNumberMethods *tp_as_number;
PySequenceMethods *tp_as_sequence;
PyMappingMethods *tp_as_mapping;

对象的一系列方法就是通过这几个指针绑定。

需要留意的几个方法

1
2
3
4
5
6
// 获取op的引用计数
#define Py_REFCNT(ob) (((PyObject*)(ob))->ob_refcnt)
// 获取op的类型
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
// 获取op元素的个数
#define Py_SIZE(ob) (((PyVarObject*)(ob))->ob_size)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// 删除了部分code,不影响核心逻辑
// 调用类型的tp_dealloc进行释放,不同对象类型行为不同
#define _Py_Dealloc(op) ((*Py_TYPE(op)->tp_dealloc)((PyObject *)(op)))
// 引用计数+1
#define Py_INCREF(op) (((PyObject *)(op))->ob_refcnt++)
// 引用计数-1,当引用计数=0,调用_Py_Dealloc
#define Py_DECREF(op) \
do { \
PyObject *_py_decref_tmp = (PyObject *)(op); \
if (_Py_DEC_REFTOTAL _Py_REF_DEBUG_COMMA \
--(_py_decref_tmp)->ob_refcnt != 0) \
_Py_CHECK_REFCNT(_py_decref_tmp) \
else \
_Py_Dealloc(_py_decref_tmp); \
} while (0)