python - 在python中,為什麼子類化這麼慢?

  显示原文与译文双语对照的内容

我在開發一個擴展dict的簡單類,我意識到鍵查找和使用pickle非常慢。

我以為類有問題,所以我做了一些簡單的基準測試:


(venv) marco@buzz:~/sources/python-frozendict/test$ python --version


Python 3.9.0a0


(venv) marco@buzz:~/sources/python-frozendict/test$ sudo pyperf system tune --affinity 3


[sudo] password for marco: 


Tune the system configuration to run benchmarks



Actions


=======



CPU Frequency: Minimum frequency of CPU 3 set to the maximum frequency



System state


============



CPU: use 1 logical CPUs: 3


Perf event: Maximum sample rate: 1 per second


ASLR: Full randomization


Linux scheduler: No CPU is isolated


CPU Frequency: 0-3=min=max=2600 MHz


CPU scaling governor (intel_pstate): performance


Turbo Boost (intel_pstate): Turbo Boost disabled


IRQ affinity: irqbalance service: inactive


IRQ affinity: Default IRQ affinity: CPU 0-2


IRQ affinity: IRQ affinity: IRQ 0,2=CPU 0-3; IRQ 1,3-17,51,67,120-131=CPU 0-2


Power supply: the power cable is plugged



Advices


=======



Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs


Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s ' 


x = {0:0, 1:1, 2:2, 3:3, 4:4}


' 'x[4]'


.........................................


Mean +- std dev: 35.2 ns +- 1.8 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


class A(dict):


 pass 



x = A({0:0, 1:1, 2:2, 3:3, 4:4})


' 'x[4]'


.........................................


Mean +- std dev: 60.1 ns +- 2.5 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


x = {0:0, 1:1, 2:2, 3:3, 4:4}


' '5 in x'


.........................................


Mean +- std dev: 31.9 ns +- 1.4 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


class A(dict):


 pass



x = A({0:0, 1:1, 2:2, 3:3, 4:4})


' '5 in x'


.........................................


Mean +- std dev: 64.7 ns +- 5.4 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python


Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01) 


[GCC 9.2.1 20190909] on linux


Type"help","copyright","credits" or"license" for more information.


>>> from timeit import timeit


>>> class A(dict):


... def __reduce__(self): 


... return (A, (dict(self), ))


... 


>>> timeit("dumps(x)","""


... from pickle import dumps


... x = {0:0, 1:1, 2:2, 3:3, 4:4}


...""", number=10000000)


6.70694484282285


>>> timeit("dumps(x)","""


... from pickle import dumps


... x = A({0:0, 1:1, 2:2, 3:3, 4:4})


...""", number=10000000, globals={"A": A})


31.277778962627053


>>> timeit("loads(x)","""


... from pickle import dumps, loads


... x = dumps({0:0, 1:1, 2:2, 3:3, 4:4})


...""", number=10000000)


5.767975459806621


>>> timeit("loads(x)","""


... from pickle import dumps, loads


... x = dumps(A({0:0, 1:1, 2:2, 3:3, 4:4}))


...""", number=10000000, globals={"A": A})


22.611666693352163



結果真是意外,雖然鍵查找速度較慢2x,但pickle速度較慢5x。

get()__eq__()__init__(),以及keys()上的迭代速度如何。

編輯:我查看了python 3.9的源代碼,它是Objects/dictobject.c,而dict_subscript()僅在缺少鍵時減慢子類,因為子類可以實現__missing__(),並且它嘗試查看它是否存在,但是基準測試中有一個現有的密鑰。

注意到:__getitem__()是用標誌METH_COEXIST定義的,另外,__contains__()是另一個2x慢的方法,有相同的標誌,從官方文檔

方法被載入來代替現有的定義,沒有METH_COEXIST,預設值是跳過重複的定義,由於slot包裝器是在方法表之前載入的,例如sq_contains slot的存在將生成一個contains ()的包裝方法,並阻止載入有相同名稱的相應PyCFunction,定義了標誌后,將載入PyCFunction來代替包裝對象,並將與插槽共存,這很有幫助,因為對PyCFunctions的調用比對包裝器對象調用的優化多。

如果我理解正確,理論上METH_COEXIST應該加快速度,但它有相反的效果,為什麼?

EDIT 2:我發現了更多。

__getitem__()__contains()__被標記為METH_COEXIST,因為它們在PyDict_Type中聲明了兩次。

它們在插槽tp_methods中同時存在,其中它們被顯式聲明為__getitem__()__contains()__,但是官方文檔表示tp_methods不是由子類繼承的。

dict的子類不調用__getitem__(),而是調用subslot mp_subscript,實際上,mp_subscript包含在插槽tp_as_mapping中,允許子類繼承它的subslots。

問題是__getitem__()mp_subscript使用相同的函數dict_subscript,是否有可能它只是繼承的方式減慢了它的速度?

时间:

我在開發一個擴展dict的簡單類,我意識到鍵查找和使用pickle非常慢。

我以為類有問題,所以我做了一些簡單的基準測試:


(venv) marco@buzz:~/sources/python-frozendict/test$ python --version


Python 3.9.0a0


(venv) marco@buzz:~/sources/python-frozendict/test$ sudo pyperf system tune --affinity 3


[sudo] password for marco: 


Tune the system configuration to run benchmarks



Actions


=======



CPU Frequency: Minimum frequency of CPU 3 set to the maximum frequency



System state


============



CPU: use 1 logical CPUs: 3


Perf event: Maximum sample rate: 1 per second


ASLR: Full randomization


Linux scheduler: No CPU is isolated


CPU Frequency: 0-3=min=max=2600 MHz


CPU scaling governor (intel_pstate): performance


Turbo Boost (intel_pstate): Turbo Boost disabled


IRQ affinity: irqbalance service: inactive


IRQ affinity: Default IRQ affinity: CPU 0-2


IRQ affinity: IRQ affinity: IRQ 0,2=CPU 0-3; IRQ 1,3-17,51,67,120-131=CPU 0-2


Power supply: the power cable is plugged



Advices


=======



Linux scheduler: Use isolcpus=<cpu list> kernel parameter to isolate CPUs


Linux scheduler: Use rcu_nocbs=<cpu list> kernel parameter (with isolcpus) to not schedule RCU on isolated CPUs


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s ' 


x = {0:0, 1:1, 2:2, 3:3, 4:4}


' 'x[4]'


.........................................


Mean +- std dev: 35.2 ns +- 1.8 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


class A(dict):


 pass 



x = A({0:0, 1:1, 2:2, 3:3, 4:4})


' 'x[4]'


.........................................


Mean +- std dev: 60.1 ns +- 2.5 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


x = {0:0, 1:1, 2:2, 3:3, 4:4}


' '5 in x'


.........................................


Mean +- std dev: 31.9 ns +- 1.4 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python -m pyperf timeit --rigorous --affinity 3 -s '


class A(dict):


 pass



x = A({0:0, 1:1, 2:2, 3:3, 4:4})


' '5 in x'


.........................................


Mean +- std dev: 64.7 ns +- 5.4 ns


(venv) marco@buzz:~/sources/python-frozendict/test$ python


Python 3.9.0a0 (heads/master-dirty:d8ca2354ed, Oct 30 2019, 20:25:01) 


[GCC 9.2.1 20190909] on linux


Type"help","copyright","credits" or"license" for more information.


>>> from timeit import timeit


>>> class A(dict):


... def __reduce__(self): 


... return (A, (dict(self), ))


... 


>>> timeit("dumps(x)","""


... from pickle import dumps


... x = {0:0, 1:1, 2:2, 3:3, 4:4}


...""", number=10000000)


6.70694484282285


>>> timeit("dumps(x)","""


... from pickle import dumps


... x = A({0:0, 1:1, 2:2, 3:3, 4:4})


...""", number=10000000, globals={"A": A})


31.277778962627053


>>> timeit("loads(x)","""


... from pickle import dumps, loads


... x = dumps({0:0, 1:1, 2:2, 3:3, 4:4})


...""", number=10000000)


5.767975459806621


>>> timeit("loads(x)","""


... from pickle import dumps, loads


... x = dumps(A({0:0, 1:1, 2:2, 3:3, 4:4}))


...""", number=10000000, globals={"A": A})


22.611666693352163



結果真是意外,雖然鍵查找速度較慢2x,但pickle速度較慢5x。

get()__eq__()__init__(),以及keys()上的迭代速度如何。

編輯:我查看了python 3.9的源代碼,它是Objects/dictobject.c,而dict_subscript()僅在缺少鍵時減慢子類,因為子類可以實現__missing__(),並且它嘗試查看它是否存在,但是基準測試中有一個現有的密鑰。

注意到:__getitem__()是用標誌METH_COEXIST定義的,另外,__contains__()是另一個2x慢的方法,有相同的標誌,從官方文檔

方法被載入來代替現有的定義,沒有METH_COEXIST,預設值是跳過重複的定義,由於slot包裝器是在方法表之前載入的,例如sq_contains slot的存在將生成一個contains ()的包裝方法,並阻止載入有相同名稱的相應PyCFunction,定義了標誌后,將載入PyCFunction來代替包裝對象,並將與插槽共存,這很有幫助,因為對PyCFunctions的調用比對包裝器對象調用的優化多。

如果我理解正確,理論上METH_COEXIST應該加快速度,但它有相反的效果,為什麼?

EDIT 2:我發現了更多。

__getitem__()__contains()__被標記為METH_COEXIST,因為它們在PyDict_Type中聲明了兩次。

它們在插槽tp_methods中同時存在,其中它們被顯式聲明為__getitem__()__contains()__,但是官方文檔表示tp_methods不是由子類繼承的。

dict的子類不調用__getitem__(),而是調用subslot mp_subscript,實際上,mp_subscript包含在插槽tp_as_mapping中,允許子類繼承它的subslots。

問題是__getitem__()mp_subscript使用相同的函數dict_subscript,是否有可能它只是繼承的方式減慢了它的速度?

...