Python 类型注释及自定义类型

Python 以其动态特性而受到许多开发者的欢迎。但是当工程项目变得越来越复杂的时候，这一特性又往往会使得开发者头疼不已。想一想，当看着一段代码，却无法确定其中变量的特定类型，无法下手编写代码。即便强行编写完成也只能在运行时检查是否出错，这无疑会对开发效率产生很大影响。

# 类型系统

按照一般的分类形式， Python 属于动态的强类型系统的编程语言。此外， Python 还支持 Duck Typing 这一特性。

所谓的 Duck Typing，即：

If it walks like a duck and it quacks like a duck, then it must be a duck.

翻译成计算机语言，就是，加入某个变量支持 A 类型的所有操作，那么我们就可以将其视为 A 类型的变量。我们可以看这样一段代码：

	from typing import Iterable
	def print_items(items: Iterable):
	for item in items:
	print(item)

	print_items([1,2,3])
	print_items({4, 5, 6})
	print_items({"A": 1, "B": 2, "C": 3})

上述的 print_items 函数，只要输入的 items 支持 __iter__ 操作，就能够成功调用该函数。即只要支持了 __iter__ 操作，我们就可以将 items 视为 Iterable 类型的变量。这里按照语法 <var>: <type> 标注的 Iterable 类型即 Python 中的一种类型注释，用于帮助开发者更方便地理解代码，并且可以通过静态类型检查工具来检查潜在的类型错误。具体细节将在本文后续内容中介绍。

虽然 Duck Typing 的特性非常便利，但是它是一把双刃剑：

Duck Typing 可以增加代码的健壮性，可以通过构建抽象类接口来支持多种类型变量的调用，不需要针对某个类型编写特定代码。
但是如果滥用 Duck Typing，某些时候可能会使开发者迷惑，因为支持的某些操作可能并不符合直觉。

# 类型注释

Python 的动态类型在编写小规模代码时非常方便，但是当代码规模变大后，就很难知晓某段复杂代码中变量的类型，只有在运行时才能够知道其类型。对于这点， Python 之父 Guido van Rossum 曾经说过：

I’ve learned a painful lesson that for small programs dynamic typing is great. For large programs you have to have a more disciplined approach and it helps if the language actually gives you that discipline, rather than telling you "Well, you can do whatever you want."

为了帮助解决这一困扰许多开发者的问题， Python 从 3.7 开始提供了较为完备的类型支持系统。

Python 的类型注释（Type Annotations），是一种类型提示（Type Hints），用于提示开发者某种变量的具体类型。其语法如下：

def find_workers_available_for_time(open_time: datetime.datetime) -> list[str]:

对于通常的变量定义，也可以添加类型注释：

	number: int = 0
	text: str = "useless"
	values: list[float] = [1.2, 3.4, 6.0]
	worker: Worker = Worker()

在 Python 3.8 之前，上述代码可能会报错，需要在代码开头添加 from __future__ import anntations 语句。如果是更老版本的 Python ，可以通过以下方式添加类型注释：

	ratio = get_ratio(5,3) # type: float
	def get_workers(open): # type: (datetime.datetime) -> List[str]

但是这种方式过于繁琐，可读性也不强，推荐使用较新版本的 Python 。

许多人可能担心这些额外的代码会影响运行性能，但其实不必在意，因为这些代码并不会实际运行。

除了提示开发者变量的代码类型之外，类型注释还能够帮助 IDE 提供自动补全功能。还可以此借助 mypy 等 Typechecker 来帮助检查代码的正确性。

虽然类型注释有很多好处，但是我们也不需要处处使用。对于简单的代码，过多的类型注释反而会影响对实际代码的阅读。

# 复杂类型

在上一节中的类型注释，都是由 Python 中的基础类型构成的。本节介绍一些其他类型以构建更复杂的类型注释。

# `Optional`

Python 中的变量可以动态绑定，都可以赋值为 None 。为了能够在类型中表示这一点，可以使用 Optional 这一类型。

	from typing import Optional
	maybe_a_string: Optional[str] = "abcdef" # This has a value
	maybe_a_string: Optional[str] = None # This is the absence of a value

Optional 能够提示这一变量有可能是 None 类型。能够帮助区分空值和 None 。

# `Union`

Union 用于表示一系列独立类型的并集。例如， Union[int, str] 表示某个变量可能是 int 或 str 中的一种。此外， Union[int, None] 和 Optional[int] 在表示上是等价的。

# `Literal`

Literal 类型可以限制某些类型的取值范围。

	from typing import Literal
	@dataclass
	class Error:
	error_code: Literal[1,2,3,4,5]
	disposed_of: bool

	@dataclass
	class Snack:
	name: Literal["Pretzel", "Hot Dog", "Veggie Burger"]
	condiments: set[Literal["Mustard", "Ketchup"]]

Literal 在 Python 3.8 引入。They are a little more lightweight than Python enumerations. 比枚举略轻量。

# `Annotated`

Literal 仅能够限定某些基本的类型，提供基本限制。无法提供像 “特定长度的字符串”，“匹配特定正则表达式的字符串” 等类型限制。

在这些情形下，可以使用 Annotated 类型实现。

	x: Annotated[int, ValueRange(3,5)]
	y: Annotated[str, MatchesRegex('[0-9]{4}')]

不过，Typechecker 无法帮助我们检查这一类型的错误，因为类型过于复杂，无法通过静态分析得到结果。因此，我们仍然需要自行在代码中对输入做检查。这样能够使得变量类型取值范围更加明晰，使代码更清晰。

# `NewType`

NewType 能够帮助提供更复杂的类型表达。 NewType 会基于已有类型创建一个新的类型，并拥有和已有类型相同的 fields 和 methods。尽管如此，这个新创建的类型和原类型是无法互换的。

可以看以下的例子：

	from typing import NewType

	class HotDog:
	''' Used to represent an unservable hot dog'''
	# ... snip hot dog class implementation ...

	ReadyToServeHotDog = NewType("ReadyToServeHotDog", HotDog)

	def dispense_to_customer(hot_dog: ReadyToServeHotDog):
	# ...

代码中 ReadyToServeHotDog 和 HotDog 是不等价的。在要求了 ReadyToServeHotDog 的时候，传递 HotDog 是不可行的，但是反过来是可以的。

与此同时，我们需要提供一个类型转换方式，否则开发者不知道该如何得到这一新类型的对象。

	def prepare_for_serving(hot_dog: HotDog) -> ReadyToServeHotDog:
	assert not hot_dog.is_plated(), "Hot dog should not already be plated"
	hot_dog.put_on_plate()
	hot_dog.add_napkins()
	return ReadyToServeHotDog(hot_dog)

	def make_snack():
	serve_to_customer(ReadyToServeHotDog(HotDog()))

这样一来，所有的 ReadyToServeHotDog 在创建时都会检查是否满足了特定的条件，保证了我们调用函数的正确性。这种函数称为 blessed function。我们需要告诉开发者，在任何时候，只能使用这些 blessed function 来创建我们构造的新类型。不过目前只有使用注释这种方法，暂时没有其他有效手段能够显式提醒开发者。

实际上，我们可以通过创造新的 class 来实现类似的效果，也能够提供更有效地防止非法值类型的传入。但是相对而言， NewType 的实现更加轻量。

需要注意， NewType 和类型别名不是一回事。类型别名和原类型是完全等价的，在任意时刻，语义上可以等价互换。但是 NewType 不是。

例如 IdOrName = Union[str, int] ， IdOrName 和 Union[str, int] 类型是等价的。类型别名在表示某些复杂嵌套类型的时候比较直观，比如 IDOrNameLookup 显然比 Union[dict[int, User], list[dict[str, User]]] 更直观。

# `Final`

Final 类型在 Python 3.8 中引入，该类型的值在赋值之后就无法再绑定到其他内容上。

例如，我们定义的品牌的名称，不会轻易修改：

VENDOR_NAME: Final[str] = "Viafore's Auto-Dog"

如果开发者后续错误地尝试修改其内容，Typechecker 会报错：

	def display_vendor_information():
	vendor_info = "Auto-Dog v1.0"
	# whoops, copy-paste error, this code should be vendor_info += VENDOR_NAME
	VENDOR_NAME += VENDOR_NAME
	print(vendor_info)

但是需要注意， Final 和 C++ 中的 const 类型是不同的，因为 Python 并不限制通过函数修改对象的内容，它仅仅限制了将某个变量绑定到其他对象上。

# 容器类型

除了对一般的 int 等基础类型之外， Python 中还常用 list 、 dict 、 set 等容器类型。相比于单个值类型的变量，其类型注释要更加复杂。

看以下例子：

	def create_author_count_mapping(cookbooks: list) -> dict:
	counter = defaultdict(lambda: 0)
	for book in cookbooks:
	counter[book.author] += 1
	return counter

尽管我们知道输入是一个 list ，输出是一个 dict ，但是我们仍然不清楚其中对象的具体类型。

我们可以为容器中的对象添加类型说明：

	AuthorToCountMapping = dict[str, int]

	def create_author_count_mapping(cookbooks: list[Cookbook]) -> AuthorToCountMapping:
	counter = defaultdict(lambda: 0)
	for book in cookbooks:
	counter[book.author] += 1
	return counter

这里，使用类型别名表示返回值的类型，在此处的上下文语境中能更加清楚地表明代码的意图。

# 同构数据 vs. 异构数据

在表示容器中的对象类型时，我们经常会遇到一个问题：如果容器中的对象类型并不总是一致的，我们该如何表示其类型？

我们可以将容器分为 homogeneous collections（同构容器）和 heterogeneous collections（异构容器）两种类型，按其中元素类型是否一致来区分。

在一般情况下，我们应当尽量使用同构容器，因为异构容器经常需要我们处理 special case，这很容易出错。同构容器不一定说明其中的元素是同一种原生类型等情形，只要我们能够对其使用完全相同的操作，那么就可以认定这些元素是同构的。

对于异构容器，我们可以使用 Union 表示其中的元素类型：

	Ingredient = tuple[str, int, str] # (name, quantity, units)
	Recipe = list[Union[int, Ingredient]] # the list can be servings or ingredients
	def adjust_recipe(recipe: Recipe, servings):
	# ...

如果异构容器中的类型过于复杂，我们很可能需要添加很多的类型检验代码。这时，使用一个自定义的 class 类型可能更加合适。

如果容器中的元素类型过多，我们还可以用 Any 表示任意一种类型。这样，任意一种类型都是合法的。只是这样就无法再提供任何有效的参考信息。

不过，对于 tuple ，其中的元素类型经常是异构的。

Cookbook = tuple[str, int] # name, page count

当然，这样的代码很容易变得难懂。因为我们需要比照每个索引对应的元素内容的含义。我们可以用 dict 来替换：

	food_lab = {
	"name": "The Food Lab",
	"page_count": 958
	}

但是如此一来， dict 中的键值会映射到不同的类型上。我们需要用 dict[str, Union[int, str]] 来表示该 dict 的类型。

对于这种复杂类型的字典，推荐用 TypedDict 。

# `TypedDict`

TypedDict 在 Python 3.8 中引入。用于必须在字典中存储异构类型数据的情形。

对于 Json YAML 等文件解析得到的 dict ，其中的数据通常都是异构的。如果我们控制了 dict 的创建，那么我们可以使用 dataclass 或者 class 来管理这些数据。对于解析文件得到的内容，我们仍需要通过查看文档等方法来确认。

我们可以用 TypedDict 来解决这一问题。

	from typing import TypedDict
	class Range(TypedDict):
	min: float
	max: float

	class NutritionInformation(TypedDict):
	value: int
	unit: str
	confidenceRange95Percent: Range
	standardDeviation: float

	class RecipeNutritionInformation(TypedDict):
	recipes_used: int
	calories: NutritionInformation
	fat: NutritionInformation
	protein: NutritionInformation
	carbs: NutritionInformation

	nutrition_information:RecipeNutritionInformation = \
	get_nutrition_from_spoonacular(recipe_name)

上述代码很清晰的表示了字典中的键值对类型。当字典类型发生了变化时，我们可以通过 mypy 帮助检查。如果我们忘记更新该 TypedDict ， mypy 能够帮助我们找出错误。

# 构建新的容器类型

# `Generics`

如果确实现有类型无法表达我们想要的内容，我们可以通过 Generics 帮助构建新的容器类型。

Generic 类型通常表示我们不关心其中的具体类型，但是它能够帮助我们限制用户使用不正确的类型。

	def reverse(coll: list) -> list:
	return coll[::-1]

对于 reverse 函数，我们不关系其中的具体类型，但是我们知道返回的列表和传入的列表的值类型是相同的。我们可以这样表示：

	from typing import TypeVar
	T = TypeVar('T')
	def reverse(coll: list[T]) -> list[T]:
	return coll[::-1]

这样，一个 int 类型的 list 就绝不会产生一个 str 类型的 list 的了。

基于此方式，我们可以表达更复杂的类型：

	from collections import defaultdict
	from typing import Generic, TypeVar

	Node = TypeVar("Node")
	Edge = TypeVar("Edge")

	# directed graph
	class Graph(Generic[Node, Edge]):
	def __init__(self):
	self.edges: dict[Node, list[Edge]] = defaultdict(list)

	def add_relation(self, node: Node, to: Edge):
	self.edges[node].append(to)

	def get_relations(self, node: Node) -> list[Edge]:
	return self.edges[node]

这样，我们可以使用 Graph 表示更丰富的类型：

	cookbooks: Graph[Cookbook, Cookbook] = Graph()
	recipes: Graph[Recipe, Recipe] = Graph()
	cookbook_recipes: Graph[Cookbook, Recipe] = Graph()
	recipes.add_relation(Recipe('Pasta Bolognese'),
	Recipe('Pasta with Sausage and Basil'))
	cookbook_recipes.add_relation(Cookbook('The Food Lab'),
	Recipe('Pasta Bolognese'))

Generic 能够让我们复用更多的代码，减少错误的出现。

Generic 的其他用途：

	def get_nutrition_info(recipe: str) -> Union[NutritionInfo, APIError]:
	# ...
	def get_ingredients(recipe: str) -> Union[list[Ingredient], APIError]:
	#...
	def get_restaurants_serving(recipe: str) -> Union[list[Restaurant], APIError]:
	# ...

显然，上述方式需要我们在每个返回值中都添加一个 APIError 类型，非常繁琐。我们可以这样改写：

	T = TypeVar("T")
	APIResponse = Union[T, APIError]
	def get_nutrition_info(recipe: str) -> APIResponse[NutritionInfo]:
	# ...
	def get_ingredients(recipe: str) -> APIResponse[list[Ingredient]]:
	#...
	def get_restaurants_serving(recipe: str) -> APIResponse[list[Restaurant]]:
	# ...

# 修改现有类型

有些时候，我们可以在现有类型的基础上修改，以实现我们想要的效果。假定我们想要让字典支持别名，即不同的 key 能够指向相同的 value 。如果复制多个 value 的话，在修改时容易漏掉其他对应 key 的修改。这时，我们可以通过创建 dict 的子类来实现以上效果。

以上需求，我们可以创建一个 dict 的子类来实现：

	class NutritionalInformation(dict):
	def __getitem__(self, key):
	try:
	return super().__getitem__(key)
	except KeyError:
	pass
	for alias in get_aliases(key):
	try:
	return super().__getitem__(alias)
	except KeyError:
	pass
	raise KeyError(f"Could not find {key} or any of its aliases")

但是，以上的代码实现存在问题。当我们继承一个 dict 的时候，我们无法保证内部函数会调用我们覆盖的函数实现。内置类型的许多函数使用内联代码调用来保证性能。如果只是添加额外的方法，那么继承内置类型是可行的。但是在将来，也有可能发生类似的错误，因此最好避免继承内置类型。

为了解决这一问题，我们可以使用 UserDict ：

	from collections import UserDict
	class NutritionalInformation(UserDict):
	def __getitem__(self, key):
	try:
	return self.data[key]
	except KeyError:
	pass
	for alias in get_aliases(key):
	try:
	return self.data[alias]
	except KeyError:
	pass
	raise KeyError(f"Could not find {key} or any of its aliases")

我们可以使用 self.data 来获取底层的原生 dict 数据。除此之外，还有 UserList 和 UserString 可以帮助我们对 list 和 str 实现类似的代码。不过需要注意，这些 User* 类型可能会带来一定的性能损耗，需要根据实际情况考虑。

# 抽象类型

通过定义抽象类，我们可以自定义容器类型。 collection.abc 中提供了许多的抽象基类，我们可以根据需求使用。

上一节中，我们提到了 UserDict UserList UserString ，但是并没有 UserSet 。本节我们基于 abc 来实现。

collections.abc.Set 提供了 set 的抽象基类定义。其中包含以下函数：

__contains__ ：检查是否包含某个元素
__iter__ ：用于迭代元素
__len__ ：返回容器中的元素个数

只要我们实现了以上三个函数，我们就可以实现一个类似的 set 。

	import collections
	class AliasedIngredients(collections.abc.Set):
	def __init__(self, ingredients: set[str]):
	self.ingredients = ingredients

	def __contains__(self, value: str):
	return value in self.ingredients or any(alias in self.ingredients for alias in get_aliases(value))

	def __iter__(self):
	return iter(self.ingredients)

	def __len__(self):
	return len(self.ingredients)

	>>> ingredients = AliasedIngredients({'arugula', 'eggplant', 'pepper'})
	>>> for ingredient in ingredients:
	>>> print(ingredient)
	'arugula'
	'eggplant'
	'pepper'

	>>> print(len(ingredients))
	3
	>>> print('arugula' in ingredients)
	True
	>>> print('rocket' in ingredients)
	True
	>>> list(ingredients \| AliasedIngredients({'garlic'}))
	['pepper', 'arugula', 'eggplant', 'garlic']

除此之外，我们还可以用 abc 来提供类型注释：

	def print_items(items: collections.abc.Iterable):
	for item in items:
	print(item)

只要对象支持 __iter__ 方法，那么就符合这个函数的参数要求。我们可以通过 ABC 来定义更复杂的参数类型。也是 Duck Type 的关键实现。

Python 3.9 提供了 25 中不同的抽象基类。可以查看文档了解^[1]。

collections.abc — Abstract Base Classes for Containers — Python 3.11.4 documentation ↩︎

# 类型系统

# 类型注释

# 复杂类型

# Optional

# Union

# Literal

# Annotated

# NewType

# Final