Files
crawler-toturial/2. Python 基础.ipynb
2024-10-17 12:01:28 +08:00

894 lines
22 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "9571b890-f74b-47da-9d17-89fd797fa493",
"metadata": {},
"source": [
"#### Jupyter Notebook 的基本操作\n",
"\n",
"- **创建新 Notebook**: 在 Jupyter 的主页上,点击右上角的 \"New\" 按钮,然后选择 \"Python 3\" 来创建一个新的 Notebook。\n",
"\n",
"- **运行单元格**: 选择一个单元格后,按 `Shift + Enter` 运行该单元格的内容。代码单元格会执行代码并显示输出。\n",
"\n",
"- **保存 Notebook**: 点击工具栏上的保存图标,或者使用快捷键 `Ctrl + S`Windows或 `Cmd + S`macOS。\n",
"\n",
"#### 注释的用法\n",
"\n",
"在编写代码时,注释是非常重要的,它们可以帮助你和其他人理解代码的目的和功能。注释不会被执行,只是作为说明存在。\n",
"\n",
"- **单行注释**: 使用 `#` 符号。所有在 `#` 后面的内容都会被视为注释。\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "02f18fde-16d9-45af-b3df-308d3d8bd9eb",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello, World!\n"
]
}
],
"source": [
"# 这是一个单行注释\n",
"print(\"Hello, World!\") # 这也是一个注释"
]
},
{
"cell_type": "markdown",
"id": "12b4e652-cb8b-4c2e-abb8-237b44dc1570",
"metadata": {},
"source": [
"### 代码块\n",
"\n",
"在 Jupyter Notebook 中,代码块是指一个代码单元格,它可以包含一段或多段 Python 代码。每个代码块都是一个独立的执行单元,你可以在其中编写代码、运行代码,并查看输出结果。以下是关于代码块及其输入输出的详细解释:\n",
"\n",
"- **代码单元格**: 在 Jupyter Notebook 中,代码块通常指的是代码单元格。你可以在代码单元格中输入 Python 代码,并通过运行该单元格来执行代码。\n",
"\n",
"- **运行代码块**: 选择一个代码单元格后,按 `Shift + Enter` 或点击工具栏上的 \"Run\" 按钮来运行代码。运行后,代码的输出会显示在单元格下方。\n",
"\n",
"### 输入和输出\n",
"\n",
"- **输入In**: 在 Jupyter Notebook 中,每个代码单元格的左侧会显示一个标签,如 `In [1]:`。这个标签表示这是第几个被执行的输入单元格。输入是指你在代码单元格中编写的代码。\n",
"\n",
"- **输出Out**: 当你运行一个代码单元格时,代码的执行结果会显示在单元格下方,这就是输出。输出可以是任何 Python 表达式的结果,如数值、字符串、列表、图表等。\n",
"\n",
"#### 示例"
]
},
{
"cell_type": "markdown",
"id": "d5bbfeb8-7507-47b6-8145-f9858462fd81",
"metadata": {},
"source": [
"- **输入**: 上述代码块中的所有代码行。\n",
"- **输出**: 运行代码块后,输出会显示为 `15`,这是最后一个表达式 `sum` 的结果。\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "b1a5270e-036a-4949-9f16-a2309831eba5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 这是一个简单的代码块\n",
"a = 5\n",
"b = 10\n",
"sum = a + b\n",
"sum # 这个表达式的结果会作为输出显示"
]
},
{
"cell_type": "markdown",
"id": "a903b6a9-ca5c-4ddb-8d6e-b693e481ad6f",
"metadata": {},
"source": [
"### 多个输出\n",
"\n",
"在一个代码单元格中,只有最后一个表达式的结果会被自动显示为输出。如果你想在一个单元格中显示多个输出,可以使用 `print()` 函数。"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c607e8f0-f191-40e1-bff0-6689f5e65639",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5\n",
"10\n",
"15\n"
]
}
],
"source": [
"# 显示多个输出\n",
"print(a) # 输出 a 的值\n",
"print(b) # 输出 b 的值\n",
"print(sum) # 输出 sum 的值"
]
},
{
"cell_type": "markdown",
"id": "5affaf61-f01d-43d8-b807-ed872d659563",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "7ef935bd-0289-48c7-aa49-36750d5dcda2",
"metadata": {},
"source": [
"### 2. Python基础\n",
"#### 2.1 变量和数据类型"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "777ef098-4dd3-46ba-a642-45bc8eef3636",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'int'> <class 'float'> <class 'str'> <class 'bool'>\n"
]
}
],
"source": [
"# 整数\n",
"a = 10\n",
"# 浮点数\n",
"b = 20.5\n",
"# 字符串\n",
"c = \"Hello, World\"\n",
"# 布尔值\n",
"d = True\n",
"\n",
"print(type(a), type(b), type(c), type(d))"
]
},
{
"cell_type": "markdown",
"id": "55fcc17d-8684-4cda-9df3-a2256e17f81f",
"metadata": {},
"source": [
"#### 2.2 基本运算"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "13f3fadf-e16b-4b4e-91af-630cbc0bdf98",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"30.5 -10.5 205.0 0.4878048780487805\n"
]
}
],
"source": [
"# 算术运算\n",
"sum = a + b\n",
"difference = a - b\n",
"product = a * b\n",
"quotient = a / b\n",
"\n",
"print(sum, difference, product, quotient)"
]
},
{
"cell_type": "markdown",
"id": "62d1f7f3-cdcf-4a1f-84aa-e12e39ce59ad",
"metadata": {},
"source": [
"#### 2.3 列表 List\n",
"\n",
"列表list是一种非常常用的数据结构。它可以存储多个元素并且这些元素可以是不同类型的。下面是一些常用的列表操作及其解释"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "b11a3e0c-4c4d-4d83-b231-cb57f841946f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"apple\n",
"cherry\n"
]
}
],
"source": [
"# 创建列表\n",
"fruits = [\"apple\", \"banana\", \"cherry\"]\n",
"print(fruits[0]) # 访问第一个元素\n",
"print(fruits[-1]) # 访问最后一个元素"
]
},
{
"cell_type": "markdown",
"id": "8cd1edab-0c73-4c74-a81e-4ec3af3e5010",
"metadata": {},
"source": [
"你可以使用 `append()` 方法在列表末尾添加元素"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "8b65db8c-3013-4809-8027-8e543a9bd784",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['apple', 'banana', 'cherry', 'orange']"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 在末尾添加一个元素\n",
"fruits.append('orange')\n",
"# 现在 fruits 是 ['apple', 'blueberry', 'cherry', 'orange']\n",
"fruits"
]
},
{
"cell_type": "markdown",
"id": "70032e23-b96b-46fb-bd6f-6d4caea1719d",
"metadata": {},
"source": [
"你可以使用 `remove()` 方法删除指定的元素,或者使用 `pop()` 方法删除指定位置的元素(默认删除最后一个)。"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "21e56204-b7c9-411f-8e54-3c55c9738e9c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['apple', 'banana', 'cherry']"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除最后一个元素\n",
"last_fruit = fruits.pop()\n",
"# 现在 fruits 是 ['apple', 'blueberry', 'cherry']\n",
"fruits"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "f4532754-3c89-40fa-bbe1-e52f47748a9b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'orange'"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# last_fruit 是 'orange'\n",
"last_fruit"
]
},
{
"cell_type": "markdown",
"id": "474fd23f-e596-45c5-b400-c3d14826fc38",
"metadata": {},
"source": [
"你可以使用 `sort()` 方法对列表进行排序。"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "785754cf-a820-4558-a61b-7045127f8436",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 1, 2, 3, 4, 5, 9]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numbers = [3, 1, 4, 1, 5, 9, 2]\n",
"numbers.sort()\n",
"numbers"
]
},
{
"cell_type": "markdown",
"id": "f4efda3a-7b00-42a1-bbda-76bc8bc59425",
"metadata": {},
"source": [
"你可以使用切片操作来获取列表的一个子集。"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "ced60078-2aaa-470c-8f65-4f03b8a9c5d4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['apple', 'banana']\n",
"['banana', 'cherry']\n"
]
}
],
"source": [
"# 获取第一个和第二个元素\n",
"subset = fruits[0:2] # ['apple', 'cherry']\n",
"print(subset)\n",
"\n",
"# 获取从第二个元素到最后的所有元素\n",
"subset = fruits[1:] # ['cherry']\n",
"print(subset)"
]
},
{
"cell_type": "markdown",
"id": "d9e93e52-f799-4696-9c9f-d636075f50f2",
"metadata": {},
"source": [
"#### 2.4 字典 Dict\n",
"\n",
"字典dictionary是一种用于存储键值对key-value pairs的数据结构。字典中的每个键都是唯一的并且可以通过键快速访问对应的值。下面是一些常用的字典操作及其解释\n"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "6f7a187a-9b2d-4a39-aef3-8e0d22af5670",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"25"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 创建一个空字典\n",
"my_dict = {}\n",
"\n",
"# 创建一个包含一些键值对的字典\n",
"person = {'name': 'Alice', 'age': 25, 'city': 'New York'}\n",
"\n",
"# 访问键为 'name' 的值\n",
"name = person['name'] # 'Alice'\n",
"\n",
"# 访问键为 'age' 的值\n",
"person['age'] # 25"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "1c053ace-6613-4046-a269-9400edf8b28b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'name': 'Alice', 'age': 25, 'city': 'New York', 'email': 'alice@example.com'}"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 添加一个新的键值对\n",
"person['email'] = 'alice@example.com'\n",
"person"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "426d7a12-f908-47f0-b4ec-c37104fe890d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'name': 'Alice', 'age': 25, 'email': 'alice@example.com'}"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 删除键为 'city' 的键值对\n",
"del person['city']\n",
"person"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "e56c8bb6-1cb9-4866-9eac-4cd9111be196",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True False\n"
]
}
],
"source": [
"# 检查 'name' 是否在字典中\n",
"has_name = 'name' in person # True\n",
"\n",
"# 检查 'city' 是否在字典中\n",
"has_city = 'city' in person # False\n",
"print(has_name, has_city)"
]
},
{
"cell_type": "markdown",
"id": "06d6079d-e554-4e65-ae86-026db05a94dd",
"metadata": {},
"source": [
"### 3. 控制结构\n",
"#### 3.1 条件语句"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4e1b439a-59d2-4786-9d8e-78f0139a8782",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a is less than or equal to b\n"
]
}
],
"source": [
"if a > b:\n",
" print(\"a is greater than b\")\n",
"else:\n",
" print(\"a is less than or equal to b\")"
]
},
{
"cell_type": "markdown",
"id": "d3c4da0f-69dd-4c5b-8296-5203cc3ced2b",
"metadata": {},
"source": [
"### 4. 函数"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0a404d8d-b5f5-409e-924a-b6f644c9e64d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello, Alice!\n"
]
}
],
"source": [
"def greet(name):\n",
" return f\"Hello, {name}!\"\n",
"\n",
"print(greet(\"Alice\"))"
]
},
{
"cell_type": "markdown",
"id": "8014998b-9966-4211-a73b-2630783a26de",
"metadata": {},
"source": [
"### 5. 文本处理\n",
"#### 5.1 字符串操作\n",
"\n",
"Python提供了许多内置的方法来操作字符串。在这段代码中我们使用了 `split()` 方法。`split()` 是一个字符串方法,用于将字符串拆分为子字符串**列表**。默认情况下,它会在空白字符(如空格、制表符、换行符等)处分割字符串。\n",
"\n",
"列表是Python中的一种数据结构用于存储有序的元素集合。列表中的元素可以是任何数据类型并且列表是可变的即可以修改。在这段代码中`words = text.split()` 将返回一个列表,其中包含字符串 `text` 中的每个单词作为列表的元素。"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "12593239-b389-4081-a00f-f462fafc9b68",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['Python', 'is', 'great', 'for', 'text', 'processing']\n"
]
}
],
"source": [
"text = \"Python is great for text processing\"\n",
"words = text.split()\n",
"print(words)"
]
},
{
"cell_type": "markdown",
"id": "3f2a12b5-746b-45cf-9d41-62c129a6399f",
"metadata": {},
"source": [
"#### 5.2 字符串拼接\n",
"\n",
"在这段代码中我们展示了几种在Python中处理和格式化字符串的常用方法\n",
"\n",
"1. **字符串拼接**\n",
" - 使用加号 (`+`) 进行字符串拼接是最基本的方法。它将多个字符串连接在一起形成一个新的字符串。在示例中,`greeting`、`name` 和其他字符串通过加号连接,形成完整的问候语 `Hello, Alice!`。\n",
"\n",
"2. **`join` 方法**\n",
" - `join` 是一个字符串方法,用于将一个可迭代对象(如列表或元组)中的元素连接成一个字符串。每个元素之间用调用 `join` 的字符串作为分隔符。在示例中,`\" \".join(words)` 将列表 `words` 中的元素用空格连接,形成句子 `Python is fun`。\n",
"\n",
"3. **f-string格式化字符串**\n",
" - f-string 是Python 3.6引入的一种字符串格式化方法。它通过在字符串前加上字母 `f`,并在字符串中使用大括号 `{}` 包含变量名或表达式,来实现字符串的动态插值。在示例中,`f\"My name is {name} and I am {age} years old.\"` 使用了 f-string将变量 `name` 和 `age` 的值插入到字符串中,生成 `My name is Alice and I am 24 years old.`。"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "bc287103-f8bc-4c8e-8255-f5dfc679fc05",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Hello, Alice!\n",
"Python is fun\n",
"My name is Alice and I am 24 years old.\n"
]
}
],
"source": [
"# 使用加号拼接字符串\n",
"greeting = \"Hello\"\n",
"name = \"Alice\"\n",
"message = greeting + \", \" + name + \"!\"\n",
"print(message)\n",
"\n",
"# 使用join方法\n",
"words = [\"Python\", \"is\", \"fun\"]\n",
"sentence = \" \".join(words)\n",
"print(sentence)\n",
"\n",
"name = \"Alice\"\n",
"age = 24\n",
"\n",
"# 使用f-string\n",
"formatted_text = f\"My name is {name} and I am {age} years old.\"\n",
"print(formatted_text)"
]
},
{
"cell_type": "markdown",
"id": "e4b23e4c-ba67-4fe7-a95c-172ad71ab8b4",
"metadata": {},
"source": [
"#### 5.3 字符串查找和替换"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "94435998-7580-49fe-b812-b65a75258a54",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Position of 'great': 10\n",
"Python is excellent for text processing\n"
]
}
],
"source": [
"text = \"Python is great for text processing\"\n",
"\n",
"# 查找子字符串\n",
"position = text.find(\"great\")\n",
"print(f\"Position of 'great': {position}\")\n",
"\n",
"# 替换子字符串\n",
"new_text = text.replace(\"great\", \"excellent\")\n",
"print(new_text)"
]
},
{
"cell_type": "markdown",
"id": "75cc2929-68e4-484c-8489-e8079277014e",
"metadata": {},
"source": [
"#### 5.4 改变字符串大小写"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "021c7729-bf6d-4e60-a28c-fdf67b38ba1d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"python is great\n",
"PYTHON IS GREAT\n",
"Python Is Great\n"
]
}
],
"source": [
"text = \"Python is Great\"\n",
"\n",
"# 全部转换为小写\n",
"lower_text = text.lower()\n",
"print(lower_text)\n",
"\n",
"# 全部转换为大写\n",
"upper_text = text.upper()\n",
"print(upper_text)\n",
"\n",
"# 首字母大写\n",
"title_text = text.title()\n",
"print(title_text)"
]
},
{
"cell_type": "markdown",
"id": "0707ad7d-a6b4-496e-8f1f-d28d6b811bc3",
"metadata": {},
"source": [
"#### 5.5 去除空白字符"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "8ca8471c-46de-4495-ab53-9e9075764d0f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"|Python is great|\n",
"|Python is great |\n",
"| Python is great|\n"
]
}
],
"source": [
"text = \" Python is great \"\n",
"\n",
"# 去除两端空白字符\n",
"trimmed_text = text.strip()\n",
"print(f\"|{trimmed_text}|\")\n",
"\n",
"# 去除左侧空白字符\n",
"left_trimmed_text = text.lstrip()\n",
"print(f\"|{left_trimmed_text}|\")\n",
"\n",
"# 去除右侧空白字符\n",
"right_trimmed_text = text.rstrip()\n",
"print(f\"|{right_trimmed_text}|\")"
]
},
{
"cell_type": "markdown",
"id": "a749e823-4ab8-43d2-8da7-fc9b34f98ff1",
"metadata": {},
"source": [
"### 6. 循环控制\n",
"\n",
"在编程中循环是一种非常重要的结构它允许我们重复执行一段代码。Python 中有两种主要的循环结构:`for` 循环和 `while` 循环。接下来,我们将介绍如何使用这些循环,并结合 `range`、`list` 和 `dict` 进行讲解。\n",
"\n",
"#### 6.1 For 循环遍历列表\n",
"\n",
"列表(`list`)是 Python 中的一种数据结构,可以存储多个值。我们可以使用 `for` 循环来遍历列表中的每个元素。"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "6645da47-7378-4446-af5b-c8eeeb8d6680",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"apple\n",
"banana\n",
"cherry\n"
]
}
],
"source": [
"# 定义一个列表\n",
"fruits = ['apple', 'banana', 'cherry']\n",
"\n",
"# 遍历列表\n",
"for fruit in fruits:\n",
" print(fruit)"
]
},
{
"cell_type": "markdown",
"id": "571bdd40-a16f-41c0-886b-227ae63752c7",
"metadata": {},
"source": [
"#### 6.3 For 循环遍历字典\n",
"\n",
"字典(`dict`)是一种键值对的数据结构。我们可以使用 `for` 循环来遍历字典的键和值。"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "7d837497-6fb9-4d9c-a7ae-d7a2f06aae8d",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name Alice\n",
"age 25\n",
"city New York\n"
]
}
],
"source": [
"# 定义一个字典\n",
"person = {'name': 'Alice', 'age': 25, 'city': 'New York'}\n",
"\n",
"# 遍历字典的键\n",
"for key in person:\n",
" print(key, person[key])"
]
},
{
"cell_type": "markdown",
"id": "2a2f9289-ed6f-466b-b2e9-1b6deb64d7e6",
"metadata": {},
"source": [
"#### 6.4 While 循环\n",
"\n",
"`while` 循环在给定条件为 `True` 时重复执行一段代码。它适用于需要在循环中改变条件的情况。"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "9d494e70-d499-4fa7-9454-af5677b7d275",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"3\n",
"4\n"
]
}
],
"source": [
"# 使用 while 循环打印 0 到 4\n",
"count = 0\n",
"while count < 5:\n",
" print(count)\n",
" count += 1"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}