diff --git a/ai_lecture.md b/ai_lecture.md new file mode 100644 index 00000000..42e9d67e --- /dev/null +++ b/ai_lecture.md @@ -0,0 +1,219 @@ +# 使用 MLP 识别 MNIST 手写数字 + +主讲人:陈永源 2024-01-05 + +Note: +这个课程假设各位是有编程基础的机器学习小白,课程我会用直观的说明方式讲解机器学习原理及相关代码,不会涉及到过多数学上的细节。 + +--- + +## 课程简介 + +在这个课程中,我们将会使用机器学习技术对手写数字图像进行分类,并包括 + +- 定义 Pytorch MLP 模型 +- 使用数据集工具 DataLoader +- 从零开始训练模型 +- 使用测试集评估模型 + +Note: + +机器学习和传统程序设计的一个主要区别是 +你无需明确告诉计算机该如何去识别一个数字 +例如说6这个数字有一个竖线还有一个圈 +你不需告诉计算机这些规则 +机器学习能够自动从数据集中学到这些特征 +并且使用这些特征逐步训练它自己的神经网络, +然后逐步提升它的准确性, +这和我们人类或者是生物的学习方式是类似的。 + +--- + +## 神经网络简介 + +神经网络这一概念在 1959 年被提出 + +神经网络的核心概念来源于人脑的神经元网络,它由许多相互连接的节点(或称为"神经元")组成。这些节点可以接收输入,并根据输入数据的不同对其进行加权处理,产生输出 + +![nn](./media/nn.png) + + +Demo: + +Note: + +该术语于1959年由IBM的亚瑟·塞缪尔提出,当时他正在开发能够下跳棋的人工智能程序。半个世纪后,预测模型已经嵌入在我们日常使用的许多产品当中,执行两项基本工作:一是对数据进行分类,例如判断路上是否有其他车辆,或者患者是否患有癌症;二是对未来结果做出预测,如预测股票是否会上涨 + +而神经网络,是机器学习中一类特别重要的算法模型。神经网络的核心概念来源于人脑的神经元网络,它由许多相互连接的节点(或称为"神经元")组成。这些节点可以接收输入,并根据输入数据的不同对其进行加权处理,产生输出。特别是在处理像图像或自然语言这样的数据集时,神经网络显示出强大的功能,因为它可以自动提取和创建特征,而无需手动的特征工程。 + +// 打开 demo + +这是一个非常直观的关于神经网络的演示。我们的任务是训练一个神经网络,对对右边的橙色和蓝色的点进行分类。 +用人眼看过去,我们很直观的就能发现 +蓝色的点在中间,外面一圈都是橙色的点 +但是我们要如何让计算机自己去学到这一个规则呢? +这个时候,我们设计了有两个隐藏层的神经网络 +第一个隐藏层有四个神经元,第二个隐藏层有两个神经元 +同时使用两个线性特征作为输入。 +// 点开始学习,在一百多个迭代之后这个神经网络就能学习到我们期望的特征 +// 将鼠标放在各个神经元上,可以看到每个神经元所学习到的特征已经他们和其他神经元之间的权重。 +除此之外,我们还有一些参数可以调整,例如学习率,可以理解为神经网络学习的步长,太大的值可能会学习不到最优解,太小的值可能会花费更多的时间来学习甚至永远学不到最优解 +那么我们现在就开始手写数字分类的任务 + +--- + +## MNIST 手写数字数据集 + +- 总共7000张图片,包含1000张测试集 +- 10个分类,代表0~9 + +![mnist handwritting digits](./media/minst.jpg) + +Note: +这个数字节里面包含了大约7000张图片, +每一张图片都是这种黑白的手写数字, +分别一共有10种, +代表0~9 +这7000张图片里面有6000张是训练级,1000张是测试级,我们只会使用训练级的图片来训练神经网络,测试级的图片不会给神经网络看,但会在评估阶段用来测试神经网络的性能,这样就能看一下这个神经网络对于它没见过的图片是否也能做到准确分类, + +--- + +## 神经网络结构 + +1. 输入层大小: 28*28 = 784 +2. 隐藏层 +3. 输出层大小: 10 + +![mlp structure](./media/mlp.jpeg) + +Note: +这是我们即将设计的神经网络的结构 +它首先在输入层接受一个维度为28*28,也就是784这样一个大小的输入 +然后它会经过几个隐藏层,最终来到有10个神经元的输出层 +完成手写数字识别的分类任务 + +--- + +## 神经网络优化器 + +优化器的作用是寻找最佳的参数组合以最小化损失函数 + +![optimizer](./media/optimizer.gif) + +--- + +## 代码讲解 + +```python [0-6|8-14|16-20|18|19|22-27] +import torch +import torchvision +import torch.nn as nn +import torch.optim as optim +from torchvision import datasets, transforms +from torch.utils.data import DataLoader + +# 检查设备 +device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') +print('Using device:', device) +# 设置超参数 +batch_size = 64 +learning_rate = 0.01 +epochs = 5 + +# 设置数据集的变换 +transform = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize((0,), (1,)) +]) + +#加载数据集 +train_dataset = datasets.MNIST(root='./data_mlp', train=True, transform=transform, download=True) +test_dataset = datasets.MNIST(root='./data_mlp', train=False, transform=transform, download=True) + +train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) +test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) +``` + +Note: +0-6 首先前后行我们载入一些需要用到的库 包括神经网络层,优化器,数据集需要用到的工具 +8-14 +然后初始化一些变量和参数 +这里用了一段代码自动判断CUDA显卡设备是否可用 +如果不可用的话则会选择CPU进行运算 +超参数的这个batch_size指的是每一次迭代中 +这个神经网络会一次性看多少张图片 +我们神经网络训练数据的时候不是一张图一张图的去看 +而是多张图放在一个批次里一起去看 +一般來說,Batch Size會盡可能設置得高,讓顯卡充分發揮它的並行運算能力,但是也要看情況具體分析,太高的話可能導致過擬核,影响模型最终的学习效果 +Learning Rate就是学习率,我们刚才讲过的 +指的是神经网络一次调整 +自身权重的步长 +数值越大,学得越快 +但是太大也可能会导致完全无法学习 +最后一个参数Epoch,代表需要学习整个数据集多少遍 +数字太小的话 +训练过程可能 +还没学到这些数字的特征就结束了 数字太大的话 +他可能会把数据集里面的所有噪音都记住 最终在测试级上表现效果不佳 +16-20 +Transform这一部分是告诉DataLoader +我们应该在读取数据,在喂给神经网络之前,做哪一些格式变换操作 +18 +首先我们会把这些整数类型的数据转换成Tensor,也就是PyTorch能够进行运算的向量类型数据 +19 +然后我们会调用Normalize把它归一化成均值为0,标准差为1的数据。 +因为在某些数据集中,数值之间的尺度是不匹配的,例如我们分析身高和机票价格之间的关系,身高的单位是米,一个人身高撑死最多2米多,但是机票价格从几千到几万不等。这时候进行normalize就非常有必要。 +这样经过调整之后,输入神经网络的数值就在一个合理的范围之内 + +--- + +## 代码讲解 + +```python [1-14|16-17|19-21|23-32|34-46] +# 定义MLP模型 +class MLP(nn.Module): + def __init__(self): + super(MLP, self).__init__() + self.fc1 = nn.Linear(28*28, 512) + self.fc2 = nn.Linear(512, 256) + self.fc3 = nn.Linear(256, 10) + + def forward(self, x): + x = x.view(-1, 28*28) + x = torch.relu(self.fc1(x)) + x = torch.relu(self.fc2(x)) + x = self.fc3(x) + return x + +# 创建模型实例并将其移到正确的设备上 +model = MLP().to(device) + +# 定义损失函数和优化器 +criterion = nn.CrossEntropyLoss() +optimizer = optim.SGD(model.parameters(), lr=learning_rate) + +# 训练模型 +for epoch in range(epochs): + model.train() + for batch_idx, (data, target) in enumerate(train_loader): + data, target = data.to(device), target.to(device) + optimizer.zero_grad() + output = model(data) + loss = criterion(output, target) + loss.backward() + optimizer.step() + +# 测试模型 +model.eval() +test_loss = 0 +correct = 0 +with torch.no_grad(): + for data, target in test_loader: + data, target = data.to(device), target.to(device) + output = model(data) + test_loss += criterion(output, target).item() + pred = output.argmax(dim=1, keepdim=True) + correct += pred.eq(target.view_as(pred)).sum().item() + +test_loss /= len(test_loader.dataset) +``` \ No newline at end of file diff --git a/index.html b/index.html index 2097df32..ee2a0715 100644 --- a/index.html +++ b/index.html @@ -16,8 +16,7 @@
-
Slide 1
-
Slide 2
+
@@ -31,6 +30,8 @@ // - https://revealjs.com/config/ Reveal.initialize({ hash: true, + width: 1920, + height: 1080, // Learn about plugins: https://revealjs.com/plugins/ plugins: [ RevealMarkdown, RevealHighlight, RevealNotes ] diff --git a/media/Machine Learning Explained in 100 Seconds [PeMlggyqz0Y].tsv b/media/Machine Learning Explained in 100 Seconds [PeMlggyqz0Y].tsv new file mode 100644 index 00000000..8efbaafa --- /dev/null +++ b/media/Machine Learning Explained in 100 Seconds [PeMlggyqz0Y].tsv @@ -0,0 +1,31 @@ +start end text +0 6480 Machine learning. Teach a computer how to perform a task, without explicitly programming it to perform said task. +6620 13420 Instead, feed data into an algorithm to gradually improve outcomes with experience, similar to how organic life learns. +13580 20400 The term was coined in 1959 by Arthur Samuel at IBM, who was developing artificial intelligence that could play checkers. +20540 26880 Half a century later, and predictive models are embedded in many of the products we use every day, which perform two fundamental jobs. +26880 32040 One is to classify data, like "Is there another car on the road?" or "Does this patient have cancer?" +32040 38600 The other is to make predictions about future outcomes, like "Will this stock go up?" or "Which YouTube video do you want to watch next?" +38600 43280 The first step in the process is to acquire and clean up data. Lots and lots of data. +43480 47780 The better the data represents the problem, the better the results. Garbage in, garbage out. +47900 52160 The data needs to have some kind of signal to be valuable to the algorithm for making predictions. +52160 59920 And data scientists perform a job called feature engineering to transform raw data into features that better represent the underlying problem. +60240 64240 The next step is to separate the data into a training set and testing set. +64460 71800 The training data is fed into an algorithm to build a model, then the testing data is used to validate the accuracy or error of the model. +71980 77700 The next step is to choose an algorithm, which might be a simple statistical model like linear or logistic regression, +77940 81260 or a decision tree that assigns different weights to features in the data. +81260 86640 Or you might get fancy with a convolutional neural network, which is an algorithm that also assigns +86640 91300 weights to features, but also takes the input data and creates additional features automatically. +91640 96300 And that's extremely useful for datasets that contain things like images or natural language, +96420 99020 where manual feature engineering is virtually impossible. +99260 103960 Every one of these algorithms learns to get better by comparing its predictions to an error function. +104160 109840 If it's a classification problem, like "Is this animal a cat or a dog?" the error function might be accuracy. +109840 115900 If it's a regression problem, like "How much will a loaf of bread cost next year?" then it might be mean absolute error. +116220 121780 Python is the language of choice among data scientists, but R and Julia are also popular options, +121920 125320 and there are many supporting frameworks out there to make the process approachable. +125500 132680 The end result of the machine learning process is a model, which is just a file that takes some input data in the same shape that it was trained on, +132860 136900 then spits out a prediction that tries to minimize the error that it was optimized for. +136900 141980 It can then be embedded on an actual device or deployed to the cloud to build a real-world product. +142180 144500 This has been Machine Learning in 100 Seconds. +144580 147160 Like and subscribe if you want to see more short videos like this, +147320 150500 and leave a comment if you want to see more machine learning content on this channel. +150620 153040 Thanks for watching, and I will see you in the next one. diff --git a/media/minst.jpg b/media/minst.jpg new file mode 100644 index 00000000..07ff1edd Binary files /dev/null and b/media/minst.jpg differ diff --git a/media/mlp.jpeg b/media/mlp.jpeg new file mode 100644 index 00000000..5c0910c9 Binary files /dev/null and b/media/mlp.jpeg differ diff --git a/media/nn.png b/media/nn.png new file mode 100644 index 00000000..90c50a07 Binary files /dev/null and b/media/nn.png differ diff --git a/media/optimizer.gif b/media/optimizer.gif new file mode 100644 index 00000000..ff931147 Binary files /dev/null and b/media/optimizer.gif differ