Deep Learning Midterm Project - Image Classification

#Mission: Show the image classification results using the four algoriths (DNN & CNN & ResNet in the lecture, Customized ResNet)

> Pytorch CPU ver. 설치

Anaconda Prompt에 'conda create -n pytorch(=가상환경 이름) ipykernel'을 입력하여 pytorch라는 이름의 가상 환경을 생성한다. (ipykernel: 주피터 노트북과 연동하기 위함)

'conda activate pytorch'로 가상 환경을 활성화한다.

https://pytorch.org/get-started/locally/에서 확인한 'pip3 install torch torchvision torchaudio'를 입력하여 PyTorch와 관련 패키지들을 설치한다.

여기서 jupyter notebook이 실행되지 않는 오류가 발생하였다. 'conda list jupyter'로 현재 가상환경에 설치되어 있는 파일을 확인한다. 'conda install jupyter'을 시도하니 OpenSSL이 시스템에 설치되어 있지 않음을 확인하였다.

https://blog.naver.com/PostView.nhn?blogId=baekmg1988&logNo=221454486746 다음 블로그를 참고하여 OpenSSL을 설치하고 환경 변수 등록을 끝냈다.

그러나 CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to download and install packages. 이런 오류가 발생했고, https://youtu.be/-6puHFu8zDY 이 영상의 Method 2를 따라 하여 해결했다.

이제 'conda install jupyter'가 작동한다.

그렇게 'jupyter notebook' 명령어로 주피터 노트북을 실행하여 새 파일을 만들었는데 500: Internal Server Error가 떴다. 이는 pip install chardet과 pip install --upgrade charset-normalizer로 해결해 주었다.

이제 시작할 준비가 완료되었다.

> DNN

Epochs=50, Batch size=32로 하였다. Batch size는 크면 메모리가 많이 필요하고, 작으면 많은 반복과 느린 훈련을 하지만 일반화의 측면에서는 더 낫다.

CIFAR100 데이터셋을 train_loader과 test_loader에 각각 받아온다. 다른 3가지 모델에서도 다음과 같은 방식으로 데이터를 가져온다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
train_loader = torch.utils.data.DataLoader(
    datasets.CIFAR100('./.data',train=True,download=True,
                     transform=transforms.Compose([
                         transforms.RandomCrop(32, padding=4),transforms.RandomHorizontalFlip(),
                         transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5),
                                                                    (0.5, 0.5, 0.5))])),
    batch_size=BATCH_SIZE, shuffle=True)
 
test_loader = torch.utils.data.DataLoader(
    datasets.CIFAR100('./.data',train=False,
                     transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.5, 0.5, 0.5),
                                              (0.5, 0.5, 0.5))])),
    batch_size=BATCH_SIZE, shuffle=True)
Colored by Color Scripter
cs

DNN 모델을 만들어준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
class DNN(nn.Module): 
    def __init__(self):
        super(DNN, self).__init__()
        self.fc1 = nn.Linear(32 * 32 * 3, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 100)
        
    def forward(self, x):
        x = x.view(-1, 32 * 32 * 3)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
cs

학습을 느리게 하지 않기 위해 학습률은 0.001로 하고, 학습을 위한 함수를 만들어준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
model = DNN() #CPU 사용
optimizer = optim.SGD(model.parameters(), lr=0.001)
train_losses = []
test_losses = []
 
def train(model, train_loader, optimizer):
    model.train()
    total_loss=0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to('cpu'), target.to('cpu') #CPU 학습
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
 
    avg_loss = total_loss / len(train_loader)
    train_losses.append(avg_loss)
    
Colored by Color Scripter
cs

테스트를 위한 함수 또한 만들고, loss와 accuracy 값을 구하기 위한 코드도 넣는다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def evaluate(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to('cpu'), target.to('cpu')
            output = model(data)
# 모든 오차 더하기
            test_loss += F.cross_entropy(output, target, reduction='sum').item()
# 가장 큰 값을 가진 클래스가 모델의 예측
# 예측과 정답을 비교하여 일치할 경우 correct에 1을 더
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    test_losses.append(test_loss)
    return test_loss, test_accuracy
 
Colored by Color Scripter
cs

마지막으로 반복문을 통해 epoch에 따른 결과를 확인하고, Training & Test Losses에 대한 그래프를 그려준다. 마지막으로 torch.save를 통해 모델 전체를 저장하고 끝냈다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for epoch in range(1, EPOCHS + 1):
    train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Test Loss: {:.4f}, Accuracy: {:.2f}%'.format(
        epoch, test_loss, test_accuracy))
    
plt.plot(range(1, len(train_losses) + 1), train_losses, label='Training Loss')
plt.plot(range(1, len(test_losses) + 1), test_losses, label='Test Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Test Losses')
plt.show()    
Colored by Color Scripter
cs

DNN은 CIFAR100 데이터셋과 같은 이미지 데이터를 처리하는데 적합하지 않을 것이라 예상했고, epoch 50까지 진행한 결과 1시간 15분 정도가 걸렸고 약 15%의 정확도를 보였다.

> CNN

CNN 모델을 만들어준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
EPOCHS = 100
BATCH_SIZE = 64
 
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(20 * 5 * 5, 500) #
        self.fc2 = nn.Linear(500, 100) #
        
    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 20 * 5 * 5) #
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return x
Colored by Color Scripter
cs

학습을 위한 함수를 만들고, epoch에 대한 평균 training loss를 계산하기 위한 코드를 추가해준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
model = CNN()
optimizer = optim.SGD(model.parameters(), lr=0.01)
train_losses = []
test_losses = []
 
def train(model, train_loader, optimizer, epoch):
    model.train()
    total_loss = 0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to('cpu'), target.to('cpu')
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        
        if batch_idx % 200 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
    
    #epoch에 대한 평균 training loss 계산
    avg_loss = total_loss / len(train_loader)
    train_losses.append(avg_loss)
Colored by Color Scripter
cs

다음은 테스트를 위한 함수이며, test loss와 test accuracy를 구해준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def evaluate(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to('cpu'), target.to('cpu')
            output = model(data)
#batch에 따른 test loss 더하기
            loss = F.cross_entropy(output, target)
            test_loss += loss.item()
           
# 가장 높은 값을 가진 인덱스가 바로 예측값
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
        
    test_loss /= len(test_loader.dataset)
    test_losses.append(test_loss)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    return test_loss, test_accuracy
cs

마지막으로 epoch에 따른 결과를 보기 위한 코드를 작성해준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
for epoch in range(1, EPOCHS + 1):
    train(model, train_loader, optimizer,epoch)
    test_loss, test_accuracy = evaluate(model, test_loader)
    
    print('[{}] Test Loss: {:.4f}, Accuracy: {:.2f}%'.format(epoch, test_loss, test_accuracy))
 
plt.plot(range(1, len(train_losses) + 1), train_losses, label='Training Loss')
plt.plot(range(1, len(test_losses) + 1), test_losses, label='Test Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Test Losses')
plt.show()
Colored by Color Scripter
cs

epoch 100까지 학습하는데 3시간 정도가 걸렸다. epoch 50일 때 정확도 24%로 DNN의 같은 시점보단 높은 정확도를 보이지만, epoch 100에 28%를 찍은 것으로 보아 학습에 따른 정확도가 더디게 상승한다고 판단했다. CNN의 Test loss는 0에 수렴하는 모습을 보였다.

> ResNet

Epochs=50, Batch size=64로 하였다.

그리고 class BasicBlock을 추가한다. ResNet 모델에서 Residual Block을 구성하는 부분으로, 심층 신경망을 훈련하는 데 도움이 되는 Residual Learning 구조를 가지고 있다.

Residual Block은 입력 데이터를 가져와서 두 개의 Convolution Layer와 Batch Normalization Layer를 거치고, 입력데이터와 함께 출력된다.

이를 통해 네트워크가 skip connection을 사용하여 층을 건너뛰며, 그래디언트 소실 문제를 완화하고 더 깊은 네트워크를 효과적으로 학습할 수 있도록 한다.

만약 입력 크기('in_plaines')와 출력 크기('planes')가 다르면 ('stride != 1 or in_planes != planes') Skip connection을 위한 1x1 컨볼루션과 배치 정규화를 추가하여 입력과 출력의 차원을 맞추는 역할을 한다. 이는 skip connection이 올바르게 수행될 수 있도록 도와준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
EPOCHS = 50
BATCH_SIZE = 64
 
class BasicBlock(nn.Module):
    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != planes:
            self.shortcut = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False),
                                          nn.BatchNorm2d(planes)
)
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out
Colored by Color Scripter
cs

ResNet 구조를 정의해준다. __init__ 메서드에서 모델을 초기화한다. 첫번째 레이어인 self.conv1, self.bn1에서 입력 이미지에 3x3 컨볼루션을 적용하고 Batch Normalization을 수행한다. 이로 인해 3채널인 초기 이미지가 16개의 채널로 변환된다.

그다음으로 _make_layer 메서드를 사용하여 각 레이어를 구성하는데, 이 메서드는 BasicBlock들을 여러 번 쌓아 각 레이어를 형성하는 것이다.

self.linear에서 Global Average Pooling을 적용한 후 1x1 컨볼루션으로 최종 출력을 100개의 클래스에 대응하는 벡터로 변환한다.

_make_layer 메서드의 planes와 num_blocks는 BasicBlock 내에서 사용할 채널 수와 블록 수이다.

forward 메서드에서는 정의된 레이어들을 순차적으로 적용한다. 입력을 받아 컨볼루션, Batch Normalization, 각 레이어를 거친 후에 Global Average Pooling을 적용하여 공간 차원을 제거한다.

Global Average Pooling(전역 평균 풀링)은 네트워크의 공간적인 특성을 요약하여 최종 클래스에 대한 확률을 계산하는데 사용된다. 결과적으로 이미지의 특정 패턴이나 특징들을 클래스와 연결시키고, 파라미터 수를 줄여 모델의 계산 효율성과 속도의 향상에 도움을 준다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class ResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 16
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        self.layer1 = self._make_layer(16, 2, stride=1)
        self.layer2 = self._make_layer(32, 2, stride=2)
        self.layer3 = self._make_layer(64, 2, stride=2)
        self.linear = nn.Linear(64, 100)
    def _make_layer(self, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(BasicBlock(self.in_planes, planes, stride))
            self.in_planes = planes
        return nn.Sequential(*layers)
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = F.avg_pool2d(out, 8)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
 
Colored by Color Scripter
cs

다음은 학습과 테스트를 위한 코드이다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
model = ResNet()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
 
train_losses = []
test_losses = []
 
def train(model, train_loader, optimizer):
    model.train()
    total_loss=0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to('cpu'), target.to('cpu') 
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        
    avg_loss = total_loss / len(train_loader)
    train_losses.append(avg_loss)
 
def evaluate(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to('cpu'), target.to('cpu')
            output = model(data)
            test_loss += F.cross_entropy(output, target, reduction='sum').item()
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    test_losses.append(test_loss)
    return test_loss, test_accuracy
 
Colored by Color Scripter
cs

마지막으로 역시 epoch를 거쳐가며 결과를 확인해준다. 총 epoch 50까지 학습하는 데 약 5시간이 걸렸다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for epoch in range(1, EPOCHS + 1):
    train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    scheduler.step()
    print('[{}] Test Loss: {:.4f}, Accuracy: {:.2f}%'.format(epoch, test_loss, test_accuracy))
    
plt.plot(range(1, len(train_losses) + 1), train_losses, label='Training Loss')
plt.plot(range(1, len(test_losses) + 1), test_losses, label='Test Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Test Losses')
plt.show()
 
Colored by Color Scripter
cs

> Custom ResNet

이전 ResNet에서 일부 변형해서 만든 코드이다. (16, 2), (32, 2), (64, 2)에서 (16, 2), (32, 3), (64, 4)로 레이어의 블록 수를 증가했다. 이는 각 레이어에서 반복되는 BasicBlock의 개수를 늘려 레이어에서 더 많은 특징을 학습하게 만든다. 많은 특징을 학습할수록 이미지의 주요 특징을 더 잘 이해하고 분류 정확도 또한 높아질 것으로 예상했다.

그러나 기존 모델보다 더 많은 연산이 필요할 것 같아 epoch를 원래 ResNet의 절반으로 줄였다.

그리고 데이터 증강 부분에 RandomRotation(10)을 추가하여 주어진 데이터셋의 이미지를 10도 내에서 랜덤하게 회전해서 성능을 향상시키고자 했다.

ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1)로 전부 0.1의 작은 색깔 변화를 주었다.

이렇게 자주 쓰인다는 데이터 증강 기법을 더해 데이터의 다양성을 높이고, 블록 개수를 조정해 많은 특징을 학습하도록 ResNet의 일부를 수정했다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
DEVICE = torch.device("cpu")
EPOCHS = 25
BATCH_SIZE = 64
 
train_loader = torch.utils.data.DataLoader(
    datasets.CIFAR100('./.data',train=True,download=True,
                     transform=transforms.Compose([
                         transforms.RandomCrop(32, padding=4),
                         transforms.RandomHorizontalFlip(),
                         transforms.RandomRotation(10),
                         #데이터 증강 추가: RandomRotation(10도까지만 랜덤 회전)
                         transforms.ColorJitter(brightness=0.1, contrast=0.1, saturation=0.1, hue=0.1),  
                         # 색깔 변화: 0.1씩 작은 변화
                         transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5),
                                                                    (0.5, 0.5, 0.5))])),
    batch_size=BATCH_SIZE, shuffle=True)
 
test_loader = torch.utils.data.DataLoader(
    datasets.CIFAR100('./.data',train=False,
                     transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.5, 0.5, 0.5),
                                              (0.5, 0.5, 0.5))])),
    batch_size=BATCH_SIZE, shuffle=True)
 
Colored by Color Scripter
cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
class BasicBlock(nn.Module):
    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3,
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.shortcut = nn.Sequential()
        if stride != 1 or in_planes != planes:
            self.shortcut = nn.Sequential(nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False),
                                          nn.BatchNorm2d(planes)
)
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out
Colored by Color Scripter
cs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class CustomResNet(nn.Module):
    def __init__(self, num_classes=10):
        super(CustomResNet, self).__init__()
        self.in_planes = 16
        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(16)
        #block 개수 증가 -> 레이어 많은 학습
        self.layer1 = self._make_layer(16, 2, stride=1)
        self.layer2 = self._make_layer(32, 3, stride=2)
        self.layer3 = self._make_layer(64, 4, stride=2)
        self.linear = nn.Linear(64, 100)
    def _make_layer(self, planes, num_blocks, stride):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(BasicBlock(self.in_planes, planes, stride))
            self.in_planes = planes
        return nn.Sequential(*layers)
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = F.avg_pool2d(out, 8)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
 
Colored by Color Scripter
cs

학습과 테스트를 위한 부분이다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
model = CustomResNet()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)
 
train_losses = []
test_losses = []
 
def train(model, train_loader, optimizer):
    model.train()
    total_loss=0
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to('cpu'), target.to('cpu') 
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
        
    avg_loss = total_loss / len(train_loader)
    train_losses.append(avg_loss)
 
def evaluate(model, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to('cpu'), target.to('cpu')
            output = model(data)
            test_loss += F.cross_entropy(output, target, reduction='sum').item()
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    test_accuracy = 100. * correct / len(test_loader.dataset)
    test_losses.append(test_loss)
    return test_loss, test_accuracy
 
Colored by Color Scripter
cs

epoch 25까지 돌려주며 결과를 확인해 보았다. 약 5시간이 걸렸고, 정확도의 결과는 기존과 크게 다르지 않았으나 학습 시간은 2배가 걸렸다. 학습 시간이 예상했던 대로 훨씬 더 걸렸음에도 불구하고, 수정했던 부분이 일단은 정확도 개선에 아무런 영향을 미치지 않은 것 같아 보인다. epoch를 더 많이 주었다면 결과가 달라졌을지도 모르지만 너무 많은 시간이 걸리는 것을 막기 위해 이렇게 했다.

1
2
3
4
5
6
7
8
9
10
11
12
13
for epoch in range(1, EPOCHS + 1):
    train(model, train_loader, optimizer)
    test_loss, test_accuracy = evaluate(model, test_loader)
    scheduler.step()
    print('[{}] Test Loss: {:.4f}, A ccuracy: {:.2f}%'.format(epoch, test_loss, test_accuracy))
    
plt.plot(range(1, len(train_losses) + 1), train_losses, label='Training Loss')
plt.plot(range(1, len(test_losses) + 1), test_losses, label='Test Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.title('Training and Test Losses')
plt.show()
Colored by Color Scripter
cs

> 결론

전체적으로 학습이 반복될수록 모든 loss들이 감소하는 경향을 보인다. DNN은 Training loss와 Test loss가 완전히 비슷하며, CNN은 Test loss가 에 수렴한다. 두 ResNet은 서로 비슷하지만 Training loss에 비해 Test loss가 상승과 하강을 반복하며 학습이 진행됨에 따라 점차 감소한다.

모델 4개의 성능을 선체적으로 비교해 보자면 학습 시간은 CustomResNet > ResNet > CNN > DNN이고, 내가 학습한 epoch까지의 정확도는 ResNet = CustomResNet > CNN > DNN인 것이 확인된다.

Epoch를 늘려 각 모델의 정확도의 한계를 확인하면 CIFAR- 데이터셋을 학습하는 데 있어 어느 모델이 가장 적합한지 정확하게 판단할 수 있을 것 같다. 그러나 지금 얻은 결과로만 보았을 때는 ResNet을 성능과 학습 시간을 동시에 개선할 수 있는 방향으로 코드를 수정하여 사용하거나, CNN의 코드를 개선하여 학습 시간을 더 늘려보는 방식이 좋아 보인다.

아쉬운 점은 epoch를 늘렸을 때 확인할 수 있는 Overfitting과 최적의 학습 완료 시점을 보지 못했다는 것이다. 각종 파라미터들과 손실 함수, epoch를 수정해가며 가장 적합한 경우를 찾을 수 있으면 더 좋을 것 같다.

Author Description

The Void