고려대학교 DMQA 연구실

1ⅹ1 convolution을 사용하면 채널 수를 줄일 수 있다.

2025년 2월 14일 오전 6:49
조회수: 123

Written by

김성범 교수님

“1ⅹ1 convolution을 사용하면 채널 수를 줄일 수 있다”는 말은 무엇을 의미일까? 합성곱신경망(convolutional neural networks; CNN) 모델에 사용되는 핵심 기술 중 하나는 convolution이다. Convolution은 간단히 말해, feature map에 특정 사이즈의 필터(혹은 커널)을 합성곱하는 과정이다. 좀 더 자세히 설명하면 특정 사이즈의 필터에 있는 값(가중치)과 그에 해당하는 feature map의 값을 개별적으로 곱하여 모두 더하는 (합성곱) 과정이다. 이때 사용자가 정해야 하는 것 중에 하나가 필터 크기인데, 필터 크기를 1ⅹ1 로 하면 1ⅹ1 convolution이 된다. 1ⅹ1 은 말 그대로 하나의 숫자다. 즉, 1ⅹ1 convolution을 적용하면 특정 숫자를 단순히 feature map 각 픽셀에 곱해주는 것이다. 따라서 Feature map 채널이 한 개일 경우 1ⅹ1 convolution은 특별한 효과가 없다. 그러나 feature map 채널이 여러 개일때, 1ⅹ1 필터는 각 채널의 동일한 위치 픽셀들과 합성곱을 통해 한 개의 값으로 요약된다. 이 과정에서 원래 feature map의 크기는 유지된다 (왜? 필터의 크기가 1ⅹ1이니까…). 여러 종류의 1ⅹ1 필터를 사용하면 다양한 요약 값을 얻을 수 있고, 그만큼 출력 채널을 얻을 수 있다. 예를 들어, 원래 feature map 채널이 250개이고, 10개의 서로 다른 종류의 1ⅹ1 필터를 사용하면, 10개의 출력 채널(원래 feature map과 동일한 크기의 출력 채널)을 얻을 수 있다. 이 10개의 채널은 원래 250개의 채널 정보를 잘 담고 있다. 즉, 250개의 채널을 큰 정보 손실 없이 10개의 채널로 줄일 수 있는 것이다. 채널 수를 줄이면 이후 연산량을 기하급수적으로 줄일 수 있는 효과가 있다.

What does it mean to say "using a 1ⅹ1 convolution can reduce the number of channels?" One of the key techniques used in convolutional neural networks (CNN) models is convolution. Simply put, convolution is the process of convolving a feature map with a filter (or kernel) of a certain size. More specifically, the values (weights) in a filter of a certain size and the corresponding values in the feature map are multiplied individually and added together (convolved). One of the things you need to decide is the size of the filter, so if you set the filter size to 1ⅹ1, it will be a 1ⅹ1 convolution. A 1ⅹ1 is literally a single number, which means that when you apply a 1ⅹ1 convolution, you are simply multiplying each pixel of the feature map by a specific number. Therefore, if you have a single feature map channel, a 1ⅹ1 convolution will have no effect. However, when there are multiple feature map channels, the 1ⅹ1 filter summarizes the same position pixels in each channel into a single value by convolving them. This process preserves the size of the original feature map (why? Because the filter is 1ⅹ1 in size...). By using different kinds of 1ⅹ1 filters, you can get different summarized values, and thus different output channels. For example, if the original feature map has 250 channels, and you use 10 different kinds of 1ⅹ1 filters, you get 10 output channels (output channels of the same size as the original feature map). These 10 channels contain the information of the original 250 channels well, implying that 250 channels can be reduced to 10 channels without much information loss. Reducing the number of channels has the effect of reducing the amount of computation exponentially afterward.

Essay