Deep Learning: How Machines “See” the World

Karen
3 min readMay 13, 2024

In this blog post, I am going to talk about something called convolutional neural networks, which is a technique that allows machines to perceive images much more closely to the way humans do. Pretty fascinating, isn’t it?

If you have experience editing a picture. Great news, you already have some hands on experience with computer vision! Let’s start with an example: when we open the image editing feature in your phone, imagine there is a tiny mini-brush being used to manipulate a section of the picture. The mini-brush is called a kernel. To be technical, a kernel is a matrix used to apply effects like sharpness and blurring.

What is the difference with pixels? Pixels are the tiny building blocks that are used to form an image. Each pixel contains color and determines a single point of an image.

How is all these related to deep learning? Kernels are used in a feature extraction technique for determining the most important areas of an image. This method has the name of convolution, and that is what we are seeing in our image below, or with any image editing tool.

In math lingo, convolution is when we apply a function to another function but we can think of our base image as a function of color and our kernel as a function of pixels.

  • Brightness is making the center…

--

--

Karen
Karen

Written by Karen

Hi 🙋🏻‍♀️ HCI - UX Researcher here. I enjoy to break down research papers into insightful bits. Thank you so much for visiting 🙏