Introduction to Neural Networks: A Beginner’s Guide with Python

Learn how neural networks work by building one from scratch in Python.

Written by
March 28, 2026 7 min read 258 views

Neural networks are no longer a specialized research topic confined to academic labs — they are the engine underneath the tools billions of people use every day. The autocomplete on your phone, the recommendation queue on your streaming service, the voice assistant you ask about the weather: all of these are powered by neural networks. In 2026, understanding how they work is essential for anyone who wants to build, evaluate, or critically assess the AI systems reshaping every industry.

The good news is that the core intuition behind neural networks is accessible to anyone with a basic grasp of algebra and a willingness to write a little Python. This tutorial will take you from zero to having built and trained your own neural network from scratch using nothing but NumPy. By the time you finish, you will understand what a neural network is, what the mathematics actually involves, how training works, and where to go next.

Neural network visualization
A neural network mimics how biological neurons communicate — but at massive scale.

What Is a Neural Network? The Intuition

Start with the biology — not because it’s a perfect analogy, but because it’s a useful one. Your brain contains roughly 86 billion neurons. Each neuron receives electrical signals from other neurons, processes those signals, and if the combined input crosses a threshold, fires its own signal. Complex behavior — vision, language, memory — emerges from trillions of these simple firing events happening in patterns shaped by experience.

Artificial neural networks borrow this architecture loosely. An artificial neuron receives numerical inputs, multiplies each by a weight (representing how much that input matters), sums them up, passes the result through an activation function that decides whether and how strongly the neuron fires, and passes its output to the next layer. The learning happens when those weights are adjusted based on how wrong the network’s output was.

A modern neural network is organized into layers. The input layer receives your raw data. One or more hidden layers learn increasingly abstract representations of that data. The output layer produces the final prediction. Each connection between neurons in adjacent layers has an associated weight. Training the network means finding the set of weights that makes the output as accurate as possible.

The Mathematics You Actually Need

You need three things: dot products, activation functions, and the chain rule of calculus (which we’ll handle conceptually).

A dot product is how a neuron computes its input. If a neuron receives inputs [x1, x2, x3] with weights [w1, w2, w3], the weighted sum is x1*w1 + x2*w2 + x3*w3, plus a bias term b. In vector notation: z = W·x + b.

Activation functions introduce non-linearity, which is what allows neural networks to learn patterns that are not just straight lines. Without them, stacking layers would be mathematically equivalent to a single linear transformation.

  • Sigmoid: sigma(z) = 1 / (1 + e^-z). Squashes any input to the range (0, 1). Used in output layers for binary classification.
  • ReLU (Rectified Linear Unit): f(z) = max(0, z). If the input is negative, output 0. If positive, pass it through. The dominant activation function in modern deep networks because it is computationally cheap and does not suffer from the vanishing gradient problem.
  • Tanh: Squashes input to (-1, 1). Zero-centered, which can help optimization.

Building Your First Neural Network from Scratch with Python

We will build a two-layer neural network that learns the XOR function — the very problem that defeated the original perceptron. XOR takes two binary inputs and outputs 1 if exactly one of them is 1, otherwise 0.

import numpy as np

X = np.array([[0, 0],[0, 1],[1, 0],[1, 1]])
y = np.array([[0],[1],[1],[0]])

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

np.random.seed(42)
W1 = np.random.randn(2, 4) * 0.1
b1 = np.zeros((1, 4))
W2 = np.random.randn(4, 1) * 0.1
b2 = np.zeros((1, 1))

learning_rate = 0.5

for epoch in range(10000):
    # Forward pass
    Z1 = X @ W1 + b1
    A1 = sigmoid(Z1)
    Z2 = A1 @ W2 + b2
    A2 = sigmoid(Z2)

    # Loss: Mean Squared Error
    loss = np.mean((y - A2) ** 2)

    # Backward pass
    dA2 = -2 * (y - A2) / y.shape[0]
    dZ2 = dA2 * sigmoid_derivative(Z2)
    dW2 = A1.T @ dZ2
    db2 = np.sum(dZ2, axis=0, keepdims=True)
    dA1 = dZ2 @ W2.T
    dZ1 = dA1 * sigmoid_derivative(Z1)
    dW1 = X.T @ dZ1
    db1 = np.sum(dZ1, axis=0, keepdims=True)

    # Update weights
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1

    if epoch % 1000 == 0:
        print(f"Epoch {epoch:5d} | Loss: {loss:.6f}")

print("Final predictions:", np.round(A2, 3))

Run this and you will see loss dropping from around 0.25 toward 0.001 over 10,000 epochs, with final predictions very close to [0, 1, 1, 0] — correctly solving XOR.

Training the Network — Understanding Gradient Descent

The weight update rule is simple: W -= learning_rate * gradient. This is gradient descent. The gradient tells us the direction of steepest ascent of the loss landscape. Moving in the opposite direction takes us toward lower loss.

The learning rate controls how big each step is. Too large, and you overshoot minima and the loss oscillates. Too small, and training takes forever. A learning rate of 0.1-0.5 works for small networks like ours. For deep networks, adaptive optimizers like Adam automatically adjust the effective learning rate per parameter.

Overfitting occurs when a network learns the training data so well that it memorizes noise rather than learning generalizable patterns. Remedies include dropout, L2 regularization, and early stopping.

Python code on screen
Python remains the language of choice for ML practitioners thanks to its rich ecosystem.

Moving to Frameworks — PyTorch vs TensorFlow

Our NumPy implementation is invaluable for understanding what’s happening under the hood. In practice, you will use a deep learning framework. The two dominant frameworks in 2026 are PyTorch (developed by Meta) and TensorFlow (developed by Google). PyTorch dominates academic research and is increasingly the default for practitioners — its dynamic computation graph makes debugging natural and its API feels like natural Python.

import torch
import torch.nn as nn
import torch.optim as optim

X = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
y = torch.tensor([[0.], [1.], [1.], [0.]])

model = nn.Sequential(
    nn.Linear(2, 4),
    nn.Sigmoid(),
    nn.Linear(4, 1),
    nn.Sigmoid()
)

loss_fn = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.5)

for epoch in range(10000):
    optimizer.zero_grad()
    predictions = model(X)
    loss = loss_fn(predictions, y)
    loss.backward()
    optimizer.step()
    if epoch % 1000 == 0:
        print(f"Epoch {epoch:5d} | Loss: {loss.item():.6f}")

print("Final predictions:", model(X).detach().round())

Common Beginner Mistakes

  • Not normalizing your input data. If features have very different scales, gradients will be wildly unbalanced. Always normalize continuous inputs before feeding them to a neural network.
  • Using the wrong learning rate. If your loss goes to NaN or oscillates wildly, your learning rate is too high. If it barely moves after many epochs, it is too low. Plot your training loss — it should decrease smoothly and then flatten.
  • Not shuffling your training data between epochs. If your dataset is ordered, the network will update weights repeatedly toward one class before seeing the other, causing unstable training.

Your Next Steps

Courses worth your time: fast.ai’s Practical Deep Learning for Coders is the best top-down course in existence — you will train state-of-the-art image classifiers in the first lesson. deeplearning.ai’s Deep Learning Specialization (Andrew Ng) is more bottom-up and mathematically thorough.

Projects to try next: Train a digit classifier on MNIST. Build a simple sentiment classifier on movie review data. Explore a pre-trained ResNet or BERT model and fine-tune it on a domain-specific dataset. Each project will teach you something you cannot learn from a tutorial.

Neural networks went from a fascinating curiosity to the foundation of modern AI within a single decade. Understanding them — really understanding them, not just calling an API — puts you in a fundamentally different position as an engineer, a researcher, or a product builder. The fundamentals you have covered here are the same fundamentals used in the largest language models in the world. Keep going.

Enjoyed this article?

Get weekly insights on Tech, AI & Beauty — straight to your inbox.

Leave a Comment

Your email address will not be published. Required fields are marked *