DL101 Neural Network Outputs and Loss Functions

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work systematically investigates the statistical relationship between neural network output-layer activation functions and loss functions. Methodologically, it unifies the derivation of common loss functions—including mean squared error (MSE), mean absolute error (MAE), and cross-entropy—by applying maximum likelihood estimation to derive their implicit probabilistic assumptions for linear, sigmoid, ReLU, and softmax activations, thereby revealing their intrinsic connection to generalized linear models. The key contribution is the first formal theoretical framework mapping output structure, loss function, and data distribution—grounded in statistical modeling—which yields interpretable design principles for practical scenarios such as heavy-tailed distributions, bounded outputs, and non-negative predictions. By bridging deep learning practice with classical statistical theory, this work enhances both the rationality and interpretability of neural architecture selection. (138 words)

Technology Category

Application Category

📝 Abstract

The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.

Problem

Research questions and friction points this paper is trying to address.

Analyzing activation functions and loss functions for neural network outputs

Connecting loss functions to Maximum Likelihood Estimation statistical principles

Examining practical scenarios like constrained outputs and heavy-tailed distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Connects loss functions to Maximum Likelihood Estimation principle

Links activation functions to statistical probability distributions

Relates network outputs to Generalized Linear Models framework

🔎 Similar Papers

2023-05-10ACM Computing SurveysCitations: 60