Softmax

One term you’ll read when trying to pick up any machine learning topic is “softmax.” I’ve often seen this word when skimming machine intelligence topics. In the past, I’ve written it off as a topic I’ll need to get a firmer grasp on someday. Well, today is that day.

What is softmax?

Softmax is an operation that takes an input vector (i.e. an ordered collection of numbers), and outputs a vector of the same length, but with different values.

The result vector values add up to 1, making the result usable as a probability distribution. This is a handy property for ensuring that the result is always normalized, with values in a predictable range.

Calculation

The math is actually pretty straightforward. Borrowing from a more thorough explanation, the value of an element at index \(j\) in vector \(L\) can be written as:

\[ \begin{eqnarray} a^L_j = \frac{e^{z^L_j}}{\sum_k e^{z^L_k}} \end{eqnarray} \]

That \(e^{z^L_j}\) numerator is the natural exponential of the value at index \(j\).

The \(\sum_k e^{z^L_k}\) denominator is the sum of the natural expotentials of each value.

Implementation

A simple implementation would consist of three steps: calculate an intermediate vector of the the natural exponents, sum it, then calculate the final vector.

Example

  1. Say we have four numbers:

     [5, 10, 50, 100]
    
  2. The natural exponential of each would be

     [7.39, 20.09, 54.6, 148.41]
    
  3. Summing these, we get our denominator:

     230.49
    
  4. Dividing each value in step 2 by the value in 3, we get our result:

     [0.03, 0.09, 0.24, 0.64]
    

Notice that values in step 4 sum to 1.

Usage

Where I have most typically run into Softmax is its application in machine learning for converting the output values of a classification neural network into a probability distribution. Each value in the result is the predicted probability that the input falls into the given category.

Say that the example above consisted of scores for classification of images of fruit: apple, banana, orange, kiwi. So, after applying softmax, we can see our neural network model indicates a 3% chance of the image being an apple, 9% chance it is a banana, 24% chance it’s an orange, and a 64% chance that it is a kiwi. So it’s probably a kiwi.

Further Reading

  1. https://en.wikipedia.org/wiki/Softmax_function
  2. http://neuralnetworksanddeeplearning.com/chap3.html#softmax
  3. https://peterroelants.github.io/posts/neural_network_implementation_intermezzo02/#Softmax-function