Chapter 1
18 min read
Section 9 of 353

Hyperbolic Functions: The Other Trigonometry

Mathematical Functions - The Building Blocks

Learning Objectives

By the end of this section you will be able to:

  1. Define sinh,cosh,tanh\sinh, \cosh, \tanh from the exponential function and explain why those particular combinations matter.
  2. Prove the central identity cosh2xsinh2x=1\cosh^2 x - \sinh^2 x = 1 and see why it forces the curve onto a hyperbola.
  3. Read the graphs of hyperbolic functions — their symmetries, asymptotes, and growth rates.
  4. Connect hyperbolic functions to real phenomena: the catenary, special relativity, and the tanh\tanh activation in neural networks.
  5. Compute sinh from its Taylor series and use PyTorch's tanh\tanh in a forward pass.

The Story: A Chain, a Sail, a Surprise

Hang a uniform chain between two poles and step back. What shape does it make? Galileo, in 1638, declared it a parabola. He was wrong — but the error was so subtle that the correct answer took another fifty-three years to find. In 1691 Jakob Bernoulli posed the problem as a public challenge; within a year his brother Johann, together with Huygens and Leibniz, produced the answer:

y=acosh ⁣(xa)y = a \cosh\!\left(\tfrac{x}{a}\right)

That “cosh” is the hyperbolic cosine. It is not a trigonometric function at all — it is built directly from exe^x. The same family of functions describes the shape of a sail filled with wind, the velocity of a falling object with air drag, the rapidity of a relativistic particle, and the activation curve of a neuron in a deep network. They show up everywhere there is an underlying balance between exponential growth and decay.

The whole idea in one line: hyperbolic functions are the symmetric and antisymmetric parts of exe^x. Every fact about them follows from that one decomposition.

Definitions from the Exponential

Any function f(x)f(x) can be uniquely split into an even part 12(f(x)+f(x))\tfrac{1}{2}(f(x) + f(-x)) and an odd part 12(f(x)f(x))\tfrac{1}{2}(f(x) - f(-x)). Apply this to f(x)=exf(x) = e^x. The two pieces are so important that they get their own names:

sinhx  =  exex2\sinh x \;=\; \dfrac{e^x - e^{-x}}{2}coshx  =  ex+ex2\cosh x \;=\; \dfrac{e^x + e^{-x}}{2}tanhx  =  sinhxcoshx  =  exexex+ex\tanh x \;=\; \dfrac{\sinh x}{\cosh x} \;=\; \dfrac{e^x - e^{-x}}{e^x + e^{-x}}

These are not arbitrary combinations. They are the unique way to decompose exe^x into an odd piece (sinh) and an even piece (cosh). Adding them puts the exponential back together:

ex=coshx+sinhxex=coshxsinhxe^x = \cosh x + \sinh x \qquad e^{-x} = \cosh x - \sinh x

Think of cosh\cosh and sinh\sinh as the “real” and “imaginary” parts of exe^x — but on the real axis. The structural analogy with cos\cos and sin\sin (which split eixe^{ix}) is what makes them “trigonometry, but for the hyperbola.”

We also define cosecant, secant, cotangent versions for completeness:

cschx=1sinhx,sechx=1coshx,cothx=1tanhx\operatorname{csch} x = \frac{1}{\sinh x}, \quad \operatorname{sech} x = \frac{1}{\cosh x}, \quad \coth x = \frac{1}{\tanh x}

Graphs and Behavior

Drag the slider below and watch the three core curves move together. Hide and show exe^x and exe^{-x} on the same axes — cosh sits at their midpoint, sinh sits at their half-difference. That single visual is most of the intuition.

Interactive Hyperbolic Graphs

-3-2-10123-4-3-2-101234cosh(0)=1
x =1.00
sinh(x)
1.1752
cosh(x)
1.5431
tanh(x)
0.7616
e^x
2.7183
cosh²−sinh²
1.000000

Drag the slider. Notice cosh(x) is the average of ex and e-x; sinh(x) is half their difference. The identity cosh²−sinh² = 1 holds at every x.

Three behaviors to commit to memory:

FunctionAt x = 0SymmetryBehavior for large |x|
sinh x0odd: sinh(−x) = −sinh x≈ ½ eˣ (for x large positive)
cosh x1 (minimum)even: cosh(−x) = cosh x≈ ½ eˣ (for x large positive)
tanh x0odd→ +1 as x → ∞, → −1 as x → −∞

Two things to notice. First, coshx1\cosh x \geq 1 everywhere — the chain never sags below its lowest point. Second, tanhx\tanh x is bounded between 1-1 and +1+1; it is the squashing function that maps the whole real line into the open interval (1,1)(-1, 1). That single property is why neural networks use it.


The Fundamental Identity

Every property of sinh and cosh flows from one equation:

cosh2xsinh2x=1\boxed{\cosh^2 x - \sinh^2 x = 1}

It is the hyperbolic mirror of cos2x+sin2x=1\cos^2 x + \sin^2 x = 1. Notice the minus sign — that is the only difference, and it changes the geometry from a circle to a hyperbola. Let's prove it in three lines from the definitions:

cosh2xsinh2x=(ex+ex2)2(exex2)2\cosh^2 x - \sinh^2 x = \left(\frac{e^x + e^{-x}}{2}\right)^2 - \left(\frac{e^x - e^{-x}}{2}\right)^2=(ex+ex)2(exex)24= \frac{(e^{x} + e^{-x})^2 - (e^{x} - e^{-x})^2}{4}=4exex4=e0=1.= \frac{4 \cdot e^{x}\cdot e^{-x}}{4} = e^{0} = 1.
The middle line uses the algebra identity (a+b)2(ab)2=4ab(a+b)^2 - (a-b)^2 = 4ab. With a=exa = e^x and b=exb = e^{-x}, the cross term 4ab=4exex=44ab = 4 \, e^x e^{-x} = 4. The whole identity collapses to 11.

Confirm it numerically with the slider in the explorer above. The cell labeled cosh²−sinh² always reads 1.0000001.000000, no matter how you drag xx.


Why “Hyperbolic”? Circle vs Hyperbola

The name comes from a perfect geometric parallel. Recall:

  1. The point (cost,sint)(\cos t, \sin t) traces the unit circle x2+y2=1x^2 + y^2 = 1. And tt equals twice the area of the circular sector swept from the positive xx-axis to that point.
  2. The point (cosht,sinht)(\cosh t, \sinh t) traces the unit hyperbola x2y2=1x^2 - y^2 = 1. And tt equals twice the area of the hyperbolic sector swept from (1,0)(1,0) to that point.

That is the entire reason for the name. In trigonometry, the parameter is an angle. In hyperbolic trigonometry, the parameter is a hyperbolic angle — a swept area, not an angle in radians.

Why “hyperbolic”? Circle vs Hyperbola

On the circle, the parameter t is an angle. On the hyperbola, t is a hyperbolic angle — but in both cases t equals twice the shaded area swept from the x-axis.

Unit circle: x² + y² = 1
(0.70, 0.72)area = t/2 = 0.400
x = cos(t) = 0.6967   y = sin(t) = 0.7174
cos² + sin² = 1.000000
Unit hyperbola: x² − y² = 1
(1.34, 0.89)area = t/2 = 0.400
x = cosh(t) = 1.3374   y = sinh(t) = 0.8881
cosh² − sinh² = 1.000000
t =0.80

Slide t. The two shaded regions always have area t/2. That is the meaning of t for sinh and cosh — it is the hyperbolic analog of an angle, not an angle in radians.

Analogy clincher. The circle and the hyperbola differ by a single sign: x2+y2=1x^2 \, \textbf{+} \, y^2 = 1 vs x2-y2=1x^2 \, \textbf{-} \, y^2 = 1. Every identity, derivative, and series for hyperbolic functions can be obtained from its circular cousin by carefully flipping one sign.

Algebraic Identities

Compare these side-by-side with their circular relatives. Notice how often a sign quietly flips:

TrigonometricHyperbolic
sin² + cos² = 1cosh² − sinh² = 1
sin(a + b) = sin a cos b + cos a sin bsinh(a + b) = sinh a cosh b + cosh a sinh b
cos(a + b) = cos a cos b − sin a sin bcosh(a + b) = cosh a cosh b + sinh a sinh b
sin(2a) = 2 sin a cos asinh(2a) = 2 sinh a cosh a
cos(2a) = cos²a − sin²acosh(2a) = cosh²a + sinh²a
1 + tan² = sec²1 − tanh² = sech²

These are not coincidence. They drop out of the definitions in one line. For example, expand sinh(a+b)\sinh(a+b):

sinh(a+b)=12(ea+be(a+b))\sinh(a+b) = \tfrac{1}{2}\left(e^{a+b} - e^{-(a+b)}\right)=12(eaebeaeb)= \tfrac{1}{2}\left(e^{a}e^{b} - e^{-a}e^{-b}\right)=sinhacoshb+coshasinhb.= \sinh a \, \cosh b + \cosh a \, \sinh b.

The last step is just regrouping eaebeaebe^a e^b - e^{-a} e^{-b} into the two pieces sinhacoshb\sinh a \, \cosh b and coshasinhb\cosh a \, \sinh b. Try it on paper — it is two lines of algebra.

Looking ahead: in Chapter 5 we will see that ddxsinhx=coshx\frac{d}{dx}\sinh x = \cosh x and ddxcoshx=sinhx\frac{d}{dx}\cosh x = \sinh x (no minus sign, unlike the circular case). That is yet another consequence of the same one-sign flip.

Inverse Hyperbolic Functions

Because sinh\sinh and tanh\tanh are strictly increasing, they have proper inverses on all of R\mathbb{R}. cosh\cosh is not one-to-one (it's even), so we restrict to x0x \geq 0 to invert it. The beautiful surprise: all three inverses have closed-form logarithm expressions.

arcsinhx=ln ⁣(x+x2+1),xR\operatorname{arcsinh} x = \ln\!\left(x + \sqrt{x^2 + 1}\right), \quad x \in \mathbb{R}arccoshx=ln ⁣(x+x21),x1\operatorname{arccosh} x = \ln\!\left(x + \sqrt{x^2 - 1}\right), \quad x \geq 1arctanhx=12ln ⁣(1+x1x),x<1\operatorname{arctanh} x = \tfrac{1}{2}\ln\!\left(\tfrac{1+x}{1-x}\right), \quad |x| < 1

Let's derive arcsinh\operatorname{arcsinh} to show how clean the algebra is. Set y=sinhx=exex2y = \sinh x = \tfrac{e^x - e^{-x}}{2}, then solve for xx. Multiply through by 2ex2 e^x:

2yex=e2x12y\,e^x = e^{2x} - 1(ex)22y(ex)1=0(e^x)^2 - 2y(e^x) - 1 = 0ex=y+y2+1e^x = y + \sqrt{y^2 + 1}x=ln ⁣(y+y2+1).x = \ln\!\left(y + \sqrt{y^2 + 1}\right).

That is the quadratic formula applied to a hidden quadratic in exe^x. The plus root is taken because ex>0e^x > 0 rules out the minus root.

These log-formulas are a nice reminder that anything built from exe^x can be inverted by something built from ln\ln. There is no “new transcendental function” needed.

The Catenary — Where cosh Really Lives

Now we earn the story we opened with. A flexible chain of uniform mass per unit length, hanging under gravity, satisfies a balance equation between tension and weight. Without doing the derivation (we'll do it carefully in Chapter 22), the resulting shape is:

y(x)=acosh ⁣(xa)y(x) = a \cosh\!\left(\tfrac{x}{a}\right)

The constant aa is the ratio of horizontal tension to weight per unit length — small aa means a tight, heavy chain (sharp dip); large aa means a light, taut chain (gentle curve). Drag the sliders below to feel it.

The Catenary — a hanging chain is exactly a cosh

Hang a uniform chain between two poles. Gravity + tension force the shape y = a · cosh(x / a). Compare it against the famous lookalike y = a + x²/(2a) (which is only a 2nd-order Taylor approximation).

y_min = a = 1.000y = a · cosh(x/a)y = a + x²/(2a)
a (sag)1.00
span3.00
y at endpoints
2.3524
sag (y_end − y_min)
1.3524
parabola error at x=span/2
0.2274

Increase the span and watch the parabola peel away from the real chain. For small x/a the two agree (that's why Galileo thought the chain was a parabola), but the true shape is cosh — a fact only nailed down in 1691 by the Bernoullis and Leibniz.

Galileo missed it by one term. The Taylor series of cosh starts as coshu=1+u22+u424+\cosh u = 1 + \tfrac{u^2}{2} + \tfrac{u^4}{24} + \dots For a shallow chain (x/a|x/a| small) only the first two terms matter, so ya+x22ay \approx a + \tfrac{x^2}{2a} — a parabola. That's why Galileo's eye couldn't tell. The discrepancy grows as x4/(24a3)x^4/(24 a^3).

Famous catenaries in the real world:

  • The Gateway Arch in St. Louis is an inverted weighted catenary — designed precisely so that every part of the arch is in pure compression.
  • Suspension bridge cables hang very close to a parabola because the deck's weight dominates, not the cable weight. The pure catenary is the answer only when the chain itself is the load.
  • The shape of an idealized hanging soap film between two rings is a catenoid, the surface of revolution of a catenary.

Worked Example (Collapsible)

Work through this by hand before peeking. The example threads almost every idea in this section through a single computation.

Worked Example — Compute every hyperbolic value at x = 1, then verify the identity, the double-angle formula, and the inverse.

Step 1. Compute e1e^1 and e1e^{-1}:

e = 2.718281828... 1/e = 0.367879441...

Step 2. Plug into the definitions:

sinh1=e1/e2=2.718280.367882=2.350402=1.17520\sinh 1 = \tfrac{e - 1/e}{2} = \tfrac{2.71828 - 0.36788}{2} = \tfrac{2.35040}{2} = 1.17520cosh1=e+1/e2=2.71828+0.367882=3.086162=1.54308\cosh 1 = \tfrac{e + 1/e}{2} = \tfrac{2.71828 + 0.36788}{2} = \tfrac{3.08616}{2} = 1.54308tanh1=sinh1cosh1=1.175201.54308=0.76159\tanh 1 = \tfrac{\sinh 1}{\cosh 1} = \tfrac{1.17520}{1.54308} = 0.76159

Step 3. Check the fundamental identity:

cosh²(1) − sinh²(1) = (1.54308)² − (1.17520)² = 2.38110 − 1.38110 = 1.00000 ✓

Step 4. Use the double-angle formula sinh(2x)=2sinhxcoshx\sinh(2x) = 2 \sinh x \cosh x:

sinh(2) ?= 2 · sinh(1) · cosh(1) 2 · 1.17520 · 1.54308 = 3.62686 direct: sinh(2) = (e² − e⁻²)/2 = (7.38906 − 0.13534)/2 = 3.62686 ✓

Step 5. Invert with the log-formula arcsinhy=ln(y+y2+1)\operatorname{arcsinh} y = \ln(y + \sqrt{y^2+1}):

arcsinh(1.17520) = ln(1.17520 + √(1.17520² + 1)) = ln(1.17520 + √2.38110) = ln(1.17520 + 1.54308) = ln(2.71828) = 1.00000 ✓

Notice the round-trip: x=1sinh1.17520arcsinh1x = 1 \xrightarrow{\sinh} 1.17520 \xrightarrow{\operatorname{arcsinh}} 1. The numbers 1.175201.17520 and 1.543081.54308 reappear inside the log — that's exactly the algebra that derived the closed-form inverse.


Plain Python: Building sinh from Scratch

Before reaching for a library, let's build sinh\sinh ourselves from its Taylor series. Recall:

sinhx=x+x33!+x55!+x77!+=n=0x2n+1(2n+1)!\sinh x = x + \frac{x^3}{3!} + \frac{x^5}{5!} + \frac{x^7}{7!} + \dots = \sum_{n=0}^{\infty} \frac{x^{2n+1}}{(2n+1)!}

Only the odd powers survive, because sinh\sinh is the odd part of exe^x (whose full Taylor series is xn/n!\sum x^n / n!; even terms cancel when we subtract exe^{-x}). We'll watch the partial sum converge to sinh1\sinh 1 term by term.

Build sinh from its Taylor Series — Plain Python
🐍python
1import math

Standard library, no third-party packages. We only need math.factorial for the denominators and math.sinh at the end to grade ourselves.

5x = 1.0

The point at which we evaluate sinh. We chose 1 because the true answer is the famous constant (e − 1/e)/2 ≈ 1.1752. Any |x| < 1 converges even faster — the factorials in the denominators crush the powers.

EXECUTION STATE
x = 1.0
6total = 0.0 — the running partial sum

We accumulate term-by-term into total. Starting at 0 because the series itself starts at term n=0 with value x¹/1! = x.

EXECUTION STATE
total = 0.0
8for n in range(8) — eight terms is plenty

Eight iterations means highest power is x¹⁵ and denominator is 15! = 1,307,674,368,000. The remaining tail of the series is below 1e-13 — invisible at double precision.

LOOP TRACE · 8 iterations
n=0
power = 1
term = 1/1! = 1.0000000000
total = 1.0000000000
n=1
power = 3
term = 1/3! = 0.1666666667
total = 1.1666666667
n=2
power = 5
term = 1/5! = 0.0083333333
total = 1.1750000000
n=3
power = 7
term = 1/7! = 0.0001984127
total = 1.1751984127
n=4
power = 9
term = 1/9! = 0.0000027557
total = 1.1752011684
n=5
power = 11
term = 1/11! = 0.0000000251
total = 1.1752011935
n=6
power = 13
term = 1/13! = 0.0000000002
total = 1.1752011936
n=7
power = 15
term = 1/15! ≈ 7.6e-13
total = 1.1752011936
9power = 2*n + 1 — only odd exponents

For n=0,1,2,3,... the exponents are 1,3,5,7,... exactly the odd integers. This is the structural reason sinh is the ODD part of eˣ: the even powers of e^x and e^-x cancel when you subtract.

10term = x^power / factorial(power)

Two numbers and a division. Notice the denominators grow factorially (1, 6, 120, 5040, ...) while the numerators grow only as powers (1, 1, 1, 1, ... at x=1). Factorial wins — that is what guarantees convergence.

11total += term

Plain accumulation. By the time n=7 the addition is dwarfed by the running total, so floating-point errors stop mattering. This is why Taylor series can be made arbitrarily accurate without arbitrary-precision arithmetic.

12print — watch the convergence in real time

Reading the trace top-to-bottom: 1.0000 → 1.1666 → 1.1750 → 1.1751984 → ... Each new term decides one more digit. By n=7 we have agreement with math.sinh(1) to 10 decimal places.

14math.sinh(1) — the reference value

The C library implementation of sinh uses essentially the same idea (a polynomial approximation) but with hand-tuned coefficients optimized for accuracy and speed. Our naive loop matches it.

EXECUTION STATE
math.sinh(1) = 1.1752011936438014
total = 1.1752011936...
abs error = ≈ 1e-13
7 lines without explanation
1import math
2
3# Goal: compute sinh(1) by summing the Taylor series term by term.
4# sinh(x) = x + x^3/3! + x^5/5! + x^7/7! + ...
5x = 1.0
6total = 0.0
7
8for n in range(8):
9    power = 2 * n + 1
10    term = x**power / math.factorial(power)
11    total += term
12    print(f"n={n}: term = x^{power}/{power}! = {term:.10f},  running total = {total:.10f}")
13
14print("\nlibrary sinh(1) =", math.sinh(1))
15print("our  approx     =", total)
16print("absolute error  =", abs(total - math.sinh(1)))
Try modifying the snippet locally: set x=5x = 5 and watch the series take many more terms to converge — the trade-off between input size and series length is the same one that governs how every transcendental function is computed inside your CPU.

PyTorch: tanh as a Neural Activation

In deep learning, the tanh function is a squashing non-linearity: it takes any real number and maps it into the open interval (1,1)(-1, 1). It is symmetric about the origin (unlike sigmoid, which is centered at 0.50.5), which makes gradient flow through deep networks behave better. Here is the smallest meaningful neural-network forward pass that uses it.

A One-Layer tanh Network in PyTorch
🐍python
1import torch

PyTorch gives us tensor objects with batched arithmetic and automatic differentiation. For this section we only use the forward pass — Chapter 5 will return to compute the derivative of tanh.

6torch.manual_seed(0)

Deterministic random — even though we hard-code the weights here, it is good practice when the same script later switches to nn.Linear which initialises randomly. Reproducibility costs nothing.

8x = torch.tensor([1.0, -2.0, 0.5])

A 3-dimensional input vector. Treat it as 'three features observed for one example' — maybe pixel intensity, edge strength, color saturation. The shape is (3,).

EXECUTION STATE
x.shape = torch.Size([3])
x = tensor([ 1.0000, -2.0000, 0.5000])
9W = torch.tensor([[...], [...]])

Weight matrix with shape (2, 3): two output neurons, each connected to all three inputs. Row 0 = neuron 0's weights; row 1 = neuron 1's weights. In a real network these are learned by gradient descent.

EXECUTION STATE
W.shape = torch.Size([2, 3])
W[0] = tensor([ 0.5000, -0.3000, 0.2000])
W[1] = tensor([-0.1000, 0.4000, 0.8000])
11b = torch.tensor([0.1, -0.2])

Bias term — one number per output neuron. Lets the neuron fire even when all inputs are zero. Shape (2,) to match the two output rows of W.

EXECUTION STATE
b = tensor([ 0.1000, -0.2000])
13z = W @ x + b — the linear pre-activation

Matrix-vector product. The @ operator is matmul. Computing row by row: z[0] = 0.5·1 + (-0.3)·(-2) + 0.2·0.5 + 0.1 = 0.5 + 0.6 + 0.1 + 0.1 = 1.3. z[1] = (-0.1)·1 + 0.4·(-2) + 0.8·0.5 + (-0.2) = -0.1 - 0.8 + 0.4 - 0.2 = -0.7. Shape is (2,).

EXECUTION STATE
z = tensor([ 1.3000, -0.7000])
14a = torch.tanh(z) — the non-linear squash

Element-wise tanh. PyTorch applies tanh(1.3) ≈ 0.8617 and tanh(-0.7) ≈ -0.6044. Without this non-linearity stacking many linear layers would still be linear — tanh is what gives the network expressive power.

EXECUTION STATE
tanh(1.3) = 0.8617
tanh(-0.7) = -0.6044
a = tensor([ 0.8617, -0.6044])
16print(z)

Expected stdout: 'z = tensor([ 1.3000, -0.7000])'. The pre-activations can be anywhere on the real line — they are unbounded by design.

17print(a)

Expected stdout: 'a = tensor([ 0.8617, -0.6044])'. After tanh, both values sit inside (-1, 1). That bounded range is the whole point of using a hyperbolic activation: it keeps signals from exploding as they propagate through many layers.

18torch.all(a.abs() < 1)

Sanity check. We expect 'True' because |tanh| < 1 for all real inputs (the horizontal asymptotes y = ±1 we plotted earlier). This is the mathematical guarantee the network relies on.

EXECUTION STATE
torch.all(a.abs() < 1).item() = True
8 lines without explanation
1import torch
2
3# A tiny "network": one linear layer followed by tanh.
4# Pre-activation:   z = W x + b
5# Activation:       a = tanh(z)
6torch.manual_seed(0)
7
8x = torch.tensor([1.0, -2.0, 0.5])      # input vector  (3 features)
9W = torch.tensor([[ 0.5, -0.3, 0.2],    # weight matrix (2 outputs, 3 inputs)
10                  [-0.1,  0.4, 0.8]])
11b = torch.tensor([0.1, -0.2])           # bias    (2 outputs)
12
13z = W @ x + b                            # linear pre-activation
14a = torch.tanh(z)                        # squash to (-1, 1)
15
16print("z =", z)
17print("a =", a)
18print("|a| < 1 everywhere ->", torch.all(a.abs() < 1).item())

The reason early deep networks chose tanh\tanh over sigmoid\text{sigmoid} is that tanh\tanh is zero-centered: tanh(0)=0\tanh(0) = 0. With sigmoid, every activation is positive, which biases gradients in one direction and slows learning. Modern networks often replace both with ReLU, but tanh\tanh is still the default inside LSTMs, GRUs, and many attention components.


Where Hyperbolic Functions Show Up

Five places, drawn from very different fields, where you cannot avoid sinh\sinh, cosh\cosh, or tanh\tanh:

FieldWhere it appearsFormula
Architecture / Civil eng.Hanging chains and archesy = a · cosh(x/a)
Special relativityVelocity addition via rapidity φtanh(φ₁ + φ₂) is the relativistic sum
MechanicsTerminal velocity with quadratic dragv(t) = v_T · tanh(g t / v_T)
Deep learningtanh activation, scaled dot-product in attentiona = tanh(W x + b)
Geometry / CartographyMercator projection latitudey = ln(tan(π/4 + ϕ/2)) = arctanh(sin ϕ)

Take the falling-object example for a moment. With air drag proportional to v2v^2, Newton's second law mv˙=mgkv2m\dot{v} = m g - k v^2 has the closed-form solution v(t)=vTtanh(gt/vT)v(t) = v_T \tanh(g t / v_T). The terminal velocity vT=mg/kv_T = \sqrt{m g / k} is the height of the horizontal asymptote you saw in the explorer. Falling raindrops, skydivers, and ping-pong balls all trace out a tanh\tanh curve when plotted velocity vs. time. The same shape governs how a neuron saturates. Same math, different universe.

Don't confuse tanh\tanh with the logistic / sigmoid function σ(x)=1/(1+ex)\sigma(x) = 1/(1 + e^{-x}). They are related by tanh(x)=2σ(2x)1\tanh(x) = 2\,\sigma(2x) - 1 — a horizontal squeeze and a vertical shift. Tanh maps to (1,1)(-1, 1); sigmoid maps to (0,1)(0, 1). Same family, different range.

Summary

  1. Definitions. sinhx=exex2\sinh x = \tfrac{e^x - e^{-x}}{2}, coshx=ex+ex2\cosh x = \tfrac{e^x + e^{-x}}{2}, tanhx=sinhx/coshx\tanh x = \sinh x / \cosh x. They are the odd and even parts of exe^x.
  2. Identity. cosh2xsinh2x=1\cosh^2 x - \sinh^2 x = 1. This is what puts (cosht,sinht)(\cosh t, \sinh t) on the hyperbola x2y2=1x^2 - y^2 = 1 for every tt.
  3. Geometry. Trigonometric functions parametrize the unit circle; hyperbolic functions parametrize the unit hyperbola. In both cases the parameter tt is twice the swept area, not the arc length.
  4. Inverses. Every inverse has a closed-form log expression, e.g. arctanhx=12ln ⁣1+x1x\operatorname{arctanh} x = \tfrac{1}{2}\ln\!\tfrac{1+x}{1-x}.
  5. Applications. The catenary y=acosh(x/a)y = a\cosh(x/a), relativistic rapidity, terminal velocity, and the tanh\tanh neural activation all share the same underlying object.

Exercises

  1. Quick definitions. Without a calculator, compute sinh(ln2)\sinh(\ln 2) and cosh(ln2)\cosh(\ln 2) in closed form. Hint: eln2=2e^{\ln 2} = 2.
  2. Identity hunt. Starting from the definitions, prove the double-angle formula cosh(2x)=2cosh2x1\cosh(2x) = 2\cosh^2 x - 1. Then write sinh2x\sinh^2 x in terms of cosh(2x)\cosh(2x).
  3. Inverse derivation. Mimic the arcsinh\operatorname{arcsinh} derivation in this section to derive the closed form for arctanhx\operatorname{arctanh} x. You will end up solving a linear equation in e2xe^{2x}, not a quadratic.
  4. Catenary fit. A telephone wire hangs between two poles 60 m apart, dipping 5 m below the attachment points. Find aa so that y(x)=acosh(x/a)y(x) = a\cosh(x/a) matches. Solve numerically — there is no closed form for aa.
  5. Coding exercise. Extend the plain-Python series in this section so it computes cosh\cosh from its Taylor series (only even powers!). Verify your answer against cosh(2)\cosh(2) from the worked example.
  6. Neural net check. In the PyTorch snippet, replace tanh\tanh with the sigmoid σ(x)=1/(1+ex)\sigma(x) = 1/(1 + e^{-x}). By hand, predict the new outputs using tanh(x)=2σ(2x)1\tanh(x) = 2\sigma(2x) - 1, then run the code to confirm.
Loading comments...