Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section you will be able to:

Define $\sinh, \cosh, \tanh$ from the exponential function and explain why those particular combinations matter.
Prove the central identity $\cosh^2 x - \sinh^2 x = 1$ and see why it forces the curve onto a hyperbola.
Read the graphs of hyperbolic functions — their symmetries, asymptotes, and growth rates.
Connect hyperbolic functions to real phenomena: the catenary, special relativity, and the $\tanh$ activation in neural networks.
Compute sinh from its Taylor series and use PyTorch's $\tanh$ in a forward pass.

The Story: A Chain, a Sail, a Surprise

Hang a uniform chain between two poles and step back. What shape does it make? Galileo, in 1638, declared it a parabola. He was wrong — but the error was so subtle that the correct answer took another fifty-three years to find. In 1691 Jakob Bernoulli posed the problem as a public challenge; within a year his brother Johann, together with Huygens and Leibniz, produced the answer:

y = a \cosh\!\left(\tfrac{x}{a}\right)

That “cosh” is the hyperbolic cosine. It is not a trigonometric function at all — it is built directly from $e^x$ . The same family of functions describes the shape of a sail filled with wind, the velocity of a falling object with air drag, the rapidity of a relativistic particle, and the activation curve of a neuron in a deep network. They show up everywhere there is an underlying balance between exponential growth and decay.

The whole idea in one line: hyperbolic functions are the symmetric and antisymmetric parts of $e^x$ . Every fact about them follows from that one decomposition.

Definitions from the Exponential

Any function $f(x)$ can be uniquely split into an even part $\tfrac{1}{2}(f(x) + f(-x))$ and an odd part $\tfrac{1}{2}(f(x) - f(-x))$ . Apply this to $f(x) = e^x$ . The two pieces are so important that they get their own names:

\sinh x \;=\; \dfrac{e^x - e^{-x}}{2}

\cosh x \;=\; \dfrac{e^x + e^{-x}}{2}

\tanh x \;=\; \dfrac{\sinh x}{\cosh x} \;=\; \dfrac{e^x - e^{-x}}{e^x + e^{-x}}

These are not arbitrary combinations. They are the unique way to decompose $e^x$ into an odd piece (sinh) and an even piece (cosh). Adding them puts the exponential back together:

e^x = \cosh x + \sinh x \qquad e^{-x} = \cosh x - \sinh x

Think of $\cosh$ and $\sinh$ as the “real” and “imaginary” parts of $e^x$ — but on the real axis. The structural analogy with $\cos$ and $\sin$ (which split $e^{ix}$ ) is what makes them “trigonometry, but for the hyperbola.”

We also define cosecant, secant, cotangent versions for completeness:

\operatorname{csch} x = \frac{1}{\sinh x}, \quad \operatorname{sech} x = \frac{1}{\cosh x}, \quad \coth x = \frac{1}{\tanh x}

Graphs and Behavior

Drag the slider below and watch the three core curves move together. Hide and show $e^x$ and $e^{-x}$ on the same axes — cosh sits at their midpoint, sinh sits at their half-difference. That single visual is most of the intuition.

Interactive Hyperbolic Graphs

x =1.00

sinh(x)

1.1752

cosh(x)

1.5431

tanh(x)

0.7616

e^x

2.7183

cosh²−sinh²

1.000000

Drag the slider. Notice cosh(x) is the average of e^x and e^-x; sinh(x) is half their difference. The identity cosh²−sinh² = 1 holds at every x.

Three behaviors to commit to memory:

Function	At x = 0	Symmetry	Behavior for large \|x\|
sinh x	0	odd: sinh(−x) = −sinh x	≈ ½ eˣ (for x large positive)
cosh x	1 (minimum)	even: cosh(−x) = cosh x	≈ ½ eˣ (for x large positive)
tanh x	0	odd	→ +1 as x → ∞, → −1 as x → −∞

Two things to notice. First, $\cosh x \geq 1$ everywhere — the chain never sags below its lowest point. Second, $\tanh x$ is bounded between $-1$ and $+1$ ; it is the squashing function that maps the whole real line into the open interval $(-1, 1)$ . That single property is why neural networks use it.

The Fundamental Identity

Every property of sinh and cosh flows from one equation:

\boxed{\cosh^2 x - \sinh^2 x = 1}

It is the hyperbolic mirror of $\cos^2 x + \sin^2 x = 1$ . Notice the minus sign — that is the only difference, and it changes the geometry from a circle to a hyperbola. Let's prove it in three lines from the definitions:

\cosh^2 x - \sinh^2 x = \left(\frac{e^x + e^{-x}}{2}\right)^2 - \left(\frac{e^x - e^{-x}}{2}\right)^2

= \frac{(e^{x} + e^{-x})^2 - (e^{x} - e^{-x})^2}{4}

= \frac{4 \cdot e^{x}\cdot e^{-x}}{4} = e^{0} = 1.

The middle line uses the algebra identity

(a+b)^2 - (a-b)^2 = 4ab

. With

a = e^x

and

b = e^{-x}

, the cross term

4ab = 4 \, e^x e^{-x} = 4

. The whole identity collapses to

1

Confirm it numerically with the slider in the explorer above. The cell labeled cosh²−sinh² always reads $1.000000$ , no matter how you drag $x$ .

Why “Hyperbolic”? Circle vs Hyperbola

The name comes from a perfect geometric parallel. Recall:

The point $(\cos t, \sin t)$ traces the unit circle $x^2 + y^2 = 1$ . And $t$ equals twice the area of the circular sector swept from the positive $x$ -axis to that point.
The point $(\cosh t, \sinh t)$ traces the unit hyperbola $x^2 - y^2 = 1$ . And $t$ equals twice the area of the hyperbolic sector swept from $(1,0)$ to that point.

That is the entire reason for the name. In trigonometry, the parameter is an angle. In hyperbolic trigonometry, the parameter is a hyperbolic angle — a swept area, not an angle in radians.

Why “hyperbolic”? Circle vs Hyperbola

On the circle, the parameter t is an angle. On the hyperbola, t is a hyperbolic angle — but in both cases t equals twice the shaded area swept from the x-axis.

Unit circle: x² + y² = 1

x = cos(t) = 0.6967 y = sin(t) = 0.7174

cos² + sin² = 1.000000

Unit hyperbola: x² − y² = 1

x = cosh(t) = 1.3374 y = sinh(t) = 0.8881

cosh² − sinh² = 1.000000

t =0.80

Slide t. The two shaded regions always have area t/2. That is the meaning of t for sinh and cosh — it is the hyperbolic analog of an angle, not an angle in radians.

Analogy clincher. The circle and the hyperbola differ by a single sign: $x^2 \, \textbf{+} \, y^2 = 1$ vs $x^2 \, \textbf{-} \, y^2 = 1$ . Every identity, derivative, and series for hyperbolic functions can be obtained from its circular cousin by carefully flipping one sign.

Algebraic Identities

Compare these side-by-side with their circular relatives. Notice how often a sign quietly flips:

Trigonometric	Hyperbolic
sin² + cos² = 1	cosh² − sinh² = 1
sin(a + b) = sin a cos b + cos a sin b	sinh(a + b) = sinh a cosh b + cosh a sinh b
cos(a + b) = cos a cos b − sin a sin b	cosh(a + b) = cosh a cosh b + sinh a sinh b
sin(2a) = 2 sin a cos a	sinh(2a) = 2 sinh a cosh a
cos(2a) = cos²a − sin²a	cosh(2a) = cosh²a + sinh²a
1 + tan² = sec²	1 − tanh² = sech²

These are not coincidence. They drop out of the definitions in one line. For example, expand $\sinh(a+b)$ :

\sinh(a+b) = \tfrac{1}{2}\left(e^{a+b} - e^{-(a+b)}\right)

= \tfrac{1}{2}\left(e^{a}e^{b} - e^{-a}e^{-b}\right)

= \sinh a \, \cosh b + \cosh a \, \sinh b.

The last step is just regrouping $e^a e^b - e^{-a} e^{-b}$ into the two pieces $\sinh a \, \cosh b$ and $\cosh a \, \sinh b$ . Try it on paper — it is two lines of algebra.

Looking ahead: in Chapter 5 we will see that

\frac{d}{dx}\sinh x = \cosh x

and

\frac{d}{dx}\cosh x = \sinh x

(no minus sign, unlike the circular case). That is yet another consequence of the same one-sign flip.

Inverse Hyperbolic Functions

Because $\sinh$ and $\tanh$ are strictly increasing, they have proper inverses on all of $\mathbb{R}$ . $\cosh$ is not one-to-one (it's even), so we restrict to $x \geq 0$ to invert it. The beautiful surprise: all three inverses have closed-form logarithm expressions.

\operatorname{arcsinh} x = \ln\!\left(x + \sqrt{x^2 + 1}\right), \quad x \in \mathbb{R}

\operatorname{arccosh} x = \ln\!\left(x + \sqrt{x^2 - 1}\right), \quad x \geq 1

\operatorname{arctanh} x = \tfrac{1}{2}\ln\!\left(\tfrac{1+x}{1-x}\right), \quad |x| < 1

Let's derive $\operatorname{arcsinh}$ to show how clean the algebra is. Set $y = \sinh x = \tfrac{e^x - e^{-x}}{2}$ , then solve for $x$ . Multiply through by $2 e^x$ :

2y\,e^x = e^{2x} - 1

(e^x)^2 - 2y(e^x) - 1 = 0

e^x = y + \sqrt{y^2 + 1}

x = \ln\!\left(y + \sqrt{y^2 + 1}\right).

That is the quadratic formula applied to a hidden quadratic in $e^x$ . The plus root is taken because $e^x > 0$ rules out the minus root.

These log-formulas are a nice reminder that anything built from

e^x

can be inverted by something built from

\ln

. There is no “new transcendental function” needed.

The Catenary — Where cosh Really Lives

Now we earn the story we opened with. A flexible chain of uniform mass per unit length, hanging under gravity, satisfies a balance equation between tension and weight. Without doing the derivation (we'll do it carefully in Chapter 22), the resulting shape is:

y(x) = a \cosh\!\left(\tfrac{x}{a}\right)

The constant $a$ is the ratio of horizontal tension to weight per unit length — small $a$ means a tight, heavy chain (sharp dip); large $a$ means a light, taut chain (gentle curve). Drag the sliders below to feel it.

The Catenary — a hanging chain is exactly a cosh

Hang a uniform chain between two poles. Gravity + tension force the shape y = a · cosh(x / a). Compare it against the famous lookalike y = a + x²/(2a) (which is only a 2nd-order Taylor approximation).

a (sag)1.00

span3.00

y at endpoints

2.3524

sag (y_end − y_min)

1.3524

parabola error at x=span/2

0.2274

Increase the span and watch the parabola peel away from the real chain. For small x/a the two agree (that's why Galileo thought the chain was a parabola), but the true shape is cosh — a fact only nailed down in 1691 by the Bernoullis and Leibniz.

Galileo missed it by one term. The Taylor series of cosh starts as $\cosh u = 1 + \tfrac{u^2}{2} + \tfrac{u^4}{24} + \dots$ For a shallow chain ( $|x/a|$ small) only the first two terms matter, so $y \approx a + \tfrac{x^2}{2a}$ — a parabola. That's why Galileo's eye couldn't tell. The discrepancy grows as $x^4/(24 a^3)$ .

Famous catenaries in the real world:

The Gateway Arch in St. Louis is an inverted weighted catenary — designed precisely so that every part of the arch is in pure compression.
Suspension bridge cables hang very close to a parabola because the deck's weight dominates, not the cable weight. The pure catenary is the answer only when the chain itself is the load.
The shape of an idealized hanging soap film between two rings is a catenoid, the surface of revolution of a catenary.

Worked Example (Collapsible)

Work through this by hand before peeking. The example threads almost every idea in this section through a single computation.

Worked Example — Compute every hyperbolic value at x = 1, then verify the identity, the double-angle formula, and the inverse.

Step 1. Compute $e^1$ and $e^{-1}$ :

e = 2.718281828... 1/e = 0.367879441...

Step 2. Plug into the definitions:

\sinh 1 = \tfrac{e - 1/e}{2} = \tfrac{2.71828 - 0.36788}{2} = \tfrac{2.35040}{2} = 1.17520

\cosh 1 = \tfrac{e + 1/e}{2} = \tfrac{2.71828 + 0.36788}{2} = \tfrac{3.08616}{2} = 1.54308

\tanh 1 = \tfrac{\sinh 1}{\cosh 1} = \tfrac{1.17520}{1.54308} = 0.76159

Step 3. Check the fundamental identity:

cosh²(1) − sinh²(1) = (1.54308)² − (1.17520)² = 2.38110 − 1.38110 = 1.00000 ✓

Step 4. Use the double-angle formula $\sinh(2x) = 2 \sinh x \cosh x$ :

sinh(2) ?= 2 · sinh(1) · cosh(1) 2 · 1.17520 · 1.54308 = 3.62686 direct: sinh(2) = (e² − e⁻²)/2 = (7.38906 − 0.13534)/2 = 3.62686 ✓

Step 5. Invert with the log-formula $\operatorname{arcsinh} y = \ln(y + \sqrt{y^2+1})$ :

arcsinh(1.17520) = ln(1.17520 + √(1.17520² + 1)) = ln(1.17520 + √2.38110) = ln(1.17520 + 1.54308) = ln(2.71828) = 1.00000 ✓

Notice the round-trip: $x = 1 \xrightarrow{\sinh} 1.17520 \xrightarrow{\operatorname{arcsinh}} 1$ . The numbers $1.17520$ and $1.54308$ reappear inside the log — that's exactly the algebra that derived the closed-form inverse.

Plain Python: Building sinh from Scratch

Before reaching for a library, let's build $\sinh$ ourselves from its Taylor series. Recall:

\sinh x = x + \frac{x^3}{3!} + \frac{x^5}{5!} + \frac{x^7}{7!} + \dots = \sum_{n=0}^{\infty} \frac{x^{2n+1}}{(2n+1)!}

Only the odd powers survive, because $\sinh$ is the odd part of $e^x$ (whose full Taylor series is $\sum x^n / n!$ ; even terms cancel when we subtract $e^{-x}$ ). We'll watch the partial sum converge to $\sinh 1$ term by term.

Build sinh from its Taylor Series — Plain Python

🐍python

Explanation(9)

Code(16)

1import math

Standard library, no third-party packages. We only need math.factorial for the denominators and math.sinh at the end to grade ourselves.

5x = 1.0

The point at which we evaluate sinh. We chose 1 because the true answer is the famous constant (e − 1/e)/2 ≈ 1.1752. Any |x| < 1 converges even faster — the factorials in the denominators crush the powers.

EXECUTION STATE

x = 1.0

6total = 0.0 — the running partial sum

We accumulate term-by-term into total. Starting at 0 because the series itself starts at term n=0 with value x¹/1! = x.

EXECUTION STATE

total = 0.0

8for n in range(8) — eight terms is plenty

Eight iterations means highest power is x¹⁵ and denominator is 15! = 1,307,674,368,000. The remaining tail of the series is below 1e-13 — invisible at double precision.

LOOP TRACE · 8 iterations

n=0

power = 1

term = 1/1! = 1.0000000000

total = 1.0000000000

n=1

power = 3

term = 1/3! = 0.1666666667

total = 1.1666666667

n=2

power = 5

term = 1/5! = 0.0083333333

total = 1.1750000000

n=3

power = 7

term = 1/7! = 0.0001984127

total = 1.1751984127

n=4

power = 9

term = 1/9! = 0.0000027557

total = 1.1752011684

n=5

power = 11

term = 1/11! = 0.0000000251

total = 1.1752011935

n=6

power = 13

term = 1/13! = 0.0000000002

total = 1.1752011936

n=7

power = 15

term = 1/15! ≈ 7.6e-13

total = 1.1752011936

9power = 2*n + 1 — only odd exponents

For n=0,1,2,3,... the exponents are 1,3,5,7,... exactly the odd integers. This is the structural reason sinh is the ODD part of eˣ: the even powers of e^x and e^-x cancel when you subtract.

10term = x^power / factorial(power)

Two numbers and a division. Notice the denominators grow factorially (1, 6, 120, 5040, ...) while the numerators grow only as powers (1, 1, 1, 1, ... at x=1). Factorial wins — that is what guarantees convergence.

11total += term

Plain accumulation. By the time n=7 the addition is dwarfed by the running total, so floating-point errors stop mattering. This is why Taylor series can be made arbitrarily accurate without arbitrary-precision arithmetic.

12print — watch the convergence in real time

Reading the trace top-to-bottom: 1.0000 → 1.1666 → 1.1750 → 1.1751984 → ... Each new term decides one more digit. By n=7 we have agreement with math.sinh(1) to 10 decimal places.

14math.sinh(1) — the reference value

The C library implementation of sinh uses essentially the same idea (a polynomial approximation) but with hand-tuned coefficients optimized for accuracy and speed. Our naive loop matches it.

EXECUTION STATE

math.sinh(1) = 1.1752011936438014

total = 1.1752011936...

abs error = ≈ 1e-13

7 lines without explanation

1import math
2
3# Goal: compute sinh(1) by summing the Taylor series term by term.
4# sinh(x) = x + x^3/3! + x^5/5! + x^7/7! + ...
5x = 1.0
6total = 0.0
7
8for n in range(8):
9    power = 2 * n + 1
10    term = x**power / math.factorial(power)
11    total += term
12    print(f"n={n}: term = x^{power}/{power}! = {term:.10f},  running total = {total:.10f}")
13
14print("\nlibrary sinh(1) =", math.sinh(1))
15print("our  approx     =", total)
16print("absolute error  =", abs(total - math.sinh(1)))

Try modifying the snippet locally: set

x = 5

and watch the series take many more terms to converge — the trade-off between input size and series length is the same one that governs how every transcendental function is computed inside your CPU.

PyTorch: tanh as a Neural Activation

In deep learning, the tanh function is a squashing non-linearity: it takes any real number and maps it into the open interval $(-1, 1)$ . It is symmetric about the origin (unlike sigmoid, which is centered at $0.5$ ), which makes gradient flow through deep networks behave better. Here is the smallest meaningful neural-network forward pass that uses it.

A One-Layer tanh Network in PyTorch

🐍python

Explanation(10)

Code(18)

1import torch

PyTorch gives us tensor objects with batched arithmetic and automatic differentiation. For this section we only use the forward pass — Chapter 5 will return to compute the derivative of tanh.

6torch.manual_seed(0)

Deterministic random — even though we hard-code the weights here, it is good practice when the same script later switches to nn.Linear which initialises randomly. Reproducibility costs nothing.

8x = torch.tensor([1.0, -2.0, 0.5])

A 3-dimensional input vector. Treat it as 'three features observed for one example' — maybe pixel intensity, edge strength, color saturation. The shape is (3,).

EXECUTION STATE

x.shape = torch.Size([3])

x = tensor([ 1.0000, -2.0000, 0.5000])

9W = torch.tensor([[...], [...]])

Weight matrix with shape (2, 3): two output neurons, each connected to all three inputs. Row 0 = neuron 0's weights; row 1 = neuron 1's weights. In a real network these are learned by gradient descent.

EXECUTION STATE

W.shape = torch.Size([2, 3])

W[0] = tensor([ 0.5000, -0.3000, 0.2000])

W[1] = tensor([-0.1000, 0.4000, 0.8000])

11b = torch.tensor([0.1, -0.2])

Bias term — one number per output neuron. Lets the neuron fire even when all inputs are zero. Shape (2,) to match the two output rows of W.

EXECUTION STATE

b = tensor([ 0.1000, -0.2000])

13z = W @ x + b — the linear pre-activation

Matrix-vector product. The @ operator is matmul. Computing row by row: z[0] = 0.5·1 + (-0.3)·(-2) + 0.2·0.5 + 0.1 = 0.5 + 0.6 + 0.1 + 0.1 = 1.3. z[1] = (-0.1)·1 + 0.4·(-2) + 0.8·0.5 + (-0.2) = -0.1 - 0.8 + 0.4 - 0.2 = -0.7. Shape is (2,).

EXECUTION STATE

z = tensor([ 1.3000, -0.7000])

14a = torch.tanh(z) — the non-linear squash

Element-wise tanh. PyTorch applies tanh(1.3) ≈ 0.8617 and tanh(-0.7) ≈ -0.6044. Without this non-linearity stacking many linear layers would still be linear — tanh is what gives the network expressive power.

EXECUTION STATE

tanh(1.3) = 0.8617

tanh(-0.7) = -0.6044

a = tensor([ 0.8617, -0.6044])

16print(z)

Expected stdout: 'z = tensor([ 1.3000, -0.7000])'. The pre-activations can be anywhere on the real line — they are unbounded by design.

17print(a)

Expected stdout: 'a = tensor([ 0.8617, -0.6044])'. After tanh, both values sit inside (-1, 1). That bounded range is the whole point of using a hyperbolic activation: it keeps signals from exploding as they propagate through many layers.

18torch.all(a.abs() < 1)

Sanity check. We expect 'True' because |tanh| < 1 for all real inputs (the horizontal asymptotes y = ±1 we plotted earlier). This is the mathematical guarantee the network relies on.

EXECUTION STATE

torch.all(a.abs() < 1).item() = True

8 lines without explanation

1import torch
2
3# A tiny "network": one linear layer followed by tanh.
4# Pre-activation:   z = W x + b
5# Activation:       a = tanh(z)
6torch.manual_seed(0)
7
8x = torch.tensor([1.0, -2.0, 0.5])      # input vector  (3 features)
9W = torch.tensor([[ 0.5, -0.3, 0.2],    # weight matrix (2 outputs, 3 inputs)
10                  [-0.1,  0.4, 0.8]])
11b = torch.tensor([0.1, -0.2])           # bias    (2 outputs)
12
13z = W @ x + b                            # linear pre-activation
14a = torch.tanh(z)                        # squash to (-1, 1)
15
16print("z =", z)
17print("a =", a)
18print("|a| < 1 everywhere ->", torch.all(a.abs() < 1).item())

The reason early deep networks chose $\tanh$ over $\text{sigmoid}$ is that $\tanh$ is zero-centered: $\tanh(0) = 0$ . With sigmoid, every activation is positive, which biases gradients in one direction and slows learning. Modern networks often replace both with ReLU, but $\tanh$ is still the default inside LSTMs, GRUs, and many attention components.

Where Hyperbolic Functions Show Up

Five places, drawn from very different fields, where you cannot avoid $\sinh$ , $\cosh$ , or $\tanh$ :

Field	Where it appears	Formula
Architecture / Civil eng.	Hanging chains and arches	y = a · cosh(x/a)
Special relativity	Velocity addition via rapidity φ	tanh(φ₁ + φ₂) is the relativistic sum
Mechanics	Terminal velocity with quadratic drag	v(t) = v_T · tanh(g t / v_T)
Deep learning	tanh activation, scaled dot-product in attention	a = tanh(W x + b)
Geometry / Cartography	Mercator projection latitude	y = ln(tan(π/4 + ϕ/2)) = arctanh(sin ϕ)

Take the falling-object example for a moment. With air drag proportional to $v^2$ , Newton's second law $m\dot{v} = m g - k v^2$ has the closed-form solution $v(t) = v_T \tanh(g t / v_T)$ . The terminal velocity $v_T = \sqrt{m g / k}$ is the height of the horizontal asymptote you saw in the explorer. Falling raindrops, skydivers, and ping-pong balls all trace out a $\tanh$ curve when plotted velocity vs. time. The same shape governs how a neuron saturates. Same math, different universe.

Don't confuse

\tanh

with the logistic / sigmoid function

\sigma(x) = 1/(1 + e^{-x})

. They are related by

\tanh(x) = 2\,\sigma(2x) - 1

— a horizontal squeeze and a vertical shift. Tanh maps to

(-1, 1)

; sigmoid maps to

(0, 1)

. Same family, different range.

Summary

Definitions. $\sinh x = \tfrac{e^x - e^{-x}}{2}$ , $\cosh x = \tfrac{e^x + e^{-x}}{2}$ , $\tanh x = \sinh x / \cosh x$ . They are the odd and even parts of $e^x$ .
Identity. $\cosh^2 x - \sinh^2 x = 1$ . This is what puts $(\cosh t, \sinh t)$ on the hyperbola $x^2 - y^2 = 1$ for every $t$ .
Geometry. Trigonometric functions parametrize the unit circle; hyperbolic functions parametrize the unit hyperbola. In both cases the parameter $t$ is twice the swept area, not the arc length.
Inverses. Every inverse has a closed-form log expression, e.g. $\operatorname{arctanh} x = \tfrac{1}{2}\ln\!\tfrac{1+x}{1-x}$ .
Applications. The catenary $y = a\cosh(x/a)$ , relativistic rapidity, terminal velocity, and the $\tanh$ neural activation all share the same underlying object.

Exercises

Quick definitions. Without a calculator, compute $\sinh(\ln 2)$ and $\cosh(\ln 2)$ in closed form. Hint: $e^{\ln 2} = 2$ .
Identity hunt. Starting from the definitions, prove the double-angle formula $\cosh(2x) = 2\cosh^2 x - 1$ . Then write $\sinh^2 x$ in terms of $\cosh(2x)$ .
Inverse derivation. Mimic the $\operatorname{arcsinh}$ derivation in this section to derive the closed form for $\operatorname{arctanh} x$ . You will end up solving a linear equation in $e^{2x}$ , not a quadratic.
Catenary fit. A telephone wire hangs between two poles 60 m apart, dipping 5 m below the attachment points. Find $a$ so that $y(x) = a\cosh(x/a)$ matches. Solve numerically — there is no closed form for $a$ .
Coding exercise. Extend the plain-Python series in this section so it computes $\cosh$ from its Taylor series (only even powers!). Verify your answer against $\cosh(2)$ from the worked example.
Neural net check. In the PyTorch snippet, replace $\tanh$ with the sigmoid $\sigma(x) = 1/(1 + e^{-x})$ . By hand, predict the new outputs using $\tanh(x) = 2\sigma(2x) - 1$ , then run the code to confirm.