Boo-AI — Master Artificial Intelligence by Building from Scratch

Learning Objectives

After this section you will be able to:

Explain why the sine, cosine, and tangent functions are not invertible on their full domain, and how restricting their domain fixes the problem.
State the principal-branch domain and range of $\arcsin x$ , $\arccos x$ , and $\arctan x$ — and know why each range is chosen.
Use the cancellation identities $\sin(\arcsin x) = x$ and $\arcsin(\sin x) = x$ (and recognize when the second one fails).
Reach for atan2 instead of plain arctan whenever you have separate $x$ and $y$ components, and explain why it covers all four quadrants.
Implement the inverse trig functions in plain Python and use the differentiable PyTorch versions for ML pipelines.
Apply them to real problems: angles of elevation, robotics joint angles, phase recovery, computer-graphics surface normals.

The Problem: From Ratio Back to Angle

The forward trig functions answer the question: “Given an angle, what is the ratio?” For instance, climbing a ramp at $30^{\circ}$ means you rise $\sin 30^{\circ} = 0.5$ units for every 1 unit of slope.

But in real engineering and science the question almost always runs the other way:

I measured the rise and the slope length. What angle is the ramp at?

That is what the inverse trig functions exist to do. $\arcsin$ , $\arccos$ , and $\arctan$ reverse the arrow: feed them the ratio, get back the angle.

\sin: \text{angle} \longrightarrow \text{ratio}, \qquad \arcsin: \text{ratio} \longrightarrow \text{angle}

Notation: arcsin and sin⁻¹ mean the same thing

Both

\arcsin x

and

\sin^{-1} x

are used interchangeably. The arc prefix is preferred in physics and engineering because the result IS an arc length on the unit circle; the

-1

notation is preferred in calculus textbooks. Note that

\sin^{-1} x

does not mean

1/\sin x

— that would be the cosecant.

Why Sin, Cos, Tan Aren't Invertible Yet

A function $f$ is invertible only when each output comes from one and only one input — the so-called horizontal-line test: every horizontal line must hit the graph at most once.

Sine fails this test catastrophically. The horizontal line $y = 0.5$ crosses $\sin x$ at infinitely many places — every $30^{\circ}$ , $150^{\circ}$ , $390^{\circ}$ , and so on. So if someone says “sin θ = 0.5, what is θ?”, the honest answer is “infinitely many possibilities”. That is useless.

The fix: restrict the domain

We choose ONE slice of the sine curve where it is strictly increasing — that slice is

[-\tfrac{\pi}{2}, \tfrac{\pi}{2}]

. On that slice every output value is hit exactly once, so the inverse exists. That chosen slice is called the principal branch.

\arcsin

is defined to be the inverse of THAT slice — no other.

Cosine and tangent need the same trick, but with different slices:

Function	Principal-branch slice	Why this slice?
sin x	x ∈ [-π/2, π/2]	sin is strictly increasing here; covers all outputs in [-1, 1] exactly once.
cos x	x ∈ [0, π]	cos is strictly decreasing here; covers [-1, 1] exactly once. (Can't use [-π/2, π/2] — cos is even, so each output is hit twice.)
tan x	x ∈ (-π/2, π/2)	tan is strictly increasing, covers all of ℝ, and the slice is open because tan blows up at ±π/2.

arcsin: The Inverse of Sine

Take the sine curve and clip it to the principal slice $[-\tfrac{\pi}{2}, \tfrac{\pi}{2}]$ . Now reflect it across the diagonal line $y = x$ . That mirror image IS $y = \arcsin x$ .

This is the deepest single picture in the section — drag the slider below and watch the mirror move with you.

Reflecting the sine curve to build arcsin

Drag the slider. The blue dot slides along the restricted sine on [-π/2, π/2]. The pink dot is its mirror image across the dashed diagonal — that mirror trace IS the arcsin curve.

θ:θ = 0.600 rad ≈ 34.4°

Forward: sin takes angle → ratio

sin(0.600) = 0.5646

Inverse: arcsin takes ratio → angle

arcsin(0.5646) = 0.600 rad

Notice three things in the picture:

The blue point lives on the sine curve, with coordinates $(\theta, \sin\theta)$ .
The pink point lives on the arcsin curve, with coordinates $(\sin\theta, \theta)$ — its x and y are simply swapped. That swap is what "reflection across $y = x$ " means.
The pink curve only exists for inputs in $[-1, 1]$ , because that is the range of sine. Try to ask $\arcsin(1.5)$ and you get a domain error: no angle has a sine of 1.5.

Formal definition. For $x \in [-1, 1]$ , $\arcsin x$ is the unique angle $\theta \in [-\tfrac{\pi}{2}, \tfrac{\pi}{2}]$ such that $\sin\theta = x$ .

Reading the triangle

If a right triangle has opposite side $o$ and hypotenuse $h$ , then the angle opposite to side $o$ is recovered by $\theta = \arcsin(o/h)$ . Try the interactive triangle below with the "arcsin" tab — slide the ratio and watch the angle change.

From a ratio back to an angle

Choose which ratio you know (opposite/hypotenuse for arcsin, adjacent/hypotenuse for arccos, opposite/adjacent for arctan) and slide the value. The triangle and the angle on the unit circle update live.

Ratio:0.50

θ = arcsin(0.50) = 0.524 rad ≈ 30.0°

Notice: the slider only moves the ratio, but the angle marker (yellow arc) is what arcsin recovers. The same angle θ is shared by both diagrams.

arccos and arctan

arccos: the "adjacent over hypotenuse" inverse

$\arccos$ takes a value in $[-1, 1]$ and returns an angle in $[0, \pi]$ . Why $[0, \pi]$ and not $[-\tfrac{\pi}{2}, \tfrac{\pi}{2}]$ ? Because cosine is even: $\cos(-\theta) = \cos\theta$ , so the values it produces on $[-\tfrac{\pi}{2}, 0]$ exactly repeat what it does on $[0, \tfrac{\pi}{2}]$ . We need a slice where cosine is strictly monotone, and the natural choice is $[0, \pi]$ , where cosine is strictly decreasing from $+1$ to $-1$ .

\arccos: [-1, 1] \longrightarrow [0, \pi]

arctan: the "opposite over adjacent" inverse

$\arctan$ accepts any real number and returns an angle in $(-\tfrac{\pi}{2}, \tfrac{\pi}{2})$ — open intervals at both ends, because tangent has vertical asymptotes there. As the input grows without bound, the output approaches but never reaches $\pm\tfrac{\pi}{2}$ .

\arctan: (-\infty, \infty) \longrightarrow (-\tfrac{\pi}{2}, \tfrac{\pi}{2})

This is why arctan is the only inverse trig function with a bounded range but an unbounded domain. It maps the whole real line to a finite open interval — that property makes it a popular "soft clamp" in machine learning and signal processing.

A useful limit

\lim_{x \to +\infty} \arctan x = \dfrac{\pi}{2}

and

\lim_{x \to -\infty} \arctan x = -\dfrac{\pi}{2}

. The function squeezes the entire infinite line into the bounded interval

(-\tfrac{\pi}{2}, \tfrac{\pi}{2})

The Three Graphs Side by Side

Switch tabs in the viewer below to see each principal-branch graph with its domain on the x-axis, its range as a shaded band on the y-axis, and a live readout for the value you pick.

The three principal-branch graphs

x:x = 0.500

Domain

[-1, 1]

Range (principal branch)

[-π/2, π/2]

A few features deserve attention:

arcsin: domain $[-1, 1]$ , range $[-\tfrac{\pi}{2}, \tfrac{\pi}{2}]$ . Passes through the origin. Odd function — symmetric about the origin.
arccos: domain $[-1, 1]$ , range $[0, \pi]$ . Passes through $(0, \tfrac{\pi}{2})$ . Strictly decreasing — bigger cosine means smaller angle (think: the closer the adjacent is to the hypotenuse, the more the triangle is "flattened").
arctan: domain all of $\mathbb{R}$ , range $(-\tfrac{\pi}{2}, \tfrac{\pi}{2})$ . Two horizontal asymptotes. Odd function.

Cancellation Identities (and a Common Trap)

Because $\arcsin$ is the inverse of restricted sine, the cancellation works perfectly in one direction:

\sin(\arcsin x) = x \qquad \text{for all } x \in [-1, 1]

Same pattern for the other two:

\cos(\arccos x) = x, \quad \tan(\arctan x) = x

But the other direction is delicate. Try plugging $x = 2\pi$ into $\arcsin(\sin 2\pi)$ :

\arcsin(\sin 2\pi) = \arcsin(0) = 0 \neq 2\pi

The trap: composition with the principal branch

\arcsin(\sin x) = x

only when

x \in [-\tfrac{\pi}{2}, \tfrac{\pi}{2}]

. For

x

outside that range, the composition folds

x

back into the principal branch first, then returns that folded value. The same caveat applies to

\arccos(\cos x)

and

\arctan(\tan x)

Concretely, if $x = 5\pi/4$ (which is $225^{\circ}$ , in the third quadrant), then $\sin(5\pi/4) = -\tfrac{\sqrt{2}}{2}$ , and $\arcsin(-\tfrac{\sqrt{2}}{2}) = -\tfrac{\pi}{4}$ — definitely not $5\pi/4$ . Information about which "copy" of the angle we started in was destroyed by sine and cannot be recovered by arcsin alone.

Useful Pythagorean identities

Frequently the angle is just an intermediate step; what you really want is another trig value of it. These identities let you skip the angle entirely:

\cos(\arcsin x) = \sqrt{1 - x^2}, \quad \sin(\arccos x) = \sqrt{1 - x^2}

\sec(\arctan x) = \sqrt{1 + x^2}, \quad \tan(\arcsin x) = \dfrac{x}{\sqrt{1 - x^2}}

Both of these come from drawing a right triangle that realises the inverse-trig input as one of its sides. For $\cos(\arcsin x) = \sqrt{1-x^2}$ : let $\theta = \arcsin x$ , so the opposite side is $x$ and the hypotenuse is $1$ ; by Pythagoras the adjacent side is $\sqrt{1 - x^2}$ , and $\cos\theta = \text{adj}/\text{hyp} = \sqrt{1 - x^2}$ . You can derive every entry in this table that way.

Worked Example: Finding the Angle of a Roof

A construction crew is building a roof. The rise (vertical) is $3.5$ meters and the run (horizontal) is $6.0$ meters. The supervisor needs the angle of the roof in degrees so the trusses can be cut. Let's do this by hand before letting a calculator do it, so the meaning of every step is clear.

▶ Step-by-step worked solution (click to expand and solve along)

Step 1 — Identify which ratio you have. Rise is the side opposite the angle you want; run is the side adjacent to it. So the natural ratio is $\dfrac{\text{opp}}{\text{adj}} = \tan\theta$ , which means we use $\arctan$ to recover $\theta$ .

\tan\theta = \dfrac{3.5}{6.0} \approx 0.58333

Step 2 — Apply arctan. Using the Taylor series $\arctan x = x - \tfrac{x^3}{3} + \tfrac{x^5}{5} - \cdots$ (which converges nicely for $|x| \leq 1$ ):

\arctan(0.58333) \approx 0.58333 - \tfrac{0.58333^3}{3} + \tfrac{0.58333^5}{5} - \cdots

Computing the first three terms:

$0.58333$
$-0.58333^3 / 3 = -0.0661$
$+0.58333^5 / 5 = +0.01346$

Sum so far: $0.58333 - 0.0661 + 0.01346 \approx 0.5307$ radians. The remaining terms shave off a few thousandths to give the true value $\theta \approx 0.5281$ rad.

Step 3 — Convert to degrees. Multiply by $\tfrac{180}{\pi} \approx 57.2958$ :

\theta \approx 0.5281 \times 57.2958 \approx 30.26^{\circ}

Step 4 — Sanity check with the triangle. The hypotenuse is $\sqrt{3.5^2 + 6.0^2} = \sqrt{12.25 + 36} = \sqrt{48.25} \approx 6.946$ . So $\sin\theta = 3.5/6.946 \approx 0.5039$ , and $\arcsin(0.5039) \approx 0.5281$ rad — the same answer. Two different paths through the triangle converge on the same angle, which is exactly the consistency check the cancellation identities promise.

Engineer's takeaway

Always cross-check inverse-trig calculations by computing the angle two ways (via arctan from the legs, and via arcsin or arccos from one leg and the hypotenuse). If they disagree by more than rounding noise, you have a units bug — almost certainly degrees vs radians.

atan2: The Practical Four-Quadrant Inverse

Here is a problem you will hit in your first robotics, game, or graphics project. You know a point's $(x, y)$ in the plane, and you want the angle from the positive $x$ -axis to the point. The naive answer is $\theta = \arctan(y/x)$ . The naive answer is wrong half the time.

Why? $\arctan$ returns angles only in $(-\tfrac{\pi}{2}, \tfrac{\pi}{2})$ , which covers only the right half of the plane. And dividing $y$ by $x$ throws away the sign information you need to tell, for example, the point $(-1, 1)$ in the second quadrant apart from $(1, -1)$ in the fourth — both have $y/x = -1$ .

The fix is a function that takes $y$ and $x$ separately, looks at the sign of each, and returns the correct angle anywhere in $(-\pi, \pi]$ . Every language calls it the same name: atan2(y, x).

Drag the yellow point in the viewer below. The green ray shows the angle that atan2 recovers (always correct). The dashed pink ray shows what arctan(y/x) would give — watch it stay in the right half-plane even when the actual point is on the left.

arctan(y/x) vs atan2(y, x): why the second one exists

Drag the yellow point anywhere. arctan(y/x) only ever returns angles in (-π/2, π/2), so it cannot tell (x, y) apart from (-x, -y). atan2(y, x) uses both signs and returns the correct angle anywhere in (-π, π].

Quadrant of point

arctan(y / x)

-0.644 rad (-36.9°)

atan2(y, x)

2.498 rad (143.1°)

✗ They disagree by exactly π. arctan collapsed Q2/Q3 onto Q4/Q1 because it cannot see the sign of x.

The piecewise definition of atan2

Under the hood, atan2(y, x) is just arctan(y/x) with a sign correction that depends on the quadrant. The full definition (using radians) is:

\operatorname{atan2}(y, x) = \begin{cases} \arctan(y/x) & \text{if } x > 0 \\ \arctan(y/x) + \pi & \text{if } x < 0,\ y \geq 0 \\ \arctan(y/x) - \pi & \text{if } x < 0,\ y < 0 \\ +\pi/2 & \text{if } x = 0,\ y > 0 \\ -\pi/2 & \text{if } x = 0,\ y < 0 \end{cases}

The piecewise nature of the definition is exactly what makes it correct: it inspects the sign of $x$ (which arctan never sees) and pushes the answer into the correct half-plane.

Rule of thumb

Whenever you have separate

y

and

x

components — vectors, complex numbers, gradients, polar coordinates — reach for atan2(y, x), never arctan(y/x). The only time plain arctan is the right tool is when you genuinely only have the ratio and know the answer lives in the right half-plane.

Python Implementation: From Scratch

Let's build $\arctan$ ourselves using only the most basic Python — no math.atan, no NumPy — so the inner workings are exposed. The Taylor series gives us the principle; an iterative argument reduction trick makes it fast enough to use.

A Hand-Built arctan Using Taylor Series + Argument Reduction

🐍python

Explanation(19)

Code(32)

1Signature: scalar in, scalar out

We accept a single float x and return a float in radians. n_terms=25 is generous — the Taylor series for arctan(0.5) converges to ~16 decimal places in about 20 terms after argument reduction, so 25 is comfortably accurate.

EXECUTION STATE

x = any float

n_terms = 25 (default)

2Docstring

Documents the contract: input any real number, output the angle in RADIANS whose tangent is x. Calling code that wants degrees has to multiply by 180/pi itself.

4Odd-symmetry trick

arctan is an odd function: arctan(-x) = -arctan(x). So we record the sign of x once, compute everything for |x|, and reapply the sign at the very end. This halves the cases we have to think about.

EXECUTION STATE

x (incoming) = -3.0 = negative

sign = -1.0

5Work with |x| from now on

After this line, x is guaranteed non-negative. Any downstream logic only has to handle the right half of the input range.

EXECUTION STATE

x (after abs) = 3.0

9Detect the slow-convergence region

The Maclaurin series arctan(x) = x - x^3/3 + x^5/5 - ... CONVERGES only when |x| <= 1 (and even at the boundary it converges slowly). For |x| > 1, the terms x^(2k+1) explode before the alternating signs can tame them. So we flag the case x > 1 and apply an identity to reduce it.

EXECUTION STATE

use_complement = True (for x=3.0)

10Branch on the flag

If x > 1, the line below replaces x with 1/x. From now on the series sees a number in (0, 1], where it converges very quickly.

11The reduction identity: arctan(x) = pi/2 - arctan(1/x)

This identity holds for every x > 0. Geometric intuition: if a right triangle has legs a (adjacent) and b (opposite) with angle θ at the bottom corner, then tan θ = b/a. The OTHER acute angle (call it φ) has tan φ = a/b = 1/tan θ. And θ + φ = π/2 because they are the two non-right angles of a right triangle. So arctan(b/a) + arctan(a/b) = π/2.

EXECUTION STATE

x (before) = 3.0

x (after 1/x) = 0.3333

14Initialize the running sum

total holds the partial sum of the Taylor series. It starts at 0 and accumulates one term per loop iteration.

EXECUTION STATE

total = 0.0

15term = x^(2k+1), starting at k=0

Instead of recomputing x ** (2*k+1) every iteration (which would be O(k) per step, O(n²) total), we keep a running variable term that we multiply by x² each iteration. This is the same trick used in Horner's scheme and saves a lot of multiplications.

EXECUTION STATE

term (k=0) = x = 0.3333

16Precompute x²

x_sq is a constant inside the loop. Computing it once outside the loop is a small but free optimization.

EXECUTION STATE

x_sq = 0.3333² = 0.1111

17The main loop: 25 terms

Each iteration adds one Taylor term to the running sum and advances `term` from x^(2k+1) to x^(2k+3) by multiplying by x².

LOOP TRACE · 4 iterations

k = 0 (first term)

term / (2*0+1) = 0.3333 / 1 = 0.3333

sign (k even) = +1

total after add = 0.3333

term after *= x_sq = 0.3333 × 0.1111 = 0.03704

k = 1 (second term, subtract)

term / (2*1+1) = 0.03704 / 3 = 0.01235

sign (k odd) = -1

total after sub = 0.3333 - 0.01235 = 0.32099

term after *= x_sq = 0.03704 × 0.1111 = 0.004115

k = 2 (third term, add)

term / (2*2+1) = 0.004115 / 5 = 0.000823

sign (k even) = +1

total after add = 0.32099 + 0.000823 = 0.32181

k = 3 — 24 (remaining terms)

Each new |term/(2k+1)| = smaller by factor ~0.111 / (2k+3)

total at k=24 = 0.32175 (converged)

18Add or subtract this term

Alternating signs: + for even k, - for odd k. Python's conditional expression (1 if k % 2 == 0 else -1) is the cleanest way to write this without branching.

19Advance to the next odd power

After this line, `term` now equals x^(2k+3), ready for the next iteration. This in-place update is what makes the loop O(n) instead of O(n²).

EXECUTION STATE

term (k=0 → next) = 0.3333 × 0.1111 = 0.03704

22Undo the reduction

If we replaced x by 1/x earlier, the series gave us arctan(1/x), not arctan(x). The identity says arctan(x) = π/2 - arctan(1/x), so we subtract our partial sum from π/2.

EXECUTION STATE

total (before undo) for x=3 = 0.32175 = arctan(1/3)

total (after undo) = π/2 - 0.32175 = 1.24905 = arctan(3)

23Use a literal π for clarity

We deliberately hard-code π to ~16 digits of float64 precision so the function has zero dependencies. In production you would import math.pi.

26Reapply the sign

Now we use the `sign` we recorded at the very top. If x was originally negative, the answer flips sign. The function is now fully correct for any finite real input.

EXECUTION STATE

Final return (x=-3.0) = -1.24905

30Test 1: arctan(0) = 0

Sanity check. The loop adds the term 0 / 1, then 0 / 3, then 0 / 5, ... all zero. Output: 0.0 exactly. No reduction needed (0 < 1), no sign flip. The function passes the trivial case.

EXECUTION STATE

Expected stdout = 0.0

31Test 2: arctan(1) = π/4

x=1 is the boundary case. No reduction (use_complement=False since x is not strictly > 1). The series 1 - 1/3 + 1/5 - 1/7 + ... is the famous Leibniz series for π/4. With 25 terms it gives ~0.7854, matching π/4 to about 4 decimal places — the slowest-converging case. (That's why we used reduction for x > 1: it pushes the worst case to x ≈ 1, never beyond.)

EXECUTION STATE

Expected stdout = ~0.7854 (≈ π/4)

32Test 3: arctan(10) ≈ 1.4711

x=10 triggers reduction. Internally the function computes arctan(0.1), which converges in just 3-4 terms (because 0.1^3/3 is already ~0.0003), then returns π/2 - arctan(0.1) ≈ 1.5708 - 0.0997 ≈ 1.4711. The true value to 10 digits is 1.4711276743. The argument-reduction trick is what makes this fast and accurate.

EXECUTION STATE

Expected stdout = ~1.4711 (matches math.atan(10))

13 lines without explanation

1def my_arctan(x: float, n_terms: int = 25) -> float:
2    """Compute arctan(x) for any real x, in radians."""
3    # 1. Symmetry: arctan is odd, so handle the sign once and forget about it.
4    sign = 1.0 if x >= 0 else -1.0
5    x = abs(x)
6
7    # 2. Reduction: Taylor series converges slowly for |x| > 1.
8    #    Use the identity arctan(x) = pi/2 - arctan(1/x) to push large
9    #    inputs into the well-behaved region |x| <= 1.
10    use_complement = x > 1.0
11    if use_complement:
12        x = 1.0 / x
13
14    # 3. Maclaurin series: arctan(x) = x - x^3/3 + x^5/5 - x^7/7 + ...
15    total = 0.0
16    term = x          # the running x^(2k+1)
17    x_sq = x * x
18    for k in range(n_terms):
19        total += term / (2 * k + 1) * (1 if k % 2 == 0 else -1)
20        term *= x_sq  # advance from x^(2k+1) to x^(2k+3)
21
22    # 4. Undo the reduction.
23    if use_complement:
24        total = (3.141592653589793 / 2) - total
25
26    return sign * total
27
28
29# Try it
30print(my_arctan(0.0))    # 0.0
31print(my_arctan(1.0))    # ~ pi/4 = 0.7853981...
32print(my_arctan(10.0))   # ~ 1.4711...

Take a moment to appreciate what just happened. We turned the abstract object "the inverse of the tangent function" into a 20-line Python program that any reader can step through with a pencil. The series gives us correctness; the reduction gives us speed; the sign trick gives us coverage of negative inputs. That is what every production atan implementation in libm ultimately is, with more polish for floating-point edge cases.

PyTorch: Vectorised and Differentiable

In machine learning the inverse trig functions show up wherever an angle is hidden inside another quantity: angular regression, orientation prediction in 6-DoF pose estimation, normalizing-flow Jacobians, hyperbolic embeddings. PyTorch ships them as first-class differentiable ops. The key point is that they propagate gradients exactly the way calculus says they should.

Derivatives you'll see in autograd

$\dfrac{d}{dx}\arcsin x = \dfrac{1}{\sqrt{1 - x^2}}$
$\dfrac{d}{dx}\arccos x = -\dfrac{1}{\sqrt{1 - x^2}}$
$\dfrac{d}{dx}\arctan x = \dfrac{1}{1 + x^2}$

We will derive these in chapter 5 (Section 5.8 of this book), but PyTorch already knows them — it can backpropagate through any inverse trig op for free.

Inverse Trig in PyTorch: Batched, Differentiable, atan2 Included

🐍python

Explanation(17)

Code(23)

1Import PyTorch

torch gives us tensor objects with overloaded math operations and an autograd engine that records every op so it can compute gradients on demand. All the inverse trig functions are top-level torch attributes: torch.asin, torch.acos, torch.atan, torch.atan2.

4Build a batch of test ratios

We construct one 1D tensor with four values so we can demonstrate that inverse trig is element-wise vectorised — no loop needed. Default dtype is float32. The values -0.5, 0.0, 0.5, 0.9 all live safely in [-1, 1] so they are valid inputs for asin and acos.

EXECUTION STATE

ratios.dtype = torch.float32

ratios.shape = torch.Size([4])

ratios = tensor([-0.5, 0.0, 0.5, 0.9])

5torch.asin: element-wise arcsin

Applies arcsin to every entry in parallel. On a GPU this runs as a single CUDA kernel; on CPU it uses SIMD. The output has the same shape as the input. For our four inputs the function returns the four corresponding angles in [-π/2, π/2].

EXECUTION STATE

angles_arcsin = tensor([-0.5236, 0.0000, 0.5236, 1.1198])

asin(-0.5) in degrees = ≈ -30.0°

asin(0.9) in degrees = ≈ 64.16° (close to π/2)

6torch.acos: element-wise arccos

Same shape, different range. arccos lives in [0, π] and is DECREASING — bigger ratio means smaller angle. Notice acos(0.9) is smaller than acos(0.5), unlike asin where the trend is reversed.

EXECUTION STATE

angles_arccos = tensor([2.0944, 1.5708, 1.0472, 0.4510])

acos(0.0) = π/2 ≈ 1.5708

acos(0.9) in degrees = ≈ 25.84°

7torch.atan: element-wise arctan

arctan accepts any real (not just [-1, 1]) and stays inside the open interval (-π/2, π/2). For small values |x| < 1 the answer is close to x itself (because arctan(x) ≈ x for small x — first Taylor term).

EXECUTION STATE

angles_arctan = tensor([-0.4636, 0.0000, 0.4636, 0.7328])

atan(0.5) vs 0.5 = 0.4636 — about 7% less

8Print arcsin batch

Expected stdout: 'arcsin: tensor([-0.5236, 0.0000, 0.5236, 1.1198])'. The first three are 30° increments (in radians); the last is close to 64.2°.

9Print arccos batch

Expected stdout: 'arccos: tensor([2.0944, 1.5708, 1.0472, 0.4510])'. The middle entry π/2 is the giveaway that ratio 0.0 maps to the right angle.

10Print arctan batch

Expected stdout: 'arctan: tensor([-0.4636, 0.0000, 0.4636, 0.7328])'. Symmetric about the origin because arctan is odd.

13Set up four (y, x) pairs — one in each quadrant

We deliberately pick the four points (1,1), (1,-1), (-1,-1), (-1,1) — one in each quadrant — so the output will display all four quadrants of atan2.

EXECUTION STATE

y = tensor([1, 1, -1, -1])

x = tensor([1, -1, -1, 1])

(x, y) pairs = (1,1)→Q1, (-1,1)→Q2, (-1,-1)→Q3, (1,-1)→Q4

15torch.atan2(y, x) — element-wise four-quadrant inverse

Notice the argument ORDER is (y, x), not (x, y). This trips up everyone the first time. atan2 inspects the sign of x to decide which half of the plane the angle lives in, then uses arctan(y/x) inside that half. The result for our four points is +45°, +135°, -135°, -45° respectively — every quadrant represented.

EXECUTION STATE

theta (rad) = tensor([0.7854, 2.3562, -2.3562, -0.7854])

theta (deg) = tensor([45.0, 135.0, -135.0, -45.0])

16rad2deg for human-readable output

torch.rad2deg(x) is just x * 180/π element-wise. Useful for printing — internally you should keep angles in radians because every other math op (sin, cos, derivatives) expects radians.

EXECUTION STATE

Expected stdout = tensor([ 45.0000, 135.0000, -135.0000, -45.0000])

19Create an input tensor that REQUIRES a gradient

requires_grad=True tells autograd: 'every op I do with this tensor goes onto the tape, so that later I can ask for dL/dx.' We pick a single scalar x = 0.5 so we can compare the autograd result against the analytic formula by hand.

EXECUTION STATE

x_grad = tensor([0.5], requires_grad=True)

20loss = sum(asin(x_grad))

A scalar loss is what autograd needs in order to call backward(). Here loss = asin(0.5) = π/6 ≈ 0.5236. The .sum() is cosmetic since our tensor has one element, but it's the idiom that scales to batches.

EXECUTION STATE

loss = tensor(0.5236, grad_fn=<SumBackward0>)

loss.grad_fn = SumBackward0 → AsinBackward0 → ...

21loss.backward() — let autograd run

This walks backward through the recorded computation graph and computes dL/dx for every tensor with requires_grad=True. Because L = asin(x), the chain rule gives dL/dx = 1 / sqrt(1 - x²). PyTorch implements this derivative analytically inside AsinBackward, so no numerical differentiation is involved.

22Compute the analytic answer by hand

Plug x = 0.5 into 1/sqrt(1 - x²): 1/sqrt(0.75) = 1/0.86603 ≈ 1.1547. That is the slope of the arcsin curve at x = 0.5.

EXECUTION STATE

1 / sqrt(1 - 0.25) = 1 / 0.86603 ≈ 1.1547

23Print analytic

Expected stdout: 'analytic 1/sqrt(1 - 0.5^2) = 1.1547005383792515'. This is the ground truth.

24Print autograd's answer and compare

Expected stdout: 'autograd dL/dx = 1.1547005176544189'. The two agree to ~6 decimal places — the small mismatch is just float32 (autograd) vs float64 (analytic) rounding. Conclusion: PyTorch's AsinBackward implements exactly the calculus formula 1/sqrt(1 - x²), and you can stack arcsin into any deeper network without worrying about gradients vanishing or being incorrect — as long as x stays away from ±1 where the derivative blows up.

EXECUTION STATE

x_grad.grad.item() = 1.1547 (matches analytic)

6 lines without explanation

1import torch
2
3# 1. Plain element-wise inverse trig on a batch of ratios.
4ratios = torch.tensor([-0.5, 0.0, 0.5, 0.9])
5angles_arcsin = torch.asin(ratios)      # rad in [-pi/2, pi/2]
6angles_arccos = torch.acos(ratios)      # rad in [0, pi]
7angles_arctan = torch.atan(ratios)      # rad in (-pi/2, pi/2)
8print("arcsin:", angles_arcsin)
9print("arccos:", angles_arccos)
10print("arctan:", angles_arctan)
11
12# 2. atan2 works on (y, x) pairs and returns the correct quadrant.
13y = torch.tensor([ 1.0,  1.0, -1.0, -1.0])
14x = torch.tensor([ 1.0, -1.0, -1.0,  1.0])
15theta = torch.atan2(y, x)               # rad in (-pi, pi]
16print("atan2 angles (deg):", torch.rad2deg(theta))
17
18# 3. Backpropagate through arcsin to confirm d/dx arcsin x = 1 / sqrt(1 - x^2).
19x_grad = torch.tensor([0.5], requires_grad=True)
20loss = torch.asin(x_grad).sum()
21loss.backward()
22print("analytic 1/sqrt(1 - 0.5^2) =", 1.0 / (1.0 - 0.5**2) ** 0.5)
23print("autograd dL/dx              =", x_grad.grad.item())

Gradient blow-up near the boundary

The derivative

\dfrac{1}{\sqrt{1 - x^2}}

diverges as

x \to \pm 1

. If you train a network that produces values close to the boundary and then feeds them into asin, your gradients can explode. Common fixes: clamp inputs to

[-1 + \varepsilon, 1 - \varepsilon]

with a small

\varepsilon

, or rephrase the problem in terms of atan2, whose derivative

\dfrac{1}{1 + x^2}

is bounded everywhere.

Real-World Applications

Robotics: inverse kinematics for a 2-link arm

A robotic arm with two segments of lengths $L_1$ and $L_2$ reaches a point $(x, y)$ . The two joint angles are recovered with inverse trig:

\theta_2 = \arccos\left(\dfrac{x^2 + y^2 - L_1^2 - L_2^2}{2 L_1 L_2}\right), \quad \theta_1 = \operatorname{atan2}(y, x) - \arctan\!\left(\dfrac{L_2 \sin\theta_2}{L_1 + L_2 \cos\theta_2}\right)

Notice $\operatorname{atan2}$ , not $\arctan$ — the target can be in any quadrant, including behind the robot. Using plain arctan would silently put the arm in the wrong half-plane.

Computer graphics: surface normals

The angle a surface normal makes with the camera is $\arccos(\mathbf{n} \cdot \mathbf{v})$ for unit vectors. This powers diffuse-lighting calculations in every raster and ray tracer. A pixel's brightness is roughly proportional to $\max(0, \cos\theta)$ where $\theta$ came from an arccos.

Signal processing: phase recovery

A complex number $z = a + bi$ has phase $\varphi = \operatorname{atan2}(b, a)$ . FFT bins, OFDM demodulators, GPS carrier-tracking loops — they all unwrap phases using atan2, and they all subtly break if a developer reaches for arctan instead.

Geolocation: bearing between two GPS points

Given two latitude/longitude pairs, the compass bearing from one to the other is computed with atan2 of the cross-track and along-track displacements on the sphere. Every navigation app you have ever opened uses this formula thousands of times per second.

Machine learning: angular losses

For tasks where the output is an angle (head pose, wind direction, magnetic declination), naive MSE on the raw value fails because of the wraparound: predicting $359^{\circ}$ when the truth is $1^{\circ}$ is a 358° error in raw MSE, but only 2° geometrically. A clean fix: predict the unit-vector $(\cos\theta, \sin\theta)$ and recover the angle with $\operatorname{atan2}$ .

Summary

Function	Domain	Range	Key idea
arcsin(x)	[-1, 1]	[-π/2, π/2]	Reverses sin on the principal branch.
arccos(x)	[-1, 1]	[0, π]	Reverses cos on [0, π] (cos is even, can't use a symmetric slice).
arctan(x)	(-∞, ∞)	(-π/2, π/2)	Soft-clamps the whole real line into a bounded angle.
atan2(y, x)	(y, x) ∈ ℝ² \ {(0,0)}	(-π, π]	Full four-quadrant inverse — always use this for vector→angle.

Key takeaways

Inverse trig functions exist to recover an angle from a ratio. They are the everyday-use direction.
Sin, cos, tan are not invertible globally. Each one is restricted to a principal branch where it is strictly monotone — that branch determines the range of the inverse.
$\sin(\arcsin x) = x$ always, but $\arcsin(\sin x) = x$ only inside the principal branch. Outside, the composition folds the input back into the branch.
$\arctan(y/x)$ is wrong half the time. Always prefer atan2(y, x) when both components are available.
PyTorch's asin, acos, atan, and atan2 are fully differentiable. Their gradients are the textbook derivatives $\pm 1/\sqrt{1-x^2}$ and $1/(1+x^2)$ .
The Pythagorean identities like $\cos(\arcsin x) = \sqrt{1 - x^2}$ let you skip computing the angle altogether when only another trig ratio is needed downstream.

Exercises

Conceptual

Explain in your own words why $\arccos$ uses the range $[0, \pi]$ instead of $[-\tfrac{\pi}{2}, \tfrac{\pi}{2}]$ like $\arcsin$ . Hint: think about what symmetry cosine has that sine doesn't.
Without computing, decide whether each statement is true or false. Justify briefly.
- (a) $\arcsin(\sin(2)) = 2$
- (b) $\sin(\arcsin(0.7)) = 0.7$
- (c) $\arctan(\tan(10)) = 10$
Sketch $\arctan x$ on the entire real line. Mark the two horizontal asymptotes and the point of inflection.

Computational

A ladder of length 5 m leans against a wall. Its base is 1.5 m from the wall. What angle does the ladder make with the ground? Use $\arccos$ and show the steps.
Simplify, leaving no inverse trig in the answer: $\sin(\arccos(x))$ for $x \in [-1, 1]$ .
Evaluate exactly (no calculator):
- (a) $\arcsin\!\left(-\tfrac{\sqrt{3}}{2}\right)$
- (b) $\arctan(1)$
- (c) $\arccos\!\left(\tfrac{1}{2}\right)$
Compute by hand the bearing from the origin to the point $(-3, 4)$ using both $\arctan(y/x)$ and atan2(y, x). Which one gives the geometrically correct angle? By how much do they differ?

Programming

Extend my_arctan from this section to handle x = ±inf and x = NaN gracefully (returning $\pm\pi/2$ and NaN respectively). Test with the edge cases.
Implement my_atan2(y, x) in plain Python by composing your my_arctan with the piecewise sign-correction rules. Verify on the four points (1, 1), (-1, 1), (-1, -1), (1, -1).
Train a tiny PyTorch model that predicts the angle of a 2D unit vector by outputting $(\cos\hat\theta, \sin\hat\theta)$ and recovering $\hat\theta = \operatorname{atan2}$ . Compare its accuracy to a model that predicts $\hat\theta$ directly with MSE. Explain the gap.

Exploration

The Leibniz series for $\pi/4$ is $\arctan(1) = 1 - \tfrac{1}{3} + \tfrac{1}{5} - \dots$ . How many terms do you need to compute $\pi$ to 6 decimal places? Now use Machin's formula $\pi/4 = 4\arctan(1/5) - \arctan(1/239)$ — how many fewer terms are needed?
Investigate the unwrap operation in numpy.unwrap. Why is it needed when you compute the phase of a time-varying complex signal with atan2?

Next we'll meet hyperbolic functions — close cousins to sine and cosine, defined by exponentials instead of circles. They will give us the same kind of story all over again: forward, inverse, identities, derivatives — but with a hyperbola in the picture instead of a unit circle.