# Treachery of Whales

## Part 1

Given a list of integers $a_{1},a_{2},a_{3},…,a_{n}$, find an integer $x∈Z$ which minimizes $∣a_{1}−x∣+∣a_{2}−x∣+⋯$.

I reasoned as follows:

if we have found such an $x$, then either $x+1$ must also be minimal or it
increases the cost. If it is also minimal, meaning the cost didn’t go up it
means we have half the values on one side and the other half on the other, so
we’re “in between” the two medians. If it does increase the cost there must be
an imbalance now — more values on one side, as each step (that isn’t on a
point) increases the cost by the number of values we move away from and
decreases it by the ones we move toward. $x+1$ is now beyond the median point,
so it *was* at the median exactly. The same goes for $x−1$. So we find the
values that minimizes the cost is either only the direct median value itself (if
there is one), or it’s the entire band between (and including) the two shared
median points.

I could probably have explained that better, but whatever.

```
aoc = __import__('aoc-common').aoc()
import numpy as np
text = aoc.input_text(7)
nums = np.array(text.strip().split(','), np.int64)
pseudomed = np.sort(nums)[len(nums)//2] # or indeed, int(np.median(nums))
answer1 = np.abs(nums - pseudomed).sum()
print(f'answer to part 1: {answer1}')
```

Given the nature of AoC, I suspected the problem data would be crafted such that there is only one possible answer though.

## Part 2

Given a list of integers $a_{1},a_{2},a_{3},…,a_{n}$, find an integer $x∈Z$ which minimizes $c(∣a_{1}−x∣)+c(∣a_{2}−x∣)+⋯$ where $c(n)=1+2+⋯+n=2n(n+1) $.

This was a lot harder to reason about and nothing came to me immediately, other than what I already “know” — I had some vague idea that “the mean minimizes the square error,” so I tried to plug that in and it worked. It makes some intuitive sense but I’m not sure I could prove why it is so on the spot, just looking at it here while writing these notes. Hmm, I feel it’s something I ought to be able to show though.

Now the “cost” (error function) is not exactly $∣a_{i}−x∣_{2}$ here, but rather $∣a_{i}−x∣_{2}+∣a_{i}−x∣$. And this will shift the minimum a little when the numbers are not not balanced.

OK, here is where the proper imposter syndrome is starting to set in and I absolutely feel like I’ve failed this problem now, because I’m at a loss for how to prove (even informally) the limit of this bound.

Experimentally it seems that $∣μ−ν∣≤21 $ where $μ$ is the arithmetic mean and $ν∈R$ is the true minimum. Experimentally, and even more curiously, the difference $∣μ−ν∣$ seems to be expressed as some sum of fractions on the form $n(n+1)1 $ (??), based on the how many numbers are less than the mean versus how many are greater than the mean.

It’s not obvious to me why or how, and have now already spent over two hours thinking about this problem and typing this out, so I’m giving up and I’ll just settle for being too stupid to figure it out. I’ve never been good at dealing with bounds.

```
cost = lambda x: x*(x+1)//2
est = nums.mean()
center = min(cost(np.abs(nums - x)).sum() for x in [int(est)-1, int(est), int(est)+1]) # bounds???
print(f'answer to part 2: {center}')
```

Edit: no, fuck, fuck AoC, it can’t be that hard, I will figure it out.
*Otherwise I am worthless.*

OK so. Say. Say we’re in $R$. We’re a little number on the $R$ number line.
We have the other given numbers here, to the left and right. We’re not *on* any
of the given numbers, we’re not an edge case, we’re snugly in between somewhere,
so we can consider some neighborhood… Now. Now we can get rid off all the
stupid absolute values that confused me, because it’s fixed which are negative
and which are positive. So we have a regular polynomials—

Oh God, I am so stupid. Of course. This is basic high school math. Stupid, stupid, stupid. We take the derivate to find the minimum…

$[i=1∑n (a_{i}−x)_{2}]_{′}=i=1∑n 2x−2a_{i}=2n(x−μ)$

So, yes, the regular square error is minimized when $x=μ$.

Now for our expression,

$[i=1∑n (a_{i}−x)_{2}+s_{i}(a_{i}−x)]_{′}=i=1∑n 2x−2a_{i}+s_{i}=2n(x−μ)+i=1∑n s_{i}$

Where $s_{i}$ is $±1$ depending on where $x$ is in relation to $a_{i}$. The sum can be at most $n−1$ in magnitude. So…

$2n(x−μ)−n<2n(x−μ)+i=1∑n s_{i}<2n(x−μ)+n$

$−21 <x−μ<21 $

Good. I’m just gonna handwave whatever edge cases might exist; if we’re “on” a given number, I guess the $s_{i}$ will just be 0 at that point, so it doesn’t matter; and I’m gonna ignore the details of broken derivate at those points. I’m satisfied, at least. And I’m sick to death of this stupid problem.

```
cost = lambda x: x*(x+1)//2
est = nums.mean()
center = min(cost(np.abs(nums - x)).sum() for x in [int(est), int(est)+1])
print(f'answer to part 2: {center}')
```