why the fuck python be like "import a module to implement features that are basic functionality in languages that see use today"

WhyEssEff [she/her] · edit-2 2 years ago

why the fuck python be like "import a module to implement features that are basic functionality in languages that see use today"

Changeling [it/its] · 2 years ago

The old model

Short version

You have a CPU and a bunch of memory. The CPU has a program counter. It points to the memory address of the current instruction. The instruction gets loaded in. The CPU follows the instruction. And then depending on what the instruction was and how it went, it would either add one to the program counter or set it to a specific address (that’s called a jump).

Long version

You have:

a clock—sends a pulse at some regular interval
a bunch of memory—usually RAM, can retain its values as long as the computer has power, each byte usually has its own address
a program counter—holds the memory address of the current instruction
buses—circuits to transfer bytes between different modules
registers—“working” memory
Arithmetic/Logic Unit—Part of the CPU, cool math stuff
Control Unit—Part of the CPU, decides what to do next

You start up the machine, the program counter defaults to some preset value (usually 0x00). The instruction at the program counter is loaded into the CU. The CU decides what needs done. It could load something into a bus from memory, push some value into a register, instruction the ALU to add two registers together, etc. These are all system-specific possibilities and can vary pretty greatly.

The CU may take a different number of cycles depending on the instruction, but at the end of each instruction, it will either increment the program counter by 1 or it will jump to a specific address based on the outcome of the instruction. Some instructions will always result in a jump. Some never will. Some are conditional. That all depends on the machine you’re writing for.

This all repeats over and over until the instruction that gets loaded in is a halt instruction. And that’s it.

Cool videos to learn more

Ben Eater’s 23-part series where he designs and builds a custom computer from scratch on breadboards.
LostT’s DEF CON 24 talk titled “How to build a processor in 10 minutes or less”
Matt Parker’s Numberphile video where he builds computer circuits from dominoes to show the fundamental ideas there.
Matt Parker domino computer, but bigger

Stack and heap

The stack and the heap aren’t part of the modern complications that I mentioned and are way too old to say otherwise. However, if you go back to those old school personal computers, having dedicated stacks wasn’t always a given.

This StackOverflow post is a decent explanation of what they are and how they work.

Modern complications

It used to be that each instruction took a preset amount of clock cycles. But that would leave some modules doing absolutely nothing for extended periods of time. So instead, if a CPU could prove that, for example, a certain section of ALU code would run the same way no matter what the previous code did, it would run it early and let the ALU work on it while the previous code finished. And this worked even better if code wasn’t necessarily executed in the exact specified order, but instead in an equivalent order. This is the basis for pipelines.

Now, as people who maintained instruction sets grew their instruction sets over time, they found that it was often easier to, instead of create new circuitry for every single new instruction, just translate new instructions into a series of simpler old instructions. This would be slow to do with every instruction, but modern computers load a bunch of instructions at once and then cache their results. This whole situation is called microcode. Your instructions can technically get translated into an arbitrary set of largely undocumented instructions. Usually it’s nbd and makes things faster. Sometimes, it involves accusations of the NSA adding back doors into your encryption instructions.

Memory virtualization, I think, is an OS thing that I don’t super understand but it basically means that memory is no longer an index into the giant array of memory that is your RAM.

Memory locality and cache misses are things that affect speed. Basically, your CPU loads memory in chunks, so if you can store related memory close together, it will cut down on reads because those addresses will be in cache. This seems like a small detail, but it’s really hard to understand how much accounting for this can speed up your code.