I'm trying to learn PostgreSQL to get a jerb. I've used MySQL a little bit before but I'm by no means a database expert.
I installed it on my machine to the default location /var/lib/postgres
and followed the instructions on the Arch Wiki to create a database. Then I took a peak inside the directory:
[:/var/lib/postgres]$ ls -lh
total 0
drwx------ 1 postgres postgres 512 Feb 25 23:40 data
[:/var/lib/postgres]$ cd data
[:/var/lib/postgres/data]$ ls -lh
total 56K
drwx------ 1 postgres postgres 32 Feb 23 19:08 base
drwx------ 1 postgres postgres 606 Feb 25 23:41 global
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_commit_ts
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_dynshmem
-rw------- 1 postgres postgres 4.7K Feb 23 18:44 pg_hba.conf
-rw------- 1 postgres postgres 1.6K Feb 23 18:44 pg_ident.conf
drwx------ 1 postgres postgres 76 Feb 25 23:45 pg_logical
drwx------ 1 postgres postgres 28 Feb 23 18:44 pg_multixact
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_notify
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_replslot
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_serial
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_snapshots
drwx------ 1 postgres postgres 0 Feb 25 23:40 pg_stat
drwx------ 1 postgres postgres 92 Feb 26 20:39 pg_stat_tmp
drwx------ 1 postgres postgres 8 Feb 23 18:44 pg_subtrans
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_tblspc
drwx------ 1 postgres postgres 0 Feb 23 18:44 pg_twophase
-rw------- 1 postgres postgres 3 Feb 23 18:44 PG_VERSION
drwx------ 1 postgres postgres 76 Feb 23 18:44 pg_wal
drwx------ 1 postgres postgres 8 Feb 23 18:44 pg_xact
-rw------- 1 postgres postgres 88 Feb 23 18:44 postgresql.auto.conf
-rw------- 1 postgres postgres 28K Feb 23 18:44 postgresql.conf
-rw------- 1 postgres postgres 48 Feb 25 23:40 postmaster.opts
-rw------- 1 postgres postgres 101 Feb 25 23:40 postmaster.pid
Holy hell! What is all this junk? Where does my newly created database "live"? I don't see any sign of it anywhere. If I create tables in my database, where do those live? Etc.
It reminds me of what the inside of a .git directory looks like: a bunch of binary gobbledygunk. Is this stuff not supposed to be human readable?
Right. The database back-end is practically its own filesystem. How does NTFS, Ext4, or BTRFS work internally? Hell if I know! The database back-end isn't constrained by such long-lived rigid specifications though. It can change from version to version as new features and improved algorithms are implemented. The point of it is to make random access as fast as possible for datasets which are too large to be stored entirely in memory. There is a lot of caching involved, both in memory and on the disk. Also the file layouts need to be designed so data can be inserted and deleted as efficiently - with as little "reformatting" or "garbage collection" as possible. A high performance database is probably one of the most complex types of computer software in existence.