DataBase Notes: Storage


  1. Relational Databases
  2. Storage
  3. Execution
  4. Concurrency Control
  5. Recovery
  6. Distributed databases


  1. random
  2. sequential
  • random read is important.


  • OK, some databases use it
  • No, we cannot control the memory-disk ourself

mmap + msync + madvise.

  • IMDB, influxdb, sqlite, mongodb use it.


  1. File storage
  2. Page layout
  3. tuple layout
  4. data representation
  5. system catalogs

Storage manager:

  1. maintaining 1 or more database files
  2. organize files like collection of pages.

Page is like page in System, but managed by database.

Organization of pages in memory: Page manage like operating systems… You can have pd and pe.

For bulk operations in index order a larger page size can give a noticeable, sometimes significant, performance increase. For very random access, particularly many small writes, there is similar potential for significant performance reduction.

Page structure:

Header | Data

Header fixed size.

Page layout:

  1. tuple-oriented
  2. log-structured

tuple can be stored SLOTTED

Header || slots —> || <— tuples ||

Log structure like bitcask, I wrote it XD.

  • build indexs
  • periodically compact logs

Tuple layout:

  1. header for:
    1. visibility info
    2. bit map

we can denormalize the tuple data, if b->a, you can store B-A, called pre-join.

mongodb use it.

Recordid : Page + bios/slot

  • tuple: sequential of bytes.
  • data storage
    • TIME/DATE/TIMESTAMP: 32/64 bits unix epoch
    • varchar/TEXT/BLOB: header with length and follow by data
    • can use overflow to store data outside
    • can use external files

DB has oltp and olap, here is image:

Storage model:

n-ary: called nsm, row storage.

dsm: use fixed-length offset or tuple id to represent

buffer pool

memory <—> disk

  • Spacial control
  • temporal control

Page table has:

  • dirty flag
  • Pin/Reference counter

FetchPage and UnpinPage will be used to manage pages.

I’m not familiar with pre-fetching, so don’t ask me…

I also know nothing about scan sharing…

most os use O_DIRECT to avoid system cache.