Why do we need a separate datastructure like B-Tree for database and file system?

Technology CommunityCategory: TreesWhy do we need a separate datastructure like B-Tree for database and file system?
VietMX Staff asked 3 years ago

The main reason for the existence of B-Trees is to better utilize the behaviour of devices that read and write large chunks of data. Two properties are important to make the B-Tree better than binary trees when data has to be stored on disk:

  • Access to disk is really slow (compared to memory or caches, random access to data on disk is orders of magnitude slower); and
  • Every single read causes a whole sector to be loaded from the drive – assuming a sector size of 4K, this means 1000 integers, or tens of some larger objects you’re storing.

Hence, we can use the pros of the second fact, while also minimizing the cons – i.e. number of disk accesses.

So, instead of just storing a single number in every node that tells us if we should continue to the left or to the right, we can create a bigger index that tells us if we should continue to the first 1/100, to the second or to the 99-th (imagine books in a library sorted by their first letter, then by the second, and so on). As long as all this data fits on a single sector, it will be loaded anyway, so we might as well use it completely.

This results in roughly logb N lookups, where N is the number of records. This number, while asymptotically the same as log N, is actually a few times smaller with large enough N and b – and since we’re talking about storing data to disk for use in databases, etc., the amount of data is usually large enough to justify this.