A Deep Dive into Postgres Statistics
Thursday, October 24 at 11:50–12:40
You may have heard that Postgres keeps statistics on your data to help choose a query plan. But have you ever wondered how Postgres decides which statistics to keep, or the exact influence those statistics have on a query plan? Did you know that your favorite database stores different types of statistics that can be used in the query plan?
You may know that you can CREATE STATISTICS to manually tell Postgres what to do, but it's not always obvious when you should do that instead of relying on the defaults. It can be even trickier when working with a client's database where the relationship between tables and columns aren't immediately obvious.
What I want you to learn today is exactly that: how can you know if Postgres is keeping the correct statistics when you don't necessarily know the data. The best way to answer all of these questions is by looking directly at Postgres source code. By looking at the source of truth itself, we can really understand exactly what is going on under the hood. We'll learn about soft dependencies, most common values list, and a lot of other fun "math" stuff.