Date: 2018-10-26
Time: 09:30–10:20
Room: New York
Level: Intermediate
Since the release of v9.6 we have seen numerous articles and talks on the benefits of parallel query in PostgreSQL. Still, I could not help but notice the queries and discussions where it is either misused, or is over-expected to perform. In this talk we will discuss how to better utilise intra-query parallelism to enhance the query performance without over-using it. First we will explore the cases where parallel scans/joins are not required and if forced will be likely to degrade query performance. Next, we will talk of the mistakes or assumptions easy to make when using this feature. Additionally, we will suggest best practices for fine tuning this in PostgreSQL.
Later in talk we will discuss the areas where PostgreSQL stands in terms of parallel operators with reference to other database engines. For instance, till date PostgreSQL can use parallel scans on only one side of the join, we will discuss the challenges and benefits of using parallelism on both the sides. Other than that, parallel query infrastructure in PostgreSQL doesn’t allow inter-worker communication which inhibits many of the parallelism benefits, like parallel operations using data redistribution. With data redistribution it can perform a number of smaller joins and collect their data for the final output, this has proven to be one of the high performing strategy by many commercial database engines. We will discuss how PostgreSQL makes up for that with its other supported features.
Finally, the talk will conclude with a brief summary of how intra-query parallelism is a step toward making PostgreSQL more OLAP compliant, and the common mistakes and assumptions to avoid while using this feature for your environment to get the best from this feature.
The following slides have been made available for this session: