My Technical Blogs: Postgres Performance Improvement Tips for Large Data Set Tables

Problem Statement

Slow running queries against a large data set tables (>5 GB)

Areas You May Have Already Looked into

SQL Query Tuning
and JDBC Code Optimization

Areas to Explore More

DB Design

Disk I/O

Number of Columns in a Table
Choose right Data Types for Columns

Tune DB Buffer Size

To improve Caching

Standard recommended solution to deal with large data set tables

Table Partitioning

Since at a certain point you realize the data size in the database is the bottleneck

Application Design

Tune JDBC Code and Design
Apply Application Level Cache

How Postgres Stores Data

In order to understand the performance improvements suggestion, first we need to understand some of the basic architecture of PostgreSQL:

Disk Files

All table, indexes are stored in OS files
Files can find under: <Postgres home>/pgdata/base/

Page Size

The size of a page is fixed at 8,192 bytes (8K)

you can increase or decrease the page size if you build your own copy of PostgreSQL from source, but all pages within a database will be the same size

All disk I/O is performed on a page-by-page basis, when you select a single row from a table, PostgreSQL will read at least one page
Heap page & Index Page

Two different pages one for actual table data and another for index data

Heap and Index Cache Hits

Disk I/O is expensive
Caches heap and index pages

Postgres itself tracks access patterns of your data and will on its own keep frequently accessed data in cache
Memory access is fast; disk access is slow.

Insertion order matters for effective caching

Minimize Disk I/O

Normalize Database

Remove unnecessary columns

Choose Right Data Types

Avoid using larger size data type if data values are small and can fit into smaller data types
It has a direct impact on Cache hits

Tune DB Buffer Size

Adjust the DB Buffer Size

To improve heap block and index block cache hits
You could increase the cache size to be large enough to hold the entire table, but that's a large shared memory segment, and if you don't have enough physical memory, your system will start to thrash

Dynamic Table Partition For Large Data Set Table

Table partition standard solution for large data set tables
The average number of heap/index blocks you'll have to navigate in order to find a row goes down
Partition also benefits on choosing the right scan types during query
There are some maintenance advantages too. You can DROP an individual partition, to erase all of the data from that range. This is a common technique for pruning historical data out of a partitioned table, one that avoids the VACUUM cleanup work that DELETE leaves behind.
Dynamic partition rules can be setup which minimizes maintenance overhead and transparent to application layer
Tips:

On what column to partition matters
No. of partitions should not be large ( < 100)
Race condition if two separate transactions inserts, may get two child partition tables with the same name if not taken care properly

My Technical Blogs

Sunday, 1 June 2014

Postgres Performance Improvement Tips for Large Data Set Tables

Problem Statement

Areas You May Have Already Looked into

Areas to Explore More

How Postgres Stores Data

In order to understand the performance improvements suggestion, first we need to understand some of the basic architecture of PostgreSQL:

Minimize Disk I/O

Tune DB Buffer Size

Dynamic Table Partition For Large Data Set Table

Thank You!

No comments:

Post a Comment