Schedule - PGConf.EU 2013Next generation of GIN
Date: 2013-10-31
This talk presents set of advances which significantly improves GIN index. Their primary target is to make full-text search (FTS) in PostgreSQL to be as fast as it's in stand-alone solutions such as Sphinx and Solr. However it has many other applications. The set of advances is following: * Compression of item pointers in index * Store additional information in posting trees and posting lists * Fast scan: skip parts of posting trees during scan * Sorting result in index These advances in GIN leads to following benefits to GIN indexes: * Indexes will become about 2 time smaller without any work in opclass. * Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. * Fast scan dramatically GIN search in "frequest_term & rare_term" case. * Usage of additional information for sorting in index accelerates ranking in FTS and dramatically reduces its IO. We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine. SpeakerAlexander KorotkovOleg Bartunov |