This post describes how to write a basic PostgreSQL function for parsing a JSON into Sphinx keywords(space-separated values of the JSON object).
Requirements
- Perl core
- Perl JSON module
- PostgreSQL built with JSON support
Sample Problem
We have workers
table:
CREATE TABLE workers( id integer, props text );where
props
keeps arbitrary worker properties in JSON format.
Let's insert a worker:
INSERT INTO workers VALUES(1, '{"x":"x test", "d":104, "o":{"o1":"o1 name"}, "a":["a1", "a2"]}');
We want to make the worker properties searchable through Sphinx. So we add corresponding PostgreSQL source and an index in our sphinx.conf
:
source src_workers: pgsql { sql_query_pre = set work_mem = '100MB' sql_query = select id, props from workers sql_field_string = props } index idx_workers { source = src_testing path = /home/ruslan/sphinxsearch/idx_workers min_prefix_len = 3 docinfo = extern }
With the current sql_query
the props
field will be cluttered with JSON keys and specific characters. We'll write a PostgreSQL function which will give us a single line of space-separated values:
$ createlang plperlu t # < "t" is a database name $ psql -d t create or replace function json2sphkw(text,text) returns text as $$ use JSON qw( decode_json ); my ($in) = @_; $_SHARED{read_sub} = sub { my ($in) = @_[0]; if (ref($in) eq "HASH") { my @res, $read_sub = $_SHARED{read_sub}; foreach $v (values %$in) { push @res, &$read_sub($v); } return join(' ', @res); } elsif (ref($in) eq "ARRAY") { my @res, $read_sub = $_SHARED{read_sub}; foreach $v (values @$in) { push @res, &$read_sub($v); } return join(' ', @res); } return $in; }; my $h = decode_json $in; my $read_sub = $_SHARED{read_sub}; return &$read_sub($h); $$ LANGUAGE plperlu;
Now we can use it:
SELECT json2sphkw(props, ' ') FROM workers WHERE id = 1; json2sphkw -------------------------- o1 name a1 a2 x test 104 (1 row)And the
sphinx.conf
can be changed as follows:
@@ -1,7 +1,7 @@ source src_workers: pgsql { sql_query_pre = set work_mem = '100MB' - sql_query = select id, props from workers + sql_query = select id, json2sphkw(props, ' ') from workers sql_field_string = props }