This post describes how to write a basic PostgreSQL function for parsing a JSON into Sphinx keywords(space-separated values of the JSON object).
Requirements
- Perl core
- Perl JSON module
- PostgreSQL built with JSON support
Sample Problem
We have workers table:
CREATE TABLE workers( id integer, props text );where
props keeps arbitrary worker properties in JSON format.
Let's insert a worker:
INSERT INTO workers VALUES(1, '{"x":"x test", "d":104, "o":{"o1":"o1 name"}, "a":["a1", "a2"]}');
We want to make the worker properties searchable through Sphinx. So we add corresponding PostgreSQL source and an index in our sphinx.conf:
source src_workers: pgsql
{
sql_query_pre = set work_mem = '100MB'
sql_query = select id, props from workers
sql_field_string = props
}
index idx_workers
{
source = src_testing
path = /home/ruslan/sphinxsearch/idx_workers
min_prefix_len = 3
docinfo = extern
}
With the current sql_query the props field will be cluttered with JSON keys and specific characters. We'll write a PostgreSQL function which will give us a single line of space-separated values:
$ createlang plperlu t # < "t" is a database name
$ psql -d t
create or replace function json2sphkw(text,text) returns text as $$
use JSON qw( decode_json );
my ($in) = @_;
$_SHARED{read_sub} = sub {
my ($in) = @_[0];
if (ref($in) eq "HASH") {
my @res, $read_sub = $_SHARED{read_sub};
foreach $v (values %$in) {
push @res, &$read_sub($v);
}
return join(' ', @res);
} elsif (ref($in) eq "ARRAY") {
my @res, $read_sub = $_SHARED{read_sub};
foreach $v (values @$in) { push @res, &$read_sub($v); }
return join(' ', @res);
}
return $in;
};
my $h = decode_json $in;
my $read_sub = $_SHARED{read_sub};
return &$read_sub($h);
$$ LANGUAGE plperlu;
Now we can use it:
SELECT json2sphkw(props, ' ') FROM workers WHERE id = 1;
json2sphkw
--------------------------
o1 name a1 a2 x test 104
(1 row)
And the sphinx.conf can be changed as follows:
@@ -1,7 +1,7 @@
source src_workers: pgsql
{
sql_query_pre = set work_mem = '100MB'
- sql_query = select id, props from workers
+ sql_query = select id, json2sphkw(props, ' ') from workers
sql_field_string = props
}