This post describes how to write a basic PostgreSQL function for parsing a JSON into Sphinx keywords(space-separated values of the JSON object).
Requirements
- Perl core
 - Perl JSON module
 - PostgreSQL built with JSON support
 
Sample Problem
We have workers table:
CREATE TABLE workers( id integer, props text );where
props keeps arbitrary worker properties in JSON format.
Let's insert a worker:
INSERT INTO workers VALUES(1, '{"x":"x test", "d":104, "o":{"o1":"o1 name"}, "a":["a1", "a2"]}');
We want to make the worker properties searchable through Sphinx. So we add corresponding PostgreSQL source and an index in our sphinx.conf:
source src_workers: pgsql
{
  sql_query_pre = set work_mem = '100MB'
  sql_query = select id, props from workers
  sql_field_string = props
}
index idx_workers
{
  source = src_testing
  path = /home/ruslan/sphinxsearch/idx_workers
  min_prefix_len = 3
  docinfo = extern
}
With the current sql_query the props field will be cluttered with JSON keys and specific characters. We'll write a PostgreSQL function which will give us a single line of space-separated values:
$ createlang plperlu t # < "t" is a database name
$ psql -d t
create or replace function json2sphkw(text,text) returns text as $$
use JSON qw( decode_json );
my ($in) = @_;
$_SHARED{read_sub} = sub {
  my ($in) = @_[0];
  if (ref($in) eq "HASH") {
    my @res, $read_sub = $_SHARED{read_sub};
    foreach $v (values %$in) {
      push @res, &$read_sub($v);
    }
    return join(' ', @res);
  } elsif (ref($in) eq "ARRAY") {
    my @res, $read_sub = $_SHARED{read_sub};
    foreach $v (values @$in) { push @res, &$read_sub($v); }
    return join(' ', @res);
  }
  return $in;
};
my $h = decode_json $in;
my $read_sub = $_SHARED{read_sub};
return &$read_sub($h);
$$ LANGUAGE plperlu;
Now we can use it:
SELECT json2sphkw(props, ' ') FROM workers WHERE id = 1;
        json2sphkw
--------------------------
 o1 name a1 a2 x test 104
(1 row)
And the sphinx.conf can be changed as follows:
@@ -1,7 +1,7 @@
 source src_workers: pgsql
 {
   sql_query_pre = set work_mem = '100MB'
-  sql_query = select id, props from workers
+  sql_query = select id, json2sphkw(props, ' ') from workers
   sql_field_string = props
 }