NAME
ElasticSearch - An API for communicating with ElasticSearch
VERSION
Version 0.53, tested against ElasticSearch server version 0.19.2.
DESCRIPTION
ElasticSearch is an Open Source (Apache 2 license), distributed, RESTful
Search Engine based on Lucene, and built for the cloud, with a JSON API.
Check out its features:
This module is a thin API which makes it easy to communicate with an
ElasticSearch cluster.
It maintains a list of all servers/nodes in the ElasticSearch cluster,
and spreads the load across these nodes in round-robin fashion. If the
current active node disappears, then it attempts to connect to another
node in the list.
Forking a process triggers a server list refresh, and a new connection
to a randomly chosen node in the list.
SYNOPSIS
use ElasticSearch;
my $es = ElasticSearch->new(
servers => 'search.foo.com:9200', # default '127.0.0.1:9200'
transport => 'http' # default 'http'
| 'httplite'
| 'httptiny'
| 'curl'
| 'aehttp'
| 'aecurl'
| 'thrift',
max_requests => 10_000, # default 10_000
trace_calls => 'log_file',
no_refresh => 0 | 1,
);
$es->index(
index => 'twitter',
type => 'tweet',
id => 1,
data => {
user => 'kimchy',
post_date => '2009-11-15T14:12:12',
message => 'trying out Elastic Search'
}
);
$data = $es->get(
index => 'twitter',
type => 'tweet',
id => 1
);
# native elasticsearch query language
$results = $es->search(
index => 'twitter',
type => 'tweet',
query => {
text => { user => 'kimchy' }
}
);
# ElasticSearch::SearchBuilder Perlish query language
$results = $es->search(
index => 'twitter',
type => 'tweet',
queryb => {
message => 'Perl API',
user => 'kimchy',
post_date => {
'>' => '2010-01-01',
'<=' => '2011-01-01',
}
}
);
$dodgy_qs = "foo AND AND bar";
$results = $es->search(
index => 'twitter',
type => 'tweet',
query => {
query_string => {
query => $es->query_parser->filter($dodgy_qs)
},
}
);
See the "examples/" directory for a simple working example.
GETTING ElasticSearch
You can download the latest released version of ElasticSearch from
.
See here for setup instructions:
CALLING CONVENTIONS
I've tried to follow the same terminology as used in the ElasticSearch
docs when naming methods, so it should be easy to tie the two together.
Some methods require a specific "index" and a specific "type", while
others allow a list of indices or types, or allow you to specify all
indices or types. I distinguish between them as follows:
$es->method( index => multi, type => single, ...)
"single" values must be a scalar, and are required parameters
type => 'tweet'
"multi" values can be:
index => 'twitter' # specific index
index => ['twitter','user'] # list of indices
index => undef # (or not specified) = all indices
"multi_req" values work like "multi" values, but at least one value is
required, so:
index => 'twitter' # specific index
index => ['twitter','user'] # list of indices
index => '_all' # all indices
index => [] # error
index => undef # error
Also, see "use_index()/use_type()".
as_json
If you pass "as_json => 1" to any request to the ElasticSearch server,
it will return the raw UTF8-decoded JSON response, rather than a Perl
datastructure.
RETURN VALUES AND EXCEPTIONS
Methods that query the ElasticSearch cluster return the raw data
structure that the cluster returns. This may change in the future, but
as these data structures are still in flux, I thought it safer not to
try to interpret.
Anything that is known to be an error throws an exception, eg trying to
delete a non-existent index.
INTEGRATION WITH ElasticSearch::SearchBuilder
ElasticSearch::SearchBuilder provides a concise Perlish
SQL::Abstract-style query language, which gets translated into the
native Query DSL
that
ElasticSearch uses.
For instance:
{
content => 'search keywords',
-filter => {
tags => ['perl','ruby'],
date => {
'>' => '2010-01-01',
'<=' => '2011-01-01'
},
}
}
Would be translated to:
{ query => {
filtered => {
query => { text => { content => "search keywords" } },
filter => {
and => [
{ terms => { tags => ["perl", "ruby"] } },
{ numeric_range => {
date => {
gt => "2010-01-01",
lte => "2011-01-01"
}}},
],
}
}}}
All you have to do to start using ElasticSearch::SearchBuilder is to
change your "query" or "filter" parameter to "queryb" or "filterb"
(where the extra "b" stands for "builder"):
$es->search(
queryb => { content => 'keywords' }
)
If you want to see what your SearchBuilder-style query is being
converted into, you can either use "trace_calls()" or access it directly
with:
$native_query = $es->builder->query( $query )
$native_filter = $es->builder->filter( $filter )
See the ElasticSearch::SearchBuilder docs for more information about the
syntax.
METHODS
Creating a new ElasticSearch instance
new()
$es = ElasticSearch->new(
transport => 'http',
servers => '127.0.0.1:9200' # single server
| ['es1.foo.com:9200',
'es2.foo.com:9200'], # multiple servers
trace_calls => 1 | '/path/to/log/file' | $fh
timeout => 30,
max_requests => 10_000, # refresh server list
# after max_requests
no_refresh => 0 | 1 # don't retrieve the live
# server list. Instead, use
# just the servers specified
);
"servers" can be either a single server or an ARRAY ref with a list of
servers. If not specified, then it defaults to "localhost" and the port
for the specified transport (eg 9200 for "http*" or 9500 for "thrift").
These servers are used in a round-robin fashion. If any server fails to
connect, then the other servers in the list are tried, and if any
succeeds, then a list of all servers/nodes currently known to the
ElasticSearch cluster are retrieved and stored.
Every "max_requests" (default 10,000) this list of known nodes is
refreshed automatically. To disable this automatic refresh, you can set
"max_requests" to 0.
To force a lookup of live nodes, you can do:
$es->refresh_servers();
no_refresh()
Regardless of the "max_requests" setting, a list of live nodes will
still be retrieved on the first request. This may not be desirable
behaviour if, for instance, you are connecting to remote servers which
use internal IP addresses, or which don't allow remote "nodes()"
requests.
If you want to disable this behaviour completely, set "no_refresh" to 1,
in which case the transport module will round robin through the
"servers" list only. Failed nodes will be removed from the list (but
added back in every "max_requests" or when all nodes have failed).
Transport Backends
There are various "transport" backends that ElasticSearch can use:
"http" (the default, based on LWP), "httplite" (based on HTTP::Lite),
"httptiny" (based on HTTP::Tiny), "curl" (based on WWW::Curl), "aehttp"
(based on AnyEvent::HTTP), "aecurl" (based on AnyEvent::Curl::Multi) and
"thrift" (which uses the Thrift protocol).
Although the "thrift" interface has the right buzzwords (binary,
compact, sockets), the generated Perl code is very slow. Until that is
improved, I recommend one of the "http" backends instead.
The "httplite" backend is about 30% faster than the default "http"
backend, and will probably become the default after more testing in
production.
The "httptiny" backend is 1% faster again than "httplite".
See also: ElasticSearch::Transport, "timeout()", "trace_calls()",
and
Document-indexing methods
index()
$result = $es->index(
index => single,
type => single,
id => $document_id, # optional, otherwise auto-generated
data => {
key => value,
...
},
# optional
consistency => 'quorum' | 'one' | 'all',
create => 0 | 1,
parent => $parent,
percolate => $percolate,
refresh => 0 | 1,
replication => 'sync' | 'async',
routing => $routing,
timeout => eg '1m' or '10s'
version => int,
version_type => 'internal' | 'external',
);
eg:
$result = $es->index(
index => 'twitter',
type => 'tweet',
id => 1,
data => {
user => 'kimchy',
post_date => '2009-11-15T14:12:12',
message => 'trying out Elastic Search'
},
);
Used to add a document to a specific "index" as a specific "type" with a
specific "id". If the "index/type/id" combination already exists, then
that document is updated, otherwise it is created.
Note:
* If the "id" is not specified, then ElasticSearch autogenerates a
unique ID and a new document is always created.
* If "version" is passed, and the current version in ElasticSearch is
different, then a "Conflict" error will be thrown.
* "data" can also be a raw JSON encoded string (but ensure that it is
correctly encoded, otherwise you see errors when trying to retrieve
it from ElasticSearch).
$es->index(
index => 'foo',
type => 'bar',
id => 1,
data => '{"foo":"bar"}'
);
See also:
, "bulk()"
and "put_mapping()"
set()
"set()" is a synonym for "index()"
create()
$result = $es->create(
index => single,
type => single,
id => $document_id, # optional, otherwise auto-generated
data => {
key => value,
...
},
# optional
consistency => 'quorum' | 'one' | 'all',
parent => $parent,
percolate => $percolate,
refresh => 0 | 1,
replication => 'sync' | 'async',
routing => $routing,
timeout => eg '1m' or '10s',
version => int,
version_type => 'internal' | 'external',
);
eg:
$result = $es->create(
index => 'twitter',
type => 'tweet',
id => 1,
data => {
user => 'kimchy',
post_date => '2009-11-15T14:12:12',
message => 'trying out Elastic Search'
},
);
Used to add a NEW document to a specific "index" as a specific "type"
with a specific "id". If the "index/type/id" combination already exists,
then a "Conflict" error is thrown.
If the "id" is not specified, then ElasticSearch autogenerates a unique
ID.
If you pass a "version" parameter to "create", then it must be 0 unless
you also set "version_type" to "external".
See also: "index()"
update()
$result = $es->update(
index => single,
type => single,
id => single,
# required
script => $script,
# optional
params => { params },
consistency => 'quorum' | 'one' | 'all'
ignore_missing => 0 | 1
parent => $parent,
percolate => $percolate,
retry_on_conflict => 2,
routing => $routing,
timeout => '10s',
replication => 'sync' | 'async'
)
The "update()" method accepts a script which will update a single doc
without having to retrieve and reindex the doc yourself, eg:
$es->update(
index => 'test',
type => 'foo',
id => 123,
script => 'ctx._source.tags+=[tag]',
params => { tag => 'red' }
);
See for
more.
get()
$result = $es->get(
index => single,
type => single or blank,
id => single,
# optional
fields => 'field' or ['field1',...]
preference => '_local' | '_primary' | $string,
refresh => 0 | 1,
routing => $routing,
ignore_missing => 0 | 1,
);
Returns the document stored at "index/type/id" or throws an exception if
the document doesn't exist.
Example:
$es->get( index => 'twitter', type => 'tweet', id => 1)
Returns:
{
_id => 1,
_index => "twitter",
_source => {
message => "trying out Elastic Search",
post_date=> "2009-11-15T14:12:12",
user => "kimchy",
},
_type => "tweet",
}
By default the "_source" field is returned. Use "fields" to specify a
list of (stored) fields to return instead, or "[]" to return no fields.
Pass a true value for "refresh" to force an index refresh before
performing the get.
If the requested "index", "type" or "id" is not found, then a "Missing"
exception is thrown, unless "ignore_missing" is true.
See also: "bulk()",
mget()
$docs = $es->mget(
index => single,
type => single or blank,
ids => \@ids,
fields => ['field_1','field_2'],
filter_missing => 0 | 1
);
$docs = $es->mget(
index => single or blank,
type => single or blank,
docs => \@doc_info,
fields => ['field_1','field_2'],
filter_missing => 0 | 1
);
"mget" or "multi-get" returns multiple documents at once. There are two
ways to call "mget()":
If all docs come from the same index (and potentially the same type):
$docs = $es->mget(
index => 'myindex',
type => 'mytype', # optional
ids => [1,2,3],
)
Alternatively you can specify each doc separately:
$docs = $es->mget(
docs => [
{ _index => 'index_1', _type => 'type_1', _id => 1 },
{ _index => 'index_2', _type => 'type_2', _id => 2 },
]
)
Or:
$docs = $es->mget(
index => 'myindex', # default index
type => 'mytype', # default type
fields => ['field_1','field_2'], # default fields
docs => [
{ _id => 1 }, # uses defaults
{ _index => 'index_2',
_type => 'type_2',
_id => 2,
fields => ['field_2','field_3'],
},
]
);
If $docs or $ids is an empty array ref, then "mget()" will just return
an empty array ref.
Returns an array ref containing all of the documents requested. If a
document is not found, then its entry will include "{exists => 0}". If
you would rather filter these missing docs, pass "filter_missing => 1".
See
delete()
$result = $es->delete(
index => single,
type => single,
id => single,
# optional
consistency => 'quorum' | 'one' | 'all'
ignore_missing => 0 | 1
refresh => 0 | 1
parent => $parent,
routing => $routing,
replication => 'sync' | 'async'
version => int
);
Deletes the document stored at "index/type/id" or throws an "Missing"
exception if the document doesn't exist and "ignore_missing" is not
true.
If you specify a "version" and the current version of the document is
different (or if the document is not found), a "Conflict" error will be
thrown.
If "refresh" is true, an index refresh will be forced after the delete
has completed.
Example:
$es->delete( index => 'twitter', type => 'tweet', id => 1);
See also: "bulk()",
bulk()
$result = $es->bulk( [ actions ] )
$result = $es->bulk(
actions => [ actions ] # required
index => 'foo', # optional
type => 'bar', # optional
consistency => 'quorum' | 'one' | 'all' # optional
refresh => 0 | 1, # optional
replication => 'sync' | 'async', # optional
);
Perform multiple "index", "create" and "delete" actions in a single
request. This is about 10x as fast as performing each action in a
separate request.
Each "action" is a HASH ref with a key indicating the action type
("index", "create" or "delete"), whose value is another HASH ref
containing the associated metadata.
The "index" and "type" parameters can be specified for each individual
action, or inherited from the top level "index" and "type" parameters,
as shown above.
NOTE: "bulk()" also accepts the "_index", "_type", "_id", "_source",
"_parent", "_routing" and "_version" parameters so that you can pass
search results directly to "bulk()".
"index" and "create" actions
{ index => {
index => 'foo',
type => 'bar',
id => 123,
data => { text => 'foo bar'},
# optional
routing => $routing,
parent => $parent,
percolate => $percolate,
timestamp => $timestamp,
ttl => $ttl,
version => $version,
version_type => 'internal' | 'external'
}}
{ create => { ... same options as for 'index' }}
The "index" and "type" parameters, if not specified, are inherited from
the top level bulk request.
"data" can also be a raw JSON encoded string (but ensure that it is
correctly encoded, otherwise you see errors when trying to retrieve it
from ElasticSearch).
actions => [{
index => {
index => 'foo',
type => 'bar',
id => 1,
data => '{"foo":"bar"}'
}
}]
"delete" action
{ delete => {
index => 'foo',
type => 'bar',
id => 123,
# optional
routing => $routing,
parent => $parent,
version => $version,
version_type => 'internal' | 'external'
}}
The "index" and "type" parameters, if not specified, are inherited from
the top level bulk request.
Return values
The "bulk()" method returns a HASH ref containing:
{
actions => [ the list of actions you passed in ],
results => [ the result of each of the actions ],
errors => [ a list of any errors ]
}
The "results" ARRAY ref contains the same values that would be returned
for individiual "index"/"create"/"delete" statements, eg:
results => [
{ create => { _id => 123, _index => "foo", _type => "bar", _version => 1 } },
{ index => { _id => 123, _index => "foo", _type => "bar", _version => 2 } },
{ delete => { _id => 123, _index => "foo", _type => "bar", _version => 3 } },
]
The "errors" key is only present if an error has occured, so you can do:
$results = $es->bulk(\@actions);
if ($results->{errors}) {
# handle errors
}
Each error element contains the "error" message plus the "action" that
triggered the error. Each "result" element will also contain the error
message., eg:
$result = {
actions => [
## NOTE - num is numeric
{ index => { index => 'bar', type => 'bar', id => 123,
data => { num => 123 } } },
## NOTE - num is a string
{ index => { index => 'bar', type => 'bar', id => 123,
data => { num => 'foo bar' } } },
],
errors => [
{
action => {
index => { index => 'bar', type => 'bar', id => 123,
data => { num => 'text foo' } }
},
error => "MapperParsingException[Failed to parse [num]]; ...",
},
],
results => [
{ index => { _id => 123, _index => "bar", _type => "bar", _version => 1 }},
{ index => {
error => "MapperParsingException[Failed to parse [num]];...",
id => 123, index => "bar", type => "bar",
},
},
],
};
See for
more details.
bulk_index(), bulk_create(), bulk_delete()
These are convenience methods which allow you to pass just the metadata,
without the "index", "create" or "index" action for each record.
These methods accept the same parameters as the "bulk()" method, except
that the "actions" parameter is replaced by "docs", eg:
$result = $es->bulk_index( [ docs ] );
$result = $es->bulk_index(
docs => [ docs ], # required
index => 'foo', # optional
type => 'bar', # optional
consistency => 'quorum' | 'one' | 'all' # optional
refresh => 0 | 1, # optional
replication => 'sync' | 'async', # optional
);
For instance:
$es->bulk_index(
index => 'foo',
type => 'bar',
refresh => 1,
docs => [
{ id => 123, data => { text=>'foo'} },
{ id => 124, type => 'baz', data => { text=>'bar'} },
]
);
reindex()
$es->reindex(
source => $scrolled_search,
# optional
bulk_size => 1000,
dest_index => $index,
quiet => 0 | 1,
transform => sub {....},
)
"reindex()" is a utility method which can be used for reindexing data
from one index to another (eg if the mapping has changed), or copying
data from one cluster to another.
Params
* "source" is a required parameter, and should be an instance of
ElasticSearch::ScrolledSearch.
* "dest_index" is the name of the destination index, ie where the docs
are indexed to. If you are indexing your data from one cluster to
another, and you want to use the same index name in your destination
cluster, then you can leave this blank.
* "bulk_size" - the number of docs that will be indexed at a time.
Defaults to 1,000
* Set "quiet" to 1 if you don't want any progress information to be
printed to "STDOUT"
* "transform" should be a sub-ref which will be called for each doc,
allowing you to transform some element of the doc, or to skip the
doc by returning "undef".
Examples:
To copy the ElasticSearch website index locally, you could do:
my $local = ElasticSearch->new(
servers => 'localhost:9200'
);
my $remote = ElasticSearch->new(
servers => 'search.elasticsearch.org:80',
no_refresh => 1
);
my $source = $remote->scrolled_search(
search_type => 'scan',
scroll => '5m'
);
$local->reindex(source=>$source);
To copy one local index to another, make the title upper case, exclude
docs of type "boring", and to preserve the version numbers from the
original index:
my $source = $es->scrolled_search(
index => 'old_index',
search_type => 'scan',
scroll => '5m',
version => 1
);
$es->reindex(
source => $source,
dest_index => 'new_index',
transform => sub {
my $doc = shift;
return if $doc->{_type} eq 'boring';
$doc->{_source}{title} = uc( $doc->{_source}{title} );
return $doc;
}
);
NOTE: If some of your docs have parent/child relationships, and you want
to preserve this relationship, then you should add this to your scrolled
search parameters: "fields => ['_source','_parent']".
For example:
my $source = $es->scrolled_search(
index => 'old_index',
search_type => 'scan',
fields => ['_source','_parent'],
version => 1
);
$es->reindex(
source => $source,
dest_index => 'new_index',
);
See also "scrolled_search()", ElasticSearch::ScrolledSearch, and
"search()".
analyze()
$result = $es->analyze(
text => $text_to_analyze, # required
index => single, # optional
# either
field => 'type.fieldname', # requires index
analyzer => $analyzer,
tokenizer => $tokenizer,
filters => \@filters,
# other options
format => 'detailed' | 'text',
prefer_local => 1 | 0
);
The "analyze()" method allows you to see how ElasticSearch is analyzing
the text that you pass in, eg:
$result = $es->analyze( text => 'The Man' )
$result = $es->analyze(
text => 'The Man',
analyzer => 'simple'
);
$result = $es->analyze(
text => 'The Man',
tokenizer => 'keyword',
filters => ['lowercase'],
);
$result = $es->analyze(
text => 'The Man',
index => 'my_index',
analyzer => 'my_custom_analyzer'
);
$result = $es->analyze(
text => 'The Man',
index => 'my_index',
field => 'my_type.my_field',
);
See
for more.
Query methods
search()
$result = $es->search(
index => multi,
type => multi,
# optional
query => { native query },
queryb => { searchbuilder query },
filter => { native filter },
filterb => { searchbuilder filter },
explain => 1 | 0,
facets => { facets },
fields => [$field_1,$field_n],
partial_fields => { my_field => { include => 'foo.bar.* }},
from => $start_from
highlight => { highlight }
indices_boost => { index_1 => 1.5,... },
min_score => $score,
preference => '_local' | '_primary' | $string,
routing => [$routing, ...]
script_fields => { script_fields }
search_type => 'dfs_query_then_fetch'
| 'dfs_query_and_fetch'
| 'query_then_fetch'
| 'query_and_fetch'
| 'count'
| 'scan'
size => $no_of_results
sort => ['_score',$field_1]
scroll => '5m' | '30s',
stats => ['group_1','group_2'],
track_scores => 0 | 1,
timeout => '10s'
version => 0 | 1
);
Searches for all documents matching the query, with a request-body
search. Documents can be matched against multiple indices and multiple
types, eg:
$result = $es->search(
index => undef, # all
type => ['user','tweet'],
query => { term => {user => 'kimchy' }}
);
You can provide either the "query" parameter, which uses the native
ElasticSearch Query DSL, or the "queryb" parameter, which uses the more
concise ElasticSearch::SearchBuilder query syntax.
Similarly, use "filterb" instead of "filter". SearchBuilder can also be
used in facets, for instance, instead of:
$es->search(
facets => {
wow_facet => {
query => { text => { content => 'wow' }},
facet_filter => { term => {status => 'active' }},
}
}
)
You can use:
$es->search(
facets => {
wow_facet => {
queryb => { content => 'wow' }, # note the extra 'b'
facet_filterb => { status => 'active' }, # note the extra 'b'
}
}
)
See "INTEGRATION WITH ElasticSearch::SearchBuilder" for more.
For all of the options that can be included in the native "query"
parameter, see
,
and
searchqs()
$result = $es->searchqs(
index => multi,
type => multi,
# optional
q => $query_string,
analyze_wildcard => 0 | 1,
analyzer => $analyzer,
default_operator => 'OR | AND ',
df => $default_field,
explain => 1 | 0,
fields => [$field_1,$field_n],
from => $start_from,
lowercase_expanded_terms => 0 | 1,
preference => '_local' | '_primary' | $string,
routing => [$routing, ...]
search_type => $search_type
size => $no_of_results
sort => ['_score:asc','last_modified:desc'],
scroll => '5m' | '30s',
stats => ['group_1','group_2'],
timeout => '10s'
version => 0 | 1
Searches for all documents matching the "q" query_string, with a URI
request. Documents can be matched against multiple indices and multiple
types, eg:
$result = $es->searchqs(
index => undef, # all
type => ['user','tweet'],
q => 'john smith'
);
For all of the options that can be included in the "query" parameter,
see and
.
scroll()
$result = $es->scroll(
scroll_id => $scroll_id,
scroll => '5m' | '30s',
);
If a search has been executed with a "scroll" parameter, then the
returned "scroll_id" can be used like a cursor to scroll through the
rest of the results.
If a further scroll request will be issued, then the "scroll" parameter
should be passed as well. For instance;
my $result = $es->search(
query=>{match_all=>{}},
scroll => '5m'
);
while (1) {
my $hits = $result->{hits}{hits};
last unless @$hits; # if no hits, we're finished
do_something_with($hits);
$result = $es->scroll(
scroll_id => $result->{_scroll_id},
scroll => '5m'
);
}
See
scrolled_search()
"scrolled_search()" returns a convenience iterator for scrolled
searches. It accepts the standard search parameters that would be passed
to "search()" and requires a "scroll" parameter, eg:
$scroller = $es->scrolled_search(
query => {match_all=>{}},
scroll => '5m' # keep the scroll request
# live for 5 minutes
);
See ElasticSearch::ScrolledSearch, "search()", "searchqs()" and
"scroll()".
count()
$result = $es->count(
index => multi,
type => multi,
# optional
routing => [$routing,...]
# one of:
query => { native query },
queryb => { search builder query },
);
Counts the number of documents matching the query. Documents can be
matched against multiple indices and multiple types, eg
$result = $es->count(
index => undef, # all
type => ['user','tweet'],
queryb => { user => 'kimchy' }
);
Note: "count()" supports ElasticSearch::SearchBuilder-style queries via
the "queryb" parameter. See "INTEGRATION WITH
ElasticSearch::SearchBuilder" for more details.
"query" defaults to "{match_all=>{}}" unless specified.
DEPRECATION: "count()" previously took query types at the top level, eg
"$es->count( term=> { ... })". This form still works, but is deprecated.
Instead use the "queryb" or "query" parameter as you would in
"search()".
See also "search()",
and
msearch()
$results = $es->msearch(
index => multi,
type => multi,
queries => \@queries | \%queries
);
With "msearch()" you can run multiple searches in parallel. "queries"
can contain either an array of queries, or a hash of named queries.
$results will return either an array or hash of results, depending on
what you pass in.
The top-level "index" and "type" parameters define default values which
will be used for each query, although these can be overridden in the
query parameters:
$results = $es->msearch(
index => 'my_index',
type => 'my_type',
queries => {
first => {
query => { match_all: {}} # my_index/my_type
},
second => {
index => 'other_index',
query => { match_all: {}} # other_index/my_type
},
}
)
In the above example, $results would look like:
{
first => { hits => ... },
second => { hits => ... }
}
A query can contain the following options:
{
index => 'index_name' | ['index_1',...],
type => 'type_name' | ['type_1',...],
query => { native query },
queryb => { search_builder query },
filter => { native filter },
filterb => { search_builder filter },
facets => { facets },
from => 0,
size => 10,
sort => { sort },
highlight => { highlight },
fields => [ 'field1', ... ],
explain => 0 | 1,
indices_boost => { index_1 => 5, ... },
min_score => 2,
partial_fields => { partial fields },
preference => '_local' | '_primary' | $string,
routing => 'routing' | ['route_1',...],
script_fields => { script fields },
search_type => $search_type,
stats => 'group_1' | ['group_1','group_2'],
timeout => '30s',
track_scores => 0 | 1,
version => 0 | 1,
}
See
.
delete_by_query()
$result = $es->delete_by_query(
index => multi,
type => multi,
# optional
consistency => 'quorum' | 'one' | 'all'
replication => 'sync' | 'async'
routing => [$routing,...]
# one of:
query => { native query },
queryb => { search builder query },
);
Deletes any documents matching the query. Documents can be matched
against multiple indices and multiple types, eg
$result = $es->delete_by_query(
index => undef, # all
type => ['user','tweet'],
queryb => {user => 'kimchy' },
);
Note: "delete_by_query()" supports ElasticSearch::SearchBuilder-style
queries via the "queryb" parameter. See "INTEGRATION WITH
ElasticSearch::SearchBuilder" for more details.
DEPRECATION: "delete_by_query()" previously took query types at the top
level, eg "$es->delete_by_query( term=> { ... })". This form still
works, but is deprecated. Instead use the "queryb" or "query" parameter
as you would in "search()".
See also "search()",
and
mlt()
# mlt == more_like_this
$results = $es->mlt(
index => single, # required
type => single, # required
id => $id, # required
# optional more-like-this params
boost_terms => float
mlt_fields => 'scalar' or ['scalar_1', 'scalar_n']
max_doc_freq => integer
max_query_terms => integer
max_word_len => integer
min_doc_freq => integer
min_term_freq => integer
min_word_len => integer
pct_terms_to_match => float
stop_words => 'scalar' or ['scalar_1', 'scalar_n']
# optional search params
explain => {explain}
facets => {facets}
fields => {fields}
filter => { native filter },
filterb => { search builder filter },
from => {from}
indices_boost => { index_1 => 1.5,... }
min_score => $score
preference => '_local' | '_primary' | $string
routing => [$routing,...]
script_fields => { script_fields }
search_scroll => '5m' | '10s',
search_indices => ['index1','index2],
search_from => integer,
search_size => integer,
search_type => $search_type
search_types => ['type1','type],
size => {size}
sort => {sort}
scroll => '5m' | '30s'
timeout => '10s'
)
More-like-this (mlt) finds related/similar documents. It is possible to
run a search query with a "more_like_this" clause (where you pass in the
text you're trying to match), or to use this method, which uses the text
of the document referred to by "index/type/id".
This gets transformed into a search query, so all of the search
parameters are also available.
Note: "mlt()" supports ElasticSearch::SearchBuilder-style filters via
the "filterb" parameter. See "INTEGRATION WITH
ElasticSearch::SearchBuilder" for more details.
See
and
validate_query()
$bool = $es->validate_query(
index => multi,
type => multi,
query => { native query }
| queryb => { search builder query }
| q => $query_string
);
Returns true if the passed in "query" (native ES query), "queryb"
(SearchBuilder style query) or "q" (Lucene query string) is valid.
Otherwise returns false.
See
Index Admin methods
index_status()
$result = $es->index_status(
index => multi,
recovery => 0 | 1,
snapshot => 0 | 1,
);
Returns the status of $result = $es->index_status(); #all $result =
$es->index_status( index => ['twitter','buzz'] ); $result =
$es->index_status( index => 'twitter' );
Throws a "Missing" exception if the specified indices do not exist.
See
index_stats()
$result = $es->index_stats(
index => multi,
types => multi,
docs => 1|0,
store => 1|0,
indexing => 1|0,
get => 1|0,
all => 0|1, # returns all stats
clear => 0|1, # clears default docs,store,indexing,get,search
flush => 0|1,
merge => 0|1
refresh => 0|1,
level => 'shards'
);
Throws a "Missing" exception if the specified indices do not exist.
See
index_segments()
$result = $es->index_segments(
index => multi,
);
Returns low-level Lucene segments information for the specified indices.
Throws a "Missing" exception if the specified indices do not exist.
See
create_index()
$result = $es->create_index(
index => single,
# optional
settings => {...},
mappings => {...},
);
Creates a new index, optionally passing index settings and mappings, eg:
$result = $es->create_index(
index => 'twitter',
settings => {
number_of_shards => 3,
number_of_replicas => 2,
analysis => {
analyzer => {
default => {
tokenizer => 'standard',
char_filter => ['html_strip'],
filter => [qw(standard lowercase stop asciifolding)],
}
}
}
},
mappings => {
tweet => {
properties => {
user => { type => 'string' },
content => { type => 'string' },
date => { type => 'date' }
}
}
}
);
Throws an exception if the index already exists.
See
delete_index()
$result = $es->delete_index(
index => multi_req,
ignore_missing => 0 | 1 # optional
);
Deletes one or more existing indices, or throws a "Missing" exception if
a specified index doesn't exist and "ignore_missing" is not true:
$result = $es->delete_index( index => 'twitter' );
See
index_exists()
$result = $e->index_exists(
index => multi
);
Returns "{ok => 1}" if all specified indices exist, or an empty list if
it doesn't.
See
index_settings()
$result = $es->index_settings(
index => multi,
);
Returns the current settings for all, one or many indices.
$result = $es->index_settings( index=> ['index_1','index_2'] );
See
update_index_settings()
$result = $es->update_index_settings(
index => multi,
settings => { ... settings ...},
);
Update the settings for all, one or many indices. Currently only the
"number_of_replicas" is exposed:
$result = $es->update_index_settings(
settings => { number_of_replicas => 1 }
);
Throws a "Missing" exception if the specified indices do not exist.
See
aliases()
$result = $es->aliases( actions => [actions] | {actions} )
Adds or removes an alias for an index, eg:
$result = $es->aliases( actions => [
{ remove => { index => 'foo', alias => 'bar' }},
{ add => { index => 'foo', alias => 'baz' }}
]);
"actions" can be a single HASH ref, or an ARRAY ref containing multiple
HASH refs.
Note: "aliases()" supports ElasticSearch::SearchBuilder-style filters
via the "filterb" parameter. See "INTEGRATION WITH
ElasticSearch::SearchBuilder" for more details.
$result = $es->aliases( actions => [
{ add => {
index => 'foo',
alias => 'baz',
index_routing => '1',
search_routing => '1,2',
filterb => { foo => 'bar' }
}}
]);
See
get_aliases()
$result = $es->get_aliases( index => multi )
Returns a hashref listing all indices and their corresponding aliases,
eg:
{
"foo" : {
"aliases" : {
"foo_1" : {
"search_routing" : "1,2",
"index_routing" : "1"
"filter" : {
"term" : {
"foo" : "bar"
}
}
},
"foo_2" : {}
}
}
}
If you pass in the optional "index" argument, which can be an index name
or an alias name, then it will only return the indices related to that
argument.
See
open_index()
$result = $es->open_index( index => single);
Opens a closed index.
The open and close index APIs allow you to close an index, and later on
open it.
A closed index has almost no overhead on the cluster (except for
maintaining its metadata), and is blocked for read/write operations. A
closed index can be opened which will then go through the normal
recovery process.
See
for more
close_index()
$result = $es->close_index( index => single);
Closes an open index. See
for more
create_index_template()
$result = $es->create_index_template(
name => single,
template => $template, # required
mappings => {...}, # optional
settings => {...}, # optional
);
Index templates allow you to define templates that will automatically be
applied to newly created indices. You can specify both "settings" and
"mappings", and a simple pattern "template" that controls whether the
template will be applied to a new index.
For example:
$result = $es->create_index_template(
name => 'my_template',
template => 'small_*',
settings => { number_of_shards => 1 }
);
See
for more.
index_template()
$result = $es->index_template(
name => single
);
Retrieves the named index template.
See
delete_index_template()
$result = $es->delete_index_template(
name => single,
ignore_missing => 0 | 1 # optional
);
Deletes the named index template.
See
flush_index()
$result = $es->flush_index(
index => multi,
full => 0 | 1, # optional
refresh => 0 | 1, # optional
);
Flushes one or more indices, which frees memory from the index by
flushing data to the index storage and clearing the internal transaction
log. By default, ElasticSearch uses memory heuristics in order to
automatically trigger flush operations as required in order to clear
memory.
Example:
$result = $es->flush_index( index => 'twitter' );
Throws a "Missing" exception if the specified indices do not exist.
See
refresh_index()
$result = $es->refresh_index(
index => multi,
);
Explicitly refreshes one or more indices, making all operations
performed since the last refresh available for search. The (near)
real-time capabilities depends on the index engine used. For example,
the robin one requires refresh to be called, but by default a refresh is
scheduled periodically.
Example:
$result = $es->refresh_index( index => 'twitter' );
Throws a "Missing" exception if the specified indices do not exist.
See
optimize_index()
$result = $es->optimize_index(
index => multi,
only_deletes => 0 | 1, # only_expunge_deletes
flush => 0 | 1, # flush after optmization
refresh => 0 | 1, # refresh after optmization
wait_for_merge => 1 | 0, # wait for merge to finish
max_num_segments => int, # number of segments to optimize to
)
Throws a "Missing" exception if the specified indices do not exist.
See
gateway_snapshot()
$result = $es->gateway_snapshot(
index => multi,
);
Explicitly performs a snapshot through the gateway of one or more
indices (backs them up ). By default, each index gateway periodically
snapshot changes, though it can be disabled and be controlled completely
through this API.
Example:
$result = $es->gateway_snapshot( index => 'twitter' );
Throws a "Missing" exception if the specified indices do not exist.
See
and
snapshot_index()
"snapshot_index()" is a synonym for "gateway_snapshot()"
clear_cache()
$result = $es->clear_cache(
index => multi,
bloom => 0 | 1,
field_data => 0 | 1,
filter => 0 | 1,
id => 0 | 1,
fields => 'field1' | ['field1','fieldn',...]
);
Clears the caches for the specified indices. By default, clears all
caches, but if any of "id", "field", "field_data" or "bloom" are true,
then it clears just the specified caches.
Throws a "Missing" exception if the specified indices do not exist.
See
Mapping methods
put_mapping()
$result = $es->put_mapping(
index => multi,
type => single,
mapping => { ... } # required
ignore_conflicts => 0 | 1
);
A "mapping" is the data definition of a "type". If no mapping has been
specified, then ElasticSearch tries to infer the types of each field in
document, by looking at its contents, eg
'foo' => string
123 => integer
1.23 => float
However, these heuristics can be confused, so it safer (and much more
powerful) to specify an official "mapping" instead, eg:
$result = $es->put_mapping(
index => ['twitter','buzz'],
type => 'tweet',
mapping => {
_source => { compress => 1 },
properties => {
user => {type => "string", index => "not_analyzed"},
message => {type => "string", null_value => "na"},
post_date => {type => "date"},
priority => {type => "integer"},
rank => {type => "float"}
}
}
);
See also:
and
DEPRECATION: "put_mapping()" previously took the mapping parameters at
the top level, eg "$es->put_mapping( properties=> { ... })". This form
still works, but is deprecated. Instead use the "mapping" parameter.
delete_mapping()
$result = $es->delete_mapping(
index => multi_req,
type => single,
ignore_missing => 0 | 1,
);
Deletes a mapping/type in one or more indices. See also
Throws a "Missing" exception if the indices or type don't exist and
"ignore_missing" is false.
mapping()
$mapping = $es->mapping(
index => single,
type => multi
);
Returns the mappings for all types in an index, or the mapping for the
specified type(s), eg:
$mapping = $es->mapping(
index => 'twitter',
type => 'tweet'
);
$mappings = $es->mapping(
index => 'twitter',
type => ['tweet','user']
);
# { twitter => { tweet => {mapping}, user => {mapping}} }
Note: the index name which as used in the results is the actual index
name. If you pass an alias name as the "index" name, then this key will
be the index (or indices) that the alias points to.
See also:
River admin methods
See and
.
create_river()
$result = $es->create_river(
river => $river_name, # required
type => $type, # required
$type => {...}, # depends on river type
index => {...}, # depends on river type
);
Creates a new river with name $name, eg:
$result = $es->create_river(
river => 'my_twitter_river',
type => 'twitter',
twitter => {
user => 'user',
password => 'password',
},
index => {
index => 'my_twitter_index',
type => 'status',
bulk_size => 100
}
)
get_river()
$result = $es->get_river(
river => $river_name,
ignore_missing => 0 | 1 # optional
);
Returns the river details eg
$result = $es->get_river ( river => 'my_twitter_river' )
Throws a "Missing" exception if the river doesn't exist and
"ignore_missing" is false.
delete_river()
$result = $es->delete_river( river => $river_name );
Deletes the corresponding river, eg:
$result = $es->delete_river ( river => 'my_twitter_river' )
See .
river_status()
$result = $es->river_status(
river => $river_name,
ignore_missing => 0 | 1 # optional
);
Returns the status doc for the named river.
Throws a "Missing" exception if the river doesn't exist and
"ignore_missing" is false.
Percolate methods
See also:
and
create_percolator()
$es->create_percolator(
index => single
percolator => $percolator
# one of queryb or query is required
query => { native query }
queryb => { search builder query }
# optional
data => {data}
)
Create a percolator, eg:
$es->create_percolator(
index => 'myindex',
percolator => 'mypercolator',
queryb => { field => 'foo' },
data => { color => 'blue' }
)
Note: "create_percolator()" supports ElasticSearch::SearchBuilder-style
queries via the "queryb" parameter. See "INTEGRATION WITH
ElasticSearch::SearchBuilder" for more details.
get_percolator()
$es->get_percolator(
index => single
percolator => $percolator,
ignore_missing => 0 | 1,
)
Retrieves a percolator, eg:
$es->get_percolator(
index => 'myindex',
percolator => 'mypercolator',
)
Throws a "Missing" exception if the specified index or percolator does
not exist, and "ignore_missing" is false.
delete_percolator()
$es->delete_percolator(
index => single
percolator => $percolator,
ignore_missing => 0 | 1,
)
Deletes a percolator, eg:
$es->delete_percolator(
index => 'myindex',
percolator => 'mypercolator',
)
Throws a "Missing" exception if the specified index or percolator does
not exist, and "ignore_missing" is false.
percolate()
$result = $es->percolate(
index => single,
type => single,
doc => { doc to percolate },
# optional
query => { query to filter percolators },
prefer_local => 1 | 0,
)
Check for any percolators which match a document, optionally filtering
which percolators could match by passing a "query" param, for instance:
$result = $es->percolate(
index => 'myindex',
type => 'mytype',
doc => { text => 'foo' },
query => { term => { color => 'blue' }}
);
Returns:
{
ok => 1,
matches => ['mypercolator']
}
Cluster admin methods
cluster_state()
$result = $es->cluster_state(
# optional
filter_blocks => 0 | 1,
filter_nodes => 0 | 1,
filter_metadata => 0 | 1,
filter_routing_table => 0 | 1,
filter_indices => [ 'index_1', ... 'index_n' ],
);
Returns cluster state information.
See
cluster_health()
$result = $es->cluster_health(
index => multi,
level => 'cluster' | 'indices' | 'shards',
timeout => $seconds
wait_for_status => 'red' | 'yellow' | 'green',
| wait_for_relocating_shards => $number_of_shards,
| wait_for_nodes => eg '>=2',
);
Returns the status of the cluster, or index|indices or shards, where the
returned status means:
"red": Data not allocated
"yellow": Primary shard allocated
"green": All shards allocated
It can block to wait for a particular status (or better), or can block
to wait until the specified number of shards have been relocated (where
0 means all) or the specified number of nodes have been allocated.
If waiting, then a timeout can be specified.
For example:
$result = $es->cluster_health( wait_for_status => 'green', timeout => '10s')
See:
cluster_settings()
$result = $es->cluster_settings()
Returns any cluster wide settings that have been set with
"update_cluster_settings".
See
update_cluster_settings()
$result = $es->update_cluster_settings(
persistent => {...},
transient => {...},
)
For example:
$result = $es->update_cluster_settings(
persistent => {
"discovery.zen.minimum_master_nodes" => 2
},
)
"persistent" settings will survive a full cluster restart. "transient"
settings won't.
See
nodes()
$result = $es->nodes(
nodes => multi,
settings => 0 | 1,
http => 0 | 1,
jvm => 0 | 1,
network => 0 | 1,
os => 0 | 1,
process => 0 | 1,
thread_pool => 0 | 1,
transport => 0 | 1
);
Returns information about one or more nodes or servers in the cluster.
See:
nodes_stats()
$result = $es->nodes_stats(
node => multi,
indices => 1 | 0,
clear => 0 | 1,
all => 0 | 1,
fs => 0 | 1,
http => 0 | 1,
jvm => 0 | 1,
network => 0 | 1,
os => 0 | 1,
process => 0 | 1,
thread_pool => 0 | 1,
transport => 0 | 1,
);
Returns various statistics about one or more nodes in the cluster.
See:
shutdown()
$result = $es->shutdown(
node => multi,
delay => '5s' | '10m' # optional
);
Shuts down one or more nodes (or the whole cluster if no nodes
specified), optionally with a delay.
"node" can also have the values "_local", "_master" or "_all".
See:
restart()
$result = $es->restart(
node => multi,
delay => '5s' | '10m' # optional
);
Restarts one or more nodes (or the whole cluster if no nodes specified),
optionally with a delay.
"node" can also have the values "_local", "_master" or "_all".
See: "KNOWN ISSUES"
current_server_version()
$version = $es->current_server_version()
Returns a HASH containing the version "number" string, the build "date"
and whether or not the current server is a "snapshot_build".
Other methods
use_index()/use_type()
"use_index()" and "use_type()" can be used to set default values for any
"index" or "type" parameter. The default value can be overridden by
passing a parameter (including "undef") to any request.
$es->use_index('one');
$es->use_type(['foo','bar']);
$es->index( # index: one, types: foo,bar
data=>{ text => 'my text' }
);
$es->index( # index: two, type: foo,bar
index=>'two',
data=>{ text => 'my text' }
)
$es->search( type => undef ); # index: one, type: all
trace_calls()
$es->trace_calls(1); # log to STDERR
$es->trace_calls($filename); # log to $filename.$PID
$es->trace_calls(\*STDOUT); # log to STDOUT
$es->trace_calls($fh); # log to given filehandle
$es->trace_calls(0 | undef); # disable logging
"trace_calls()" is used for debugging. All requests to the cluster are
logged either to "STDERR", or the specified filehandle, or the specified
filename, with the current $PID appended, in a form that can be rerun
with curl.
The cluster response will also be logged, and commented out.
Example: "$es->cluster_health" is logged as:
# [Tue Oct 19 15:32:31 2010] Protocol: http, Server: 127.0.0.1:9200
curl -XGET 'http://127.0.0.1:9200/_cluster/health'
# [Tue Oct 19 15:32:31 2010] Response:
# {
# "relocating_shards" : 0,
# "active_shards" : 0,
# "status" : "green",
# "cluster_name" : "elasticsearch",
# "active_primary_shards" : 0,
# "timed_out" : false,
# "initializing_shards" : 0,
# "number_of_nodes" : 1,
# "unassigned_shards" : 0
# }
query_parser()
$qp = $es->query_parser(%opts);
Returns an ElasticSearch::QueryParser object for tidying up query
strings so that they won't cause an error when passed to ElasticSearch.
See ElasticSearch::QueryParser for more information.
transport()
$transport = $es->transport
Returns the Transport object, eg ElasticSearch::Transport::HTTP.
timeout()
$timeout = $es->timeout($timeout)
Convenience method which does the same as:
$es->transport->timeout($timeout)
refresh_servers()
$es->refresh_servers()
Convenience method which does the same as:
$es->transport->refresh_servers()
This tries to retrieve a list of all known live servers in the
ElasticSearch cluster by connecting to each of the last known live
servers (and the initial list of servers passed to "new()") until it
succeeds.
This list of live servers is then used in a round-robin fashion.
"refresh_servers()" is called on the first request and every
"max_requests". This automatic refresh can be disabled by setting
"max_requests" to 0:
$es->transport->max_requests(0)
Or:
$es = ElasticSearch->new(
servers => '127.0.0.1:9200',
max_requests => 0,
);
builder_class() | builder()
The "builder_class" is set to ElasticSearch::SearchBuilder by default.
This can be changed, eg:
$es = ElasticSearch->new(
servers => '127.0.0.1:9200',
builder_class => 'My::Builder'
);
"builder()" will "require" the module set in "builder_class()", create
an instance, and store that instance for future use. The "builder_class"
should implement the "filter()" and "query()" methods.
camel_case()
$bool = $es->camel_case($bool)
Gets/sets the camel_case flag. If true, then all JSON keys returned by
ElasticSearch are in camelCase, instead of with_underscores. This flag
does not apply to the source document being indexed or fetched.
Defaults to false.
error_trace()
$bool = $es->error_trace($bool)
If the ElasticSearch server is returning an error, setting "error_trace"
to true will return some internal information about where the error
originates. Mostly useful for debugging.
GLOBAL VARIABLES
$Elasticsearch::DEBUG = 0 | 1;
If $Elasticsearch::DEBUG is set to true, then ElasticSearch exceptions
will include a stack trace.
AUTHOR
Clinton Gormley, ""
KNOWN ISSUES
"get()"
The "_source" key that is returned from a "get()" contains the
original JSON string that was used to index the document initially.
ElasticSearch parses JSON more leniently than JSON::XS, so if
invalid JSON is used to index the document (eg unquoted keys) then
"$es->get(....)" will fail with a JSON exception.
Any documents indexed via this module will be not susceptible to
this problem.
"restart()"
"restart()" is currently disabled in ElasticSearch as it doesn't
work correctly. Instead you can "shutdown()" one or all nodes and
then start them up from the command line.
BUGS
This is a beta module, so there will be bugs, and the API is likely to
change in the future, as the API of ElasticSearch itself changes.
If you have any suggestions for improvements, or find any bugs, please
report them to
. I will be
notified, and then you'll automatically be notified of progress on your
bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc ElasticSearch
You can also look for information at:
* GitHub
* CPAN Ratings
* Search MetaCPAN
ACKNOWLEDGEMENTS
Thanks to Shay Bannon, the ElasticSearch author, for producing an
amazingly easy to use search engine.
LICENSE AND COPYRIGHT
Copyright 2010 - 2011 Clinton Gormley.
This program is free software; you can redistribute it and/or modify it
under the terms of either: the GNU General Public License as published
by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.