GemFire functions with Java 8, Nashorn and Groovy

GemFire functions offers a powerful and flexible way to send distributed work to multiple servers, where this work can be data-dependent as smart units of work that act in parallel on a given region or parallel on all available members of the system. This work can even be filtered to only work on a set of keys or only a sub-set of specified members, which can be really convenient according to the use case being implemented. For example:

✓ If you have some kind of external resource provisioning or initialization of a third-party services (as a Linux service for example) you may implement that wrapped into a GemFire function and distribute command on a sub-set of members.
✓ If you have a partitioned data set that you require to perform an aggregation or any kind of data processing, you can implement a data-dependent GemFire function.

These functions can leverage high availability (HA) features such when a failure occurs GemFire will automatically retry, as set on com.gemstone.gemfire.cache.client.Pool.retryAttempts, can fail-over to a different node, can collect partial execution results on the clients and many other options.

Function execution scheme

Another important feature of GemFire functions is the remote deployment capability that can be performed through gfsh -deploy command, specifying the JAR file to be deployed containing the function code and the target, a single server, multiple servers (groups) or an entire cluster. This is a powerful solution that allows developers to add new functionalities dynamically to an already running system that may have hundreds nodes in convenient way. On this post I’m going to create some functions that will be deployed to multiple servers (JVMs) and executed remotely from a client application. Getting a ride on the JDK8 train I’m going to leverage JDK 8 Nashorn and implement a GemFire function using JavaScript.

JDK 8 + Nashorn

JDK 8 is finally out (March/18/2014) after a late train, secure train and holding the train, the train arrived. Lambdas are of course one of the hottest things in town, but another great feature of JDK 8 is Nashorn. Nashorn is a JavaScript engine created by Oracle, just like V8 (https://code.google.com/p/v8/) from Google, but running directly on the JVM. The project was public announced 2011, becoming open source on 2012, as part of OpenJDK. Since JDK 6+ there is a built-in JavaScript engine in the Oracle JVM, based on Rhino (Mozilla) developed as part of JSR 223. All these efforts are related to the Da Vinci Machine (JSR 292) which aims to make the JVM support dynamic languages (invokedynamic) such as Groovy, Jython, JRuby. The greatest advantages of Nashorn are focus on speed and usage of newer technologies and specs that was not available on the JVM when Rhino was created.

Requirements:

  • JDK 8 (for Nashorn function)
  • Pivotal GemFire 7.+
  • Any IDE with JDK 8 support (IntelliJ, NetBeans or Eclipse)
  • Gradle (optional)

DISCLAIMER: JDK 8 is not officially supported on GemFire 7 yet and this article is only an experiment of what’s possible to implement.

If you do not want to follow the step-by-step or are already familiar with GemFire functions just clone the git repository, build and play with code.

git clone https://github.com/markito/gemfire-functions-sample/

Once you have cloned the project you should have the following structure:

├── build
│   ├── classes
│   ├── dependency-cache
│   ├── libs
│   ├── resources
│   └── tmp
├── build.gradle
├── gradle
│   └── wrapper
├── gradle.properties
├── gradlew
├── gradlew.bat
├── out
│   └── production
├── servers
│   ├── cache.xml
│   ├── locator1
│   ├── server1
│   ├── server2
│   ├── setEnv.sh
│   ├── startServers.sh
│   └── stopServers.sh
├── settings.gradle
└── src
└── main

Edit gradle.properties and modify the gemfireHome variable to point to your GemFire installation directory.

gemfireHome=/opt/gemfire/install/Pivotal_GemFire_702_b45797

Run the build command. This may take some time in the first execution since it’s going to download the dependencies.

./gradlew build

Now let’s move into the servers folder and update the setEnv.sh script in order to set the GEMFIRE_HOME variable to your current GemFire installation. In my machine this script looks like the following:

export GEMFIRE_HOME=/opt/gemfire/install/Pivotal_GemFire_702_b45797/

Now you can manage the servers using the shell scripts provided in this folder (startServers.sh/stopServers.sh) which will start one locator and two GemFire servers (data nodes). Start the servers.

$./startServers.sh

The cache.xml file has a single partitioned region named exampleRegion, using the bundled cache.xml sample of GemFire installation.

<?xml version="1.0"?>
<!DOCTYPE cache PUBLIC
    "-//GemStone Systems, Inc.//GemFire Declarative Caching 7.0//EN"
    "http://www.gemstone.com/dtd/cache7_0.dtd">

<cache>
        <region name="exampleRegion">
                <region-attributes refid="PARTITION">
                </region-attributes>
        </region>
</cache>

The environment is ready to go. I’m going to continue the next steps using IntelliJ IDE but they could easily be adapted to for any other common Java IDE.

Creating Functions

Assuming you’re new to GemFire functions, we’re going to run a very simple function that will run on every node and print a message, really just an example. Create a new Java class named HelloFunction.java and extends com.gemstone.gemfire.cache.execute.FunctionAdapter. Implement the required methods execute() and getId() and let’s just print a message in the server logs. It’s important to give a meaningful ID for the function since it’s how you’re going to call it on the system. By default GemFire functions expects results and since we’re not dealing with data and not resulting anything we’re going to change that setting by implementing hasResult() and isHA() methods and returning false.

@Override
public void execute(FunctionContext functionContext) {
System.out.println("Hello, I'm running here");
}

@Override
public String getId() {
return HelloFunction.class.getCanonicalName();
}

@Override
public boolean hasResult() {
return false;
}

@Override
public boolean isHA() {
return false;
}

Note: An alternative to implementing hasResult() and isHA() is to just return an empty result or a status from the execute() method with functionContext.getResultSender().lastResult(0) for example; Of course that by doing this your clients will wait for function execution to complete, so it may be a better approach to set hasResult() to false.

Now the usual process would be to generate a jar file from this application and deploy it to a running GemFire cluster using gfsh. But I’ve have created a Gradle task to simplify even more this process and run it directly after the build or from an IDE for example. Assuming you’re following the article and have GemFire servers running and cloned FunctionSamples project, just run deployJar Gradle task.

$./gradlew deployJar
Creating properties on demand (a.k.a. dynamic properties) has been deprecated and is scheduled to be removed in Gradle 2.0. Please read http://gradle.org/docs/current/dsl/org.gradle.api.plugins.ExtraPropertiesExtension.html for information on the replacement for dynamic properties.
Deprecated dynamic property: "transitive" on "root project 'FunctionSamples'", value: "true".
Deprecated dynamic property: "libDir" on "task ':deployJar'", value: "build/libs/FunctionSam...".
:compileJava
:compileGroovy
:processResources UP-TO-DATE
:classes
:jar
:deployJar

(1) Executing -  connect

Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=anakin.local, port=1099] ..
Successfully connected to: [host=anakin.local, port=1099]

(2) Executing -  deploy --jar=build/libs/FunctionSamples-1.0.jar

Member  |      Deployed JAR       | Deployed JAR Location
------- | ----------------------- | --------------------------------------------------------------------------------------------------------------------
server1 | FunctionSamples-1.0.jar | /Users/markito/Projects/Pivotal/workspaces/articles/FunctionSamples/servers/server1/vf.gf#FunctionSamples-1.0.jar#1
server2 | FunctionSamples-1.0.jar | /Users/markito/Projects/Pivotal/workspaces/articles/FunctionSamples/servers/server2/vf.gf#FunctionSamples-1.0.jar#1

stty: stdin isn't a terminal

BUILD SUCCESSFUL

Total time: 15.56 secs

First it obviously compiled the new class and generated the jar file, then executing the deployJar task it connected into the running locator (the GemFire equivalent of a namenode/load-balancer) and discovered the other two members of the system. After that it just copied jar file into each server folder, assigning a versioning number, so you can see check if a server has a problem during this process and identify which servers has older versions. Now let’s connect to the server and call the function from gfsh command line utility, which is a great tool for such tests. If you don’t have gfsh on your PATH, just use the provided setEnv.sh and then proceed by connecting ( connect ) to the system and list all available functions (list functions).

(markito@anakin)$ . ./servers/setEnv.sh
Gemfire environment set...
(markito@anakin)$ gfsh
_________________________     __
/ _____/ ______/ ______/ /____/ /
/ /  __/ /___  /_____  / _____  /
/ /__/ / ____/  _____/ / /    / /
/______/_/      /______/_/    /_/    v7.0.2

Monitor and Manage GemFire
gfsh>connect
Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=anakin.local, port=1099] ..
Successfully connected to: [host=anakin.local, port=1099]

gfsh>list functions
Member  | Function
------- | -------------------------------------------------------------------
server1 | NashornFunction
server1 | com.pivotal.gemfire.samples.functions.ExternalScriptFunctionAdapter
server1 | com.pivotal.gemfire.samples.functions.HelloFunction
server1 | com.pivotal.gemfire.samples.functions.SimpleFunction
server1 | com.pivotal.gemfire.samples.functions.SimpleGroovyFunction
server2 | NashornFunction
server2 | com.pivotal.gemfire.samples.functions.ExternalScriptFunctionAdapter
server2 | com.pivotal.gemfire.samples.functions.HelloFunction
server2 | com.pivotal.gemfire.samples.functions.SimpleFunction
server2 | com.pivotal.gemfire.samples.functions.SimpleGroovyFunction

Here you can see all functions available on our project, including the ones I’m going to cover later on the article using Nashorn and Groovy (a bonus). In order to execute the function from gfsh we need to, guess what, call the execute function command passing the function id and depending on the function type you need to give it a target (region, group or member). Since this current function will only print a message it can be called on every member.

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.HelloFunction
Execution summary

Member ID/Name         | Function Execution Result
------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
anakin(server2:7723):51377 | While executing function : com.pivotal.gemfire.samples.functions.HelloFunction on member : anakin(server2:7723):51377 error occured : Cannot return any result as the Function#hasResult() is false
anakin(server1:7711):9790  | While executing function : com.pivotal.gemfire.samples.functions.HelloFunction on member : anakin(server1:7711):9790 error occured : Cannot return any result as the Function#hasResult() is false

By default the gfsh client expects some result from the function and will complain that the function hasResult() method is false, which is fine. You can now verify both server1 and server2 logs for the printed “Hello” message.

==> server1/server1.log <==
Hello, I'm running here
==> server2/server2.log <==
Hello, I'm running here

Or alternatively you can check server logs from gfsh as well using the show log command.

gfsh>show log --member=server2 --lines=3
SystemLog:
Hello, I'm running here
Hello, I'm running here
Hello, I'm running here

Use case: Data clean up

So far so good, now let’s do something useful with functions implementing a use case. In the project there is a com.pivotal.gemfire.samples.loader.LoadData class that can be executed to produce some entries in the /exampleRegion, run this class before proceeding. It’s a simple GemFire client that connect to the system and put some Customer objects that we’re generating with some fake data for testing. The Customer class has only 4 fields, ID, NAME, E-MAIL and CREDIT CARD NUMBER and for testing purposes it’s generating some customers with invalid e-mails and some customers with invalid credit cards so we can implement GemFire functions for simple data clean up, one of the 21st century biggest problems, right ? And to make things more interesting there are versions of the same function using Java 8 syntax, Groovy and Javascript (through Java 8 Nashorn).

Java Function

SimpleFunction.java has credit card validation and set invalid ones to empty String (“”). Also, since the region is partitioned and this is a data-dependent function we need to give it a region to work which can be specified on gfsh with —region=/regionName

…
public void execute(FunctionContext functionContext) {

RegionFunctionContext rfc = (RegionFunctionContext) functionContext;
Region<Object, Customer> region = PartitionRegionHelper.getLocalDataForContext(rfc);

// check every credit card and clear invalid ones
region.forEach((Object id, Customer customer) -> {
if (!RandomCreditCardGenerator.isValidCreditCardNumber(customer.getCcNumber())) {
customer.setCcNumber("");
System.out.println(String.format("Customer %s has an invalid credit card.", id));
region.put(id, customer);
}
});

rfc.getResultSender().lastResult("Done.");
}
…

Call the function from gfsh:

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.SimpleFunction --region=/exampleRegion

Check the both cache server logs and note that the function is being executed on both nodes and data updates are happening locally.

==> server1/server1.log <==
Customer 405 has an invalid credit card.
Customer 123 has an invalid credit card.
Customer 407 has an invalid credit card.
Customer 13 has an invalid credit card.
……
==> server2/server2.log <==
Customer 81 has an invalid credit card.
Customer 391 has an invalid credit card.
Customer 201 has an invalid credit card.
……

Note that calling the function again will not produce these messages since there are no invalid credit cards anymore.

Nashorn Function

SimpleNashornFunction.java was implemented from the Java side pretty much as a wrapper of the JavaScript file where the actual business logic is. The execute() method is performing an e-mail validation on the customer objects, using traditional e-mail validations for JavaScript available everywhere on internet, then cleaning up invalid ones. Let’s look at some code:

public SimpleNashornFunction() throws ScriptException, FileNotFoundException, UnsupportedEncodingException {
engineManager = new ScriptEngineManager();
engine = engineManager.getEngineByName("nashorn");

InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(jsFile);
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "UTF-8"));

engine.eval(reader);

Invocable invocable = (Invocable) engine;
_function = invocable.getInterface(com.gemstone.gemfire.cache.execute.Function.class);
}
...

I’m assuming that the JavaScript file is an acceptable implementation of GemFire Function interface then assigning it to a shadow object function object which will be used on other methods as follows:

...
@Override
public void execute(FunctionContext fc) {
_function.execute(fc);
}

@Override
public String getId() {
return _function.getId();
}

Now on the JavaScript side there is a NashornFunction.js file under resources folder which has functions as getId() , execute() or hasResult(), the methods required by GemFire Function interface. Here is the getID() and execute() methods in JavaScript:

function getId() {
return "NashornFunction";
}

function execute(context) {
var PartitionHelper = Java.type("com.gemstone.gemfire.cache.partition.PartitionRegionHelper");
counter = 0;
var region = PartitionHelper.getLocalDataForContext(context);
region.forEach(function (id,customer) {
//context.getResultSender().sendResult("Processing " + id);
if ( (customer.email.length > 0) && (!isEmailValid(customer.email)) ) {
print("Customer " + customer.name + " has an invalid e-mail");
customer.email = "";
region.put(id,customer);
counter++;
}
});
context.getResultSender().lastResult("Done. " + counter + " changed objects");
}

The commented line can be used if you want to receive partial results from the function execution, in this case, the IDs already processed as soon as they’re processed. Other than that the code is very similar to the Java 8 version of credit card validation. Very simple, huh? With the jar deployed you can call it from gfsh as follows:

gfsh>execute function --id=NashornFunction --region=/exampleRegion
Execution summary

Member ID/Name          | Function Execution Result
------------------------------- | ----------------------------------------------------
anakin(server2:11844):18390 | Done. 124 changed objects
Done. 126 changed objects

Note that you have returned the number of changed objects on each JVM. Then you may want to check the server logs (server1.log and server2.log) for processing information:

==> server1/server1.log <==
Customer John80 has an invalid e-mail
Customer John390 has an invalid e-mail
Customer John278 has an invalid e-mail
Customer John82 has an invalid e-mail
Customer John392 has an invalid e-mail
Customer John200 has an invalid e-mail
…

This wrapper approach is looking for the NashornFunction.js on the classpath but since Nashorn is actually parsing the JavaScript dynamically, why not let the JavaScript file out of the jar package so you can update the file and run a new version or even a new function code without compilation and deployment to the servers ? Of course that the file must be available for the JVMs, but that’s simple to solve on real world scenario through SAN/NAS, etc… That’s exactly what ExternalScriptFunction does by receiving two parameters, the location of the JavaScript file and the function you want to call on that file, it will execute your JavaScript file inside GemFire JVMs, which is a powerful combination to have distributed JavaScript execution on the server side, collocated with data. Let’s take look at some code:

@Override
public void execute(FunctionContext fc) {

ScriptEngineManager engineManager = new ScriptEngineManager();
ScriptEngine engine = engineManager.getEngineByName("nashorn");

if ((fc.getArguments() != null)) {
// full path to javascript file
final String jsFile = ((String[]) fc.getArguments())[0];      // javascript file
final String method = ((String[]) fc.getArguments())[1];     // method to be called

try {
engine.eval(new FileReader(jsFile));

Invocable invocable = (Invocable) engine;
RegionFunctionContext rfc = (RegionFunctionContext) fc;

// call execute function on javaScript side
invocable.invokeFunction(method, rfc);

} catch (FileNotFoundException | ScriptException | NoSuchMethodException ex) {
Logger.getLogger(ExternalScriptFunction.class.getName()).log(Level.SEVERE, null, ex);

}
}
}

Very straightforward implementation on Java side and from Javascript you don’t actually need to implement all methods required by GemFire Function interface since the Java side is only calling one specific function there anyway. Now calling it from gfsh very simple too, just pass arguments to the GemFire function:

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.ExternalScriptFunction --arguments="/Users/markito/Projects/Pivotal/workspaces/articles/FunctionSamples/src/main/resources/NashornFunction.js","execute" --region=/exampleRegion

Remember to run LoadData again so you have some invalid data for testing. Here I’m calling the exact same code we called before by passing “execute” as the name of function to execute on the JavaScript file, performing e-mail validation again. Let’s now call another method of this JavaScript file:

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.ExternalScriptFunction --arguments="/Users/markito/Projects/Pivotal/workspaces/articles/FunctionSamples/src/main/resources/NashornFunction.js","validateCards" --region=/exampleRegion
…

And if you check the both server logs you can see that we’re performing the credit card validation as expected.

==> server2/server2.log <==
Customer Mary125 has an invalid credit card:48720386601453991
Customer Mary325 has an invalid credit card:49297365787823271
Customer Mary163 has an invalid credit card:46489340791567091
…
==> server1/server1.log <==
Customer Mary233 has an invalid credit card:46864508028522361
Customer Mary119 has an invalid credit card:49160166021681981
Customer Mary237 has an invalid credit card:44850318956088501
…

Using Java (SimpleFunction.java) or even the bundled version of NashornFunction.js file (SimpleNashornFunction.java), if you want to change anything on the code you would have to compile, deploy and run, but since we’re loading an external file here you can modify the NashornFunction.js file and just call the function from gfsh. The current validateCards() on NashornFunction.js is not cleaning invalid cards, this code is commented out, so let’s remove those comments and call it again from gfsh.

...
function validateCards(context) {
var PartitionHelper = Java.type("com.gemstone.gemfire.cache.partition.PartitionRegionHelper");
counter = 0;
var region = PartitionHelper.getLocalDataForContext(context);
region.forEach(function (id,customer) {
//context.getResultSender().sendResult("Processing " + id);
if ( (customer.ccNumber.length > 0) && (!Mod10(customer.ccNumber)) ) {
print("Customer " + customer.name + " has an invalid credit card:" + customer.ccNumber);
customer.ccNumber = ""; // uncomment
region.put(id, customer); // uncomment
counter++; // uncomment
}
});
context.getResultSender().lastResult("Done. " + counter + " changed objects");
}

Then call the function again from gfsh and check the server logs. Note that the 1st execution will return the number of changed objects, but next calls will just return 0.

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.ExternalScriptFunction --arguments="/Users/markito/Projects/Pivotal/workspaces/articles/FunctionSamples/src/main/resources/NashornFunction.js","validateCards" --region=/exampleRegion

Bonus: Groovy Function

Groovy is a 1st class citizen on the JVM for a very long time and there is no surprise in the fact that you can implement GemFire Functions in pure Groovy. The only requirement here is to obviously have Groovy library on GemFire server classpath. The Groovy class implementation can extend GemFire FunctionAdapter implementing the required methods exactly the same way you would do in pure Java. One may ask, “Why include a Groovy example if it’s that similar to the Java implementation ?” The answer is simple, remember that GemFire is not yet certified to run on Java 8 production environments, so if you do want leverage the power of lambdas and closures in GemFire functions, Groovy provides a nice and clean alternative that works on Java 6 and Java 7 and that some people may still prefer to use over Java 8. There are also some advantages of the Groovy syntax being simpler and provide other features…

void execute(FunctionContext functionContext) {
RegionFunctionContext rfc = (RegionFunctionContext) functionContext;
Region<Object,Object> region = PartitionRegionHelper.getLocalDataForContext(rfc);

// check every credit card and clear invalid ones
region.collect({ id, customer ->
if (!creditCadGen.isValidCreditCardNumber(customer.ccNumber)) {
customer.ccNumber = ""
region.put(id, customer);
rfc.getResultSender().sendResult("Customer $id modified");
println("Customer $id has an invalid credit card.");

}
});
rfc.getResultSender().lastResult("Done.");
}

And if you are following the post until here you should already know how to call the Groovy version using gfsh:

gfsh>execute function --id=com.pivotal.gemfire.samples.functions.SimpleGroovyFunction --region=/exampleRegion
Execution summary

Member ID/Name          | Function Execution Result
------------------------------- | ----------------------------------------------------
anakin(server2:11844):18390 | Done. 107 changed objects
Done. 115 changed objects

Conclusion

✓ GemFire functions offers a very flexible mechanism to run distributed code on multiple JVMs and leverage data locality in order to improve data processing.
✓ Java 8 syntax with Lambdas and other enhancements on the Java collections can really save some time to implement data processing.
✓ Java 8 Nashorn is really simple to use and a fast JavaScript implementation that allows developers to leverage existing JavaScript knowledge and code to run on the JVM and mix and match Java objects and JavaScript syntax.
✓ Groovy is still a very powerful alternative for people that is looking for Lambdas and Closures on Java 6, Java 7 and still on Java 8. People looking to remove some boilerplate that Java requires will also benefit from Groovy syntax without sacrificing performance.
✓ GemFire remote deployment for functions offers a powerful tool for developers that want to add functionality to a running system without having to restart or worry about package distribution and versioning.

There are still a bunch of areas to explore using lambdas and parallel processing features of Java 8 that could be used on GemFire functions but that may be something for the next post…

References

2 comments

  1. Pingback: GemFire with Java 8 | Jonas Dias

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s