Thursday, December 22, 2011

Using asynchronous method calls to mine gold faster.

Disclaimer:
First of all let me say, that we are not going to mine any real gold here. However, gold mining seemed as a good example, since the wins through asynchronous communication can be as valuable.
Second, code examples in this post rely on a specific programming language (java) and a specific framework (will be named at the end) but can well be used with any other language or framework. Talking about frameworks, asynchronous communication is all but trivial and its of great value to have a synchronously programmable framework which hides the details from the developer.

Lets go back a century or two and dig for some gold. For that we have developed a special service which offers two methods:

@DistributeMe(asynchSupport=true, asynchCallTimeout=2500)
public interface GoldMinerService extends Service{
/**
* Searches a random location for gold. Can last up to 10 seconds. 
* Returns if anything was found.
* @return
*/
boolean searchForGold();
/**
* Washes gold for a given duration. Returns the amount of washed clumps.
* @param duration
* @return
*/
int washGold(long duration);
}

You probably noticed the @DistributeMe annotation on the top of the class. More on it later.


The first method searches for gold at a location. When we call it, it picks a random location and starts digging. It digs for several seconds (1 to 10) making a meter per second until it finds something. And than this something can be gold or clay. Two gold miners are using this service, the SynchGoldSearcher and the AsynchGoldSearcher. Both have 60 seconds to find possible locations for a gold mine.
The SynchGoldSearcher is pretty straight forward, it just digs until it hits something. Just like the classical service call does. So please allow me to introduce Player1:


public class SynchGoldSearcher {
public static void main(String[] args) throws Exception{
GoldMinerService service = ServiceLocator.getRemote(GoldMinerService.class);
long start = System.currentTimeMillis();
int searchTime = 60;
System.out.println("Searching for gold for "+searchTime+" seconds");
long endTime = start + 1000L*searchTime;
long now;
int foundGold = 0;
int attempts = 0;
while ((now = System.currentTimeMillis())<endTime){
System.out.println("Attempt "+(++attempts));
if (service.searchForGold()){
foundGold++;
System.out.println("Found gold!");
}else{
System.out.println("Nothing here...");
}
}
System.out.println("Found "+foundGold+" gold in "+attempts+" attempts and "+(now - start)/1000+" seconds.");
}
}

There is no real magic in the above code, so I'll skip the comments. Now the second player looks pretty similar except for a small detail:


public class AsynchGoldSearcher {
public static void main(String[] args) throws Exception{
GoldMinerService service = ServiceLocator.getAsynchRemote(GoldMinerService.class);
long start = System.currentTimeMillis();
int searchTime = 60;
System.out.println("Searching for gold for "+searchTime+" seconds");
long endTime = start + 1000L*searchTime;
long now;
int foundGold = 0;
int attempts = 0;
while ((now = System.currentTimeMillis())<endTime){
System.out.println("Attempt "+(++attempts));
try{
if (service.searchForGold()){
foundGold++;
System.out.println("Found gold!");
}else{
System.out.println("Nothing here...");
}
}catch(CallTimeoutedException timeoutException){
System.out.println("too deep, aborted...");
}
}
System.out.println("Found "+foundGold+" gold in "+attempts+" attempts and "+(now - start)/1000+" seconds.");
((AsynchStub)service).shutdown();
}
}

First it asks for an AsynchRemote instead of a 'normal' Remote. Second, a CallTimeoutedException comes into play.
Remember the annotation @DistributeMe on top of the class? It belong to the distributeme framework and tells it to generate RMI code for my interface. Among other stuff it tells the framework that I want to have an asynchronous stub and that the default timeout for calls over this stub are 2500 ms. This means that any call that lasts longer than 2500 ms will be aborted and a CallTimeoutedException will be thrown.
Now lets start both searchers:
The synchronous searcher provides this, shortened output:

Searching for gold for 60 seconds
Attempt 1
Nothing here...
Attempt 2
Nothing here...
...

Attempt 7
Found gold!
...

Attempt 10
Nothing here...
Found 1 gold in 10 attempts and 65 seconds.



The asynchronous searcher provides this, also shortened output:

Searching for gold for 60 seconds
Attempt 1
too deep, aborted...
Attempt 2
Nothing here...
Attempt 3
too deep, aborted...
Attempt 4
too deep, aborted...
Attempt 5
Found gold!
Attempt 6
too deep, aborted...
Attempt 7
Found gold!
...
Attempt 28
too deep, aborted...
Found 6 gold in 28 attempts and 62 seconds.


The asynchronous approach seems to have been more successful? I admit, I run the test multiple times to achieve the results I wanted, but the number of found mining locations are not that important to the showcase as the number of the attempts. The client doesn't really know, how long the server will need to answer the request. In fact there are situation when an RMI call will hang nearly forever on the server side, and you'll be unable to do anything about it. With asynchronous approach the calling thread is detached from the call and from the server side processing, therefore allowing you (or the framework) interrupt it at any time and return.

When can it be useful? There are many scenarios, but the most important is the quality of service to the user. With this approach you can guarantee that the site will be responsive and answer within acceptable limits (even if some times this answer will be: I can't get this information right now).

But wait, we are just getting warm. Most of the web applications and especially portals have pages where they need to combine multiple pieces of information. Often the process of retrieving the information is long and retrieving the information sequentially sums up many long retrievals. Ironically, often enough the resources in question are not competing (for example two database servers) and we could speed up the retrieval by getting the information in parallel, but its just such a hassle to do all the concurrency programming. Thankfully its not ;-)

To demonstrate this, here is another example. After we got some productive mines in previous section we are now wash mined gold. To achieve this we call the method washGold which washes the gold for given amount of time, producing one clump in each second. Again, we have a synchronous and an asynchronous versions, first the synchronous one:


public class SynchGoldWasher {
public static void main(String[] args) throws Exception{
GoldMinerService service = ServiceLocator.getRemote(GoldMinerService.class);
int washTime = 10;
System.out.println("Washing gold for "+washTime+" seconds.");
long start = System.currentTimeMillis();
int washed = service.washGold(washTime*1000L);
long duration = System.currentTimeMillis() - start;
System.out.println("Washed "+washed+" gold clumps in "+duration+" ms.");
}
}

and now the asynchronous one:

public class AsynchGoldWasher {
public static void main(String[] args) throws Exception{
GoldMinerService service = ServiceLocator.getAsynchRemote(GoldMinerService.class);
AsynchGoldMinerService asynchService = (AsynchGoldMinerService)service;
int washTime = 10;
int calls = 5;
System.out.println("Washing gold for "+washTime+" seconds in "+calls+" calls.");
long start = System.currentTimeMillis();
MultiCallCollector collector = new MultiCallCollector(calls);
for (int i=0; i<calls; i++){
asynchService.asynchWashGold(washTime*1000L, collector.createSubCallHandler(""+i));
}
collector.waitForResults(11000);
int washed = 0;
for (int i=0; i<calls; i++){
washed += (Integer)collector.getReturnValue(""+i);
}
long duration = System.currentTimeMillis() - start;
System.out.println("Washed "+washed+" gold clumps in "+duration+" ms.");
asynchService.shutdown();
}
}

Again, we compare them against each other, first the synchronous one:

Washing gold for 10 seconds.
Washed 10 gold clumps in 10193 ms.


and the asynchronous one:


Washing gold for 10 seconds in 5 calls.
Washed 50 gold clumps in 10200 ms.


Now this is a difference! 5 Times faster? Well, of course it was an easy win, since you as a smart reader already noticed that I'm starting the 5 calls in parallel. But the actually amazing thing is that its achieved with minimal code overhead!
If we take a look at the source code, we see that we only need few additional lines to change the behavior of sequential code and we are programming sequentially even we run concurrently:

First we create a new collector and tell it that we are going to call 5 methods (calls is 5):

MultiCallCollector collector = new MultiCallCollector(calls);


Than we call the methods asynchronously via the auto generated asynchronous interface. Each calls returns immediately.

for (int i=0; i<calls; i++){
  asynchService.asynchWashGold(washTime*1000L,      collector.createSubCallHandler(""+i));
}


Now we tell the collector that we are ready calling and want to wait for max 11 seconds for results.

collector.waitForResults(11000);


And finally we have to collect the results.

int washed = 0;
for (int i=0; i<calls; i++){
  washed += (Integer)collector.getReturnValue(""+i);
}


And even if we are calling 5 times to the same service, there are no physical limitations to the amount of called services or methods.
So where would you use it? Example one: you have a portal with a welcome page which gets some information from multiple modules in the system: new messages, favorites online, the weather and so on. Instead of waiting for each of the submodules you only need to wait for the slowest one. And if its too slow, you can always abort it by setting the max call duration via timeout or similar.
Another example: you have to perform a complicated and long lasting calculation, you split it in multiple blocks and let multiple nodes calculate parts of it. It could save your scaleability problems, of course only if your problem is generally dividable in multiple subtasks, but most are.

To round it up, the above can very well be achieved by programming it manually and will help you a lot in running a high trafficked site or a non-trivial calculation. You don't need to have a specific framework, but if you want to save yourself the hassle you can use ours ;-)

More on DistributeMe.
Source code of the examples.

Exact instructions how to run the tests on your local machine will be provided later... if requested ;-)

3 comments:

  1. Excellent article!
    Leon, the annotation @DistributeMe(asynchSupport=true, asynchCallTimeout=2500) implies that the timeout is a part of the service interface.
    Can it be overridden by a client and what was the reason of such decision?

    ReplyDelete
  2. One more question.
    You've mentioned that there are no physical limitations to the amount of called services or methods. Does it means that the service skeleton creates new thread for each incoming call?

    ReplyDelete
  3. Hello Ruslan,
    ad 1: The timeout can be either part of the service interface, or part of the method declaration, or configured globally in the Defaults if not specified exactly. However, the timeout only has effect if you are using the classical style of the interface. If you are explicitly use Asynch-Interface you are free to use whatever timeout you want (by calling waitForResult on a CallHandler) or by not waiting for the method execution at all!

    ad 2: The limitation was related to the number of interfaces / methods that can be called programatically, not necessary concurrently. The example contained only one service and one method in this service. I wanted to say that you can call three services and 5 different methods at the same time at one place in code and other services/methods in another. How many service you can really call concurrently depends on the size of the asynch thread pool, as well as rmi stub/skeleton pools on client/server side, os settings, firewalls and so on.

    ReplyDelete