Sonntag, 10. November 2013

Forking around the christmas tree

Some time ago, I heardread about the Fork/Join-framework, coming with Java 7.

The basic concept is pretty easy, and quickly understood:
If you have a large task, split it, and give the parts to different threads/processors to better utilize your hardware (Hey, there are even low-end smartphones with dualcores, nowadays).

While the concept is pretty easy, there are few good examples. Some of the examples are even labeled as bad examples by the author. Others are so complex, you can't see the important parts.

That's why I didn't dig into it, until recently. I will try to show you a simple example with a little story:

A Heavily Loaded Christmas Tree


We have bought a christmas tree with a lot of branches. Of course, every branch needs to be decorated with ornaments. Luckily we got some elves to do the work.
The ornaments are heavy, so our little elves can't carry more than one ornament from the storage room to the tree.

Lets do this with a single elf:

public drapeTree(){
  drapeBranch(trunk);
}
public void drapeBranch(Branch branch){
  Ornament o = getOrnamentFromStorage();
  branch.drape(o);
  for(Branch subBranch : branch.getSubBranches()){
    drapeBranch(subBranch);
  }
}

poor little elf, 200 branches takes a lot of time to drape.

Organizing Elves


Int-elves usually come in groups of 2,4 or 8. But there are even elves (so called Advanced Micro Elves) that comes in groups of 3 or 6. Other elves are much smaller but are seen in huge groups.

A simple model to organise elves, would be, to split the job and assign every elf of the group the same amount of branches to drape. In an ideal world, all elves would be finished at the same time.
But some ornaments are broken and have to be repaired before draping them to the branch. Others need to be polished. This takes longer, so some elves are faster than others.
Another problem is, that we need to check before, how much elves belong to the group, so we can split the work accordingly.

Self Organizing Elves


Luckily our elves are intelligent enough to organize themself. Just give them tasks todo:

class DrapeTask extends RecursiveAction{
  Branch branch
  DrapeTask(Branch branch) {
    this.branch = branch;
  }  

  protected void compute(){
    Ornament o = getOrnamentFromStorage();
    branch.drape(o);
    for(Branch subBranch : branch.getSubBranches()){
      new DrapeTask(subBranch).fork();
    }
  }

}
ForkJoinPool elfGroup = new ForkJoinPool(); 
public drapeTree(){
  elfGroup.invoke(new DrapeTask(trunk));
}
Now the first of the elves start to drape the trunk with a tree-topper. On his way back, he notes all branches from the trunk and creates tasks from it. Now, every elf in the group is going to get a task with a dedicated branch. Every time the elves return from a branch, they return with tasks for the subbranches of this branch. Usually, the elves will work on the branch and all of its subbranches. If one elves finishes all his tasks faster than the others, he will look up the remaining tasks of the other elves and "steal" some of them.

Meanwhile At The North Pole


Santa Claus watches his elves closely. He notices that the tasks have to be a specific size. If they are too small, his elves have to do more organization (writing tasks, etc.) than actual work. If the tasks are too big, the work isn't well balanced in the group.

There's some deeper stuff (sorry, no elves here) on the heise site.

Freitag, 11. März 2011

Bazaar from a Subversion user`s sight

Currently I'm trying different distributed version control systems (DVCS) as alternative to an existing Subversion installation.

Today I will share my impressions of bazaar (bzr).

Overview

While learning the basic concepts of Bazaar, it seems very promising and easy. Users migrating from a centralized VCS will find the checkout functionality very useful: You can simply commit and update like you do in your old VCS, thus minimizing the learning curve. The only difference is that you can do local commits and you may unbind your (implicit created) branch when taking your laptop on the road.
Very impressive: you can even checkout your SVN repo and work on it. You could even branch it, but bazaar wants to rewrite the history when commiting, which is deactivated at our repo (for good reasons).

The packaged Tools are very good, You can either use TortoiseBZR (Windows Explorer Extension), or Bazaar Explorer, or both. They are both very good Tools (The best prepackaged ones, I have seen so far)!

Access / Serving

Bazaar uses a "Smart Server" to access bazaar repositories through file system access (or through SSH/SFTP,..). There's a module for Apache to provide access through HTTP.

Branching Concept

Branches can only be done by cloning. Branches can have default source and target branches, so you can easily pull from the official mainline and push to the development branch for review. A branch may even be bound to another branch, so a commit is done in both branches, or none (Local commits are still possible).
Bazaar insists that you have pulled and merged all changes from a branch before you are allowed to push your changes.

Wait, what?
Canonical (the company behind bazaar) says that there may be side-effects of two commits, even if they aren't related on a per-file base. So you should merge all changes and run the unit-tests before pushing your changes back to the mainline.
While the point is valid, imagine a project with 30+ full time developer: Change, commit, push, oh diverged... merge, build and test, (commit), push, oh diverged again....

Also negative:
We are currently using an unstable trunk model, and using stable branches for stabilizing before a final deployment and fixing bugs. We are merging back all fixes to trunk and mark other things (like maven dependency changes) as merged. While Subversions merge tracking is far from perfect, it handles this with ease and all merges are tracked.

I've looked for a similar feature in Bazaar, but the only function that provides a selective merge of single revisions is "cherry picking". Cherry picking only merges contents, so all merge tracking infos are lost. You could als export the revision as a patch and apply it on your mainline.

Repository Concept

Bazaar features a shared repository functionality, where a single repository can hold the history for different branches. This saves disk space and bandwidth.

Conclusion

Bazaar is a very interesting DVCS and supports a lot of workflows better than other systems. Migration from centralized systems is easy.
SVN support is very good, but slow.
Negative points are earned for forcing you to create clones of your branches and to merge all changes before pushing.
While speed is ok, it is definitely slower than other DVCS and sometimes even slower than SVN.

Sonntag, 31. Januar 2010

TinyMe 2010 Beta 2 review, or: is Unity strength?

Background

I always wanted a small server for printing, file hosting and subversion.
From a former employer I got some discarded thin clients a while ago. Power consumption is low, so they make perfect mini servers.

Specs: 466MHz Celeron, 256 MB RAM
I expanded it with a cheap 4 GB CF-Card and IDE-CF-Adapter.

Vector was the first distribution running fine, but made a lot of use of the harddrive, which made the whole system slow and sluggish in reaction (Did I notice, the CF Card was cheap?). So I tested Puppy and TinyMe and ended with TinyMe as my favourite system.

About TinyMe

TinyMe was originally based on PCLinuxOS. A year ago they divided from the PCLinuxOS project, and founded (together with other distribution teams) a new base distribution, called Unity Linux.
TinyMe provides two different Versions: One with nearly everything you need for a small office desktop and another for a very small, clean system where you have to install the things you need on your own (but it let you choose your favourite).
The default window manager/desktop ist LXDE, which is a good compromise of speed, memory usage and usability.
Now there is a Unity-based beta release: "Acorn" 2010 B2 (Only the bigger version is available ATM)

Installation

TinyMe comes on a Live-CD.
It asks for language (only few are currently supported), timezone and keyboard layout on startup.
After login a simple desktop shows up, with conky running on upper right. A click on the "Unity Installer" desktop icon brings up "Draklive-Install", the Mandriva/Mandrake installer.
Installation is straightforward: The dialogs are kept simple, but with advanced options for different settings. Only the ACPI should be placed at the advanced tab, since the most users won't know what it is and are probably not interested in deactivating it.
Speaking of ACPI: The Installer detected that my box doesn't support it, but there were no option for APM. So I had to add the "apm=power-off" option manually. Otherwise shutdown won't power off my box.
It's common for bigger distributions to support only ACPI, but users of small distributions often have to deal with old hardware, that doesn't support ACPI. So support for APM shutdown should be considered important (and it's only a apm=power-off at the GRUB options).

Setup

At the first startup, TinyMe will ask for a root password and a new user. Afterwards the login screen is shown.
Configuration is done through the "Configure Your Computer" icon which will ask you for the root pw. It's a little serve to allow only root to configure mouse, keyboard, etc.
The setup dialog is designed nicely, but incomplete at some places. As an example, the "Set up the printer" icon does nothing. I think it will be fixed for the final release, but it's confusing.
The configuration part is also mentioned at the TODO list on the website, so at the final release you will be able to change backgrounds (which is not possible at the moment, but the changing nature photos are beautiful so I can live with it).
A small, but annoying, bug is that the Control Center sometimes refuses to be closed.

System

The graphical package manager is the Smart Package Manager.
Preinstalled applications are Sylpheed for mails, Midori for browsing and Abiword for text processing.
A spreadsheet application is not installed.
Conky is running and I found no option to deactivate it. It's not consuming much memory, but the constant screen updates can be annoying on remote connections.

The Good

The system is running at a good speed. Ext4 will keep the boot times down, even on my CF card. There are still lags when loading applications, but the overall speed is great, compared to what the hardware is. Idle memory consumption is about 45 MB RAM which is very impressive.

The Bad

The Midori browser crashed when loading google and other pages, so does a post-installed Chrome. Maybe the underlying webkit engine isn't designed to run on the old CPU. At a guest session in a VBox, Midori didn't had those problems.
Firefox worked fine for me.

The packages are sometimes incompatible. Trying to install a VNC server resulted in exceptions due to strange version conflicts. [Edit: this may be a result of the activation of an additional repository]

Updating the fresh installed system showed up some warnings.

Conclusion

There's a lot of potential in this distribution. Speed and memory consumption make it a good choice for older hardware.
There is also hard work to do before final release. The choice of browser should be rethought (it should be tested on older hardware as well). An APM option would be nice and the configuration utilities need to be finished.

Freitag, 3. Juli 2009

Building up an Eclipse team update site mirror

The new Eclipse Galileo (3.5) ships with the completed provisioning system p2. It was introduced with Eclipse Ganymede (3.4) but not every feature was finished at that time.
One of these features is the mirroring-tool.

It enables mirroring of multiple update site in one single site. So it's ideal for companies to construct a single team-mirror with everything they need.

There's a small page which documents, in a rather rudimentary way, how to use it: p2 repository mirroring

You can mirror 2 parts separately: metadata and artifacts.
The metadata tells the eclipse p2 about artifact categories and other information, while the artifacts contains the real plugins.

So to build a full mirror, you need to run both mirroring-applications.

To automate this process I developed a little script that checks a list of update sites and adds artifacts and metadata to your mirror.
A problem was that some mirrors of the mylyn extras had wrong/old metadata, as the downloaded artifacts were 3.2.0 artifacts while the metadata points to 3.1.1 artifacts. So I chose a explicit mirror server (it is commented out, in hope that they have fixed it now).


#Warn for undeclared variables
set -u
#stop on errors
set -e

#######
# choose where to store thing
dst_repo="file:/D:/update_35"
# Choose a name ot the mirror
dst_name="My Mirror"


# Mylyn Extras
mylyn_extras=( \
http://download.eclipse.org/tools/mylyn/update/extras \
http://download.eclipse.org/tools/mylyn/update/incubator \
)
#I used it because of wrong metadata on one of the official mirrors
#mylyn_extras=( \
# ftp://mirror.netcologne.de/eclipse//tools/mylyn/update/extras/ \
# ftp://mirror.netcologne.de/eclipse//tools/mylyn/update/incubator/ \
#)

# M2eclipse for Maven integration
m2eclipse=( \
http://m2eclipse.sonatype.org/update-dev/ \
)

# SpringIDE
spring_ide=( \
http://springide.org/updatesite/ \
)

#Subversive integration and connectors
polarion_svn=( \
http://www.polarion.org/projects/subversive/download/eclipse/2.0/update-site/ \
http://community.polarion.com/projects/subversive/download/integrations/update-site/ \
)

#Test NG integration
testng=( \
http://beust.com/eclipse/ \
)

#Put all the mirrors in a single array
mirror_list=( \
${mylyn_extras[*]} \
${m2eclipse[*]} \
${spring_ide[*]} \
${polarion_svn[*]} \
${testng[*]} \
)

function mirror(){

./eclipse \
-application org.eclipse.equinox.p2.artifact.repository.mirrorApplication \
-nosplash \
-source $1 \
-destination $dst_repo \
-destinationName $dst_name \
-verbose \
-compare


./eclipse \
-application org.eclipse.equinox.p2.metadata.repository.mirrorApplication \
-nosplash \
-source $1 \
-destination $dst_repo \
-destinationName $dst_name \
-verbose \
-compare

}
i=0
for aMirror in ${mirror_list[*]}
do
((i++))
echo "Mirroring $aMirror (Step $i of ${#mirror_list[*]})"
mirror $aMirror
done
echo "fin"


Feel free to use it.

Dienstag, 12. Mai 2009

Bad Habits - Static Variables

There are many bad habits programmers can develop in their life. The possible impact can range from hardly readable code to serious failures.

Today I want to discuss a very common habit that can cause very serious issues up to security and stability flaws: Static Variables.
No, I'm not speaking of constant values. It's ok to store, let's say, PI in a static constant. It's also ok to store environment properties and states in a static fashion, as they're affecting the whole system.

But you should be careful with data that is changing and/or context sensitive!

To clarify what I mean, lets assume someone with that habit is writing a parser that's parsing some data from a text. A possible worst case scenario could look like this:

class AParser
public static String data1;
public static String data2;

public static void parse(String text){
data1 = ...//parse data1
data2 = ...//parse data2
}
}

So what's wrong with it?
Imagine two threads calling parse at the same time. The result will be unpredictable, as both threads are working on the very same variables and are changing each others results.
It doesn't even need threads to go wrong badly! Every run will set values to some variables, but will it do that to every variable? Regardless of the circumstances? Most probably the answer is no. Even if it is, the next change of the method will likely add such problems.

So what's the solution? Simply removing static?
class AParser {
private String data1;
private String data2;

void parseName(String text){
data1 = ...//parse data1
data2 = ...//parse data2
}

public String getData1() {
return data1;
}
public String getData2() {
return data2;
}
}
Is that really better?
The answer is: No!
We just removed the static keyword, leaving the problem untouched: Multiple calls to the method of a single instance will still cause the same issues.
When used together with a lazy instantiation (singleton pattern) of the parser, there's no difference to the "real" static approach: there will be only a single instance to store data in. The variables would be static, even if they are not declared as such.

So what's the real problem here?
The major problem is the lack of a separation between data and logic, a common problem leading to static variables.
Another problem is the unnecessary wide scope of the variables.

So let's create a simple data class which holds the data locally and enables returning of multiple values:

class ParsedData {
String data1;
String data2;
}

class Parser {

public static ParsedData parse(String text){
ParsedData result = new ParsedData();
result.data1=...//Parse data1
result.data2=...//Parse data2
return result;
}
}
Now every call to the parse method will have it's own, local data. So it wont affect the data of other calls, even when the method itself is static.

Conclusion

  • Minimizing the the scope of your variables reduces possible side effects.
    Check every non-local variable if it is placed correctly
  • Introduce data classes for the data living in one context, to make handling with it easier and more flexible.

Donnerstag, 7. Mai 2009

spring-remoting-cluster

I started the spring-remoting-cluster project as a result of my job work. The initial intention was, to spread load across multiple webservers which act as a backend for different server applications.

Our first version was very rough (it looked even worse than the first revision commited to the google repo). So I pushed it forward to get a code base which was maintainable and where features could be added easily.

One of the first new features was the runtime configuration (just call the addUri oder removeUri-Method to add or remove a server on the fly). We created a simple MBean to control it via JMX. It's really cool when you are going to deploy a new version of the backend server: Remove one, deploy new version, add it again, continue with the next.
The frontend app doesn't even notice that a server is down.

This is a client-side library, so there are some things you should spend at least a short thought about:
  • if connection to one server is lost, this neither means the server is down, nor that your method wasn't executed properly. If you run the method again on a second instance, it could lead to duplicated data. If you don't, maybe no data will be saved/processed.
  • load-balancing can only be rudimentary. Even the best algorithm can only guess which server will have the most free resources available, especially if there's more than a single client.

I have planned new features for it, but there's a lot of work todo before (one of them is documentation, as mentioned in the issue tracker).
Features I'm currently planning:
  • A better load balancing algorithm.
    A weighted one should be rather simple, but provide a much better balancing.
  • Support of Burlap/Hessian, JMS and RMI (as mentioned on the frontpage)
  • Annotation based config
    You can already mark a method as "test" method. It will be used instead of a generic alive-test approach.
So, have fun playing around with it and let me know, if you like it and what's missing. I could also need some support in development, so don't be shy when you are interested in helping.

criticism is welcome!