Tuning the EJB3 Implementation

During my evaluation of highly scalable technologies I incurred in some performance problems with the EJB3 implementation. It seems to me that it is a quite common problem

Yesterday I was getting very dissappointing results from the benchmarks of the EJB3 implementation of the sample application. I have discovered some nice hints for optimizing Hybernate and EJB3 applications and a very good video on EJB3. This hepled me understand why my EJBs were not updated properly, but most of all why I was getting such poor performance.

The performance problem was due to the use of eager fetching on a many to many relationship. The relationship in question was friendship between users. This is a many to many relationship with itself. This means eager fetching is propagated transitively, so each time I loaded a user, all the friends were loaded as well, and also the friends of the friends and the friends of… basically the whole table.

By default many to many relationships use lazy fetching, but I changed this to eager when I wrote the application. The reason I had eager fetching on this relationship is that when I tried to get the number of friends a user had, my appserver was complaining that it couldn’t be done with lazy fetching. My error was in performing this through user.getFriends().size(). What I’m doing now instead is asking the database to give me the number of friends by using a query like “SELECT count(f) FROM user u JOIN u.friends f WHERE…”.

After that I changed some transaction attributes so that read only operations do not require transactions. Given the semantic of the methods I don’t require a strong level on consistency, so it’s relatively safe to do this. This minimizes delays due to contention and increases the paralellizability (try pronouncing this aloud) of the applications.

The result was that now my application runs 10 times faster. Now I’m getting 79 requests per second if I run the server on an EC2 instance and the client on my machine. This is just a little slower than the 100 requests per second obtained by the pojo in memoty solution, which means that EJB3 is pretty fast. I still have to test this without the communication bottleneck to be sure of the actual performance.

Measuring the Speed of Clouds

I have started a project for benchmarking highly scalable technologies. My plan is to use cloud computing platforms and implement the same application using different stacks of technologies. The aspect I want to evaluate is performance when the number of nodes grows a lot. Read the rest of this entry »

Centralized VS Distributed SCM

Distributed Source Control Management systems have become a trend in the last years: bazaar, git, mercurial, svk. Distributed SCM is a fascinating concept, but how well does it perform in practice? Some say that it leads to a phenomenon called branch proliferation. Read the rest of this entry »

Playing with Jazz

Yeah, you read correctly: “Playing with Jazz”. Jazz is the new Application Lifecycle Management software developed by IBM Rational. In other words it is an all-in-one Source Control Management, Continuous Integration and Issue Tracking server. Currently I’m developing a lot of plugins for my client that are meant to integrate Jira, Subversion, Maven, ClearCase, our custom bug tracking software, our custom made overnight testing and reporting, Eclipse IDE etc. Jazz is supposed to provide these features and much more out-of-the-box. I’ve seen the demos and I was really impressed. So, now that it’s open for download, I registered and downloaded the server and the client.

Well, things are not that impressive when you start working with it. The product isn’t that ergonomic. It presents a relatively steep learing curve for the basic functionality.

Ideally an administrator would:

  1. install the server (it should be added automatically as a windows/*nix service)
  2. start the Jazz service
  3. connect to the web-admin interface
  4. create some users

The developer:

  1. installs the Eclipse Jazz plugin
  2. connects with his username/password
  3. uploads a project or two in the repository

The server:

  1. autodetects how to build and test the projects (eclipse/ant/maven)
  2. starts doing this immediately with a predefined schedule (continuously)
  3. if there are failures it notifies the responsible developer
  4. build and test status are visible in the status bar of the IDE

What happens in reality is not that simple. You need to create a project area, define the process, create a team area (A team area is just a team, the word “Area” was added so that you don’t confuse the representation of a team in Jazz with the people sitting around you), create a development line, create a workspace, create a stream from your workspace to the development line, create an iteration plan, an iteration, a build configuration, a build engine etc.

jazz.png

What I would like to see is a product where the novice user and even the administrator initially, don’t need to know about project areas, iteration plans, development lines, streams. All these are advanced features. The first impact should present a shared versioned storage where you place your projects and it tells you whether the projects are building and whether the tests are passing. Users should be able to add tasks without having to define an iteration or an iteration plan. And most of all, I don’t like the Team Artifacts view. It’s an heterogeneus collection of entities in the Jazz server unimmaninatively organized into a tree. Integration with Eclipse should be tighter. Eclipse already knows how to build and test my projects. Why do I need to specify those things in two different places?

Don’t get me wrong. I don’t want to say that Jazz is overengineered. Most of the entities involved are necessary for such a product. Even multiple project areas and streams. But the point here is that they are presented in the wrong way. Advanced features should be hidden from the user and the product should be usable out-of-the-box. Ok, it took me less than 2 hours to install the server, read the instructions and configure a project, and another hour to troubleshoot the automated build. But there is really no reason why it should take more than 5 minutes.

I really hope that Jazz developers will take this as a constructive critique and deal with these issues before the release of Jazz. Otherwise they risk to loose a multitude of potential customers who need only simple configurations with the most basic process. And it’s from projects like these that you get the most publicity, community involvement and knowledge sharing.