Pentaho and Apatar, My Two Strangely Named Free Data Friends

Pentaho and Apatar, My Two Strangely Named Free Data Friends

kettle_conceptual_model
Image by rickbradley via Flickr

I have to admit that cool tools like these will eventually make certain types of programmers worry about job security. But I see it as a tool. I have an idea about the movement of data that will make me money and then I have to figure out the way to get that done. You see, I deal with masses of data for the most part. They make me money.

E-Commerce sites and affiliate sites based off of entire product catalogs. Some of my ideas about that are in this post. You have to find a way to automate when your smallest product catalog has over 3000 products. You have to save data, analyze that data and use it to pick winners from losers. And the winners, you do SEO work, which, at times can be automated also. With that simple concept, I have seen $1,000,000/year sites become $2,000,000/year sites within the space of a year and $100 week affiliate sites top $1000 a week within two weeks.

I have used PHP for the most part, which is great, because you can run in on a server with almost constant uptime and schedule the server with “cron jobs” to run a script when it needs to be run. And to tell you the truth, if you are running a one person show part-time and want to get stuff done, there are quicker ways to build the framework to do the job, especially if all you are doing is moving data around on the internet. There are macro tools and scripting and desktop software. But running a scheduled task from the computer you do work on sucks, especially if it is a laptop. I know mine is not always on and I don’t really want it to turn on by itself.

That’s where my two friends come in. They are designed to make the job a bit easier. It did touch on software for transforming and moving data and I used Jitterbit for a while but there was a learning curve. It was not very visual. It might be better now, but I ran into these two once more which I also checked out back then and stopped. These tools have improved greatly since the last time I used them. They are Java based and use a client-server configuration. That means I can install the editor here on my laptop and the server somewhere on a host that can be cronned. Of course, now I have to venture into Tomcat, which I have always seen hanging around but never found a use for.

I never used flowcharts for the same reason I never wrote “Hello World” programs. I believe in real world exercises. But, after using these two tools, I am not so sure of that. If I am planning software that would be useful to others as well as myself, then I can write the idea with one of these tools and not only test it quicker in the real world, but have my job running. Then I can use the results to map out the software I would write for others. Look at the user interface below to see what I mean. This one is from Pentaho.

Of the two Pentaho seems to have more of the scripting, regex and programming like features I need, but it did not come with an installer. Extract the folder and run the Spoon.bat file to run the designer. Apatar came with an installer for those of you who don't know what a batch file is and here is a look at designing a data job in it's interface.

I have picked Pentaho for the features, but have yet to install the server. You don’t need it to run the scripts. You can run the scripts from the designer. The server is a big dog, 142 MB still in the tar.gz file and will run on Tomcat which I have never installed. But I have tested some of my scripts and they work and they will do work for me. And like I said, if any part of what I come up with is commercially viable as desktop software, a php script, an online service or a Wordpress plugin. I have the idea in front of me, with all the data kinks and twists and turns worked out. I just replace those icons with functions in whichever code I choose.

Great tools.


Stephan Miller

Written by

Kansas City Software Engineer and Author

Twitter | Github | LinkedIn

Updated