Sunday, November 23, 2008

My Stat Engine!

For the past 1.5 weeks, I have been probing for the features available in Google App Engine. Gave a start to this Google's product, which is about 7 months old now, by building my own stat engine (which provides stats on page hits) for my blog. Google App Engine seems to be promising in terms of ease of development. But, yet to test its real power in terms of availability and scalability.



To build custom visitor maps for your sites:

1) Get the IP of the remote machine (client). I have used CGI Python to get remote address. Typically, any server side script can be used.

2) Get the location of the IP. This is not needed if you are providing only the page hits. I have used the hostip info API to get the location. It can provide details upto latitude / longitude. For more on getting the location, refer here.I have used GeoCoding capabilities of Google Maps to locate the users. Yahoo Maps local would give much better results.

3) Set up the database with the information you would like to store (ip, location, timestanp,etc) . Choose database of your choice. I have chosen Google's Big Table ( as it comes with Google App Engine)

4) Some server side scripting to persist the IP information into the database.

5) Use any visualization tool for displaying the stats. I have used Google Maps to display the visitors' location. Google Visualization API is also a better option.

We are done. Its pretty simple. You got to be sure of what kind of statistical information you would like to see. Design your database schema accordingly.

I find it little difficult to transform from traditional relational databases to Google's Big Table. Its indeed a heavy paradigm shift. Google's Big Table is quite different: No database functions like count(), max(), sum().No joins. Can fetch only 100o rows. No group by. In my case, I need to group by location to get the location based count. Its not possible directly with Google's Big Table. Need to maitain a separate table for location count and increment it everytime a page hit occurs. To my surprise, that is how Google Analytics schema also look like. Read this.

If you wanna give a try on ur blog, ping me at [email protected]. Shall share the codebase.

P.S.: Scroll down to the bottom of this page to see Stat Engine in action.

3 comments:

  1. This is very nice dude.
    I recommend that you add more features to it and share it with the broader community...

    ReplyDelete
  2. Thanks dude! Shall take it further if time permits...

    ReplyDelete
  3. [Yogesh - Commented offline] I think Google's BIG Table is meant for faster search on columns , Aggregate functions are missing deliberately, because those functions will be horrendously slow !!

    [Varun] - Yup! Agreed.

    ReplyDelete