Recent Posts

Pages: [1] 2 3 ... 10
1
Big Data / Harnessing Hadoop for Big Data – Series III
« Last post by admin on January 24, 2012, 01:53:51 AM »
Flytxt conducted ‘Harnessing Hadoop for Big Data – Series III’ at CDAC on 11th January 2012. To view the presentation, Click below.

Coexistence or competition - RDBMS and HADOOP
Leveraging open source for big data stack
2
Big Data / Re: Increasing HBase Write Performance
« Last post by Jocelyn on January 05, 2012, 01:35:24 AM »
The big problem that I see is you are running HBase on 2 nodes with a replication factor of 3 (actually in effect just 2 as there are only 2 nodes to replicate to). This means all writes must be replicated to both nodes. HBase really needs at least 5 or so nodes to get going.
It sounds like you are filling up your first region and it is splitting, during the split once the MemStore fills up you will start blocking. You should look into creating your table pre-split into multiple regions that will give you an even distribution of writes.
3
Cloud Computing / Re: Web 2.0, for new era Cloud Applications
« Last post by Jocelyn on January 05, 2012, 01:33:27 AM »
The concept of "Web 2.0" began with a conference brainstorming session between O'Reilly and MediaLive International. Dale Dougherty, web pioneer and O'Reilly VP, noted that far from having "crashed", the web was more important than ever, with exciting new applications and sites popping up with surprising regularity. What's more, the companies that had survived the collapse seemed to have some things in common. Could it be that the dot-com collapse marked some kind of turning point for the web, such that a call to action such as "Web 2.0" might make sense? We agreed that it did, and so the Web 2.0 Conference was born.
4
Cloud Computing / Re: Choosing your database – SQL or NoSQL?
« Last post by Jocelyn on January 03, 2012, 01:37:49 AM »
Sometimes RDBMS are not the best solution, although there are ways to accomodate user defined fields (see XML Datatype, EAV design pattern, or just have spare generic columns) sometimes a schema free database is a good choice.
However, you need to nail down your requirements before choosing to go with a document database, as you will loose a lot of the power you may be used to with the relational model
eg...
If you would otherwise have multiple tables in your RDBMS database, you will need to research the features MongoDB affords you to accomodate these needs.
If you will need to query the data in specific ways, again you need to research what MongoDB offers you.
I wouldnt think of NoSQL as replacement for RDBMS, rather a slightly different tool that brings its own sets of advantages and disadvantages making it more suitable for some projects than others.
5
Cloud Computing / Flytxt has great news to share.
« Last post by admin on December 08, 2011, 02:36:32 AM »
Flytxt wins at the 3rd Annual IEEE Cloud Computing  Technology and Science Challenge held at Athens. The white paper published was on ‘Mobile subscriber Fingerprinting:A big Data Approach’ and was presented by Jobin Wilson and Vikram Garg at Athens. To download the white paper, click here.
6
Big Data / Re: Increasing HBase Persistence Performance
« Last post by sarath.sasidharan on October 12, 2011, 12:47:36 AM »
Just for guys who need more deeper info into it's working. TableOutputFormat class extends the basic OutputFormat class , which is used to store the data in various locations , this could either be a file or a table depending on which class extends this. TextOutputFormat stores the file into a text file. TableOutputFormat is the API which is used to store this into a table. It uses the TableRecordWriter to write it into a specific hbase table . TableOutputFormat sets the auto flush of the table to false due to which it takes over the buffer flushing . This helps in increasing the load/insertion rates into hbase. This helps because once the auto flush is set to false then the list of puts are sent to the region servers only when the write buffer size is full , else every put is send to the region server as soon as they arrive.
7
Big Data / Increasing HBase Write Performance
« Last post by sarath.sasidharan on October 11, 2011, 06:02:34 AM »
Are you haunted by slow write rates into HBase ?  We'll then you will surely like this approach.

Problem :

(a) Slow rates for persistence into HBase , even though we have enough commodity servers set up.

(b) Low TPS observed while using MapReduce to write data into HBase Tables.

Solution :

HBase APIs to the rescue . TableOutputFormat is the one which you will have to use for the same.

Steps to Achieve the same :

(a) In the map reduce code , you would need to set the Hbase output table , for this you can do this by creating a new configuration object and setting the value for the same :

If i have my configuration object as conf , then   conf.set(TableOutputFormat.OUTPUT_TABLE,'HBASE_TABLE_1');
Here HBASE_TABLE_1 is the table where the data needs to be put.

(b) Next you would need to set the OutputValueClass as Put for the job object which you have created.

If job is my job object which i have created then ,

job.setOutputValueClass(Put.class);

(c) Once this has been done the we would need to set the OutputFormatClass of the job object to the TableOutputFormat .

job.setOutputFormatClass(TableOutputFormat.class);

These are the configuration related changes which you would need to perform before you submit the job.

Once this is through , you would need to create a new put object and then add the required column family, the column and the corresponding values into this put object .

Next call the context.write(Text,Put) method , and pass the key which you need to store and also the populated put object which contains the values which need to be persisted into the table.

Once the context.write method is called the TableOutputFormat has an overriden method for write which will take in the Text object as it s key and Writable object  as it s value . The same is checked if it is either a put or a delete object , if it is either if them then the corresponding function is executed , if not then an exception will be thrown.

 tableOutFormat.table.put(new Put((Put) value))

In the above code line we have the  tableOutFormat.table value being resolved into a HTable object and then the put method is called with the writable value being cast into an Put object.


Once you are through with this you are ready to go and check your new performance , you will surely find increase in your persistence rate.

Hope this helps  :)



8
Big Data / Learning materials from the Dev2dev workshop - 13th September 2011
« Last post by admin on September 16, 2011, 12:22:12 AM »
Thank you for participating in the dev2dev workshop on 13th September 2011 and being active contributors towards its success.
Based on the public demand for the learning materials shared during the workshop, we have uploaded the same on the post here. Lets, share, learn and grow together.
Thank you for the support and wish you a Happy Engineers Day!
9
Big Data / Re: Pivot table in Hive
« Last post by jobin.wilson on September 15, 2011, 08:03:22 AM »
Hi Reju,

As i mentioned,the approach has a limitation  :'(.Only if the column on which you want to pivot is finite and known to you in advance,this approach would help.

Did you consider aggregating the values in hive & storing the result into a file and then transforming the output file to your required format(outside hive)& then loading the transformed file back to your destination hive table?

Thanks & Regards
Jobin Wilson

10
Cloud Computing / Web 2.0, for new era Cloud Applications
« Last post by anoop.nair on September 15, 2011, 07:49:57 AM »
        Web 2.0 has been a buzz word lately, with a string of portals and applications switching to the Web 2.0 mode, its relevance being increased manifold with the advent of cloud based SAAS, PAAS applications.

So What’s the big hoopla about ? what does Web 2.0 mean ? What new standards has it brought about? Is it here to stay? .  Well, I’ll try and demystify these.

  “Web 2.0” the name as such, formally came into being, in late 2004 when O’ Reilly and Media Alive International, held a conference about the Web, they thought of naming it ‘Web 2.0’,  there way of saying the web still mattered. At that time the word meant “the web as a platform”. This meaning, the web would no longer be a place, just to display static content, but a place for richer and user interactive applications as well. Its meaning and usability stand redefined. Though there is no one word definition for Web 2.0, but certain aspects put together makes it so. The foremost of them being the ones listed below.

Respect the User:  One of the major paradigm changes of Web 2.0 is “Dont bug the user”.  There used to be a time when giving the user more pain( long registration forms , activations links and what not)  gave a kind of legitimacy to the website. Not now, “Respect the user” in the sense, no more unnecessary pop ups, no undue long registration forms , asking only for the necessary information and definitely no Activation links. People have become smarter , their perspectives has changed, they now look for smarter and easier ways of getting things done. Putting too many hassles irritates them.

Clean UI:  Cleaner UIs, a neat interface, wherein the user doesn’t have to search for information is  major plus for portal and apps. A neat and uncluttered UI makes the user comfortable and also ensures his stay and chances of revisit more.

Loads of Scripting: There was a time when JavaScript was considered a security nightmare. By default, the browsers would have scripting disabled.  Now as it turns out, almost every website or portal out there uses JavaScript to bring in about the right amount of dynamism to their pages. JavaScript in itself has evolved and matured with loads of frameworks. Ext JS being my personal favorite. There are other frame works as well for instance Moo Tools, it has great a few widgets best of all its light weight. JQuery and Qooxdoo are other such frameworks.  Scripting allows you to make UI richer and more responsive.

Ajax for All: Fetching only need based data from the server, and of course behind the scene data retrieval, have become quite a fancy, with easier Ajax handling Apis from JQuery and Ext Js like frameworks. Their integration proves to be very useful and easy.

In a nut shell, the Web is the place to be, with loads of backend processing , no more Applets, no legacy pages. The web is here to offer all, data representation to storage to data processing all done on the cloud to make sure you use the PC just to access theses services. No more heavy duty installations on your machine. 
Just to give you a glimpse of the Web 2.0 fewer, I have listed a few Web 2.0 compliant apps and web portals.
http://www.cleartrip.com/  - Travel Portal , allows you to book flight , train tickets .
http://demo.group-office.eu  - An office utility implementation that helps you share emails, calendar project files, etc.  (User: demo, Pass: demo).
Cloud based applications like google’s spreadsheets, Zoho are other examples too.
Pages: [1] 2 3 ... 10