{"id":1785,"date":"2021-04-10T08:43:35","date_gmt":"2021-04-10T08:43:35","guid":{"rendered":"https:\/\/gauravw.com\/blog\/?p=1785"},"modified":"2021-04-10T08:43:41","modified_gmt":"2021-04-10T08:43:41","slug":"big-data-udemy-course-notes","status":"publish","type":"post","link":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/","title":{"rendered":"Big Data Udemy Course Notes"},"content":{"rendered":"<h1 class=\"wp-block-heading\">Hadoop&nbsp;<\/h1>\n\n\n\n<p>The Apache <strong>Hadoop software<\/strong> library is a framework that allows for the <strong>distributed<\/strong> <strong>processing <\/strong>of large data sets across clusters of computers using simple programming models.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\" alt=\"\"\/><\/figure>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Hadoop Common<\/strong>: The common utilities that support the other Hadoop modules.<\/li><li><strong>Hadoop Distributed File System (HDFS&#x2122;)<\/strong>: A distributed file system that provides high-throughput access to application data.<\/li><li><strong>Hadoop YARN<\/strong>: A framework for job scheduling and cluster resource management.<\/li><li><strong>Hadoop MapReduce<\/strong>: A YARN-based system for parallel processing of large data sets.<\/li><li><a href=\"https:\/\/hadoop.apache.org\/ozone\/\"><strong>Hadoop Ozone<\/strong><\/a>: An object store for Hadoop<\/li><li><a href=\"https:\/\/ambari.apache.org\/\"><strong>Ambari&#x2122;<\/strong><\/a>: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.<\/li><li><a href=\"https:\/\/avro.apache.org\/\"><strong>Avro&#x2122;<\/strong><\/a>: A data serialization system.<\/li><li><a href=\"https:\/\/cassandra.apache.org\/\"><strong>Cassandra&#x2122;<\/strong><\/a>: A scalable multi-master database with no single points of failure.<\/li><li><a href=\"https:\/\/chukwa.apache.org\/\"><strong>Chukwa&#x2122;<\/strong><\/a>: A data collection system for managing large distributed systems.<\/li><li><a href=\"https:\/\/hbase.apache.org\/\"><strong>HBase&#x2122;<\/strong><\/a>: A scalable, distributed database that supports structured data storage for large tables.<\/li><li><a href=\"https:\/\/hive.apache.org\/\"><strong>Hive&#x2122;<\/strong><\/a>: A data warehouse infrastructure that provides data summarization and ad hoc querying.<\/li><li><a href=\"https:\/\/mahout.apache.org\/\"><strong>Mahout&#x2122;<\/strong><\/a>: A Scalable machine learning and data mining library.<\/li><li><a href=\"https:\/\/pig.apache.org\/\"><strong>Pig&#x2122;<\/strong><\/a>: A high-level data-flow language and execution framework for parallel computation.<\/li><li><a href=\"https:\/\/spark.apache.org\/\"><strong>Spark&#x2122;<\/strong><\/a>: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.<\/li><li><a href=\"https:\/\/submarine.apache.org\/\"><strong>Submarine<\/strong><\/a>: A unified AI platform which allows engineers and data scientists to run Machine Learning and Deep Learning workload in distributed clusters.<\/li><li><a href=\"https:\/\/tez.apache.org\/\"><strong>Tez&#x2122;<\/strong><\/a>: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive&#x2122;, Pig&#x2122; and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop&#x2122; MapReduce as the underlying execution engine.<\/li><li><a href=\"https:\/\/zookeeper.apache.org\/\"><strong>ZooKeeper&#x2122;<\/strong><\/a>: A high-performance coordination service for distributed applications.<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/vBbIN3Z5z2hqpZNRE9VeDPt0sBnbf8zwZVYgfcMLs-wgfmvMjLsj7pbzECsReruC5mqP4o8VR7sgENV1a7JPiICzz7nxLGw0TP4mZh5CzcSPaYao6vq42p5PiIyME8fQgebS4bBS\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/o6rI9W8PSiPxa6hs7Mybk3-S4ulLKYznxPehW0uC480ppjLVGvODDY9nfYss210toIOmTHLr88LpJaF89cyv8-SpdRDWMOpkOgHWdPu0kgtlbXWpV4ffYWJyTPRpva4HKZkdT9bw\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/w8pT-FvIGkdufuIpuyqWOWLelKMHVkdrlCrClCtAjToUgIAibqDPNPaY4kcA5zoKqAQegVfhwD-WuIl-8e67cLD5TFm_e-iaLrygIrjKB69RhdrOHwRL7pZyo5xMj3D82aZe9p_f\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">HDFS<\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>Breaks files into blocks<\/li><li>Stores across several commodity computers and each block is replicated more than once to ensure data reliability in case one node&nbsp; goes down.&nbsp;<\/li><\/ol>\n\n\n\n<p>Name Node &#8211; is like an index to the files(blocks) stored and data nodes. (Namenode )<br>Data Node &#8211; it actually stores the data<\/p>\n\n\n\n<p>Client node asks for a file to Name Node and then goes to get it from the data nodes.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/LQUbLcYYtvIPS4QP0MnAotwHwaJC4CDpRrNcUReEDYbTM7inmde_8Lgdb8Yi3S7iC6qkpJZVGxiaTfboqRQZT7FZIO0K9ivIosiJ4piACzJO5kbdql93Ro-zTVAg6pZCeMxgCjO_\" alt=\"\"\/><\/figure>\n\n\n\n<p>Writing &#8211; <strong>Client node <\/strong>hands over to <strong>Name node<\/strong> and it hands over it the name of <strong>data node.<\/strong> The client talks to Data node which inturn talk to other data nodes to get the file saved across them along with replication. They all respond back to the Client node and to Name Node .<br><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/TbJixNPa03iQRXn2VK7KC66X3mb2Y2eKHj3kLbnswK65uGNaIcER6ThLcc2JZBRQG3xCJHYXVxUtJEP-uO1esCcv2zBSZvMteVUtsDdaOBMvcJqnozQYUg69fyEhC5mKMEg7maLw\" width=\"365\" height=\"328\"><\/p>\n\n\n\n<p>NameNode Resilience<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/_T9GAm9jnkso6nHag5naUxAM_H84yf7UkFvGXh8hURGm9eeAlwjDVeSu9A5WbNepScD67VC0J1MpG_hUn1c3la_baLgpiq5Vns5_IGlJ9SydoE13hNO1TjwJCuLfopUsz3ILog6C\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/m322uDfUZpMTwsOg6gQIZe--qQbulRE9UfO5Q-8csaCEAE8JETOUN54tOOdN9_ZZVXPDcAfeZhvvQza-YXX9H8MioNjQFDsjaqTA7tet8gbPNF1VppPh0csmC2HtaUEHGDwPiJ6V\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">MapReduce<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/KaJcV5Fqh20yBDKekbMJTzEReMsBJBWVO5peeQI1CmRHnNcc6TRz7LENZl_WNHq91J_waurvYgBmOr3HT5BF8szbH2F121VO4osNC4eECWd0yoEniluvosy3fWYpw__T6qzkt_vF\" alt=\"\"\/><\/figure>\n\n\n\n<ol class=\"wp-block-list\"><li>Mapper &#8211; maps to key value pairs<\/li><li>Shuffle and sort &#8211; groups and sorts the key value pairs<\/li><li>Reducer &#8211; Reduces the input to a single value&nbsp;<\/li><\/ol>\n\n\n\n<p><br>Below is example to find the no. of movies rated by each user<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Dcr_ZeUUUWPGuuejMZBo7lvmbzJc0Z_2XbeAyOwt5zrdmuicGhGxpqzoX9bXE2A3mEy8rBb8Trk-YBpvBxbZKR-bdgwyjeYbOhPspW42jVfbYgpb3pXnfaTswRuPOt0gGqtysD1_\" alt=\"\"\/><\/figure>\n\n\n\n<p>What&#8217;s happening in the system. Client node talks to YARN which talks to NodeManager which in turn talks to Node and they output it on HDFS. If Node goes down then NodeManager shifts\/restarts process on another Node, if NodeManager goes down, its handled by YARN, if YARN goes down then High availability (HA) replica of it takes over (as discussed previously).<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/yfCxCDmqxk7wvYyj__xtBgaGQ-fy2g-GUCpFcsMVlSuQzqra1vC6a9OkuafMP2TepaTe8dPEpQspDTVpJ05NPxIzfjG-MZNb_NGtW80iX_oaapAhq7Y8eOoDfIF3x2-MMxWlShEH\" width=\"624\" height=\"335\"><\/p>\n\n\n\n<p>An example where we want to display Movie id and count of ratings for the movie. (Sort movies according its rating)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/CWdJM_dW2Kl6Theci6jQeE8UA_ufs52vtej5Nca2srwq4dw-woA83znGi_5yU5iTjIml40mY-s33F4lr7KhlTPaBiRlXNMG1MvaqZx07iqlN88jbEaePBJcsRiBCcz25S-dgl-F-\" alt=\"\"\/><\/figure>\n\n\n\n<p>The need for a second reducer is that we needed sorted output (as per the count of ratings).<strong> The map step only maps but the framework does the \u201csort and shuffle\u201d on it according to the key.&nbsp; Note &#8211; framework takes everything as string so we need to zero pad the numbers for proper sorting.<\/strong><\/p>\n\n\n\n<p><br>The key being movieId in mapper would sort according to it. So, we change the order of key and value in the reducer. This ensures that &lt;count,movieId&gt; sorts in the order of count.<br>1 , abc(assume string as movieid)<\/p>\n\n\n\n<p>2, bef, gef<\/p>\n\n\n\n<p>100, ded<\/p>\n\n\n\n<p><br>Now to display it in sorted order according to no of ratings, another reducer which reverses the key and value, and in this sort is introduced by sorting algo and not by framework.<\/p>\n\n\n\n<p>Abc, 1<\/p>\n\n\n\n<p>Bef , 2<\/p>\n\n\n\n<p>Gef, 2<\/p>\n\n\n\n<p>Ded, 100<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pig<\/h2>\n\n\n\n<p>Pig is a data flow language and execution engine. Pig has a scripting language pig-latin in which you can write map reduce steps\/functions. They give a good performance too and it&#8217;s easier to write it than map-reduce using Python\/Java.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/IV0jNhmhJ1b4dq47MmHr0oef8cS8nXpw6DSyhhdbQrXrtbQovITsCqpd0O_ORbq6KaRwf5tvTFXX5l5Ej4TFvaqo901JpdXimvhhkesPbKXlizhpljsIXn6GB6xK6LDFKsAlSTXb\" alt=\"\"\/><\/figure>\n\n\n\n<p>Built on top of MapReduce and Tez. Use Pig by 1. Grunt 2.Ambari\/Hue 3. Script&nbsp;<\/p>\n\n\n\n<p>Example of getting top rated movies<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/DH__j2AEXnEgcFyR27H9yA_niBBMqXpnRra9TB8sLk_CCkT4DvingruCx_5QZBc7PbBs7cMLVW1BypKIDi9KHmHPaIYhTBLLqTwTnxVgjUQu4fR1ddzsfKSbsV2mXF0rB0F0ypWC\" alt=\"\"\/><\/figure>\n\n\n\n<p>If you execute this with Tez integrated , it runs faster.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Spark<\/h2>\n\n\n\n<p>A fast and general engine for large scale data processing.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/T-jIAfosJflYskM27PU4xSOU-cpK1aGSfWloZvLh0NiGwg4sLaQm4iE49JnKILhjQQRBhXEJfREV0O0EgruieoIOGsdrrC_RL2Wh-sAYeiCxVWs9G9rrtn6OB5ZbGQkQ2y82O2S1\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Cx3U7dD3h0IihCC2_hqtltPhkrf0zsRQpdmIxe7RkMcAMGPTP6oUeoGonebInbetwHiy4XAyN9ENFG6xAubhAFCUUlLcpY41ILedlDqQG8DDkuvIAMciYE6V-9tFtwFZ2BUUlqqF\" alt=\"\"\/><\/figure>\n\n\n\n<p>Creating RDDs<\/p>\n\n\n\n<p>RDD is a Resilient Distributed Dataset. It is a data set but specialized on which is used by the spark framework to distribute it across various nodes.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/75x4cQ1mPfkfPZfL4kOZYJHS91L94D-2A5YAaoYn4yCFbO_Pa1Sy4hSepzkIxwgJB8aFr2QY0ehN3ecdPIZILhLIRFLzKzK4D2QWBockOAS5Lna4jcQCRLJXuJBIUUkvD4t00Lyj\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Hive<\/h2>\n\n\n\n<p>Apache <strong>Hive<\/strong> is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/I4WppiLQJZSQcJbKx7OpoT7AEjB-b7wPynu-EG7NAu5U4z7wtdpJ7C8_HPp35G9uepyM_oVmMnIoIjathEAz5JO_-REXSy1Lw4h_yg3fgs11QwMZTqiY6oqtmj1IoaFpWSf8q9yq\" alt=\"\"\/><\/figure>\n\n\n\n<p>When to use<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/rEE17CaToUVyag6xn_bLJ4lUKOFqcUwlbgB73Ul8V09aVNW27a8eAK7yVc6-qzv--pSjG3mZOnfDZvy3B6Gm1bpxQIOR8vVzTCnAV27LAJIytynEwQP624tbwvy64l_-Y1znHobX\" alt=\"\"\/><\/figure>\n\n\n\n<p>Why not hive?<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/wA7kREQH7pYbfLApIQ63re-HQ-kdbFBdtnhnz81NS4R_HZCtzIQC8IBjXm250Ate7MR0ezrB-1tPkTj_ctN01Z_DfhzYRU3J5uABvdR60VANHfG9umzYXXwwg0blSFAEEYKFPSfX\" alt=\"\"\/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">How Hive Works?<\/h3>\n\n\n\n<p>Let&#8217;s say you create a table. So it will store the table data into HDFS and will be stored as a delimited text file. There is no structure information saved here.<\/p>\n\n\n\n<p>Besides this it maintains a \u201cmetastore\u201d which stores the metadata about this.<\/p>\n\n\n\n<p>Hive uses HQL to get data.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/knHc7JclLZsAkHtEupwAxPuO_m4ubrmo0i-ERuvxoV3_9Z15ZQDxqNrfCgMG4Sk9UxGPJma-vjob_-Q4TQ1d-VVfRnYvrKq1OryYtsGZRzoYAFAwg3ZI5IyWpkOQytHrvRoFSl4R\" alt=\"\"\/><\/figure>\n\n\n\n<p>Load data deals with BigData. Load Data local is non big data.<br><br>Managed table &#8211; hive takes ownership of the table. So if you write a drop table then it will drop data and structure.<br>External table- hive does not take ownership of it. So if you write a drop table then it will drop data but <strong>not<\/strong> its structure. This is useful in cases where some other Big Data applications want to use the structure of table<\/p>\n\n\n\n<p>Partitioning<\/p>\n\n\n\n<p>Hive works very well with partitioning. It will manage the partitions separately. So lets say in case of address it can very well partition with the country name. This will improvise the query execution time.<\/p>\n\n\n\n<p>Alternative to Hive &#8211; <strong>IMPALA <\/strong>(Cloudera) &#8211; faster than Hive<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">SQOOP<\/h2>\n\n\n\n<p>Ingestion framework . Helps to import from SQL based DBs to HDFC\/Hive and export to SQL as well. It also uses map reduce . When you give it a job it uses several mappers to get the task executed.<\/p>\n\n\n\n<p>Import data from sql to HDFS<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/1GoEXwy6O9Oxk6S_8izOL2GCGe1Ufes_OsK2SPrnMJgBIPhRGf6xTywxi4xrEGpFBxKsPykMmKixp5sYX1LJa5PwETptKlTGMcFiE8SOcfkAZ9-BjWky1lkIe7HzJ2Ziifc2VkuO\" alt=\"\"\/><\/figure>\n\n\n\n<p>Add &#8211;hive-import to import to hive<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/WQ3UNEwI39yk06GvlAevxDc0-sJtx7VyoXz-VbtXRfRF4S3Ik7O11MR6DTLD3HTbcgkFs6yEYM_56_PVo01SWaZ8hu850irxlfveDaLcyluI-SFwBKLyH9-AY8sn30aJOJKFe1tV\" alt=\"\"\/><\/figure>\n\n\n\n<p>Scaling up<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/ipflYj-mrM08pKL0BzqQHmdHANLfXBuNI4BCVx5mc5VLUR33Z9MbihgAzbXUaB8y_6pM7iMwBn1KpYHnNKkEo_sCjf_g7B352twKt-nIpXnPDuoOEh5RqHiYpH0OPxenWTnQa7nR\" alt=\"\"\/><\/figure>\n\n\n\n<p>Sample Architecture with NoSQL and BigData<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/xwdspRAo4l7DCNgIsL8ZGc1BjIQA7mzH-Ig_T8gA4BglAPsgxBQP1hxrMv_-micyhge6qHn7m41aLTQJ8lzkhVGnFKZWkfWkFCso-CpOWtJHWPhW-fywRi6EVKSDV_1swtwBbtXj\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">HBase<\/h2>\n\n\n\n<p>No SQL DB made from Google\u2019s BigTable<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/Mtx0lEBf-5hwm2K01xz29T8BOfMuDLdkDJFx27WCdRXOwy2NFRup3ltvZ8tSC6tDn8kgA8FDppwIaAkg3LoAvu821lkVcNIcW55gepWcZZFjpCIIaLu8c3eDv4s96CyQeVhKCLXD\" alt=\"\"\/><\/figure>\n\n\n\n<p>HMaster stores the partitions and information on where is what (on which region server).&nbsp;<\/p>\n\n\n\n<p>ZooKeeper knows about the master and can contact another master if one goes down.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Qca0U2k8kq5-xtJj2-NV-U2gOpPoYD-qdJLRwbw7fNLIqr8an-DNYvhETceWB0itcpgSW2xiQd__8FDazSkMKR5asHlqc7XXtL9LBQHBD8tzoxbqwbvDi6U25QL63q1Csn1fB2BJ\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/Y4pa5j-PTHlMO1kCA745En1BEqdWC7XjuvWtO9z_gsYQrgeoZ_f3oPvjsYlhhlrW80ya1m5mlXcBk0sucoFoNIniZq3n-98y8Zj-p6jxguywIdsgtMcH_Kh59P6DrmbSS9RJtt3r\" alt=\"\"\/><\/figure>\n\n\n\n<p>Column family can store multiple columns.<br>A cell can have multiple versions of them based on timestamp.<\/p>\n\n\n\n<p>One can use Java\/Python to populate into HBase. If the file already exists on HDFS then one can use PIG to load data into Hive.<\/p>\n\n\n\n<p>Alternative to Hbase &#8211; Accumulo (better security &#8211; cell based access control)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Cassandra<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/9AZAvOhJ5xUoaS4TCxioSN_hFMLs-Vv9Yn5chaoEY3HlrLepi2MSZ0O85nPPADBXhgyk3Nx6gpDWvSHAgnUg0m1HHb6MFrsszhol-Kmat2ZT_jf60mxJtxoiekS1Vor6ayWZr97S\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/jIoxhzHAfSzzyeIpBeOcgmOrQkY76BOdzhbZAX5a89Ja2I8s62ZJ8GBGubJ88g_XC-KnYXC_nHUViOPg8UD5_g1CNeBmiezyz5jsIqS-cF38HwWNugfWN6voMMI2YBCRmRGarvDP\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/d47E0rOGarMECj_su7sC-xLF3FS0pIgcOov3J_KRliRceJsBebW7G9Z1THiWNxOnSaTtxYydGD7zfgX2nOpJ3e2PxP-mpuvnrZrwzbEyicviGY_eBOS0KCjWVQSODWn_9iIbAssC\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/Qn6a684KBG5pDX_OKM8YqZVl1uCSBinFInVtC8InurP8TTv085Kod8Bl3PnjTCrBSZ7rTAAJfjnXFb2LOB0k5q7VY5iQhegQ2MiEYF1Nc6RSLsOyAyQpb5pdGkUdqe6ENBNTLMHB\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/95fRFISQuBTagnNcbbhOHUuDhjmVM9McgEKA7HJ4GDRsu7EBP1V46gOwShOI372fsAbiybyO6W5KNV4EJMF9Q7RTqYGKlvHkqm8hQBgpQzzUzOK3Do8nC8b5OQMwkS_502vISZty\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/DyMdvbkQ0JFnUU2ubtpW6eu4PuUWnZDxAtmg2MB0rz7UvsOknjNTviZEfaWZqQt8LX1lsxvE5dcKdH5-QkO3WpTPxUylaMExbUdGPo6fBaibkZ27ubEG7dyYwE5ykH4ObFC5wkvt\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">MongoDB&nbsp;<\/h2>\n\n\n\n<p>Managing HuMONGOus data<\/p>\n\n\n\n<p>It favours consistency over availability. It has a Document based data Model so you can store JSON\/XML.<\/p>\n\n\n\n<p>No schema is enforced.If you want it can be enforced but not a mandate.<\/p>\n\n\n\n<p>MongoDB Terminology<\/p>\n\n\n\n<p>1 .Databases<\/p>\n\n\n\n<p>2. Collections &#8211; Tables<\/p>\n\n\n\n<p>3. Documents &#8211; Rows<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/3Ac6J0WFacULQYlDTnyzPPPgavpsqee8yD0GBe0zBkXOf6qWGXx4MGUWMKvFkAyPFevdY6B5P1H-mQwbg-W2rECSGkzSSKymC1S6VoYExNjZUNzecKM6geHBNg1pJ7CytaFmOR68\" alt=\"\"\/><\/figure>\n\n\n\n<p>Sharding<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/QHjsGgho1mQL67FlS3ESeajCehrqMUKQiMbP4WN3ewoT8m37i-xXFbnr617ZmouTcid1I01tjJBMg7UagPktK3e_9ie0tzZYevc_FVq5KRxHp96Zli5wEx6lOGEPMo4gQZN2q4yX\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/qilv1Eui7nn4-r8kOcgV1K4dg4UpBy5kEJqXw9Rt7LQinyBLTdP7cRLBqSB2Ufb_T8mBq4DkgpIvG-ZIepbPsCa0byhS9wpriTG-7Y1e6dXmxOKzu2rBuhc3Pup4941xFH6c0y1d\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Query Engines&nbsp;<\/h2>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Drill<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/dBC-0ySFfd8uei_KJVypy0PXI8O1oUxcH_OlrJuxw66zgZ7gevANxmbHcJ-kvRX0ZyG3D0al20wPyk7H86FQcZdUTKaPIY_J-6PeFPrxTV2y3GLKL423I8K33KBffLMmuifCtViT\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/s9EjHMPDYpQpD2DG0HSNXRha2vW-KsBf649o3_FOlCaGTLMhxII5-ikjR4aSxVanxt7jX1A6hedm2gAsO0EF8kEZ2djovc5CmNo-dAJEhuFCCTU7sfhfJxxDvJlaF4XTAZh_r7xy\" alt=\"\"\/><\/figure>\n\n\n\n<p>It&#8217;s like SQL for the entire ecosystem. You can get data from MongoDB and join it with Hive and so on. There is no learning curve as it uses SQL data.<\/p>\n\n\n\n<p>So why would one want to use Hive, MongoDB etc? Well it&#8217;s not very efficient with Big Joins.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/a_cHaIUpoQ03DaC6xXsSjp9iD2A0xzfloOVcbRRAMKnwkjeDiSNlAdQbAmiBA1MUpM5JwiZSJAEuAAOO6n9x-5TAVk1pH3ZscETZ_DMXdC-LydXW7Q4CtEL0WxhxovNLu_ZHXhL-\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Phoenix<\/h2>\n\n\n\n<p>It&#8217;s a query engine especially for HBase. So you can use SQLs to query into HBase instead of writing a Java\/Python program. It&#8217;s quick. Also connects to Pig etc<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/PZE5ZjrnwjXwwTMgNkwJIcnjifTDBNgdP0ANOopPpADeus2leTCl4L6PIoaMdLwiQu9fRYEwBXzYy8Yd1CRraKUu62eXtuVUnoFLErXZcNg4BYq1xjYrzXsJqrVZ3QQCkNPTqDIU\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/t_9llQ6NuXErutQwe1bUazk78rZ1S8KBQ6_WcqbuC4VQ4v3oC7eEMOXgKAjtr7yXBiPPINOUUz1R4mvEarnQ0ft7DcAo2kvvdlldhHaGhlPW1_5DUIjUWXcyxx5AVQQN88PUHOF5\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/bSJtXZ4JHmQk68xlcffTjAbOTGTA11BDEB71owCjNk3jBlZ_kI3XycDvykdD3QmoM4o--aQnjkQ6Vw81YVsvPErwkR3Z59p-trl4wuzpSunU1cbwUurYgizhPr-39BM2ho8APv7H\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Presto&nbsp;<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/uuxlLMv5tnNdYNSv8UWzG0lm-FDard8O-LhMdD-TEascTd5EAKRIudtUDHiCmyZXFAVq07eaomNor9Z7hUoXm6WhKXh9YAehNXoaOSWm0ceB5NQKi5VVrdEspoTqSuhN-f4ZKWTx\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/AMnlbaxwqXTbaMoS-HBBDndtEx8-BU6KgnPMFFo4xRlQMhGGFZJWto44t7gjg27EnZ-0jzrAHAHAwbj3w_tTntMAukYY4EOGPe67omr0Zcqpmpp53x1rFgn0F9Ck7GjYnSy-z58O\" alt=\"\"\/><\/figure>\n\n\n\n<p>It&#8217;s made by Facebook.&nbsp;<\/p>\n\n\n\n<p>Presto can query <strong>Cassandra <\/strong>too besides others !<\/p>\n\n\n\n<p>Also, the UI is very good compared to Drill.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/BoEnx4PpSJU8rEUdR5tnm_3cvvjIUdVc7Qjn0wKwGy0I5XeaoOGFV9pJyLLzilHnuM-EWUGUMDuSY8BFgbEgnWywQXQkfsTCDR897PW8--sYJPjdqGWkLiUT3e6lz-JQwWXoSd9C\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">YARN &#8211; Yet Another Resource Negotiator<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/oH4Xe3mHLzRjmFT-3Hxrz2kuuo1sBxN3uUzJRIRz_tcFNlVRCJ41gDCrWeVM1giouLQvYHmMzxiBw8Pa4v_4-T96drxPo--IklH-NqJrmXQGsji3G7Z2L3DjdflaUczTXRw_nHyb\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/88uof2uZqIspx1ACyhRPtRIdZv6CnOwYKRXZ507j_pSfwOl70ZvxAdiii5kIs2azEj9iFrH6sC-2PmJS-AxMY1_My7bibAUKpruNrnkA8PoLYD7_q5PwZqoSi9ovlvt5TG43nsQf\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Tez<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/KdJ6ojdH5u7mxM_xEZP6qfL1qtDc5WuIakWbK0HR_UE3KIn7Aq2M2LMT1Tb6Fanp9EWcLKrcihTwrhfG2ahXmsAFr3XJd0ySE0469Ehn2ZUcREvVJJGyiVyajYg_UsDuFtw4izKq\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/v7P7CE9jQQeaC3KwXXLKhXW0Iw3hL0SbjgTp8uHh3ua-AKMlrtXCAJHPz02_DgyUIxsWUlmb3mzW-1yZHwEdAvSX2mlw4-Vd493n-k1qYtTrwKqlOTV1_5TcrN-KIDgf97NWXxFL\" alt=\"\"\/><\/figure>\n\n\n\n<p>Parallelizes steps or skips steps or removes redundant steps to improve the performance over map Reduce.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/-UPbeHWY2J-HgOiHzA5zCgUg2Gs6p2rXZ1NrrXqPPsQfCIVAvoluez8--Y4wuAx_dGCIhfvRPoPW3p4nF1OauJRbK5udAFJcgh3lQ9E-WrKTs7jSc_xXaTZg1AFpMvAaZ1V-hLFo\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/gxjdSZCk9nSg0MdHj5L5myAT5jFuZHH_neibvaMATiwvKl8E2xkJIay0XZJaa7mj5lVKBV2truysrMYa8kb1FWah63UsjyQSMuMLoxhK2hBmi17B0C-wtsbglEtsorhQxpMh_S0w\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Mesos<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/2JfRip2akUTleuGujndddFBW5OlN81NQj4Ip5t84rYjpASmB0mIiZOPL59CMw3c-2uy3_5g86ALORIF6waFuB2BXFznLw1-HNhw_dF4ICxVRjJEG6DGp32AfI2djKlTKQxAF0UQS\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/uzGNVN-0oBZbNnnMMVcK3yqJwNANnndOwNSU2NRukyV_Wadskgu4uGgPTrzZKLhwLc2iEa41JYBcThzyME5k8qSTO8rjggZYXjhHs1f55wvT7YUpYwy8h12cqGpMwGtbMone981a\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">ZooKeeper<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/zyo43aBnTntt7LHO2WXZsH7HBFf6SAtkIS8lt4uw2f2mO2orJ4NPHemfTwk1PMuaviXbHEffO5G5tEPigMRFH5sbtkTuLosMcKPZ72ImXzCCwMDa5RR9fpAtyWsj8ChXqdH7Bq-Q\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/aJmUuFf08U-ozffgiM9Bp655_uIIRH6a0GvrhPK4i0NIX4nXdqmQrhFDOfC_sFgWhIpg1iANqqyfAp0R1cO1dN-gqh4_xboJzmp_dwF03wTGtXuaehZCGXaEq3i8gmUOEFX0Llv1\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/dYTPfpV78qFCjf6w5JWqHvPdwLLHNuUY_HMWbdTm3upW7SeVfsAEIsF9IfrMXfAeRQ4Bdj5QCtkYs8dtw1PZ1apAkEL0RP3OOURks1qOuKAO2xcQJSvL05ACuwPJMy-H1Tgfp2J3\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/feAkSt9r8b0YzWaLWLYBpSQ6zJ5loIF3vMx3tyWTQMNIXfPPylu5TAk0uxCOCp4_vQg_BLMROoEtY9EGcf8oErD9evPdzh_LPIzyt_LlEPCeCK8fjt0ir1YDs_vRCmxvq-aSPPnD\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/YM4PrJy0FxPoxzBd2QmFLyJNHdEsjxtgYJ8wymyrWnvDUt2CMV4lEhoQRs8IudoYGwQz09LsadC9IUHApl1Jk4Zm5z6kyA5OsO6F3wCmw9kMt7TYVPMdXt6wgc8aee-5PZP0two5\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/4xjgCToaVBj1dDRQ9krHxSF94KnQwKUcEqM7SdjEgYSJjZybDaJfNUIWWIoJwr5cRYqqmVs6CZbb5qf8o_nsYYwCohCuRQatlfmkW7ui1V30vEQ9XgyUfQ0Iyer4ABGvnt7NPxDG\" alt=\"\"\/><\/figure>\n\n\n\n<p>Zookeeper favours availability over consistency&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Oozie<\/h2>\n\n\n\n<p>It is used to make workflows. So lets say you need coordination between various individual jobs like SQL, Hive, MongoDB etc then we can write an xml for the steps to be performed. Each of these steps will be coordinated by oozie.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/d54wzCuSmROdW3slhH1VW3gqAqiSXetikBvpTNymQ3nJiw7_-A5jvjS3FhrHd83NDTx_x7hoNNeaRrRCOsoH3kdGHu5ARcvmiapV37UeXIi230EV4gSOezJ40av4UCPOd7rO7tR_\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/NGVnwVdMxqhkTnKrHpmaOSs9tYQbsjFYPiYSswMtFiZjIlWtV-TtdDpk3I3MajB9CBKY86snQxKSGyQhvteijiCKRr3DgRNKTfoEYDZeorIc7EU7p3gZmL1o3lktHtc_eiu8QhbU\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/xDCJriU7AGAHkXwpP3UcddvYoC435XlNwEcY2KOTXjUt0VpO36WJBUy8EiyfZL-roGx-gT22B02Pobecba0wX4mS3TyiFMa53a_efYnhq61dOcFhNruORUjCazy-micyfFF0lmLI\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/N0haBnVrDAGTYBSNIWYB28ax7AEJHnr0GqXZYthuSmVAMXtU1OoZPy5QPfaIWabx2WUQ56Axpf35CYs89B8vt4aO6Wb0ZG4ep4vMTkfroRqmNJyN1io59lEXEgrCoMLShoFzHApg\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/NptxvxwR2Gx2rhzjqkkj77s4ikP_lK79ue82jEjEQhoB_upHB3oFGCc_x7Xl2bxrHMmQau6rBkcPgFZEkgG-Qtzg8pdR9x4fLYUU7ezfiQd0isOdTBMdAcDTAoBsKNhzFLpwGSun\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Muube0RnjXjTeDPLud4vJpsYpSQ4q1lCzWc7R_mHqp12Dn0ed-lqpkH2BEgxCB7qAZJ_fVIPUBkI282C8fXor_I8G_rIIUAXrxC3GkdTRyiW7c1T0k6nxWYle5UK1-CrgBBbseVd\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/S8NfS-Fys8szaGoOWuDf34CGxwuZQLF-QOjdwHsl2BeAV0uH7goDN5-pciL5px944c5X3V4e6MTybfX1ZD_kcvu0r4Q86FPITkwIpRgpI2WH3yWWAkwFv-Lk7WK-RLBLtfczOiS3\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Zeppelin<\/h2>\n\n\n\n<p>It provides an interactive way to write BigData programs. It can connect to various technologies and works like a workbook.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/JJWhgtc2TBq0kwMAFF67nZOAiATzNQrjGBAFKcmkGowMumRhWHo77ZKctJtQC5hJnmiGs0pl7F8G2B2eEnwC7ZHhBo2rCY1rxUuAfqS-Fd5AO83gtEfLjOtA1zLGXiYTsRWP4P2t\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">HUE &#8211; Hadoop User Experience&nbsp;<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/qleg6JWPAFv7JpRzPgHo_NyqPxfjw4_JdnNp4jsFpgxfywS5sPTDVykwFZG1TBiBwwY3MIdldtPfCVLY5jImwxX05GW_GQa_JrFmzEcpirriLtqpf9oLqbyq6OqfnxT5yM5Va1HY\" alt=\"\"\/><\/figure>\n\n\n\n<p>It is not open source and managed by cloud era. It has an inbuilt Oozie editor to write workflows.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Kafka&nbsp;<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/--Lqs1g61Y47KbvohLsFsOyNSdip4jwICcWH9aQkbYxn_td2XdhaFdNd90qBFO_MkCzmMKW54mJ8u4g6Jj-Lfqa20JVRdfiIMf-5C9OuH4qNm7TgVElmyNMXndK9vlMwhaqTW3S0\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/_7O0Ntmx8731xKOiVxPCO87aZaJch8rLpih_kO8mJ0BaIqFmOSKIkjqr6kpJKJ1pnYkG1SVIxChsfYERRQZQnvBu3c6smka9F1jpub6UfljNFLjXXrQZyl01Rx_Qtb7Kbc_E_4Tx\" alt=\"\"\/><\/figure>\n\n\n\n<p>Alternative to Kafka &#8211; Amazon Kinesis<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Flume<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/Zmx1Rf1MMIyKbjOyNRsf6uhLiDGbhcyV1SuFK0olXWty3UYK_GMOpuXYiz2T6fEcLZXjIrsTB4G_7I4bBOoumcrrxQ3RNQJnaWzJYyhFtez84NC-EHeocRkj9VZf9mZyhMfNyma1\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/ljR5qJK_9u75BlGa3eC7GwrszdnBrfkj7EoRoaacAzH0XIy_H_vzlL8OUNoYOvyEyRYT3tHvdnBARzxOPdySitv1FSfpQTuUdve7L4SWu8pFZr-RvU03CBC70dTMvmyCS2GwdksL\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/zz2AMSHSvxfNJlL7bOEcRwteZSP14eFLn22RIlXw9LmkZTL6F0YSOqEcKDosB2zL8df82b-ZuXE4M5W0TpKjBk5mEJ6GU9Jb4I-QUV9C9trZyHSZ-yNB-BgFBVV0tqY8HDHVHV9E\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Spark Streaming<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/IUAIQLJKV8VRT6CGkeWsnobx8gV1N34d3m7ornoziC3Eass2hkvozg4KMSDJCztTm9XvZ3ie4F3OarzFOaW55Tloadj8Yw_PAsohNhqyLj07CU5Z2cYGeV004beHdn2d_0Qy1Ov1\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Qqpnv6rRp3Ylslc4Qbgj1_eiYs6JnMLF27PwxAHKspNON3OaFCPVtTMJNXIS17cGitR9K65qFY__q7mPVOY_27ze6KFP2mN6SuLor1DUIce6TORl6ozHvq5_-FMp45yRgKWHqzYn\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/C8geEE4EMmdZjsAI930d5mjPU6trFHmoRTh54Vglh4E4r4YqdFPC-hQ835jERrIUEzvNZAjYtSl2u-KtTyizQHGLsWojpu2FZ8oSpJMDb5mhbuS9w2hU0UIMqbcRKHMguT7VZXLx\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Nscwh8w08V2DWw6nWJsLqLbd8NC1XiwFRfoMMbM-A9-t6IJMA9EEHpOvRg-Ph5kD2euW_UW0o4u6sllfq9kDk0RvlufrFLHrobkYpXENaXMl4q9_bhg2gnDhJlex3YrNVy3OCEiX\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/WfumuQbAmLGdHVNeNcoUm9mfDLyQm9ozQz_SFL4xs-dQxtxIsfXGBelTMuPY-VdUyLJYyDdekNsYwJKH5xYBaK6mstEh4qGXdGKDeNdqLne4uCxAOBKij6WLowLmpGrwSCoRveV3\" alt=\"\"\/><\/figure>\n\n\n\n<p>Structured Streaming<\/p>\n\n\n\n<p>Uses DataSets instead of DataFrame<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/XV0FUoTt7-1jchcudCVNZyz8me9k444VUqui9E1v2rBKnk_4M44lcW9TM6k10cWTZXSqffsSB9MsK1LPstAEZ0mZuOMeic5enpCFYxkHo2D3J-BEvemptGg-AtKaITdQZL68t-tF\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/ME1zFW7p2f26SUJ5z76oq448nUNpSjv_NKdjK0RaT-5KlY5gYdWhsqzMlBadR-SV7LGuojQSHbir0no6FIcERyCqgyqqP0G7YDEEK4ClCgapkslA8hJhw3_CuAAtdV5PjRmsjksc\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Apache Storm<\/h2>\n\n\n\n<p>Storm vs Spark Streaming<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/2UrxVXWwdMRGJm3ccFzvYGWcuaTe2GrtdjzdY370vtBJ8XTGFjEDaoCBB-aPOkjdQvwtUWeWA6AJM50-LBBB4WKyaenWTRR7V8MQPwjfyWln81rr2anW-RMIYvA2wug72bMhY9de\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/WQhXWOR22QnFoM653gKLKPPkCdzMMZ78UyNUC0IX_cg336QqAnbwl7-cPQjmbzDodYvK-nDn0U8MRmhV7uyMtPhWMtkshqGDqo6XVGEmmQN_7LTQcyP0Lzezszz5o0XuOGnyX3Be\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/GOxhV9yHW-hPySx_YBt8gWkfhGNQZt6_ciF5p4nOp7KlrXQpGm-xpFwf3hPNOu0KzCJ-ZAhDy-7wUdwsGiitwn0J4mhvD_RlXjmj0qZ3aSuYt3DiU8egPRCzY0V96rClGGwsecQg\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/nNp79FbRGQR_y2b-X8wjfiPbdNJs1UP9OWVeNUZ-e1xto6HpdZQcpnes_XM-KQbOmZrF_LVCjeU5fnIRNUhvyaScFernTC3TWM6NabTWuCCEsyK1O8n0MHe74mtX_WaVGv0t6wp8\" alt=\"\"\/><\/figure>\n\n\n\n<p>Sliding windows &#8211; have a common \/overlapping part of the processing. Example 10 secs is window size so every sec 9 secs will be common to the previous one.<\/p>\n\n\n\n<p>Tumbling windows &#8211; it has no overlap.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Flink<\/h2>\n\n\n\n<p>Can manage both event based and batch based both<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/LoMVzEvSu57wp61zu5N4DZLKJRHkNiq3ZWUlVo42rmI4Cs--HR3Wy04Ki6V2XMGGxzRrH-PZCQ39m-46tM6LLOB3HPMYU1-YIfN9EKhydKNvQTjobQuMk848Je0aDdqXDvgSbDkr\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/XdglGVMqL9Ne92On-AodRZYw003n5zHTSbZ0czT9Orp4sP2QntEykq3N7Tmjx6w0QBiII6B3QBE3AkbW8bpmGCclgRMZlHmYgSpkmpP86QMcKx6KD_RxubHtn-Y6-lSBaBkz1j-v\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/gApets57hPXajpbayQPjn5uimVvAxWPGmeAAUEr9Oq8edwPNcqUlAtJZy84d2s1XDkQYpHP5DTSEvmrnhsePUWy19s6tJWEUe2IfvFzAZRNg_dalMmNYcp9rBn6DsbE7XUaxVpJ0\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/7QTkwEB_GXltggOb0OFBzDxPXYk0VcTnWwvxPRsBOBfA6C-SIfS9n7SGc8V44b-X4yx9-P9CnTi2eyO_6nVXlHWtCoSQdWipatDwXKAdYIYzKqk1ALTi4Vq0FR3OHRDCdzSZwzgQ\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/fDBHV8S7RjDoxwzz8o6TEnyyVB4z74CufcT6pJQvIY7fMMkOspsSxIWfQBppNwkA1zoC7DK6PpLU4Ln5FlbqzCdbfbywqXXprwf8q4fjEXB4CVaF6Kow5YrW7H9jUJVLUCSQ__NH\" alt=\"\"\/><\/figure>\n\n<!--themify_builder_content-->\n<div id=\"themify_builder_content-1785\" data-postid=\"1785\" class=\"themify_builder_content themify_builder_content-1785 themify_builder tf_clear\">\n    <\/div>\n<!--\/themify_builder_content-->","protected":false},"excerpt":{"rendered":"<p>Hadoop&nbsp; The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS&#x2122;): A distributed file system that provides high-throughput access to application data. Hadoop [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[97],"tags":[],"class_list":["post-1785","post","type-post","status-publish","format-standard","hentry","category-tech-learnings","has-post-title","has-post-date","has-post-category","has-post-tag","has-post-comment","has-post-author",""],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.3 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Big Data Udemy Course Notes &#187; Gaurav Wadhwani<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Big Data Udemy Course Notes &#187; Gaurav Wadhwani\" \/>\n<meta property=\"og:description\" content=\"Hadoop&nbsp; The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS&#x2122;): A distributed file system that provides high-throughput access to application data. Hadoop [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\" \/>\n<meta property=\"og:site_name\" content=\"Gaurav Wadhwani\" \/>\n<meta property=\"article:published_time\" content=\"2021-04-10T08:43:35+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-04-10T08:43:41+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\" \/>\n<meta name=\"author\" content=\"Gaurav Wadhwani\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Gaurav Wadhwani\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"25 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\"},\"author\":{\"name\":\"Gaurav Wadhwani\",\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e\"},\"headline\":\"Big Data Udemy Course Notes\",\"datePublished\":\"2021-04-10T08:43:35+00:00\",\"dateModified\":\"2021-04-10T08:43:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\"},\"wordCount\":1578,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e\"},\"image\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\",\"articleSection\":[\"Tech Learnings\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\",\"url\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\",\"name\":\"Big Data Udemy Course Notes &#187; Gaurav Wadhwani\",\"isPartOf\":{\"@id\":\"https:\/\/gauravw.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\",\"datePublished\":\"2021-04-10T08:43:35+00:00\",\"dateModified\":\"2021-04-10T08:43:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage\",\"url\":\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\",\"contentUrl\":\"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/gauravw.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Big Data Udemy Course Notes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/gauravw.com\/blog\/#website\",\"url\":\"https:\/\/gauravw.com\/blog\/\",\"name\":\"Gaurav Wadhwani\",\"description\":\"Where I write \/ scribble\",\"publisher\":{\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/gauravw.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e\",\"name\":\"Gaurav Wadhwani\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/88929454012064ffbe95370287faa36b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/88929454012064ffbe95370287faa36b?s=96&d=mm&r=g\",\"caption\":\"Gaurav Wadhwani\"},\"logo\":{\"@id\":\"https:\/\/gauravw.com\/blog\/#\/schema\/person\/image\/\"},\"sameAs\":[\"http:\/\/gauravw.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Big Data Udemy Course Notes &#187; Gaurav Wadhwani","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/","og_locale":"en_US","og_type":"article","og_title":"Big Data Udemy Course Notes &#187; Gaurav Wadhwani","og_description":"Hadoop&nbsp; The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS&#x2122;): A distributed file system that provides high-throughput access to application data. Hadoop [&hellip;]","og_url":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/","og_site_name":"Gaurav Wadhwani","article_published_time":"2021-04-10T08:43:35+00:00","article_modified_time":"2021-04-10T08:43:41+00:00","og_image":[{"url":"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1","type":"","width":"","height":""}],"author":"Gaurav Wadhwani","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Gaurav Wadhwani","Est. reading time":"25 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#article","isPartOf":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/"},"author":{"name":"Gaurav Wadhwani","@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e"},"headline":"Big Data Udemy Course Notes","datePublished":"2021-04-10T08:43:35+00:00","dateModified":"2021-04-10T08:43:41+00:00","mainEntityOfPage":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/"},"wordCount":1578,"commentCount":0,"publisher":{"@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e"},"image":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage"},"thumbnailUrl":"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1","articleSection":["Tech Learnings"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/","url":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/","name":"Big Data Udemy Course Notes &#187; Gaurav Wadhwani","isPartOf":{"@id":"https:\/\/gauravw.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage"},"image":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage"},"thumbnailUrl":"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1","datePublished":"2021-04-10T08:43:35+00:00","dateModified":"2021-04-10T08:43:41+00:00","breadcrumb":{"@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#primaryimage","url":"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1","contentUrl":"https:\/\/lh4.googleusercontent.com\/hl_oFvZyR7esbGxzB_cLxNl4TUl5TaylI980JbL75_iBCO3Yss05AoGMlsAf_iFJvJwWrhDr1Jw0L26-vhiHG7pZo3YHlQEo0qjU-kjLXu2WS60QCDtmnSnY2EfV_BvKgaEaomd1"},{"@type":"BreadcrumbList","@id":"https:\/\/gauravw.com\/blog\/2021\/04\/big-data-udemy-course-notes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/gauravw.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Big Data Udemy Course Notes"}]},{"@type":"WebSite","@id":"https:\/\/gauravw.com\/blog\/#website","url":"https:\/\/gauravw.com\/blog\/","name":"Gaurav Wadhwani","description":"Where I write \/ scribble","publisher":{"@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/gauravw.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/9a05a9c3487f35f6b4577c6956cf252e","name":"Gaurav Wadhwani","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/88929454012064ffbe95370287faa36b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/88929454012064ffbe95370287faa36b?s=96&d=mm&r=g","caption":"Gaurav Wadhwani"},"logo":{"@id":"https:\/\/gauravw.com\/blog\/#\/schema\/person\/image\/"},"sameAs":["http:\/\/gauravw.com"]}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"","builder_content":"","_links":{"self":[{"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/posts\/1785","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/comments?post=1785"}],"version-history":[{"count":1,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/posts\/1785\/revisions"}],"predecessor-version":[{"id":1786,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/posts\/1785\/revisions\/1786"}],"wp:attachment":[{"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/media?parent=1785"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/categories?post=1785"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gauravw.com\/blog\/wp-json\/wp\/v2\/tags?post=1785"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}