{"id":1169,"date":"2022-06-02T06:42:00","date_gmt":"2022-06-02T06:42:00","guid":{"rendered":"https:\/\/ignos.blog\/apache-spark-integration-into-your-data-warehouse"},"modified":"2023-03-30T14:31:45","modified_gmt":"2023-03-30T14:31:45","slug":"apache-spark-integration-into-your-data-warehouse","status":"publish","type":"post","link":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse","title":{"rendered":"Apache Spark: Integration into Your Data Warehouse"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-introduccion-a-apache-spark\">Introduction to Apache Spark<\/h2>\n\n<p>As officially defined, Apache Spark can be briefly described in a single sentence as:<\/p>\n\n<blockquote class=\"wp-block-quote is-style-large is-layout-flow wp-block-quote-is-layout-flow\"><p>Apache Spark\u2122 is a multi-language engine for executing data engineering, data science, and machine learning tasks on both single-node machines and clusters.<\/p><cite>Apache Spark Official Site<\/cite><\/blockquote>\n\n<p>Under this definition lies one of the most active Apache projects within its <a href=\"https:\/\/spark.apache.org\/\">community of users and developers<\/a>. Without a doubt, Spark has become the most widely used scalable computing engine in the world today.<\/p>\n\n<p>Its massive adoption has come hand in hand with some features that make it beneficial for use in multiple scenarios:<\/p>\n\n<ul class=\"wp-block-list\"><li>Ability to process both batch and streaming processes.<\/li><li>Support for multiple languages: Java, Scala, Python, and R.<\/li><li>Libraries that facilitate the development of specific applications, such as Spark SQL and MLib.<\/li><\/ul><div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack-1024x474.png\" alt=\"Apache Spark Stack\" class=\"wp-image-734\" width=\"512\" height=\"237\" srcset=\"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack-1024x474.png 1024w, https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack-300x139.png 300w, https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack-768x356.png 768w, https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack-1536x712.png 1536w, https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/Spark-Stack.png 1740w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><figcaption>Apache Spark Stack<\/figcaption><\/figure><\/div>\n<p>With this brief description, it can be concluded that there are many environments where the use of Spark would be beneficial, but is it worth integrating in all cases?<\/p>\n\n<h2 class=\"wp-block-heading\" id=\"h-integracion\">Integration<\/h2>\n\n<p>In general, there are two motivations for undertaking the integration of Apache Spark into a corporate Data Warehouse:<\/p>\n\n<ul class=\"wp-block-list\"><li>Need for performance improvement.<\/li><li>Need for <a href=\"https:\/\/ignos.blog\/inteligencia-artificial-aplicada-al-bi\">advanced data analysis techniques<\/a>.<\/li><\/ul>\n<p>In both cases, it can be stated that the usefulness of integrating Spark will be associated with the volume of data managed in the Data Warehouse. That is, the more information managed, the greater the benefit obtained. For small volumes, the application of simpler solutions can provide the expected benefits, with less complication in the DW architecture.<\/p>\n\n<p>In the event that our Data Warehouse contains a large volume of information, an integration of Apache Spark can be undertaken, which will surely bring benefits. In the following sections, the two most common alternatives for these integration tasks are presented:<\/p>\n\n<ul class=\"wp-block-list\"><li>Integrate Spark into an existing Data Warehouse.<\/li><li>Design a new architecture for the Data Warehouse.<\/li><\/ul>\n<h2 class=\"wp-block-heading\" id=\"h-integrar-apache-spark-en-un-dw-existente\">Integrate Apache Spark into an existing DW<\/h2>\n\n<p>This option entails minimizing both effort and risk, so that benefits in system exploitation can be achieved in a short period of time. The ease of integration is partly due to the wide range of products that have direct support for Apache Spark, including:<\/p>\n\n<ul class=\"wp-block-list\"><li>Data Science and <a href=\"https:\/\/ignos.blog\/prediccion-de-ventas-usando-machine-learning\">Machine Learning<\/a>: Scikit Learn, pandas, TensorFlow, PyTorch,&#8230;<\/li><li>SQL y BI Analysis: Power Bi, Tableau, Apache Superset, &#8230;<\/li><li>Storage and infrastructure: MS SQL Server, Cassandra, mongoDB, Kafka, elasticsearch, Apache Airflow, &#8230;<\/li><\/ul>\n<p>However, integrating Spark into a DW designed for conventional exploitation implies that certain advantages may be difficult to achieve. For example, the design of the existing DW may have overlooked storage requirements that Apache Spark needs to run analysis algorithms optimally on multi-node machines. <\/p>\n\n<p>In short, in these cases, the advantages obtained will depend on the original architecture of the DW.<\/p>\n\n<h2 class=\"wp-block-heading\" id=\"h-nuevo-diseno-integranda-apache-spark\">New design integrating Apache Spark<\/h2>\n\n<p>If, where possible, a new Data Warehouse design is chosen that directly includes the use of Apache Spark, there will be greater possibilities of achieving better performance and results.<\/p>\n\n<p>Taking into account the &#8220;use cases&#8221; that the new architecture will accommodate, performance improvements can be obtained by adjusting the configuration of the platform components to the requirements of the machine learning algorithms chosen in the proposed solution. For example, the processing of small batches, typically efficient in Spark, can be separated from complex blocks of information that can be managed with better performance in Hadoop or similar products.<\/p>\n\n<p>A fundamental aspect that will allow starting from scratch in the design is to include scalability and elasticity of the system as an essential requirement. This will ensure that both simple ETL processes and heavier ML algorithms can be accommodated on the same platform, without falling into poor management of computing resources. With providers that allow dynamically scaling the resources of their DW (Snowflake, Azure SQL DW, etc.), it is easy to maintain low costs without compromising performance.<\/p>\n\n<h2 class=\"wp-block-heading\" id=\"h-conclusiones\">Conclusion<\/h2>\n\n<p>The scenarios in which Apache Spark can provide performance or analysis capacity advantages to a Data Warehouse are varied and heterogeneous. But it is in the case of systems with a large volume of data where all its benefits can be obtained, especially if the Data Warehouse architecture is designed from the beginning to take into account the requirements specific to Spark.<\/p>\n\n<p class=\"has-text-align-center\"><a class=\"boton\" href=\"https:\/\/bi.ignos.com\/\">More information<\/a><\/p>\n<div class=\"shariff shariff-align-flex-start shariff-widget-align-flex-start\"><ul class=\"shariff-buttons theme-default orientation-horizontal buttonsize-medium\"><li class=\"shariff-button mastodon shariff-nocustomcolor\" style=\"background-color:#563ACC\"><a href=\"https:\/\/s2f.kytta.dev\/?text=Apache%20Spark%3A%20Integration%20into%20Your%20Data%20Warehouse https%3A%2F%2Fignos.blog%2Fen%2Fapache-spark-integration-into-your-data-warehouse\" title=\"Compartir en Mastodon\" aria-label=\"Compartir en Mastodon\" role=\"button\" rel=\"noopener nofollow\" class=\"shariff-link\" style=\"; background-color:#6364FF; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"75\" height=\"79\" viewBox=\"0 0 75 79\" fill=\"none\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\"><path d=\"M37.813-.025C32.462-.058 27.114.13 21.79.598c-8.544.621-17.214 5.58-20.203 13.931C-1.12 23.318.408 32.622.465 41.65c.375 7.316.943 14.78 3.392 21.73 4.365 9.465 14.781 14.537 24.782 15.385 7.64.698 15.761-.213 22.517-4.026a54.1 54.1 0 0 0 .01-6.232c-6.855 1.316-14.101 2.609-21.049 1.074-3.883-.88-6.876-4.237-7.25-8.215-1.53-3.988 3.78-.43 5.584-.883 9.048 1.224 18.282.776 27.303-.462 7.044-.837 14.26-4.788 16.65-11.833 2.263-6.135 1.215-12.79 1.698-19.177.06-3.84.09-7.692-.262-11.52C72.596 7.844 63.223.981 53.834.684a219.453 219.453 0 0 0-16.022-.71zm11.294 12.882c5.5-.067 10.801 4.143 11.67 9.653.338 1.48.471 3 .471 4.515v21.088h-8.357c-.07-7.588.153-15.182-.131-22.765-.587-4.368-7.04-5.747-9.672-2.397-2.422 3.04-1.47 7.155-1.67 10.735v6.392h-8.307c-.146-4.996.359-10.045-.404-15.002-1.108-4.218-7.809-5.565-10.094-1.666-1.685 3.046-.712 6.634-.976 9.936v14.767h-8.354c.109-8.165-.238-16.344.215-24.5.674-5.346 5.095-10.389 10.676-10.627 4.902-.739 10.103 2.038 12.053 6.631.375 1.435 1.76 1.932 1.994.084 1.844-3.704 5.501-6.739 9.785-6.771.367-.044.735-.068 1.101-.073z\"\/><defs><linearGradient id=\"paint0_linear_549_34\" x1=\"37.0692\" y1=\"0\" x2=\"37.0692\" y2=\"79\" gradientUnits=\"userSpaceOnUse\"><stop stop-color=\"#6364FF\"\/><stop offset=\"1\" stop-color=\"#563ACC\"\/><\/linearGradient><\/defs><\/svg><\/span><span class=\"shariff-text\">compartir<\/span>&nbsp;<\/a><\/li><li class=\"shariff-button facebook shariff-nocustomcolor\" style=\"background-color:#4273c8\"><a href=\"https:\/\/www.facebook.com\/sharer\/sharer.php?u=https%3A%2F%2Fignos.blog%2Fen%2Fapache-spark-integration-into-your-data-warehouse\" title=\"Compartir en Facebook\" aria-label=\"Compartir en Facebook\" role=\"button\" rel=\"nofollow\" class=\"shariff-link\" style=\"; background-color:#3b5998; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"32px\" height=\"20px\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 18 32\"><path fill=\"#3b5998\" d=\"M17.1 0.2v4.7h-2.8q-1.5 0-2.1 0.6t-0.5 1.9v3.4h5.2l-0.7 5.3h-4.5v13.6h-5.5v-13.6h-4.5v-5.3h4.5v-3.9q0-3.3 1.9-5.2t5-1.8q2.6 0 4.1 0.2z\"\/><\/svg><\/span><span class=\"shariff-text\">compartir<\/span>&nbsp;<\/a><\/li><li class=\"shariff-button linkedin shariff-nocustomcolor\" style=\"background-color:#1488bf\"><a href=\"https:\/\/www.linkedin.com\/sharing\/share-offsite\/?url=https%3A%2F%2Fignos.blog%2Fen%2Fapache-spark-integration-into-your-data-warehouse\" title=\"Compartir en LinkedIn\" aria-label=\"Compartir en LinkedIn\" role=\"button\" rel=\"noopener nofollow\" class=\"shariff-link\" style=\"; background-color:#0077b5; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"32px\" height=\"20px\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 27 32\"><path fill=\"#0077b5\" d=\"M6.2 11.2v17.7h-5.9v-17.7h5.9zM6.6 5.7q0 1.3-0.9 2.2t-2.4 0.9h0q-1.5 0-2.4-0.9t-0.9-2.2 0.9-2.2 2.4-0.9 2.4 0.9 0.9 2.2zM27.4 18.7v10.1h-5.9v-9.5q0-1.9-0.7-2.9t-2.3-1.1q-1.1 0-1.9 0.6t-1.2 1.5q-0.2 0.5-0.2 1.4v9.9h-5.9q0-7.1 0-11.6t0-5.3l0-0.9h5.9v2.6h0q0.4-0.6 0.7-1t1-0.9 1.6-0.8 2-0.3q3 0 4.9 2t1.9 6z\"\/><\/svg><\/span><span class=\"shariff-text\">compartir<\/span>&nbsp;<\/a><\/li><li class=\"shariff-button info shariff-nocustomcolor\" style=\"background-color:#eee\"><a href=\"http:\/\/ct.de\/-2467514\" title=\"M\u00e1s informaciones\" aria-label=\"M\u00e1s informaciones\" role=\"button\" rel=\"noopener \" class=\"shariff-link\" style=\"; background-color:#fff; color:#fff\" target=\"_blank\"><span class=\"shariff-icon\" style=\"\"><svg width=\"32px\" height=\"20px\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 11 32\"><path fill=\"#999\" d=\"M11.4 24v2.3q0 0.5-0.3 0.8t-0.8 0.4h-9.1q-0.5 0-0.8-0.4t-0.4-0.8v-2.3q0-0.5 0.4-0.8t0.8-0.4h1.1v-6.8h-1.1q-0.5 0-0.8-0.4t-0.4-0.8v-2.3q0-0.5 0.4-0.8t0.8-0.4h6.8q0.5 0 0.8 0.4t0.4 0.8v10.3h1.1q0.5 0 0.8 0.4t0.3 0.8zM9.2 3.4v3.4q0 0.5-0.4 0.8t-0.8 0.4h-4.6q-0.4 0-0.8-0.4t-0.4-0.8v-3.4q0-0.4 0.4-0.8t0.8-0.4h4.6q0.5 0 0.8 0.4t0.4 0.8z\"\/><\/svg><\/span><\/a><\/li><\/ul><\/div>","protected":false},"excerpt":{"rendered":"<p>Introduction to Apache Spark As officially defined, Apache Spark can be briefly described in a single sentence as: Apache Spark\u2122 is a multi-language engine for<\/p>\n","protected":false},"author":2,"featured_media":1124,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"categories":[47,48],"tags":[],"class_list":["post-1169","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bi-en","category-big-data-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.4) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Apache Spark: Integration into Your Data Warehouse - Ignos Blog<\/title>\n<meta name=\"description\" content=\"In general, there are two motivations for undertaking the integration of Apache Spark into a corporate Data Warehouse:\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark: Integration into Your Data Warehouse\" \/>\n<meta property=\"og:description\" content=\"Introduction to Apache Spark As officially defined, Apache Spark can be briefly described in a single sentence as: Apache Spark\u2122 is a multi-language\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse\" \/>\n<meta property=\"og:site_name\" content=\"Ignos Blog\" \/>\n<meta property=\"article:published_time\" content=\"2022-06-02T06:42:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-03-30T14:31:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png\" \/>\n\t<meta property=\"og:image:width\" content=\"759\" \/>\n\t<meta property=\"og:image:height\" content=\"731\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"ignosblog\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"ignosblog\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse\"},\"author\":{\"name\":\"ignosblog\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#\\\/schema\\\/person\\\/1a5938d22bbcdc65527e74d23e620b69\"},\"headline\":\"Apache Spark: Integration into Your Data Warehouse\",\"datePublished\":\"2022-06-02T06:42:00+00:00\",\"dateModified\":\"2023-03-30T14:31:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse\"},\"wordCount\":700,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/apachesparklogo-e1655475818894.png\",\"articleSection\":[\"BI\",\"Big Data\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse\",\"url\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse\",\"name\":\"Apache Spark: Integration into Your Data Warehouse - Ignos Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/apachesparklogo-e1655475818894.png\",\"datePublished\":\"2022-06-02T06:42:00+00:00\",\"dateModified\":\"2023-03-30T14:31:45+00:00\",\"description\":\"In general, there are two motivations for undertaking the integration of Apache Spark into a corporate Data Warehouse:\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#primaryimage\",\"url\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/apachesparklogo-e1655475818894.png\",\"contentUrl\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2022\\\/06\\\/apachesparklogo-e1655475818894.png\",\"width\":759,\"height\":731,\"caption\":\"Apache Spark\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en\\\/apache-spark-integration-into-your-data-warehouse#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Portada\",\"item\":\"https:\\\/\\\/ignos.blog\\\/en\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Apache Spark: Integration into Your Data Warehouse\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#website\",\"url\":\"https:\\\/\\\/ignos.blog\\\/en\",\"name\":\"Ignos Blog\",\"description\":\"Convirtiendo problemas en retos. Te ayudamos con Big Data, IA, ML, Odoo, ...\",\"publisher\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/ignos.blog\\\/en?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#organization\",\"name\":\"Ignos\",\"url\":\"https:\\\/\\\/ignos.blog\\\/en\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2018\\\/05\\\/cropped-ignos-logotipo-rgb-72ppp-11.jpg\",\"contentUrl\":\"https:\\\/\\\/ignos.blog\\\/wp-content\\\/uploads\\\/2018\\\/05\\\/cropped-ignos-logotipo-rgb-72ppp-11.jpg\",\"width\":396,\"height\":158,\"caption\":\"Ignos\"},\"image\":{\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/es.linkedin.com\\\/company\\\/ignos-estudio-de-ingenier-a-s.l.\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/ignos.blog\\\/en#\\\/schema\\\/person\\\/1a5938d22bbcdc65527e74d23e620b69\",\"name\":\"ignosblog\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Apache Spark: Integration into Your Data Warehouse - Ignos Blog","description":"In general, there are two motivations for undertaking the integration of Apache Spark into a corporate Data Warehouse:","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse","og_locale":"en_US","og_type":"article","og_title":"Apache Spark: Integration into Your Data Warehouse","og_description":"Introduction to Apache Spark As officially defined, Apache Spark can be briefly described in a single sentence as: Apache Spark\u2122 is a multi-language","og_url":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse","og_site_name":"Ignos Blog","article_published_time":"2022-06-02T06:42:00+00:00","article_modified_time":"2023-03-30T14:31:45+00:00","og_image":[{"width":759,"height":731,"url":"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png","type":"image\/png"}],"author":"ignosblog","twitter_card":"summary_large_image","twitter_misc":{"Written by":"ignosblog","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#article","isPartOf":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse"},"author":{"name":"ignosblog","@id":"https:\/\/ignos.blog\/en#\/schema\/person\/1a5938d22bbcdc65527e74d23e620b69"},"headline":"Apache Spark: Integration into Your Data Warehouse","datePublished":"2022-06-02T06:42:00+00:00","dateModified":"2023-03-30T14:31:45+00:00","mainEntityOfPage":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse"},"wordCount":700,"commentCount":0,"publisher":{"@id":"https:\/\/ignos.blog\/en#organization"},"image":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#primaryimage"},"thumbnailUrl":"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png","articleSection":["BI","Big Data"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse","url":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse","name":"Apache Spark: Integration into Your Data Warehouse - Ignos Blog","isPartOf":{"@id":"https:\/\/ignos.blog\/en#website"},"primaryImageOfPage":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#primaryimage"},"image":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#primaryimage"},"thumbnailUrl":"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png","datePublished":"2022-06-02T06:42:00+00:00","dateModified":"2023-03-30T14:31:45+00:00","description":"In general, there are two motivations for undertaking the integration of Apache Spark into a corporate Data Warehouse:","breadcrumb":{"@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#primaryimage","url":"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png","contentUrl":"https:\/\/ignos.blog\/wp-content\/uploads\/2022\/06\/apachesparklogo-e1655475818894.png","width":759,"height":731,"caption":"Apache Spark"},{"@type":"BreadcrumbList","@id":"https:\/\/ignos.blog\/en\/apache-spark-integration-into-your-data-warehouse#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Portada","item":"https:\/\/ignos.blog\/en"},{"@type":"ListItem","position":2,"name":"Apache Spark: Integration into Your Data Warehouse"}]},{"@type":"WebSite","@id":"https:\/\/ignos.blog\/en#website","url":"https:\/\/ignos.blog\/en","name":"Ignos Blog","description":"Convirtiendo problemas en retos. Te ayudamos con Big Data, IA, ML, Odoo, ...","publisher":{"@id":"https:\/\/ignos.blog\/en#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ignos.blog\/en?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ignos.blog\/en#organization","name":"Ignos","url":"https:\/\/ignos.blog\/en","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ignos.blog\/en#\/schema\/logo\/image\/","url":"https:\/\/ignos.blog\/wp-content\/uploads\/2018\/05\/cropped-ignos-logotipo-rgb-72ppp-11.jpg","contentUrl":"https:\/\/ignos.blog\/wp-content\/uploads\/2018\/05\/cropped-ignos-logotipo-rgb-72ppp-11.jpg","width":396,"height":158,"caption":"Ignos"},"image":{"@id":"https:\/\/ignos.blog\/en#\/schema\/logo\/image\/"},"sameAs":["https:\/\/es.linkedin.com\/company\/ignos-estudio-de-ingenier-a-s.l."]},{"@type":"Person","@id":"https:\/\/ignos.blog\/en#\/schema\/person\/1a5938d22bbcdc65527e74d23e620b69","name":"ignosblog"}]}},"_links":{"self":[{"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/posts\/1169","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/comments?post=1169"}],"version-history":[{"count":3,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/posts\/1169\/revisions"}],"predecessor-version":[{"id":1172,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/posts\/1169\/revisions\/1172"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/media\/1124"}],"wp:attachment":[{"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/media?parent=1169"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/categories?post=1169"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ignos.blog\/en\/wp-json\/wp\/v2\/tags?post=1169"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}