• Parquet: Open-source columnar format for Hadoop (1 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (2 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (3 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (4 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (6 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (5 of 6)

    published: 21 Nov 2014
  • Apache Parquet & Apache Spark

    - Overview of Apache Parquet and key benefits of using Apache Parquet. - Demo of using Apache Spark with Apache Parquet

    published: 16 Jun 2016
  • Apache Parquet: Parquet file internals and inspecting Parquet file structure

    In this video we will look at the inernal structure of the Apache Parquet storage format and will use the Parquet-tool to inspect the contents of the file. Apache Parquet is a columnar storage format available in the Hadoop ecosystem Related videos: Creating Parquet files using Apache Spark: https://youtu.be/-ra0pGUw7fo Parquet vs Avro: https://youtu.be/sLuHzdMGFNA

    published: 22 Apr 2017
  • Parquet Format at Twitter

    Julien Le Dem discusses Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, ...) and serialization models (Thrift, Avro, Protocol Buffers, ...) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). Join the conver...

    published: 18 Apr 2014
  • Spark Reading and Writing to Parquet Storage Format

    Spark: Reading and Writing to Parquet Format -------------------------------------------------------------------------- - Using Spark Data Frame save capability - Code/Approach works on both local HDD and in HDFS environments Related video: Introduction to Apache Spark and Parquet, https://www.youtube.com/watch?v=itm0TINmK9k Code for demo case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") df.select("name", "age", "sex").write.partitionBy("sex").mode(SaveMode.Append).format("parquet").save("/tmp/person_partitioned/") val sqlContext = new org....

    published: 19 Nov 2016
  • Parquet vs Avro

    In this video we will cover the pros-cons of 2 Popular file formats used in the Hadoop ecosystem namely Apache Parquet and Apache Avro Agenda: Where these formats are used Similarities Key Considerations when choosing: -Read vs Write Characteristics -Tooling -Schema Evolution General guidelines -Scenarios to keep data in both Parquet and Avro Avro is a row-based storage format for Hadoop. However Avro is more than a serialisation framework its also an IPC framework Parquet is a column-based storage format for Hadoop. Both highly optimised (vs pain text), both are self describing , uses compression If your use case typically scans or retrieves all of the fields in a row in each query, Avro is usually the best choice. If your dataset has many columns, and your use case typically inv...

    published: 16 Feb 2017
  • Колоночные БД на примере Parquet

    http://0x1.tv/20170422CC Колоночные БД на примере Parquet (Леонид Блохин, SECON-2017) * Леонид Блохин ------------- * Отличия строковых и колоночных баз данных. * Apache Parquet, области применения, преимущества которые он дает, сравнение с другими колоночными базами данных. * Apache Spark, области применения, отличительные особенности, приемущества и недостатки, работа с parquet файлами в Hadoop File System. * RDD, DataFrames, и Datasets в Apache Spark, зачем они нужны, как ими пользоваться, какие профиты. * Mist, используем Spark, как сервис с REST API

    published: 01 Jul 2017
  • Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandr­a Connector

    Advanced Apache Spark Meetup October 6th, 2015 Speaker: Chris Fregly Location: Spark Technology Center http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/225715756/ A deep dive into the details of the spark-cassandra-connector. This implementation of the Spark SQL Data Sources API is one of the most advanced and performance-tunable connectors available. Highlights of the spark-cassandra-connector 1) Token-ring aware data locality for co-location with Spark Worker nodes 2) Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizations 3) Spark 1.4, Spark 1.5 DataFrame support 4) Enables single Cassandra data store to serve both your transactional and analytics needs (pros and cons to this) For more information about ...

    published: 14 Oct 2015
  • Infer Hive table schema automatically using Impala and Parquet

    Tip: Infer table schema automatically using Impala (using CREATE ..LIKE PARQUET) Comparing Hive vs Impala options Option 1: Using Hive- Manually build the table schema with all the column details --------------------- CREATE EXTERNAL TABLE person (name String, age Int, sex String) STORED as PARQUET LOCATION '/tmp/person' Option 2: Using Impala - Automatically infer table schema ------------------------ CREATE EXTERNAL TABLE person2 LIKE PARQUET '/tmp/person/part-r-00000-8a445cfc-eab6-41e6-8c33-40fe8aa6600d.gz.parquet' STORED AS PARQUET LOCATION '/tmp/person';

    published: 24 Nov 2016
  • Hadoop Tutorials - SQL and file formats

    The Hadoop ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department. This tutorial organized by the IT Hadoop service, aims to introduce the main concepts about Hadoop technology in a practical way and is targeted to those who would like to start using the service for distributed parallel data processing. The main topics that will be covered are: Hadoop architecture and available components How to perform distributed parallel processing in order to explore and create reports with SQL (with Apache Impala) on example data. Using a HUE - Hadoop web UI for presenting the results in user friendly way. How to format and/or structure data in order to make data p...

    published: 09 Jul 2016
  • UNILIN production process parquet

    Take a look behind the scenes and find out how UNILIN manufactures its parquet hardwood floors. In this 30-minute explanatory movie, you follow a piece of wood as it travels through the factories in Czech and Malaysia and is being transformed from tree trunk to finished, ready-to-use hardwood floor.

    published: 18 Nov 2015
  • 0603 Open Source Recipes for Chef Deployments of Hadoop

    published: 19 Jun 2014
  • 0605 Efficient Data Storage for Analytics with Parquet 2 0

    published: 23 Jun 2014
  • Hoodie: An Open Source Incremental Processing Framework From Uber

    Recorded at DataEngConf SF '17 Even after a decade, the name “Hadoop" remains synonymous with "big data”, even as new options for processing/querying (stream processing, in-memory analytics, interactive sql) and storage services (S3/Google Cloud/Azure) have emerged & unlocked new possibilities. However, the overall data architecture has become more complex with more moving parts and specialized systems, leading to duplication of data and strain on usability . In this talk, we argue that by adding some missing blocks to existing Hadoop stack, we are able to a provide similar capabilities right on top of Hadoop, at reduced cost and increased efficiency, greatly simplifying the overall architecture as well in the process. We will discuss the need for incremental processing primitives on Had...

    published: 25 Jun 2017
  • Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

    published: 14 Feb 2017
  • Apache Kudu and Spark SQL for Fast Analytics on Fast Data (Mike Percy)

    Apache Kudu is a new, open source storage engine for the Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. Using Spark and Kudu, it is now easy to create applications that query and analyze mutable, constantly changing datasets using SQL while getting the impressive query performance that you would normally expect from an immutable columnar data format like Parquet. Kudu delivers this with a fault-tolerant, distributed architecture and a columnar on-disk storage format. This talk provides an introduction to Kudu, presents an overview of how to build a Spark application using Kudu for data storage, and demonstrates using Spark and Kudu together to achieve impressive results in a system that is friendly to both application developers and...

    published: 03 Nov 2016
  • Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough

    In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet. Additional Apache Drill resources: "Overview Apache Drill’s Query Execution Capabilities" Whiteboard Walkthrough video https://www.mapr.com/blog/big-data-sql-overview-apache-drill-query-execution-capabilities-whiteboard-walkthrough "SQL Query on Mixed Schema Data Using Apache Drill” blog post https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill Free download Apache Drill on MapR sandbox https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

    published: 12 Oct 2016
  • File Formats

    Table of Contents: 00:07 - Considerations for Choosing a File Format 00:55 - Text File Format 02:17 - SequenceFile Format 03:20 - Binary Storage Formats 05:25 - Apache Avro File Format 06:25 - Columnar Formats 07:44 - Columnar File Formats: RCFile and ORCFile 08:47 - Columnar File Layout 11:11 - Columnar File Formats: Apache Parquet 12:33 - Essential Points 13:09 - Thank you

    published: 06 Mar 2017
  • Hardwood Flooring on Stairs: Installing Open Sided Staircase Nosing Tread and Riser from A to Z

    Installing hardwood flooring on stairs you can face with open sided staircase. Watch how to install tread riser and stair nosing on it. THINGS I MENTION IN THIS VID: - Stair Tread Gauge - http://amzn.to/2i3sLNG - Dewalt Miter Saw - http://amzn.to/2i7JonF - Dewalt Table Saw - http://amzn.to/2h3aOgy - Pin Nailer 23-Gauge - http://amzn.to/2ibSUds - Compressor - http://amzn.to/2ihgl0X - Rubber Mallet - http://amzn.to/2i7mnBa - Adhesive Gun - http://amzn.to/2hOEoI4 - Construction Adhesive - http://amzn.to/2hOFeEz - Wood Glue - http://amzn.to/2i3vqH4 - Heavy-Duty Utility Knife - http://amz - Scotch Masking Tape - http://amzn.to/2kMnA6p SUBSCRIBE FOR MORE VIDS! http://www.youtube.com/user/mryoucandoityourself?sub_confirmation=1 ALSO FIND ME HERE: https://www.facebook.com/mryoucandoit...

    published: 09 May 2014
  • Parquet: Open-source columnar format for Hadoop (1 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (2 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (3 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (4 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (5 of 6)

    published: 21 Nov 2014
Parquet: Open-source columnar format for Hadoop (1 of 6)

Parquet: Open-source columnar format for Hadoop (1 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 8055
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(1_Of_6)
Parquet: Open-source columnar format for Hadoop (2 of 6)

Parquet: Open-source columnar format for Hadoop (2 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 3680
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(2_Of_6)
Parquet: Open-source columnar format for Hadoop (3 of 6)

Parquet: Open-source columnar format for Hadoop (3 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 2547
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(3_Of_6)
Parquet: Open-source columnar format for Hadoop (4 of 6)

Parquet: Open-source columnar format for Hadoop (4 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 1725
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(4_Of_6)
Parquet: Open-source columnar format for Hadoop (6 of 6)

Parquet: Open-source columnar format for Hadoop (6 of 6)

  • Order:
  • Duration: 22:02
  • Updated: 21 Nov 2014
  • views: 595
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(6_Of_6)
Parquet: Open-source columnar format for Hadoop (5 of 6)

Parquet: Open-source columnar format for Hadoop (5 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 867
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(5_Of_6)
Apache Parquet & Apache Spark

Apache Parquet & Apache Spark

  • Order:
  • Duration: 13:43
  • Updated: 16 Jun 2016
  • views: 4375
videos
- Overview of Apache Parquet and key benefits of using Apache Parquet. - Demo of using Apache Spark with Apache Parquet
https://wn.com/Apache_Parquet_Apache_Spark
Apache Parquet: Parquet file internals and inspecting Parquet file structure

Apache Parquet: Parquet file internals and inspecting Parquet file structure

  • Order:
  • Duration: 24:38
  • Updated: 22 Apr 2017
  • views: 817
videos
In this video we will look at the inernal structure of the Apache Parquet storage format and will use the Parquet-tool to inspect the contents of the file. Apache Parquet is a columnar storage format available in the Hadoop ecosystem Related videos: Creating Parquet files using Apache Spark: https://youtu.be/-ra0pGUw7fo Parquet vs Avro: https://youtu.be/sLuHzdMGFNA
https://wn.com/Apache_Parquet_Parquet_File_Internals_And_Inspecting_Parquet_File_Structure
Parquet Format at Twitter

Parquet Format at Twitter

  • Order:
  • Duration: 23:45
  • Updated: 18 Apr 2014
  • views: 8401
videos
Julien Le Dem discusses Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, ...) and serialization models (Thrift, Avro, Protocol Buffers, ...) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). Join the conversation at http://twitter.com/university
https://wn.com/Parquet_Format_At_Twitter
Spark  Reading and Writing to Parquet Storage Format

Spark Reading and Writing to Parquet Storage Format

  • Order:
  • Duration: 11:28
  • Updated: 19 Nov 2016
  • views: 1668
videos
Spark: Reading and Writing to Parquet Format -------------------------------------------------------------------------- - Using Spark Data Frame save capability - Code/Approach works on both local HDD and in HDFS environments Related video: Introduction to Apache Spark and Parquet, https://www.youtube.com/watch?v=itm0TINmK9k Code for demo case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") df.select("name", "age", "sex").write.partitionBy("sex").mode(SaveMode.Append).format("parquet").save("/tmp/person_partitioned/") val sqlContext = new org.apache.spark.sql.SQLContext(sc) val dfPerson = sqlContext.read.parquet("/tmp/person")
https://wn.com/Spark_Reading_And_Writing_To_Parquet_Storage_Format
Parquet vs Avro

Parquet vs Avro

  • Order:
  • Duration: 13:28
  • Updated: 16 Feb 2017
  • views: 1838
videos
In this video we will cover the pros-cons of 2 Popular file formats used in the Hadoop ecosystem namely Apache Parquet and Apache Avro Agenda: Where these formats are used Similarities Key Considerations when choosing: -Read vs Write Characteristics -Tooling -Schema Evolution General guidelines -Scenarios to keep data in both Parquet and Avro Avro is a row-based storage format for Hadoop. However Avro is more than a serialisation framework its also an IPC framework Parquet is a column-based storage format for Hadoop. Both highly optimised (vs pain text), both are self describing , uses compression If your use case typically scans or retrieves all of the fields in a row in each query, Avro is usually the best choice. If your dataset has many columns, and your use case typically involves working with a subset of those columns rather than entire records, Parquet is optimized for that kind of work. Finally in the video we will cover cases where you may use both file formats
https://wn.com/Parquet_Vs_Avro
Колоночные БД на примере Parquet

Колоночные БД на примере Parquet

  • Order:
  • Duration: 39:25
  • Updated: 01 Jul 2017
  • views: 15
videos
http://0x1.tv/20170422CC Колоночные БД на примере Parquet (Леонид Блохин, SECON-2017) * Леонид Блохин ------------- * Отличия строковых и колоночных баз данных. * Apache Parquet, области применения, преимущества которые он дает, сравнение с другими колоночными базами данных. * Apache Spark, области применения, отличительные особенности, приемущества и недостатки, работа с parquet файлами в Hadoop File System. * RDD, DataFrames, и Datasets в Apache Spark, зачем они нужны, как ими пользоваться, какие профиты. * Mist, используем Spark, как сервис с REST API
https://wn.com/Колоночные_Бд_На_Примере_Parquet
Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandr­a Connector

Deep Dive: Spark SQL+DataFrames+Data Sources API+Parquet+Cassandr­a Connector

  • Order:
  • Duration: 1:02:41
  • Updated: 14 Oct 2015
  • views: 2688
videos
Advanced Apache Spark Meetup October 6th, 2015 Speaker: Chris Fregly Location: Spark Technology Center http://www.meetup.com/Advanced-Apache-Spark-Meetup/events/225715756/ A deep dive into the details of the spark-cassandra-connector. This implementation of the Spark SQL Data Sources API is one of the most advanced and performance-tunable connectors available. Highlights of the spark-cassandra-connector 1) Token-ring aware data locality for co-location with Spark Worker nodes 2) Pushdown filter support for optimal performance and participation in the advanced Spark SQL Catalyst Query Optimizations 3) Spark 1.4, Spark 1.5 DataFrame support 4) Enables single Cassandra data store to serve both your transactional and analytics needs (pros and cons to this) For more information about the Spark Technology Center: http://www.spark.tc/ Follow us: @apachespark_tc Location: San Francisco, CA Apache®, Apache Spark™, and Spark™ are trademarks of the Apache Software Foundation in the United States and/or other countries.
https://wn.com/Deep_Dive_Spark_Sql_Dataframes_Data_Sources_Api_Parquet_Cassandr­A_Connector
Infer Hive table schema automatically using Impala and Parquet

Infer Hive table schema automatically using Impala and Parquet

  • Order:
  • Duration: 7:59
  • Updated: 24 Nov 2016
  • views: 270
videos
Tip: Infer table schema automatically using Impala (using CREATE ..LIKE PARQUET) Comparing Hive vs Impala options Option 1: Using Hive- Manually build the table schema with all the column details --------------------- CREATE EXTERNAL TABLE person (name String, age Int, sex String) STORED as PARQUET LOCATION '/tmp/person' Option 2: Using Impala - Automatically infer table schema ------------------------ CREATE EXTERNAL TABLE person2 LIKE PARQUET '/tmp/person/part-r-00000-8a445cfc-eab6-41e6-8c33-40fe8aa6600d.gz.parquet' STORED AS PARQUET LOCATION '/tmp/person';
https://wn.com/Infer_Hive_Table_Schema_Automatically_Using_Impala_And_Parquet
Hadoop Tutorials - SQL and file formats

Hadoop Tutorials - SQL and file formats

  • Order:
  • Duration: 54:48
  • Updated: 09 Jul 2016
  • views: 253
videos
The Hadoop ecosystem is the leading opensource platform for distributed storage and processing of "big data". The Hadoop platform is available at CERN as a central service provided by the IT department. This tutorial organized by the IT Hadoop service, aims to introduce the main concepts about Hadoop technology in a practical way and is targeted to those who would like to start using the service for distributed parallel data processing. The main topics that will be covered are: Hadoop architecture and available components How to perform distributed parallel processing in order to explore and create reports with SQL (with Apache Impala) on example data. Using a HUE - Hadoop web UI for presenting the results in user friendly way. How to format and/or structure data in order to make data processing more efficient - by using various data formats/containers and partitioning techniques (Avro, Parquet, HBase). Best practices in this area will be also discussed
https://wn.com/Hadoop_Tutorials_Sql_And_File_Formats
UNILIN production process parquet

UNILIN production process parquet

  • Order:
  • Duration: 28:35
  • Updated: 18 Nov 2015
  • views: 5671
videos
Take a look behind the scenes and find out how UNILIN manufactures its parquet hardwood floors. In this 30-minute explanatory movie, you follow a piece of wood as it travels through the factories in Czech and Malaysia and is being transformed from tree trunk to finished, ready-to-use hardwood floor.
https://wn.com/Unilin_Production_Process_Parquet
0603 Open Source Recipes for Chef Deployments of Hadoop

0603 Open Source Recipes for Chef Deployments of Hadoop

  • Order:
  • Duration: 40:36
  • Updated: 19 Jun 2014
  • views: 1103
videos
https://wn.com/0603_Open_Source_Recipes_For_Chef_Deployments_Of_Hadoop
0605 Efficient Data Storage for Analytics with Parquet 2 0

0605 Efficient Data Storage for Analytics with Parquet 2 0

  • Order:
  • Duration: 41:59
  • Updated: 23 Jun 2014
  • views: 46117
videos
https://wn.com/0605_Efficient_Data_Storage_For_Analytics_With_Parquet_2_0
Hoodie: An Open Source Incremental Processing Framework From Uber

Hoodie: An Open Source Incremental Processing Framework From Uber

  • Order:
  • Duration: 52:37
  • Updated: 25 Jun 2017
  • views: 319
videos
Recorded at DataEngConf SF '17 Even after a decade, the name “Hadoop" remains synonymous with "big data”, even as new options for processing/querying (stream processing, in-memory analytics, interactive sql) and storage services (S3/Google Cloud/Azure) have emerged & unlocked new possibilities. However, the overall data architecture has become more complex with more moving parts and specialized systems, leading to duplication of data and strain on usability . In this talk, we argue that by adding some missing blocks to existing Hadoop stack, we are able to a provide similar capabilities right on top of Hadoop, at reduced cost and increased efficiency, greatly simplifying the overall architecture as well in the process. We will discuss the need for incremental processing primitives on Hadoop, motivating them with some real world problems from Uber. We will then introduce “Hoodie”, an open source spark library built at Uber, to enable faster data for petabyte scale data analytics and solve these problems. We will deep dive into the design & implementation of the system and discuss the core concepts around timeline consistency, tradeoffs between ingest speed & query performance. We contrast Hoodie with similar systems in the space, discuss how its deployed across Hadoop ecosystem at Uber and finally also share the technical direction ahead for the project. Speaker: Vinoth Chandar, Uber
https://wn.com/Hoodie_An_Open_Source_Incremental_Processing_Framework_From_Uber
Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

  • Order:
  • Duration: 29:50
  • Updated: 14 Feb 2017
  • views: 3828
videos
https://wn.com/Spark_Parquet_In_Depth_Spark_Summit_East_Talk_By_Emily_Curtin_And_Robbie_Strickland
Apache Kudu and Spark SQL for Fast Analytics on Fast Data (Mike Percy)

Apache Kudu and Spark SQL for Fast Analytics on Fast Data (Mike Percy)

  • Order:
  • Duration: 28:54
  • Updated: 03 Nov 2016
  • views: 3662
videos
Apache Kudu is a new, open source storage engine for the Hadoop ecosystem that enables extremely high-speed analytics without imposing data-visibility latencies. Using Spark and Kudu, it is now easy to create applications that query and analyze mutable, constantly changing datasets using SQL while getting the impressive query performance that you would normally expect from an immutable columnar data format like Parquet. Kudu delivers this with a fault-tolerant, distributed architecture and a columnar on-disk storage format. This talk provides an introduction to Kudu, presents an overview of how to build a Spark application using Kudu for data storage, and demonstrates using Spark and Kudu together to achieve impressive results in a system that is friendly to both application developers and operations engineers.
https://wn.com/Apache_Kudu_And_Spark_Sql_For_Fast_Analytics_On_Fast_Data_(Mike_Percy)
Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough

Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough

  • Order:
  • Duration: 10:09
  • Updated: 12 Oct 2016
  • views: 1437
videos
In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet. Additional Apache Drill resources: "Overview Apache Drill’s Query Execution Capabilities" Whiteboard Walkthrough video https://www.mapr.com/blog/big-data-sql-overview-apache-drill-query-execution-capabilities-whiteboard-walkthrough "SQL Query on Mixed Schema Data Using Apache Drill” blog post https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill Free download Apache Drill on MapR sandbox https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill
https://wn.com/Apache_Drill_Sql_Queries_On_Parquet_Data_|_Whiteboard_Walkthrough
File Formats

File Formats

  • Order:
  • Duration: 13:17
  • Updated: 06 Mar 2017
  • views: 325
videos
Table of Contents: 00:07 - Considerations for Choosing a File Format 00:55 - Text File Format 02:17 - SequenceFile Format 03:20 - Binary Storage Formats 05:25 - Apache Avro File Format 06:25 - Columnar Formats 07:44 - Columnar File Formats: RCFile and ORCFile 08:47 - Columnar File Layout 11:11 - Columnar File Formats: Apache Parquet 12:33 - Essential Points 13:09 - Thank you
https://wn.com/File_Formats
Hardwood Flooring on Stairs: Installing Open Sided Staircase Nosing Tread and Riser from A to Z

Hardwood Flooring on Stairs: Installing Open Sided Staircase Nosing Tread and Riser from A to Z

  • Order:
  • Duration: 2:56
  • Updated: 09 May 2014
  • views: 257415
videos
Installing hardwood flooring on stairs you can face with open sided staircase. Watch how to install tread riser and stair nosing on it. THINGS I MENTION IN THIS VID: - Stair Tread Gauge - http://amzn.to/2i3sLNG - Dewalt Miter Saw - http://amzn.to/2i7JonF - Dewalt Table Saw - http://amzn.to/2h3aOgy - Pin Nailer 23-Gauge - http://amzn.to/2ibSUds - Compressor - http://amzn.to/2ihgl0X - Rubber Mallet - http://amzn.to/2i7mnBa - Adhesive Gun - http://amzn.to/2hOEoI4 - Construction Adhesive - http://amzn.to/2hOFeEz - Wood Glue - http://amzn.to/2i3vqH4 - Heavy-Duty Utility Knife - http://amz - Scotch Masking Tape - http://amzn.to/2kMnA6p SUBSCRIBE FOR MORE VIDS! http://www.youtube.com/user/mryoucandoityourself?sub_confirmation=1 ALSO FIND ME HERE: https://www.facebook.com/mryoucandoityourself/ https://twitter.com/ovisha ------------ STUFF I USE TO MAKE VIDEOS: Canon T5i - http://amzn.to/2i3ptu5 Favorite tools: 1.Woodworking: - Router Table - http://amzn.to/2ikRmO2 - Dewalt Miter Saw Crown Stops - http://amzn.to/2hObQOa - Dewalt 18 Ga nail gun - http://amzn.to/2i7TTaa - Dewalt 16 Ga nail gun - http://amzn.to/2h33B06 - Dewalt circular saw - http://amzn.to/2jkgG80 - Festool 36 Auto Clean Vacuum - http://amzn.to/2jt9jX0 - Ridgid Vacuum - http://amzn.to/2i7HWRZ - Makita Jig Saw - http://amzn.to/2i7Oe3V - Makita Cordless Tool Kit - http://amzn.to/2i2oNVF - Laminate cutter - http://amzn.to/2h3A6LG - Rockwell Versacut Saw - http://amzn.to/2h3zTYB - Fein Multimaster - http://amzn.to/2gVcZQ0 - Undercut Saw - http://amzn.to/2i3poqc - Sliding T-Bevel - http://amzn.to/2i3qMJu - Hot Glue Gun - http://amzn.to/2hOEuz2 - Tongue and Groove Glue - http://amzn.to/2hC8bD9 Click this link for my other Hardwood Flooring Tutorials! https://www.youtube.com/playlist?list=PL02383F61DB11C145 Click this link for my other Laminate Flooring Tutorials! https://www.youtube.com/playlist?list=PLJTdhUx6BdIclvU-dsWFJQiZxC0SFcxem Click this link for my other Laminate Stairs Tutorials! https://www.youtube.com/playlist?list=PLJTdhUx6BdIdIrPNMipW2xF-POlHC5E71 Click this link for my other Hardwood Stairs Tutorials! https://www.youtube.com/playlist?list=PLJTdhUx6BdIcMLVx82bygGXvcDCyhBQFQ Click this link for my other subfloor leveling tutorials https://www.youtube.com/playlist?feature=edit_ok&list=PLJTdhUx6BdIdfHIT9HiOkWODv0IJogvc_ Click this link to watch tutorials on Winder Stairs Installation https://www.youtube.com/playlist?list=PLJTdhUx6BdIco4SCnBKG2t0WnXx1EcLOs Thanks for watching! Let me know what you think by commenting and rating this video! Don't forget to subscribe :) Also visit http://hardwoodfloorinstallation101.com Main Channel: https://www.youtube.com/Mryoucandoityourself Hardwood Flooring on Stairs: Installing Open Sided Staircase Nosing Tread and Riser from A to Z
https://wn.com/Hardwood_Flooring_On_Stairs_Installing_Open_Sided_Staircase_Nosing_Tread_And_Riser_From_A_To_Z
Parquet: Open-source columnar format for Hadoop (1 of 6)

Parquet: Open-source columnar format for Hadoop (1 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 8055
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(1_Of_6)
Parquet: Open-source columnar format for Hadoop (2 of 6)

Parquet: Open-source columnar format for Hadoop (2 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 3680
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(2_Of_6)
Parquet: Open-source columnar format for Hadoop (3 of 6)

Parquet: Open-source columnar format for Hadoop (3 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 2547
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(3_Of_6)
Parquet: Open-source columnar format for Hadoop (4 of 6)

Parquet: Open-source columnar format for Hadoop (4 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 1725
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(4_Of_6)
Parquet: Open-source columnar format for Hadoop (5 of 6)

Parquet: Open-source columnar format for Hadoop (5 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 867
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(5_Of_6)