• Parquet: Open-source columnar format for Hadoop (1 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (2 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (3 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (4 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (6 of 6)

    published: 21 Nov 2014
  • Parquet: Open-source columnar format for Hadoop (5 of 6)

    published: 21 Nov 2014
  • Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

    published: 14 Feb 2017
  • Uwe L Korn - Efficient and portable DataFrame storage with Apache Parquet

    Filmed at PyData London 2017 www.pydata.org Description Apache Parquet is the most used columnar data format in the big data processing space and recently gained Pandas support. It leverages various techniques to store data in a CPU and I/O efficient way and provides capabilities to push-down queries to the I/O layer. In this talk, it is shown how to use it in Python, detail its structure and present the portable usage with other tools. Abstract Since its creation in 2013, Apache Parquet has risen to be the most widely used binary columnar storage format in the big data processing space. While supporting basic attributes of a columnar format like reading a subset of columns, it also leverages techniques to store the data efficiently while providing fast access. In addition the format is ...

    published: 15 May 2017
  • UNILIN production process parquet

    Take a look behind the scenes and find out how UNILIN manufactures its parquet hardwood floors. In this 30-minute explanatory movie, you follow a piece of wood as it travels through the factories in Czech and Malaysia and is being transformed from tree trunk to finished, ready-to-use hardwood floor.

    published: 18 Nov 2015
  • Apache Parquet & Apache Spark

    - Overview of Apache Parquet and key benefits of using Apache Parquet. - Demo of using Apache Spark with Apache Parquet

    published: 16 Jun 2016
  • Parquet vs Avro

    In this video we will cover the pros-cons of 2 Popular file formats used in the Hadoop ecosystem namely Apache Parquet and Apache Avro Agenda: Where these formats are used Similarities Key Considerations when choosing: -Read vs Write Characteristics -Tooling -Schema Evolution General guidelines -Scenarios to keep data in both Parquet and Avro Avro is a row-based storage format for Hadoop. However Avro is more than a serialisation framework its also an IPC framework Parquet is a column-based storage format for Hadoop. Both highly optimised (vs pain text), both are self describing , uses compression If your use case typically scans or retrieves all of the fields in a row in each query, Avro is usually the best choice. If your dataset has many columns, and your use case typically inv...

    published: 16 Feb 2017
  • Working with parquet files, updates in Hive

    This video exclusively demonstrates on working with parquet files and Updates in Hive. It also includes scd1 and scd2 in Hive. Good explanation on Hive concepts for beginners.

    published: 07 Dec 2017
  • Parquet Format at Criteo

    Criteo has petabyte scale data stored in HDFS with an analytic stack based on Cascading and Hive. Up until recently it was 100% backed by RCFile. In this presentation, Justin Coffey discusses how Criteo migrated to Parquet along with benchmarks of space and time comparisons vs RCFile. Join the conversation at http://twitter.com/university

    published: 21 Apr 2014
  • Apache Parquet 1 : Introduction

    Ramathan moubarek :)

    published: 28 May 2017
  • Spark Reading and Writing to Parquet Storage Format

    Spark: Reading and Writing to Parquet Format -------------------------------------------------------------------------- - Using Spark Data Frame save capability - Code/Approach works on both local HDD and in HDFS environments Related video: Introduction to Apache Spark and Parquet, https://www.youtube.com/watch?v=itm0TINmK9k Code for demo case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") df.select("name", "age", "sex").write.partitionBy("sex").mode(SaveMode.Append).format("parquet").save("/tmp/person_partitioned/") val sqlContext = new org....

    published: 19 Nov 2016
  • Apache Parquet: Parquet file internals and inspecting Parquet file structure

    In this video we will look at the inernal structure of the Apache Parquet storage format and will use the Parquet-tool to inspect the contents of the file. Apache Parquet is a columnar storage format available in the Hadoop ecosystem Related videos: Creating Parquet files using Apache Spark: https://youtu.be/-ra0pGUw7fo Parquet vs Avro: https://youtu.be/sLuHzdMGFNA

    published: 22 Apr 2017
  • #MADMEntrevista - Parquet Courts

    A @muitoalemdomicrofone e a @movimentomusical fizeram essa parceria marota para proporcionar uma entrevista sensacional com a ótima @ParquetCourts de Nova York!

    published: 14 Apr 2017
  • The columnar roadmap Apache Parquet and Apache Arrow

    published: 20 Jun 2017
  • Infer Hive table schema automatically using Impala and Parquet

    Tip: Infer table schema automatically using Impala (using CREATE ..LIKE PARQUET) Comparing Hive vs Impala options Option 1: Using Hive- Manually build the table schema with all the column details --------------------- CREATE EXTERNAL TABLE person (name String, age Int, sex String) STORED as PARQUET LOCATION '/tmp/person' Option 2: Using Impala - Automatically infer table schema ------------------------ CREATE EXTERNAL TABLE person2 LIKE PARQUET '/tmp/person/part-r-00000-8a445cfc-eab6-41e6-8c33-40fe8aa6600d.gz.parquet' STORED AS PARQUET LOCATION '/tmp/person';

    published: 24 Nov 2016
  • Working with Different File Formats - ORC, JSON, Parquet

    In this video lecture we will see, what are the different file formats spark supports out of the box. We will see how to create a dataframe with orc, json, parquet file formats

    published: 15 Nov 2017
  • How to Fill a Wooden Floor

    How To Fill A Wooden Floor http://www.howtosandafloor.com/how-to-fill-a-wooden-floor/ Find out what products I use here: http://www.howtosandafloor.com/get-floor-refinishing-the-products-i-use-ebook-free/ Depending on your preference you may want to fill your floor. It can prevent drafts coming up from beneath the floor and it can help to make a floor look much more neat and tidy. Some people prefer to keep the gaps, they believe that filling the floor will make it look fake like laminate or lino. Each to their own, personally, I say fill it every time. http://www.howtosandafloor.com

    published: 19 Jan 2014
  • Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough

    In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet. Additional Apache Drill resources: "Overview Apache Drill’s Query Execution Capabilities" Whiteboard Walkthrough video https://www.mapr.com/blog/big-data-sql-overview-apache-drill-query-execution-capabilities-whiteboard-walkthrough "SQL Query on Mixed Schema Data Using Apache Drill” blog post https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill Free download Apache Drill on MapR sandbox https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

    published: 12 Oct 2016
  • #bbuzz 2016: Julien Le Dem - Efficient Data formats for Analytics with Parquet and Arrow

    Find more information here: https://berlinbuzzwords.de/session/efficient-data-formats-analytics-parquet-and-arrow Hadoop makes it relatively easy to store petabytes of data. However, storing data is not enough; columnar layouts for storage and in-memory execution allow the analysis of large amounts of data very quickly and efficiently. It provides the ability for multiple applications to share a common data representation and perform operations at full CPU throughput using SIMD and Vectorization. For interoperability, row based encodings - CSV, Thrift, Avro - combined with general purpose compression algorithms - GZip, LZO, Snappy - are common but inefficient. As discussed extensively in the database literature, a columnar layout with statistics and sorting provides vertical and horizonta...

    published: 12 Jun 2016
  • 12 Exercise 04 - Convert NYSE Data To Parquet File Format

    Connect with me or follow me at https://www.linkedin.com/in/durga0gadiraju https://www.facebook.com/itversity https://github.com/dgadiraju https://www.youtube.com/itversityin https://twitter.com/itversity

    published: 13 Nov 2017
developed with YouTube
Parquet: Open-source columnar format for Hadoop (1 of 6)
15:01

Parquet: Open-source columnar format for Hadoop (1 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 11979
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(1_Of_6)
Parquet: Open-source columnar format for Hadoop (2 of 6)
15:01

Parquet: Open-source columnar format for Hadoop (2 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 5198
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(2_Of_6)
Parquet: Open-source columnar format for Hadoop (3 of 6)
15:01

Parquet: Open-source columnar format for Hadoop (3 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 3537
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(3_Of_6)
Parquet: Open-source columnar format for Hadoop (4 of 6)
15:01

Parquet: Open-source columnar format for Hadoop (4 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 2335
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(4_Of_6)
Parquet: Open-source columnar format for Hadoop (6 of 6)
22:02

Parquet: Open-source columnar format for Hadoop (6 of 6)

  • Order:
  • Duration: 22:02
  • Updated: 21 Nov 2014
  • views: 794
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(6_Of_6)
Parquet: Open-source columnar format for Hadoop (5 of 6)
15:01

Parquet: Open-source columnar format for Hadoop (5 of 6)

  • Order:
  • Duration: 15:01
  • Updated: 21 Nov 2014
  • views: 1147
videos
https://wn.com/Parquet_Open_Source_Columnar_Format_For_Hadoop_(5_Of_6)
Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland
29:50

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

  • Order:
  • Duration: 29:50
  • Updated: 14 Feb 2017
  • views: 10622
videos
https://wn.com/Spark_Parquet_In_Depth_Spark_Summit_East_Talk_By_Emily_Curtin_And_Robbie_Strickland
Uwe L  Korn - Efficient and portable DataFrame storage with Apache Parquet
28:31

Uwe L Korn - Efficient and portable DataFrame storage with Apache Parquet

  • Order:
  • Duration: 28:31
  • Updated: 15 May 2017
  • views: 655
videos
Filmed at PyData London 2017 www.pydata.org Description Apache Parquet is the most used columnar data format in the big data processing space and recently gained Pandas support. It leverages various techniques to store data in a CPU and I/O efficient way and provides capabilities to push-down queries to the I/O layer. In this talk, it is shown how to use it in Python, detail its structure and present the portable usage with other tools. Abstract Since its creation in 2013, Apache Parquet has risen to be the most widely used binary columnar storage format in the big data processing space. While supporting basic attributes of a columnar format like reading a subset of columns, it also leverages techniques to store the data efficiently while providing fast access. In addition the format is structured in such a fashion that when supplied to a query engine, Parquet provides indexing hints and statistics to quickly skip over chunks of irrelevant data. In recent months, efficient implementations to load and store Parquet files in Python became available, bringing the efficiency of the format to Pandas DataFrames. While this provides a new option to store DataFrames, it especially allows us to share data between Pandas and a lot of other popular systems like Apache Spark or Apache Impala. In this talk we will show the improvements that Parquet bring performance-wise but also will highlight important aspects of the format that make it portable and efficient for queries on large amount of data. As not all features are yet available in Python, an overview of the upcoming Python-specific improvements and how the Parquet format will be extended in general is given at the end of the talk. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. We aim to be an accessible, community-driven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
https://wn.com/Uwe_L_Korn_Efficient_And_Portable_Dataframe_Storage_With_Apache_Parquet
UNILIN production process parquet
28:35

UNILIN production process parquet

  • Order:
  • Duration: 28:35
  • Updated: 18 Nov 2015
  • views: 16215
videos
Take a look behind the scenes and find out how UNILIN manufactures its parquet hardwood floors. In this 30-minute explanatory movie, you follow a piece of wood as it travels through the factories in Czech and Malaysia and is being transformed from tree trunk to finished, ready-to-use hardwood floor.
https://wn.com/Unilin_Production_Process_Parquet
Apache Parquet & Apache Spark
13:43

Apache Parquet & Apache Spark

  • Order:
  • Duration: 13:43
  • Updated: 16 Jun 2016
  • views: 9773
videos
- Overview of Apache Parquet and key benefits of using Apache Parquet. - Demo of using Apache Spark with Apache Parquet
https://wn.com/Apache_Parquet_Apache_Spark
Parquet vs Avro
13:28

Parquet vs Avro

  • Order:
  • Duration: 13:28
  • Updated: 16 Feb 2017
  • views: 8730
videos
In this video we will cover the pros-cons of 2 Popular file formats used in the Hadoop ecosystem namely Apache Parquet and Apache Avro Agenda: Where these formats are used Similarities Key Considerations when choosing: -Read vs Write Characteristics -Tooling -Schema Evolution General guidelines -Scenarios to keep data in both Parquet and Avro Avro is a row-based storage format for Hadoop. However Avro is more than a serialisation framework its also an IPC framework Parquet is a column-based storage format for Hadoop. Both highly optimised (vs pain text), both are self describing , uses compression If your use case typically scans or retrieves all of the fields in a row in each query, Avro is usually the best choice. If your dataset has many columns, and your use case typically involves working with a subset of those columns rather than entire records, Parquet is optimized for that kind of work. Finally in the video we will cover cases where you may use both file formats
https://wn.com/Parquet_Vs_Avro
Working with parquet files, updates in Hive
1:03:31

Working with parquet files, updates in Hive

  • Order:
  • Duration: 1:03:31
  • Updated: 07 Dec 2017
  • views: 117
videos
This video exclusively demonstrates on working with parquet files and Updates in Hive. It also includes scd1 and scd2 in Hive. Good explanation on Hive concepts for beginners.
https://wn.com/Working_With_Parquet_Files,_Updates_In_Hive
Parquet Format at Criteo
9:34

Parquet Format at Criteo

  • Order:
  • Duration: 9:34
  • Updated: 21 Apr 2014
  • views: 1029
videos
Criteo has petabyte scale data stored in HDFS with an analytic stack based on Cascading and Hive. Up until recently it was 100% backed by RCFile. In this presentation, Justin Coffey discusses how Criteo migrated to Parquet along with benchmarks of space and time comparisons vs RCFile. Join the conversation at http://twitter.com/university
https://wn.com/Parquet_Format_At_Criteo
Apache Parquet 1 : Introduction
3:00

Apache Parquet 1 : Introduction

  • Order:
  • Duration: 3:00
  • Updated: 28 May 2017
  • views: 404
videos
Ramathan moubarek :)
https://wn.com/Apache_Parquet_1_Introduction
Spark  Reading and Writing to Parquet Storage Format
11:28

Spark Reading and Writing to Parquet Storage Format

  • Order:
  • Duration: 11:28
  • Updated: 19 Nov 2016
  • views: 4386
videos
Spark: Reading and Writing to Parquet Format -------------------------------------------------------------------------- - Using Spark Data Frame save capability - Code/Approach works on both local HDD and in HDFS environments Related video: Introduction to Apache Spark and Parquet, https://www.youtube.com/watch?v=itm0TINmK9k Code for demo case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") df.select("name", "age", "sex").write.partitionBy("sex").mode(SaveMode.Append).format("parquet").save("/tmp/person_partitioned/") val sqlContext = new org.apache.spark.sql.SQLContext(sc) val dfPerson = sqlContext.read.parquet("/tmp/person")
https://wn.com/Spark_Reading_And_Writing_To_Parquet_Storage_Format
Apache Parquet: Parquet file internals and inspecting Parquet file structure
24:38

Apache Parquet: Parquet file internals and inspecting Parquet file structure

  • Order:
  • Duration: 24:38
  • Updated: 22 Apr 2017
  • views: 5857
videos
In this video we will look at the inernal structure of the Apache Parquet storage format and will use the Parquet-tool to inspect the contents of the file. Apache Parquet is a columnar storage format available in the Hadoop ecosystem Related videos: Creating Parquet files using Apache Spark: https://youtu.be/-ra0pGUw7fo Parquet vs Avro: https://youtu.be/sLuHzdMGFNA
https://wn.com/Apache_Parquet_Parquet_File_Internals_And_Inspecting_Parquet_File_Structure
#MADMEntrevista - Parquet Courts
7:02

#MADMEntrevista - Parquet Courts

  • Order:
  • Duration: 7:02
  • Updated: 14 Apr 2017
  • views: 875
videos
A @muitoalemdomicrofone e a @movimentomusical fizeram essa parceria marota para proporcionar uma entrevista sensacional com a ótima @ParquetCourts de Nova York!
https://wn.com/Madmentrevista_Parquet_Courts
The columnar roadmap  Apache Parquet and Apache Arrow
42:41

The columnar roadmap Apache Parquet and Apache Arrow

  • Order:
  • Duration: 42:41
  • Updated: 20 Jun 2017
  • views: 1865
videos
https://wn.com/The_Columnar_Roadmap_Apache_Parquet_And_Apache_Arrow
Infer Hive table schema automatically using Impala and Parquet
7:58

Infer Hive table schema automatically using Impala and Parquet

  • Order:
  • Duration: 7:58
  • Updated: 24 Nov 2016
  • views: 625
videos
Tip: Infer table schema automatically using Impala (using CREATE ..LIKE PARQUET) Comparing Hive vs Impala options Option 1: Using Hive- Manually build the table schema with all the column details --------------------- CREATE EXTERNAL TABLE person (name String, age Int, sex String) STORED as PARQUET LOCATION '/tmp/person' Option 2: Using Impala - Automatically infer table schema ------------------------ CREATE EXTERNAL TABLE person2 LIKE PARQUET '/tmp/person/part-r-00000-8a445cfc-eab6-41e6-8c33-40fe8aa6600d.gz.parquet' STORED AS PARQUET LOCATION '/tmp/person';
https://wn.com/Infer_Hive_Table_Schema_Automatically_Using_Impala_And_Parquet
Working with Different File Formats - ORC, JSON, Parquet
11:50

Working with Different File Formats - ORC, JSON, Parquet

  • Order:
  • Duration: 11:50
  • Updated: 15 Nov 2017
  • views: 150
videos
In this video lecture we will see, what are the different file formats spark supports out of the box. We will see how to create a dataframe with orc, json, parquet file formats
https://wn.com/Working_With_Different_File_Formats_Orc,_Json,_Parquet
How to Fill a Wooden Floor
6:30

How to Fill a Wooden Floor

  • Order:
  • Duration: 6:30
  • Updated: 19 Jan 2014
  • views: 809195
videos
How To Fill A Wooden Floor http://www.howtosandafloor.com/how-to-fill-a-wooden-floor/ Find out what products I use here: http://www.howtosandafloor.com/get-floor-refinishing-the-products-i-use-ebook-free/ Depending on your preference you may want to fill your floor. It can prevent drafts coming up from beneath the floor and it can help to make a floor look much more neat and tidy. Some people prefer to keep the gaps, they believe that filling the floor will make it look fake like laminate or lino. Each to their own, personally, I say fill it every time. http://www.howtosandafloor.com
https://wn.com/How_To_Fill_A_Wooden_Floor
Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough
10:09

Apache Drill SQL Queries on Parquet Data | Whiteboard Walkthrough

  • Order:
  • Duration: 10:09
  • Updated: 12 Oct 2016
  • views: 2295
videos
In this Whiteboard Walkthrough Parth Chandra, Chair of PMC for Apache Drill project and member of MapR engineering team, describes how the Apache Drill SQL query engine reads data in Parquet format and some of the best practices to get maximum performance from Parquet. Additional Apache Drill resources: "Overview Apache Drill’s Query Execution Capabilities" Whiteboard Walkthrough video https://www.mapr.com/blog/big-data-sql-overview-apache-drill-query-execution-capabilities-whiteboard-walkthrough "SQL Query on Mixed Schema Data Using Apache Drill” blog post https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill Free download Apache Drill on MapR sandbox https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill
https://wn.com/Apache_Drill_Sql_Queries_On_Parquet_Data_|_Whiteboard_Walkthrough
#bbuzz 2016: Julien Le Dem -  Efficient Data formats for Analytics with Parquet and Arrow
45:53

#bbuzz 2016: Julien Le Dem - Efficient Data formats for Analytics with Parquet and Arrow

  • Order:
  • Duration: 45:53
  • Updated: 12 Jun 2016
  • views: 438
videos
Find more information here: https://berlinbuzzwords.de/session/efficient-data-formats-analytics-parquet-and-arrow Hadoop makes it relatively easy to store petabytes of data. However, storing data is not enough; columnar layouts for storage and in-memory execution allow the analysis of large amounts of data very quickly and efficiently. It provides the ability for multiple applications to share a common data representation and perform operations at full CPU throughput using SIMD and Vectorization. For interoperability, row based encodings - CSV, Thrift, Avro - combined with general purpose compression algorithms - GZip, LZO, Snappy - are common but inefficient. As discussed extensively in the database literature, a columnar layout with statistics and sorting provides vertical and horizontal partitioning, thus keeping IO to a minimum. Additionally a number of key big data technologies have or will soon have in-memory columnar capabilities. This includes Kudu, Ibis and Drill. Sharing a common in-memory columnar representation allows interoperability without the usual cost of serialization. Understanding modern CPU architecture is critical to maximizing processing throughput. We’ll discuss the advantages of columnar layouts in Parquet and Arrow for in-memory processing and data encodings used for storage - dictionary, bit-packing, prefix coding. We’ll dissect and explain the design choices that enable us to achieve all three goals of interoperability, space and query efficiency. In addition, we’ll provide an overview of what’s coming in Parquet and Arrow in the next year.
https://wn.com/Bbuzz_2016_Julien_Le_Dem_Efficient_Data_Formats_For_Analytics_With_Parquet_And_Arrow
12 Exercise 04 - Convert NYSE Data To Parquet File Format
18:51

12 Exercise 04 - Convert NYSE Data To Parquet File Format

  • Order:
  • Duration: 18:51
  • Updated: 13 Nov 2017
  • views: 498
videos
Connect with me or follow me at https://www.linkedin.com/in/durga0gadiraju https://www.facebook.com/itversity https://github.com/dgadiraju https://www.youtube.com/itversityin https://twitter.com/itversity
https://wn.com/12_Exercise_04_Convert_Nyse_Data_To_Parquet_File_Format