- What is "Hadoop"?:
- Distributed Compute Engine
- Distributed Storage
- Draw a picture of Hadoop and its ecosystem.
- HDFS
- YARN
- MapReduce
- Tez
- Spark
- Hive
- HBase
- ...
- (depends on their projects/environment)
- Shallow Dive(Not Deep Dive)
- HDFS
- NameNode and DataNode
- Replica, Not RAID
- Replica for data locality
- YARN
- ResourceManager and NodeManager
- Queue and Scheduler
- Process path
- show the page of gihyo's Hadoop series.
- Hive
- HiveServer2, Hive Metastore
- Where is data of Hive tables?
- Partition
- File format. Plain Text and ORC
- Stats, Optimizer, Vectorization
- HBase
- Master, RegionServer and ZooKeeper
- Master is not used for usual data access.
- (TBD)
- Spark
- RDD is ...
- DataFrame
- Spark SQL - memory management(executor-memory and overhead)
- HDFS
Last active
September 4, 2017 09:28
-
-
Save descico/f74a5eb5c09ebb988ae9149fbf636054 to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment