Advantages
- Highly scalable storage platform – it can store and distribute large data sets across hundreds of parallel operated servers
- Cost-effective – it’s designed as a scale-out architecture that can store all data of a company for later use, at an affordable price
- Flexible – it enables businesses to easily access new data sources and tap into different types of data to generate value from them
- Fast – the storage method is based on a distributed file system that basically “maps” data whenever it is located on a cluster
- Resilient to failure – data sent to an individual node is also replicated to other nodes in the cluster, meaning that in the event of failure, there is another copy available for use
Disadvantages
- Security concerns – Hadoop is missing encryption at storage and network levels
- Vulnerable – being written almost entirely in Java, it has been heavily exploited by cybercriminals
- Not fit for small data – due to its high capacity design, it lacks the ability to efficiently support the random reading of small files
- General limitations – Hadoop misses the ability to improve the efficiency and reliability on data collection, aggregation and integration
Components
- Hadoop Common
- HDFS
- MapReduce
- YARN
Development tools
- Hadoop Development Tools (HDT) plugin
- Eclipse IDE
- HUE web interface
- Azkaban
- Hortonworks Sandbox