├── .classpath
├── .project
├── README.md
├── html
├── hdfs_sunburst.html
└── hdfs_sunburst_small.html
├── pom.xml
├── scripts
└── samples.sh
├── src
└── main
│ └── java
│ └── com
│ └── github
│ └── gbraccialli
│ └── hdfs
│ ├── DirectoryContentsUtils.java
│ ├── HDFSConfigUtils.java
│ └── PathInfo.java
├── target
└── gbraccialli-hdfs-utils-with-dependencies.jar
└── zeppelin
├── hdfs-d3.json
└── note.json
/.classpath:
--------------------------------------------------------------------------------
1 |
2 |
10 | https://raw.githubusercontent.com/gbraccialli/HdfsUtils/master/zeppelin/hdfs-d3.json
11 |
12 | ###[Live Preview here](https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2dicmFjY2lhbGxpL0hkZnNVdGlscy9tYXN0ZXIvemVwcGVsaW4vbm90ZS5qc29u)
13 |
14 | ###2- Build from source, running in command line and using html file
15 | ###Building
16 | ```sh
17 | git clone https://github.com/gbraccialli/HdfsUtils.git
18 | cd HdfsUtils
19 | mvn clean package
20 | ````
21 | ###Basic usage
22 | ```sh
23 | java -jar target/gbraccialli-hdfs-utils-with-dependencies.jar \
24 | --path=/ \
25 | --maxLevelThreshold=-1 \
26 | --minSizeThreshold=-1 \
27 | --showFiles=false \
28 | --verbose=true > out.json
29 | ```
30 | ###Visualizing
31 | Open html/hdfs_sunburst.html in your browser and point to .json file you created in previous step, or copy/paste json content on right load options
32 |
33 | PS: note Chrome browser has security contraint that does not allow you to load local files, use one of the following options:
34 | - Use zeppelin notebook (describe above)
35 | - Use Safari
36 | - Enable Chrome local files access: [instructions here](http://stackoverflow.com/questions/18586921/how-to-launch-html-using-chrome-at-allow-file-access-from-files-mode)
37 | - Publish json in a webserver and use full URL
38 |
39 |
40 | ###Command line options:
41 | ####--confDir=
42 | //path-to-conf-dir
43 | //specify directory containing hadoop config files, default to /etc/hadoop/conf
44 |
45 | ####--maxLevelThreshold=
46 | -1 or or valid int
47 | //max number of directories do drill down. -1 means no limit. for example: maxLevelThreshold=3 means drill down will stop after 3 levels of subdirectories
48 |
49 | ####--minSizeThreshold=
50 | //-1 or valid long
51 | //min number of bytes in a directory to continue drill down. -1 means no limit. minSizeThreshold=1000000 means only directories greater > 1000000 bytes will be drilled down
52 |
53 | ####--showFiles=
54 | //true or false
55 | //whether to show information about files. showFiles=false will show summary information about files in each directory/subdirectory.
56 |
57 | ####--exclude=
58 | //path1,path2,...
59 | //directories to exclude from drill down, for example: /tmp/,/user/ won't present information about those directories.
60 |
61 | ####--doAs=
62 | //username (hdfs for example)
63 | //for non-kerberized cluster, you can set user to perform hdfs operations, using hdfs you won't have permissions issues. if you are using a kerberized cluster, grant read access to user performing this operation (you can use Ranger for this)
64 |
65 | ####--verbose=
66 | //true or false
67 | //when true print processing info into System.err (not applied for zeppelin)
68 |
69 | ####--path=
70 | //path to start analysis
71 |
72 |
73 | ##Special thanks to:
74 | - [Dave Patton](https://github.com/dp1140a) who first created [HDP-Viz](https://github.com/dp1140a/HDP-Viz) where I got insipered and copied lots of code
75 | - [Ali Bajwa](https://github.com/abajwa-hw) who created [ambari stack for Dave's project](https://github.com/abajwa-hw/hdpviz) (and helped me get it working)
76 | - [David Streever](https://github.com/dstreev) who created (or forked) [hdfs-cli](https://github.com/dstreev/hdfs-cli), where I also copied lots of code
77 |
--------------------------------------------------------------------------------
/html/hdfs_sunburst.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
58 |