├── README.md
├── data
├── adj_list
├── dense_link_matrix
├── map
├── sparse_link_matrix
└── users
├── docs
├── authority_scores.png
├── auths.png
├── dataset_fetcher.html
├── hits.html
├── hubbiness_scores.png
├── hubs.png
└── stats.png
├── requirements.txt
└── src
├── dataset_fetcher.py
└── hits.py
/README.md:
--------------------------------------------------------------------------------
1 | # HITS-Algorithm-implementation
2 |
3 | The HITS algorithm is being used on the Twitter follower network to find important hubs and authorities, where good hubs are people who follow good authorities and good authorities are people who are followed by good hubs. In this
4 | real-life scenario, a good authority could be a popular music artist and a good hub could be a music lover who follows many accomplished artists.
5 |
6 | Dataset
7 | -------
8 |
9 | The dataset can be viewed as a directed graph. Each node in the graph represents a Twitter user and an edge from user A to user B implies that A is a “follower” of B and B is a “friend” of A.
10 |
11 | The graph consists of 500 nodes with edges between two nodes if one is a follower/friend of another. The graph is stored as an adjacency list the first time it is prepared but then converted to an adjacency matrix immediately (thus requiring
12 | to store a map from matrix index to user id) for repeated use with the HITS algorithm.
13 |
14 | File Structure
15 | ---------------
16 | [src](src) (directory) – Contains python source files
17 |
18 | /hits.py – Implements the HITS algorithm
19 |
20 | /dataset_fetcher.py – Fetches the dataset using the Twitter API
21 |
22 |
23 | [data](data) (directory) – Contains the structures saved after obtaining the dataset
24 |
25 | /adj_list – Adjacency list representing the fetched dataset
26 |
27 | /dense_link_matrix – Link matrix using non sparse representation
28 |
29 | /sparse_link_matrix – Link matrix using sparse representation
30 |
31 | /map – Map from user id to matrix index
32 |
33 | /users – Users information
34 |
35 |
36 | [docs](docs) (directory) - Contains the documentation for the various components
37 |
38 | /dataset_fetcher.html – doc for dataset_fetcher.py
39 |
40 | /hits.html – doc for hits.py
41 |
42 | /requirements.txt – Contains requirements to run the python code
43 |
44 | Usage:
45 | ------
46 | - Download/Clone this repository
47 | ```bash
48 | git clone https://github.com/nikhil-iyer-97/HITS-Algorithm-implementation.git
49 | ```
50 | - Change working directory to the where the repository is located
51 | ```bash
52 | cd HITS-Algorithm-implementation
53 | ```
54 | - Install dependencies:
55 | ```bash
56 | pip install -r requirements.txt
57 | ```
58 | - Change working directory to `src`:
59 | ```bash
60 | cd src
61 | ```
62 | Now enter `python3 hits.py` for the program to run and display outputs.
63 |
64 | Example:
65 | --------
66 | Some example outputs for hubbiness scores and authority scores for the first 30 nodes in the graph are shown below:
67 | 
68 | 
69 |
70 | The change in hub score and authority score with respect to a few selected entities were measured, resulting in:
71 | 
72 | 
73 |
74 | And finally, the algorithm was benchmarked on Sparse Matrix vs Normal Matrix implementations for various values of :
75 |
76 | 
77 |
78 |
79 |
--------------------------------------------------------------------------------
/data/adj_list:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/data/adj_list
--------------------------------------------------------------------------------
/data/dense_link_matrix:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/data/dense_link_matrix
--------------------------------------------------------------------------------
/data/map:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/data/map
--------------------------------------------------------------------------------
/data/sparse_link_matrix:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/data/sparse_link_matrix
--------------------------------------------------------------------------------
/data/users:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/data/users
--------------------------------------------------------------------------------
/docs/authority_scores.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/docs/authority_scores.png
--------------------------------------------------------------------------------
/docs/auths.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/nikhil-iyer-97/HITS-Algorithm-implementation/dc96f56f927abf760d8ac3fa81d54cd72c9b4468/docs/auths.png
--------------------------------------------------------------------------------
/docs/dataset_fetcher.html:
--------------------------------------------------------------------------------
1 |
2 |
9 | dataset_fetcher | index /home/ubuntu/Documents/studies/3_1/IR_CS_F469/assn2/IR2/src/dataset_fetcher.py |
14 |
17 | Modules | ||||||
20 | |
|
28 |
31 | Classes | ||||||||||||||||||||||||||
34 | |
45 |
90 |
126 |
|
165 |
168 | Functions | ||
171 | |
|
9 | hits | index /home/ubuntu/Documents/studies/3_1/IR_CS_F469/assn2/IR2/src/hits.py |
14 |
17 | Modules | ||||||
20 | |
|
45 |
48 | Classes | ||||||||||||||||||
51 | |
61 |
100 |
|
144 |
147 | Functions | ||
150 | |
|
240 |
243 | Data | ||
246 | | ADJ_DIRECTED = 0 247 | ADJ_LOWER = 3 248 | ADJ_MAX = 1 249 | ADJ_MIN = 4 250 | ADJ_PLUS = 5 251 | ADJ_UNDIRECTED = 1 252 | ADJ_UPPER = 2 253 | ALL = 3 254 | BLISS_F = 0 255 | BLISS_FL = 1 256 | BLISS_FLM = 4 257 | BLISS_FM = 3 258 | BLISS_FS = 2 259 | BLISS_FSM = 5 260 | GET_ADJACENCY_BOTH = 2 261 | GET_ADJACENCY_LOWER = 1 262 | GET_ADJACENCY_UPPER = 0 263 | IN = 2 264 | Nexus = <igraph.remote.nexus.NexusConnection object> 265 | OUT = 1 266 | REWIRING_SIMPLE = 0 267 | REWIRING_SIMPLE_LOOPS = 1 268 | STAR_IN = 1 269 | STAR_MUTUAL = 3 270 | STAR_OUT = 0 271 | STAR_UNDIRECTED = 2 272 | STRONG = 2 273 | TRANSITIVITY_NAN = 0 274 | TRANSITIVITY_ZERO = 1 275 | TREE_IN = 1 276 | TREE_OUT = 0 277 | TREE_UNDIRECTED = 2 278 | WEAK = 1 279 | arpack_options = <igraph.ARPACKOptions object> 280 | config = <igraph.configuration.Configuration object> 281 | dbl_epsilon = 2.220446049250313e-16 282 | debug = False 283 | known_colors = {'alice blue': (0.9411764705882353, 0.9725490196078431, 1.0, 1.0), 'aliceblue': (0.9411764705882353, 0.9725490196078431, 1.0, 1.0), 'antique white': (0.9803921568627451, 0.9215686274509803, 0.8431372549019608, 1.0), 'antiquewhite': (0.9803921568627451, 0.9215686274509803, 0.8431372549019608, 1.0), 'antiquewhite1': (1.0, 0.9372549019607843, 0.8588235294117647, 1.0), 'antiquewhite2': (0.9333333333333333, 0.8745098039215686, 0.8, 1.0), 'antiquewhite3': (0.803921568627451, 0.7529411764705882, 0.6901960784313725, 1.0), 'antiquewhite4': (0.5450980392156862, 0.5137254901960784, 0.47058823529411764, 1.0), 'aqua': (0.0, 1.0, 1.0, 1.0), 'aquamarine': (0.4980392156862745, 1.0, 0.8313725490196079, 1.0), ...} 284 | name = 'write_svg' 285 | palettes = {'gray': <GradientPalette with 256 colors>, 'heat': <AdvancedGradientPalette with 256 colors>, 'rainbow': <RainbowPalette with 256 colors>, 'red-black-green': <AdvancedGradientPalette with 256 colors>, 'red-blue': <GradientPalette with 256 colors>, 'red-green': <GradientPalette with 256 colors>, 'red-purple-blue': <AdvancedGradientPalette with 256 colors>, 'red-yellow-green': <AdvancedGradientPalette with 256 colors>, 'terrain': <AdvancedGradientPalette with 256 colors>} 286 | pi = 3.141592653589793 |