Project Ramblings

Thursday, April 7, 2011

Grid Model

<Apr 7 2011 21:55> Now that my project evaluation got a bit extended, and since I feel like doing a little more, I am looking to simulate greedy routing on Klienberg's grid based model. The challenges here are a little more - greedy routing needs to know the source and destination, and each time I need to forward the packet to the nearest neighbour. The thing is, when we say nearest, it automatically asks for the distance metric that we are using. Klienberg's uses the Euclidean distance between the nodes as the distance metric. This would mean given two nodes of the graph, I should be able to calculate the distance between them. This is quite easy, especially since you are working with python. (See this!). I also have to make a presentation, but the theoretical work for this project was mainly done earlier and so, I can use most of that presentation.

Now, network-x generates the grid graph in an easy to use manner so that the node labels are the co-ordinates of the point. This means calculating the distance between two points is very easy. Even other wise, I could always use the shortest_path_length().

Sunday, April 3, 2011

Simulations on Power law Graphs

<Apr 3 2011 7:07 pm> Now, I'm going to work on simulating a Random Walk Strategy on Power Law Graphs with broadcasting to two levels of neighbours - this is a fair assumption according to Adamic/Adar.
The procedure is simple - You start at an arbitrary node, and each iteration, you forward to a node reached by randomly choosing one of the outgoing edges, and in the process transmitting this information to two levels of its neighbours.
Say if I start at node v1, I choose at random one of the neighbours of v1 and transmit the message to the neighbours of v1 and their neighbours - This type of random walk covers the network in sublinear time(Adamic/Adar). <Apr 3 2011 7:11 pm>

<Apr 3 2011 11:39 pm> Random Walks simulation has been much more interesting. In this case, the visualisation actually allows us to visualise something! What is interesting from these kinds of visualisations is that they help us in understanding the problem better. The graphs and statistics one would obtain for random walks is somewhat similar in spirit to say an example, like the Spread of a file in a P2P network.

This is the completed video. The random walk broadcasts the message to neighbours of the node holding the message and their neighbours - which is a fair assumption in social networks, as one will know his friends and their friends. The random walks are continued till there is graph coverage of 80%.

A small improvement to the above strategy is to choose the node with the highest degree at each step, i.e. to to forward the packet to the highest connected neighbour. Only small modifications are required in the code for this. Check the video and notice how fast it covers the graph in comparison to the Random Walk Video above.

<Apr 4 2011 3:20 am> I want to simulate Greedy Routing on Klienberg's Grid-based model with the inverse square law distribution as well, but due to time constraints, I'm leaving it for the time being. I need to work on my report now. Will come back later.

The Simulations! -1

<Apr 3 2011 10:51 am> Starting with Random Graphs - the only property I wish to make interactive that these in these graphs - if you choose any pairs of nodes at random, there exists a short path between them with high probability. This means that all I need to do is to generate a huge random graph and show that there indeed exists short paths between arbitrarily chosen vertices. I would have loved to make these take inputs - but right now, I'm hard coding the values.

So, on a graph with 100 nodes : -
I'll consider 20 pairs of nodes and simulate the paths between them. In the original python script itself I can print out a table which I can use with gnuplot to generate a graph on the percentage of nodes having short paths. Now these values that I have mentioned are only representational - I'll have to tune these to get a good visualisation. And by "short", I mean path length is poly-logarithmic in the size of the network.

I am using a random regular graph : one problem I have is in choosing the degree of the regular graph . It does not make sense to choose a constant value like say d=3, as that may be too high(making the graph extremely connected and thus, an unrealistic estimate) or too low(for a graph with 20k+ nodes, that will be low). I'm not very sure of it, but for the time being, I have given it to be log(n).

Network-X is too slow in handling a graph with just 1000 nodes(which no: pairs = 10^6), but then I have got the asymptotic peaks below these values, so its ok.
The plot is done using gnuplot - now to export the graph as GML, and to write the code for automatically generating the input python script for GUESS. <Apr 3 2011 12:02 pm>

<Apr 3 2011 4:17 pm> Done! The first video is done.

The video quality is pretty bad :(. I'm working on this right now - And moreover, it is quite difficult to understand anything from this video, but i think this goes along with the reports and help files, so it's ok :-|.

Saturday, April 2, 2011

The Simulations!

<Apr 3 2011 10:30 am> It is the most "happening" time of the project - "The simulations". If you've seen my earlier post, you would know the simulations I have planned out. One disclaimer - I am running all these simulations on artificial generated data - due to time constraints. But I hope to run the same on actual data later(hopefully, just a matter of little bash scripting).

After spending considerable time fighting it out with python modules, networkx, guess(a graph exploratory tool scripted in jython) and mencoder(the encoder I wish to employ to create some handy videos), I think I have things sorted out. This will be a rough outline of how all these tools and scripts will fit together :-

1. I generate artificial graphs(this is good enough, as the models I've considered for the simulations are only Random Graphs, Power Law Graphs and Klienberg's grid-based model) using Network-X in Python.

2. Irrespective of the simulation I am going to do, the input to the GUESS system would be a python(it could've been jython also) script that contains the code to change the color of a single(or multiple if simulating Adamic/Adar's Random Walk with broadcasting) edge, and to export it as a JPEG.

3. The original python script will also output the graph generated in GraphML(GML) format. The problem here is that the node names are simple integers in the range [0,n-1) for a graph with n nodes. GUESS is not fine with it because of reserved keywords or so. A little sed will help here :

sed 's/[0-9][0-9]*/n&/g' $1 > file1

rm $1

mv file1 $1

This means that for each file the original python scripts output, this script will take them in and give the "good" file that GUESS can take as input. Note that GUESS can also take .gdf(which is a GUESS format) files as the database on which to work on. <Apr 3 2011 10:45 am>

Sunday, March 27, 2011

Last One week

<Mon Mar 28th 2011 2:12 a.m.> As part of the project, "Navigability and Searching in Social Networks", we consider the problem of information propagation through a social network.
This is one practical problem in the following settings :-
1. How far does a file spread in a P2P network?
2. How far does word-of-mouth 'rumours' spread?

The aim is to answer these two questions using simulation studies.
Ideally, file spread in a P2P network will have to make use of actual P2P information, so I'm not attempting to simulate this right now.
Word-of-Mouth Rumours are to be studied in the following manner :-
What are the models of Information Diffusion ? 1. The independent cascade model 2. Linear threshold Model
The theory of these two models will be adopted from Klienberg's paper.
The end result of the simulation study will be this :-
On each of the following three types of graphs :
1. Random Graphs
2. Klienberg's grid-model and,
3. Power Law Graphs
Nodes will be chosen at random and using both of the Information Diffusion Models, the spread of a word-of-mouth referral will be simulated,
and statistical properties will be measured.
This tantamounts to a total of 6 video visualizations showing Information DIffusion in different well known settings.
The tools to be used are the Network-X package(Python), GUESS(The Graph Exploratory System) and mencoder(to encode the images as a short video).
Because the earlier visualizations were lost, Random s-t reaachability will be simulated for both the Random Graph and Klienberg's grid model using
Greedy Routing. (2 more video visualizations). Another video visualization on Graph Coverage using Random Walks in Power Law Graphs will also be completed.

The end goals in the next 1 week's time will be :-
1. 9 video visualizations with documentation
2. The final report - Additions from the interim report will be the document on Random Walks in Power Law Graphs and the notes on simulations. <Mon Mar 28th 2011 2:46 a.m.>

Friday, March 18, 2011

Deadlines Ahead

Unfortunate incident - I lost almost all my files from last semester related to the project. It's not much, but I had a good collection of relevant papers categorized very well. And I lost my tex file for my report, and the simulation files, and output jpeg's I had generated. I think I have 10 more days to complete my project and the report. Right now, the work left, in order of priority are:

1. The final report, along with the simulations on Random Graphs, Power Law Graphs, Klienberg's grid-based model and greedy routing algorithm - these I had done earlier, but I don't have them now. A simulation on Random walks in power law graphs is also pending.

2. LiveJournal Dataset - Statistical Properties of the Same - Analysis - This will also be included in the final report.

3. I would like to show a simulation regarding the spread of a file in a P2P network as well - to show the importance of accurate modeling of these networks. But this is at a lower priority.

I am a bit disappointed with the way this project went - I had thought of doing a bit more research, but by the time I kicked off in full swing, and had all the tools in place, a semester got over! And this semester was short - anyway I'm not cribbing. I can safely say I have done a fair amount of work - considering I don't have any team members.

But I have identified few good problems and lines of research i want to pursue after I'm done with my report :

1. Characterization of Social (or Complex Networks) as networks embedded in hyperbolic metric spaces. Klienberg's model considered the nodes to be in a Euclidean metric space, but recent research shows hyperbolic metric spaces achieve greater efficiency for greedy routing - I wanted to do a simulation of the same, and understand this more.

2. Although this characterization is very promising, and researchers have started making hyperbolic maps of the whole Internet at the AS-Level, given a huge network, like say a P2P network, there does not exist any algorithm that can map the nodes to a hyperbolic metric space, so as to facilitate greedy routing.

Such an algorithm will have good consequences.

One problem I am facing is that I have not taken any formal course in Real Analysis or Topology, so metric spaces itself is a new concept for me. I spent quite some time in my project, trying to understand this characterization, but have not been able to proceed much.

I'm a bit tense also. Hope to complete everything in time!

Wednesday, February 16, 2011

More on Power Law graphs

After a considerable amount of reading on Power Law Graphs, and how they often come up in real world network data, I decided I should get my hands dirty by understanding some analysis techniques of these networks. In all of this exploration so far, my aim was to understand how to model complex networks, simulate them, and understand existing routing mechanisms in these. In most of the literature I have explored so far, emphasis on modelling, and routing strategies are simple strategies like Random Walks or Greedy strategies . A very strong line of analysis on Power Law Graphs(and in other graphs with arbitrary degree distributions as well) is using the formalism of Generating Functions suggested by Newman et al.
Adamic/Adar's seminal work on Search in Power Law networks makes use of this. It took me some time to understand the details in depth, and hence I wrote a small document outlining the proof for sublinear scaling of random walks. It is available here.

If you happen to go there, feel free to take a look at my other projects as well!