Depicting E-mail Communication Patterns
Objectives
- Show the highest-volume mail paths in a subset of Enron messages
- Experiment with automatic real-time layout and animation
- Provide the foundation of an interactive mail search tool that will incorprate
other capabilities like time-based and phrase-frequency analysis
pending improved automatic layout of nodes
Methods
- Extract sender-receiver pairs from the Enron e-mail archive with a small Java program
- Produce JSON describing the edges of the graph
- Use D3's automatic force-directed layout, allowing for draggable nodes and
physics simulation
Challenges
- Obtaining a layout that wasn't jumbled mess; still needs refinement!
- Filtering out extraneous traffic like daily newsfeeds and system reports to
focus on more meaningful communication
Findings
-
D3 is quite versitile but one must tune the layout parameters closely
to match the data. It doesn't take too many nodes before clutter
and layout stabilization time are unacceptable.
-
The Enron Corpus is an amazing resource. In conjunction with the
communication patterns visualized here, it represents a large body of
text that can be analyzed and perused in other ways. I'm also
experimenting with phrase-based
semantic analysis of message text.
-
Some of the messages embody a fascinating narrative of an
organization's transition from boom to bust. At least a few
obscenity-laden messages ensued during the collapse.
-
Tracking of conversations is challenging because
a very high percentage of messages contain no subject. Apparently
Enron employees had a hard time typing into that field.
See Also
I may experiment with some of the visualization techniques
described in
this
academic paper.