Depicting E-mail Communication Patterns
- Show the highest-volume mail paths in a subset of Enron messages
- Experiment with automatic real-time layout and animation
- Provide the foundation of an interactive mail search tool that will incorprate
other capabilities like time-based and phrase-frequency analysis
pending improved automatic layout of nodes
- Extract sender-receiver pairs from the Enron e-mail archive with a small Java program
- Produce JSON describing the edges of the graph
- Use D3's automatic force-directed layout, allowing for draggable nodes and
- Obtaining a layout that wasn't jumbled mess; still needs refinement!
- Filtering out extraneous traffic like daily newsfeeds and system reports to
focus on more meaningful communication
D3 is quite versitile but one must tune the layout parameters closely
to match the data. It doesn't take too many nodes before clutter
and layout stabilization time are unacceptable.
The Enron Corpus is an amazing resource. In conjunction with the
communication patterns visualized here, it represents a large body of
text that can be analyzed and perused in other ways. I'm also
experimenting with phrase-based
semantic analysis of message text.
Some of the messages embody a fascinating narrative of an
organization's transition from boom to bust. At least a few
obscenity-laden messages ensued during the collapse.
Tracking of conversations is challenging because
a very high percentage of messages contain no subject. Apparently
Enron employees had a hard time typing into that field.
I may experiment with some of the visualization techniques
described in this