-
Notifications
You must be signed in to change notification settings - Fork 0
zackgalbreath/MailingListData
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
* Author: Zack Galbreath
* Software used: Mac OS 10.6, Python 2.6.1, SQLite 3.6.12, PycURL 7.19.0,
BeautifulSoap 3.2.0, Pipermail 0.09 (Mailman edition).
* Project home: git://github.com/zackgalbreath/MailingListData.git
* MD5 checksum of the "official" database file is:
c17555700f3c9f0a019be6a537f83bc6
== Contents ==
README: The document you're currently reading
MailingListData.{db,csv,xlsx}: My dataset, in SQLite, comma-separated-values,
and Microsoft Excel 2007-2008 format
collectData.py: Generates the dataset and stores it in an SQLite database
convertSQLiteToCSV.py: Converts the SQLite database to CSV format
== Usage notes ==
* Running collectData.py will overwrite an existing MailingListData.db
If you modify the database and wish to keep your changes, you should rename
it or save it somewhere else.
* Similarly, convertSQLiteToCSV.py overwrites MailingListData.csv.
== Details about the columns ==
* When recording Message_Subject, any tab character was converted to a single
space.
* A Received_Reply of "self only" means that the original author was the only
person to reply to the message.
* Time_of_Day is recorded in EDT using a 24 hour clock.
* Message_Length is the number of characters in the HTML source of Archive_URL.
* Any_Attachments is set to "yes" when we detect a "non-text" attachment that is
not a "pgp-signature". Note that Pipermail considers C++ source code
(MIME-type text/x-c++src) to be a "non-text" attachment.
* You should be able to read each original email in its entirety at its
Archive_URL. This list of URLs was taken from:
http://www.itk.org/pipermail/insight-users/2011-August/thread.html
About
data science project fall 2011
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published