Add Contacts in Webmail
The current customer database is in a Excel spreadsheet and the marcom folks have to literally copy and paste them one by one into the marcom Outlook a/c in the desktop pc. I am sure there is a better way in doing this, automation is the way to go. I explored both using Outlook as well as the web-based email. With a bit of Googling, it is possible to export the MS Excel to CSV (comma separated variable) and have the CSV file imported into the Outlook. It would be easier to introduce a header in the first row so that the mapping of your Excel data to Outlook contact can be done exactly.
What happen if the person is on leave and the other marcom folks do not have access to the desktop. AFAIK, the Outlook contacts are stored in your local desktop, not in the server. BTW, we are running Sun Messaging Server, not MS Exchange. It would be nice to import all these email addresses into the system so that it can be access anywhere by anyone. However, such functionality is not available in the system.
With many years of Web Scraping experience, it will not be too hard to do that, right? First, I installed LiveHTTPheaders extension in FireFox. Second, I login to the mail server and add a contact to the system. I picked a particular syntax (like this _XXXXX_) for all the values in the fields. This syntax enables me to easily identify the value in the encoded string in the POST data.
I break them into lines for ease of reading
sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A_FIRSTNAME_%0D%0A sn%3A_LASTNAME_%0D%0Acn%3A_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0A mail%3A_EMAIL_%0D%0Atelephonenumber%3A_OFFICE_%0D%0Ahomephone%3A_HOME_%0D%0Amobile%3A_MOBILE_%0D%0A pager%3A_PAGER_%0D%0Afacsimiletelephonenumber%3A_FAX_%0D%0A%0D%0A%7C
If you do a "man ascii" in solaris, you will see the hex character encoding
Hexadecimal - Character 00 NUL 01 SOH 02 STX 03 ETX 04 EOT 05 ENQ 06 ACK 07 BEL 08 BS 09 HT 0A NL 0B VT 0C NP 0D CR 0E SO 0F SI 10 DLE 11 DC1 12 DC2 13 DC3 14 DC4 15 NAK 16 SYN 17 ETB 18 CAN 19 EM 1A SUB 1B ESC 1C FS 1D GS 1E RS 1F US 20 SP 21 ! 22 " 23 # 24 $ 25 % 26 & 27 ' 28 ( 29 ) 2A * 2B + 2C , 2D - 2E . 2F / 30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7 38 8 39 9 3A : 3B ; 3C < 3D = 3E > 3F ? 40 @ 41 A 42 B 43 C 44 D 45 E 46 F 47 G 48 H 49 I 4A J 4B K 4C L 4D M 4E N 4F O 50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W 58 X 59 Y 5A Z 5B [ 5C \ 5D ] 5E ^ 5F _ 60 ` 61 a 62 b 63 c 64 d 65 e 66 f 67 g 68 h 69 i 6A j 6B k 6C l 6D m 6E n 6F o 70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w 78 x 79 y 7A z 7B { 7C | 7D } 7E ~ 7F DELSomehow the characters in the encoded POST data include non-printable characters, eg %0D, %0A.... So shall I do decoding to find out the name value pairs. The answser is no. What I did was to simply prepend a "$" (dollar) sign in front of the value so that I can dynamically get Tcl to do the substitution.
sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A$_FIRSTNAME_%0D%0A sn%3A$_LASTNAME_%0D%0Acn%3A$_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0A mail%3A$_EMAIL_%0D%0Atelephonenumber%3A$_OFFICE_%0D%0Ahomephone%3A$_HOME_%0D%0Amobile%3A$_MOBILE_%0D%0A pager%3A$_PAGER_%0D%0Afacsimiletelephonenumber%3A$_FAX_%0D%0A%0D%0A%7C
All I have to is to export the Excel data into CSV (I choose tab separated to avoid comma character in the data) and get the program to loop through every line. For each line, all I have to do is to extract the email address, name, ... and set the corresponding variables. The correct POST data will immediately fall into places. BTW, the login process requires to capture the Cookie, which is the session key for the entire process.
Below shows the script and the screen dumps (add contact, livehttpheaders)
#! /usr/bin/tclsh if { $argc != 1 } { puts stderr "Usage: $argv0 <tab-separated-csv>" exit 1 } set tabcsv [lindex $argv 0] proc getCookieHeader { s } { set cookie {} foreach {n v} [set ${s}(meta)] { if { [string equal -nocase {Set-Cookie} $n] } { lappend cookie $v } } set set_cookie {} foreach c $cookie { append set_cookie "[lindex [split $c {;}] 0]; " } return [string range $set_cookie 0 end-2] } set u username; set p password package require http set url "http://some.mail.server" set url_login "$url/login.msc" set url_addcontact "$url/ldap.msc" set url_logout "$url/cmd.msc?sid=&mbox=&cmd=logout&security=false&lang=en" # login set s [http::geturl $url_login -query [http::formatQuery user $u password $p]] set header [getCookieHeader $s] http::cleanup $s set fp [open $tabcsv r] # skip first line gets $fp line while { [gets $fp line] >= 0 } { set linelist [split $line "\t"] set company [lindex $linelist 0] set custname [lindex $linelist 1] set custemail [lindex $linelist 2] if { [string length $custemail]==0 || [llength $linelist]<=3 } { continue } puts $custemail set _FIRSTNAME_ "" set _LASTNAME_ "$custname" set _DISPLAY_ "$company - $custname" set _EMAIL_ "$custemail" set _OFFICE_ "" set _HOME_ "" set _MOBILE_ "" set _PAGER_ "" set _FAX_ "" # add contact set contact_query "sid=&security=false&ldapurl=pab&filter=cmd%3DPAB_CMD_ADD_ENTRY%7Cnew%3Dgivenname%3A$_FIRSTNAME_%0D%0Asn%3A$_LASTNAME_%0D%0Acn%3A$_DISPLAY_%0D%0Aobjectclass%3Apabperson%0D%0Amemberofpab%3AAddressBookc6e5131%0D%0Amail%3A$_EMAIL_%0D%0Atelephonenumber%3A$_OFFICE_%0D%0Ahomephone%3A$_HOME_%0D%0Amobile%3A$_MOBILE_%0D%0Apager%3A$_PAGER_%0D%0Afacsimiletelephonenumber%3A$_FAX_%0D%0A%0D%0A%7C" set s [http::geturl $url_addcontact -headers [list Cookie $header] -query $contact_query] http::cleanup $s } close $fp # logout set s [http::geturl $url_logout -headers [list Cookie $header]] http::cleanup $s
Labels: Tcl, Web Scraping
0 Comments:
Post a Comment
<< Home