├── README.md └── gmail_read.py /README.md: -------------------------------------------------------------------------------- 1 | # Using Gmail API and Python to read mail messages 2 | 3 | I personally find Gmail API to be a bit confusing for beginners. The API’s wizard comes handy to create the project, get credentials, and authenticate them. However, the process that should be followed after that is not mentioned clearly. 4 | 5 | Therefore, I have created a sample Python script that does the following: 6 | * Go to Gmal inbox 7 | * Find and read all the unread messages 8 | * Extract details (Date, Sender, Subject, Snippet, Body) and export them to a .csv file / DB 9 | * Mark the messages as Read - so that they are not read again 10 | 11 | 12 | [Link to the script](https://github.com/abhishekchhibber/Gmail-Api-through-Python/blob/master/gmail_read.py) 13 | 14 | 15 | Before running this script, the user should get the authentication by following 16 | the [Gmail API link](https://developers.google.com/gmail/api/quickstart/python) 17 | Also, client_secret.json should be saved in the same directory as this file 18 | 19 | 20 | The script outputs a dictionary in the following format: 21 | 22 | ``` 23 | { 'Sender': '"email.com" ', 24 | 'Subject': 'Lorem ipsum dolor sit ametLorem ipsum dolor sit amet', 25 | 'Date': 'yyyy-mm-dd', 26 | 'Snippet': 'Lorem ipsum dolor sit amet' 27 | 'Message_body': 'Lorem ipsum dolor sit amet' 28 | } 29 | ``` 30 | 31 | 32 | 33 | The dictionary can be exported as a .csv or into a databse 34 | 35 | -------------------------------------------------------------------------------- /gmail_read.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Reading GMAIL using Python 3 | - Abhishek Chhibber 4 | ''' 5 | 6 | ''' 7 | This script does the following: 8 | - Go to Gmal inbox 9 | - Find and read all the unread messages 10 | - Extract details (Date, Sender, Subject, Snippet, Body) and export them to a .csv file / DB 11 | - Mark the messages as Read - so that they are not read again 12 | ''' 13 | 14 | ''' 15 | Before running this script, the user should get the authentication by following 16 | the link: https://developers.google.com/gmail/api/quickstart/python 17 | Also, client_secret.json should be saved in the same directory as this file 18 | ''' 19 | 20 | # Importing required libraries 21 | from apiclient import discovery 22 | from apiclient import errors 23 | from httplib2 import Http 24 | from oauth2client import file, client, tools 25 | import base64 26 | from bs4 import BeautifulSoup 27 | import re 28 | import time 29 | import dateutil.parser as parser 30 | from datetime import datetime 31 | import datetime 32 | import csv 33 | 34 | 35 | # Creating a storage.JSON file with authentication details 36 | SCOPES = 'https://www.googleapis.com/auth/gmail.modify' # we are using modify and not readonly, as we will be marking the messages Read 37 | store = file.Storage('storage.json') 38 | creds = store.get() 39 | if not creds or creds.invalid: 40 | flow = client.flow_from_clientsecrets('client_secret.json', SCOPES) 41 | creds = tools.run_flow(flow, store) 42 | GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http())) 43 | 44 | user_id = 'me' 45 | label_id_one = 'INBOX' 46 | label_id_two = 'UNREAD' 47 | 48 | # Getting all the unread messages from Inbox 49 | # labelIds can be changed accordingly 50 | unread_msgs = GMAIL.users().messages().list(userId='me',labelIds=[label_id_one, label_id_two]).execute() 51 | 52 | # We get a dictonary. Now reading values for the key 'messages' 53 | mssg_list = unread_msgs['messages'] 54 | 55 | print ("Total unread messages in inbox: ", str(len(mssg_list))) 56 | 57 | final_list = [ ] 58 | 59 | 60 | for mssg in mssg_list: 61 | temp_dict = { } 62 | m_id = mssg['id'] # get id of individual message 63 | message = GMAIL.users().messages().get(userId=user_id, id=m_id).execute() # fetch the message using API 64 | payld = message['payload'] # get payload of the message 65 | headr = payld['headers'] # get header of the payload 66 | 67 | 68 | for one in headr: # getting the Subject 69 | if one['name'] == 'Subject': 70 | msg_subject = one['value'] 71 | temp_dict['Subject'] = msg_subject 72 | else: 73 | pass 74 | 75 | 76 | for two in headr: # getting the date 77 | if two['name'] == 'Date': 78 | msg_date = two['value'] 79 | date_parse = (parser.parse(msg_date)) 80 | m_date = (date_parse.date()) 81 | temp_dict['Date'] = str(m_date) 82 | else: 83 | pass 84 | 85 | for three in headr: # getting the Sender 86 | if three['name'] == 'From': 87 | msg_from = three['value'] 88 | temp_dict['Sender'] = msg_from 89 | else: 90 | pass 91 | 92 | temp_dict['Snippet'] = message['snippet'] # fetching message snippet 93 | 94 | 95 | try: 96 | 97 | # Fetching message body 98 | mssg_parts = payld['parts'] # fetching the message parts 99 | part_one = mssg_parts[0] # fetching first element of the part 100 | part_body = part_one['body'] # fetching body of the message 101 | part_data = part_body['data'] # fetching data from the body 102 | clean_one = part_data.replace("-","+") # decoding from Base64 to UTF-8 103 | clean_one = clean_one.replace("_","/") # decoding from Base64 to UTF-8 104 | clean_two = base64.b64decode (bytes(clean_one, 'UTF-8')) # decoding from Base64 to UTF-8 105 | soup = BeautifulSoup(clean_two , "lxml" ) 106 | mssg_body = soup.body() 107 | # mssg_body is a readible form of message body 108 | # depending on the end user's requirements, it can be further cleaned 109 | # using regex, beautiful soup, or any other method 110 | temp_dict['Message_body'] = mssg_body 111 | 112 | except : 113 | pass 114 | 115 | print (temp_dict) 116 | final_list.append(temp_dict) # This will create a dictonary item in the final list 117 | 118 | # This will mark the messagea as read 119 | GMAIL.users().messages().modify(userId=user_id, id=m_id,body={ 'removeLabelIds': ['UNREAD']}).execute() 120 | 121 | 122 | 123 | 124 | print ("Total messaged retrived: ", str(len(final_list))) 125 | 126 | ''' 127 | 128 | The final_list will have dictionary in the following format: 129 | 130 | { 'Sender': '"email.com" ', 131 | 'Subject': 'Lorem ipsum dolor sit ametLorem ipsum dolor sit amet', 132 | 'Date': 'yyyy-mm-dd', 133 | 'Snippet': 'Lorem ipsum dolor sit amet' 134 | 'Message_body': 'Lorem ipsum dolor sit amet'} 135 | 136 | 137 | The dictionary can be exported as a .csv or into a databse 138 | ''' 139 | 140 | #exporting the values as .csv 141 | with open('CSV_NAME.csv', 'w', encoding='utf-8', newline = '') as csvfile: 142 | fieldnames = ['Sender','Subject','Date','Snippet','Message_body'] 143 | writer = csv.DictWriter(csvfile, fieldnames=fieldnames, delimiter = ',') 144 | writer.writeheader() 145 | for val in final_list: 146 | writer.writerow(val) 147 | --------------------------------------------------------------------------------