├── requirements.txt
├── pic
    ├── landing_page.png
    ├── fake_news_page.png
    ├── doc_embedding_page.png
    └── word_embedding_page.png
├── data
    ├── word_embedding_demo.csv
    ├── sent_embed_demo.csv
    └── doc_embed_demo.csv
├── landing_page.py
├── src
    ├── sent_embed.py
    ├── fake_news.py
    └── word_embedding.py
├── README.md
├── .gitignore
├── SessionState.py
├── app.py
├── fake_news_classifier_page.py
├── sentence_embedding_page.py
└── word_embedding_page.py


/requirements.txt:
--------------------------------------------------------------------------------
1 | nlu
2 | streamlit
3 | matplotlib
4 | plotly
5 | seaborn
6 | scikit-learn


--------------------------------------------------------------------------------
/pic/landing_page.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexott/nlp_model_selection_app/main/pic/landing_page.png


--------------------------------------------------------------------------------
/pic/fake_news_page.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexott/nlp_model_selection_app/main/pic/fake_news_page.png


--------------------------------------------------------------------------------
/pic/doc_embedding_page.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexott/nlp_model_selection_app/main/pic/doc_embedding_page.png


--------------------------------------------------------------------------------
/pic/word_embedding_page.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alexott/nlp_model_selection_app/main/pic/word_embedding_page.png


--------------------------------------------------------------------------------
/data/word_embedding_demo.csv:
--------------------------------------------------------------------------------
 1 | word, tag
 2 | Unicorns, A
 3 | have, A
 4 | been, A
 5 | sighted, A
 6 | on, A
 7 | Mars!, A
 8 | Trump,B
 9 | to,B
10 | Visit,B
11 | California,B
12 | After,B
13 | Criticism,B
14 | Over,B
15 | Silence,B
16 | on,B
17 | Wildfires,B


--------------------------------------------------------------------------------
/landing_page.py:
--------------------------------------------------------------------------------
 1 | import streamlit as st
 2 | 
 3 | 
 4 | 
 5 | def show(session_state):  
 6 |     """Run this function for showing the fake news section in app
 7 |     """
 8 |     st.write(
 9 |         """
10 |         ![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)
11 | 
12 |         # Welcome to NLU App :spock-hand:
13 | 
14 |         **Select one model form the drop down menu** in order to start.  
15 | 
16 |         Here you can find John Snow Lab NLU showcases. 
17 |         With the freshly released NLU library which gives you 350+ NLP models and 100+ Word Embeddings, you have infinite possibilities to explore your data and gain insights. 
18 |         """
19 |         )
20 | 
21 | 
22 | 
23 | if __name__ == "__main__":
24 |     pass


--------------------------------------------------------------------------------
/data/sent_embed_demo.csv:
--------------------------------------------------------------------------------
 1 | doc,label
 2 | "Story of a man who has unnatural feelings for a pig",0
 3 | "Airport '77 starts as a brand new luxury 747 plane is loaded up with valuable paintings & such belonging to rich businessman Philip Stevens (James Stewart) who is flying them & a bunch of VIP's to his estate in preparation of it being opened to the public as a museum, also on board is Stevens daughter Julie (Kathleen Quinlan) & her son",0
 4 | "This film lacked something I couldn't put my finger on at first: charisma on the part of the leading actress",0
 5 | "Sorry everyone I know this is supposed to be an ""art"" film, but wow, they should have handed out guns at the screening so people could blow their brains out and not watch",0
 6 | "When I was little my parents took me along to the theater to see Interiors",0
 7 | "This movie gets better each time I see it (which is quite often)",1
 8 | "My only complaint is that Brooks should have cast someone else in the lead (I love Mel as a Director and Writer, not so much as a lead)",1
 9 | "Again, Warren was the best actor in the movie, but ""Fume"" and ""Sailor"" both played their parts well",1
10 | "This isn't the comedic Robin Williams, nor is it the quirky/insane Robin Williams of recent thriller fame",1


--------------------------------------------------------------------------------
/src/sent_embed.py:
--------------------------------------------------------------------------------
 1 | import nlu
 2 | import matplotlib.pyplot as plt
 3 | import pandas as pd
 4 | import numpy as np
 5 | pd.set_option('display.max_columns', 500)
 6 | pd.set_option('max_colwidth', 40)
 7 | pd.options.display.float_format = "{:.2f}".format
 8 | 
 9 | 
10 | # Document
11 | data_doc = pd.read_csv("data/doc_embed_demo.csv", sep=",", header=[0], encoding="utf-8", dtype = "unicode")
12 | model_pipe_doc = nlu.load("elmo")
13 | predictions_doc = model_pipe_doc.predict(data_doc[["doc"]], output_level='document', positions=True)
14 | 
15 | predictions_doc.elmo_embeddings.shape # 20 docs
16 | predictions_doc.elmo_embeddings[0].__len__() # 121  # token
17 | predictions_doc.elmo_embeddings[0][0].__len__() # 512 embedding dim
18 | # Document Embedding dim
19 | # (#Docs, #Token, #Embed)
20 | predictions_doc.columns
21 | 
22 | # Sentence
23 | data_sent = pd.read_csv("data/sent_embed_demo.csv", sep =",", header = [0], encoding = "utf-8", dtype = "unicode")
24 | model_pipe_sent = nlu.load("bert")
25 | predictions_sent = model_pipe_sent.predict(data_sent[["doc"]], output_level='sentence', positions=True)
26 | 
27 | predictions_sent.bert_embeddings.shape # 9 sentences
28 | predictions_sent.bert_embeddings[0].__len__() # 10 word
29 | data_sent.doc[0].split().__len__() # 10
30 | predictions_sent.bert_embeddings[0][0].shape # 128 embedding dim
31 | # Sentence Embedding dim
32 | # (9, #Token, 128)
33 | 
34 | predictions_sent.columns
35 | 
36 | 
37 | 
38 | 
39 | 
40 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Word Embedding with (John Snow Lab) NLU
 2 |  NLU has created a powerful API for embeddings (and even some NLP downstream-task like sarcasm detection or sentiment classification) in 1-liner of code. However, at the beginning of each NLP projects, you are facing the issue of selecting the model that fits best to your data structure. This app is designed for selecting and comparing pre-trained NLP models from NLU (John Snow Lab) with own data. Whether your project has word, sentence or document embeddings: upload the data, select some pre-trained models and download the embeddings. 
 3 | 
 4 | ![](pic/landing_page.png)
 5 | 
 6 | This app was build with [Streamlit](https://www.streamlit.io/) and has the sections:  
 7 | 
 8 | **Word embbeding**
 9 | ![](pic/word_embedding_page.png)
10 | 
11 | **Sentance or Document embedding**
12 | ![](pic/doc_embedding_page.png)
13 | 
14 | **Fake News Classifier**
15 | ![](pic/fake_news_page.png)
16 | 
17 | # Get start on your local machine
18 | Getting start with the description below on your go to [John White Lab Installation](https://nlu.johnsnowlabs.com/docs/en/install)
19 | 
20 | ## 1. Java 8
21 | You only need to configure Java 8 on your machine and are good to go! Unless you are on Windows, which requires 1 additional step.
22 | 
23 | * [Setup Java 8 on Windows](https://access.redhat.com/documentation/en-us/openjdk/8/html/getting_started_with_openjdk_8/getting_started_with_openjdk_for_windows)  
24 | * [Setup Java 8 on Linux](https://openjdk.java.net/install/)
25 | * [Setup Java 8 on Mac](https://docs.oracle.com/javase/8/docs/technotes/guides/install/mac_jdk.html)
26 | 
27 | Check you java version
28 | ```bash
29 | $ java -version
30 | # should be Java 8 (Oracle or OpenJDK)
31 | ```
32 | 
33 | 
34 | ## 2. Windows Specific Prerequisites
35 | * Download [winutils.exe](https://github.com/steveloughran/winutils/blob/master/hadoop-2.7.1/bin/winutils.exe)
36 | * Create folder C:\winutils\bin
37 | * Copy winutils.exe inside C:\winutils\bin
38 | * Set environment variable HADOOP_HOME to C:\winutils
39 | 
40 | ## 3 Install NLU
41 | Install `PySpark` based NLU form pip.  
42 | 
43 | *Note: For `nlu<=1.0.2` please use a Python version with version number SMALLER than 3.8* 
44 | 
45 | ```bash
46 | $ pip install nlu
47 | ```
48 |  
49 | 
50 | 
51 | # Credentials and links
52 | * https://github.com/JohnSnowLabs/spark-nlp-workshop 
53 | * https://www.johnsnowlabs.com/spark-nlp-in-action/
54 | * https://www.streamlit.io/
55 | * https://datascienceplus.com/


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/SessionState.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import streamlit.report_thread as ReportThread
 3 | from streamlit.server.server import Server
 4 | 
 5 | class SessionState(object):
 6 |     def __init__(self, **kwargs):
 7 |         """A new SessionState object.
 8 |         Parameters
 9 |         ----------
10 |         **kwargs : any
11 |             Default values for the session state.
12 |         Example
13 |         -------
14 |         >>> session_state = SessionState(user_name='', favorite_color='black')
15 |         >>> session_state.user_name = 'Mary'
16 |         ''
17 |         >>> session_state.favorite_color
18 |         'black'
19 |         """
20 |         for key, val in kwargs.items():
21 |             setattr(self, key, val)
22 | 
23 | 
24 | def get(**kwargs):
25 |     """Gets a SessionState object for the current session.
26 |     Creates a new object if necessary.
27 |     Parameters
28 |     ----------
29 |     **kwargs : any
30 |         Default values you want to add to the session state, if we're creating a
31 |         new one.
32 |     Example
33 |     -------
34 |     >>> session_state = get(user_name='', favorite_color='black')
35 |     >>> session_state.user_name
36 |     ''
37 |     >>> session_state.user_name = 'Mary'
38 |     >>> session_state.favorite_color
39 |     'black'
40 |     Since you set user_name above, next time your script runs this will be the
41 |     result:
42 |     >>> session_state = get(user_name='', favorite_color='black')
43 |     >>> session_state.user_name
44 |     'Mary'
45 |     """
46 |     # Hack to get the session object from Streamlit.
47 | 
48 |     ctx = ReportThread.get_report_ctx()
49 | 
50 |     this_session = None
51 | 
52 |     current_server = Server.get_current()
53 |     if hasattr(current_server, '_session_infos'):
54 |         # Streamlit < 0.56
55 |         session_infos = Server.get_current()._session_infos.values()
56 |     else:
57 |         session_infos = Server.get_current()._session_info_by_id.values()
58 | 
59 |     for session_info in session_infos:
60 |         s = session_info.session
61 |         if (
62 |             # Streamlit < 0.54.0
63 |             (hasattr(s, '_main_dg') and s._main_dg == ctx.main_dg)
64 |             or
65 |             # Streamlit >= 0.54.0
66 |             (not hasattr(s, '_main_dg') and s.enqueue == ctx.enqueue)
67 |             or
68 |             # Streamlit >= 0.65.2
69 |             (not hasattr(s, '_main_dg') and s._uploaded_file_mgr == ctx.uploaded_file_mgr)
70 |         ):
71 |             this_session = s
72 | 
73 |     if this_session is None:
74 |         raise RuntimeError(
75 |             "Oh noes. Couldn't get your Streamlit Session object. "
76 |             'Are you doing something fancy with threads?')
77 | 
78 |     # Got the session object! Now let's attach some state into it.
79 | 
80 |     if not hasattr(this_session, '_custom_session_state'):
81 |         this_session._custom_session_state = SessionState(**kwargs)
82 | 
83 |     return this_session._custom_session_state
84 | 


--------------------------------------------------------------------------------
/app.py:
--------------------------------------------------------------------------------
 1 | import nlu
 2 | import streamlit as st
 3 | import pandas as pd
 4 | 
 5 | import landing_page as LandingPage
 6 | import fake_news_classifier_page as FakeNewsClassifierPage
 7 | import SessionState as SessionState
 8 | import word_embedding_page as WordEmbeddingPage
 9 | import sentence_embedding_page as SentenceEmbeddingPage
10 | 
11 | 
12 | # Session State
13 | session_state = SessionState.get(
14 |     fakenews_pipe = None
15 |     ,fakenews_out = pd.DataFrame()
16 |     ,fakenews_is_loaded = False
17 |     ,fakenews_txt = ["Write Here..."]
18 |     ,fakenews_is_predicted = False
19 |     ,fakenews_fig = None
20 |     # Word Embedding
21 |     ,word_embed_input_format = "Copy Paste Text"
22 |     ,word_embed_is_labeled = "Labeled data"
23 |     ,word_embed_selected_model_names = ""
24 |     ,word_embed_loaded_model_names = ""
25 |     ,word_embed_pipe = None
26 |     ,word_embed_txt_input = "The quick brown fox jumps over the lazy dog."
27 |     ,word_embed_txt_out = pd.DataFrame()
28 |     ,word_embed_txt_is_predicted = False
29 |     ,word_embed_csv_input = pd.DataFrame()
30 |     ,word_embed_csv_out = pd.DataFrame()
31 |     ,word_embed_csv_is_predicted = False
32 |     ,word_embed_csv_label_column_name = "-"
33 |     ,word_embed_csv_word_column_name ="-"
34 |     # Sentence Embedding
35 |     ,sent_embed_input_format = "sentence"
36 |     ,sent_embed_selected_model_names = ""
37 |     ,sent_embed_is_labeled = "Labeled data"
38 |     ,sent_embed_loaded_model_names = ""
39 |     ,sent_embed_pipe = None
40 |     ,sent_embed_csv_input = pd.DataFrame()
41 |     ,sent_embed_csv_out = pd.DataFrame()
42 |     ,sent_embed_csv_is_predicted = False
43 |     ,sent_embed_csv_label_column_name = "-"
44 |     ,sent_embed_csv_txt_column_name ="-")
45 | 
46 | 
47 | 
48 | # Consolidate pages
49 | def main():
50 |     """Run this to run the programme
51 |     """
52 |     # Page Setup
53 |     st.set_page_config(
54 |         page_title="NLU Showcase App",
55 |         page_icon=":spock-hand:",
56 |         layout="centered",
57 |         initial_sidebar_state="expanded")
58 | 
59 |     # SIDEBAR
60 |     st.sidebar.title("Navigation")
61 |     mode = st.sidebar.radio("", 
62 |                                 ["Home"
63 |                                 ,"Model Selection: Word Embbeding "
64 |                                 ,"Model Selection: Sentence or Document Embedding"
65 |                                 ,"Pre Trained Model: Fake News Classifier"
66 |                                 # ,"Sarcasm Detection"
67 |                                 # ,"Sentiment Classifier"
68 |                                 # ,"Language Classifier"
69 |                                 ]
70 |                                 )
71 | 
72 |         
73 |     if mode == "Home":
74 |         LandingPage.show(session_state)
75 |     elif mode == "Pre Trained Model: Fake News Classifier":
76 |         FakeNewsClassifierPage.show(session_state)
77 |     elif mode == "Model Selection: Word Embbeding ":
78 |         WordEmbeddingPage.show(session_state)
79 |     elif mode == "Model Selection: Sentence or Document Embedding":
80 |         SentenceEmbeddingPage.show(session_state)
81 |     else:
82 |         None
83 | 
84 |     st.sidebar.write(
85 |         """
86 |         ---------
87 |         # About 
88 |         """)    
89 |     st.sidebar.info(
90 |         "This app is maintained by Dennis Triepke. "
91 |         "You can learn more about me at [linkedin.com](https://www.linkedin.com/in/dennistriepke/)."
92 |     )
93 | 
94 | 
95 | 
96 | if __name__ == "__main__":
97 |     main()


--------------------------------------------------------------------------------
/src/fake_news.py:
--------------------------------------------------------------------------------
 1 | import nlu
 2 | import matplotlib.pyplot as plt
 3 | import pandas as pd
 4 | import numpy as np
 5 | 
 6 | pd.set_option('display.max_columns', 500)
 7 | pd.set_option('max_colwidth', 40)
 8 | pd.options.display.float_format = "{:.2f}".format
 9 | 
10 | 
11 | 
12 | # Fake News
13 | model_sent = nlu.load('en.classify.fakenews')
14 | news = ['Unicorns have been sighted on Mars!'
15 |         ,'5G and Bill Gates cause COVID'
16 |         ,'Trump to Visit California After Criticism Over Silence on Wildfires']
17 | 
18 | # Pasted Text
19 | news = """In one day, nine cases meant to attack President-elect Joe Biden's win in key states were denied or dropped, adding up to a brutal series of losses for the President, who's already lost and refuses to let go. 
20 | Many of the cases are built upon a foundational idea that absentee voting and slight mismanagement of elections invite widespread fraud, which is not proven and state leaders have overwhelming said did not happen in 2020.
21 | In court on Friday: 
22 | 
23 | The Trump campaign lost six cases in Montgomery County and Philadelphia County in Pennsylvania over whether almost 9,000 absentee ballots could be thrown out.
24 | The Trump campaign dropped a lawsuit in Arizona seeking a review by hand of all ballots because Biden's win wouldn't change.
25 | A Republican candidate and voters in Pennsylvania lost a case over absentee ballots that arrived after Election Day, because they didn't have the ability to sue. A case addressing similar issue is still waiting on decisions from the Supreme Court -- which has remained noticeably silent on election disputes since before Election Day.
26 | Pollwatchers in Michigan lost their case to stop the certification of votes in Detroit, and a judge rejected their allegations of fraud.
27 | """
28 | 
29 | import re
30 | def sentence_split(txt):
31 |     txt = re.sub("[.!?]", "[SEP]", str(txt))
32 |     txt = re.sub("\n", "", txt)
33 | 
34 |     return [s for s in txt.split("[SEP]") if len(s) is not 0]
35 | 
36 | news = sentence_split(news)
37 | 
38 | 
39 | df = model_sent.predict(news, output_level="sentence")
40 | df[['fakenews', 'fakenews_confidence','sentence']]
41 | 
42 | 
43 | 
44 | def get_weighted_confidence_scores(df):
45 |     """Function for aggregate over the sentence based fakenews decision apply the weighted mean for each class.
46 |     Weights are the shares of fakes and real sentences in the data
47 |     ---------
48 |     in -> pandas data frame nlu prediction
49 |     out -> fake_confidence, real_confidence as weighted mean of each class
50 |     """
51 |     df = df.reset_index()
52 |     df["fakenews_confidence"] = df.fakenews_confidence.astype(float)
53 | 
54 |     df_agg = df.groupby('fakenews', as_index = True)["fakenews_confidence"] \
55 |         .agg([("mean", np.mean)
56 |             ,("var", np.var)
57 |             ,("count", np.size)
58 |             ,("weights", lambda x: x.size / float(df.__len__())) 
59 |             ]) \
60 |         .reset_index()
61 | 
62 |     df_agg["weighted_mean"] = df_agg["mean"] * df_agg["weights"]
63 |     fake_confidence = df_agg.loc[df_agg.fakenews == "FAKE", "weighted_mean"]
64 |     real_confidence = df_agg.loc[df_agg.fakenews == "REAL", "weighted_mean"]
65 | 
66 |     return float(fake_confidence), float(real_confidence)
67 | 
68 | fake_confidence, real_confidence = get_weighted_confidence_scores(df)
69 | 
70 | 
71 | # The decision is derived from the median of the sentence fakenews certainty factors
72 | if fake_confidence > real_confidence:
73 |     fakenws = "FAKE"
74 | else:
75 |     fakenws = "REAL"
76 | 


--------------------------------------------------------------------------------
/src/word_embedding.py:
--------------------------------------------------------------------------------
  1 | import nlu
  2 | import pandas as pd
  3 | import numpy as np
  4 | 
  5 | import matplotlib.pyplot as plt
  6 | import seaborn as sns
  7 | sns.set_style('darkgrid')
  8 | sns.set_palette('muted')
  9 | sns.set_context("notebook", font_scale=1,rc={"lines.linewidth": 2.5})
 10 | 
 11 | 
 12 | pd.set_option('display.max_columns', 500)
 13 | pd.set_option('max_colwidth', 40)
 14 | pd.options.display.float_format = "{:.2f}".format
 15 | 
 16 | 
 17 | 
 18 | ##################
 19 | # Word embedding #
 20 | ##################
 21 | 
 22 | model_pipe = nlu.load("bert elmo")
 23 | txt = 'Unicorns have been sighted on Mars! Trump to Visit California After Criticism Over Silence on Wildfires'
 24 | predictions = model_pipe.predict(txt, output_level='token', positions=True)
 25 | predictions.head()
 26 | 
 27 | 
 28 | # t-SNE Plot
 29 | from sklearn.manifold import TSNE
 30 | 
 31 | def get_tsne_df(predictions, embd_column, hue_column = None):
 32 |     """ Function for get t-SNE ready df 
 33 |     Cast column to np aray and generate TSNE embedding and store them into DF with label ready for hue plot
 34 |     Some rows contain NONE text as result of preprocessing, thus we have some NA embeddings and drop them
 35 |     
 36 |     Parameters
 37 |     ------------
 38 |     predictions -> nlu prediction output as pandas data frame
 39 |     embd_column -> column name for the embedding column as str 
 40 |     hue_column -> column name for hue. Leave this empty for hue the sentences
 41 |     """
 42 |     predictions.dropna(how='any', inplace=True)
 43 |     # We first create a column of type np array
 44 |     predictions['np_array'] = predictions[embd_column].apply(lambda x: np.array(x))
 45 |     # Make a matrix from the vectors in the np_array column via list comprehension
 46 |     mat = np.matrix([x for x in predictions.np_array])
 47 |     
 48 |     # Fit and transform T-SNE algorithm
 49 |     model = TSNE(n_components=2) #n_components means the lower dimension
 50 |     low_dim_data = model.fit_transform(mat)
 51 | 
 52 |     if hue_column:
 53 |         t_df = pd.DataFrame(low_dim_data, index = predictions[hue_column])
 54 |         t_df.columns = ['x','y']
 55 |     
 56 |     else:
 57 |         t_df = pd.DataFrame(low_dim_data)
 58 |         t_df.columns = ['x', 'y']
 59 | 
 60 |     return t_df 
 61 | 
 62 | 
 63 | 
 64 | 
 65 | 
 66 | 
 67 | 
 68 | # Set subplot
 69 | # n -> c r
 70 | # 1 -> 1 1
 71 | # 2 -> 2 1
 72 | # 3 -> 3 1 
 73 | # 4 -> 2 2
 74 | # 5 -> 3 2
 75 | # 6 -> 3 2
 76 | # for i in range(1,7):
 77 | #     print("# %s" % i,min(i,3), 1 if i <=3 else 2)
 78 | 
 79 | 
 80 | # Infer the embedding column names
 81 | EMBED_COL_NAMES = [c for c in predictions.columns if c.endswith("_embeddings")]
 82 | # EMBED_COL_NAMES.extend(["bert_embeddings", "bert_embeddings"])
 83 | # EMBED_COL_NAMES.extend(["bert_embeddings"])
 84 | n_plots = len(EMBED_COL_NAMES)
 85 | 
 86 | fig, axs = plt.subplots(ncols = 2 if n_plots == 4 else min(n_plots, 3) , nrows = 1 if n_plots <= 3 else 2 )
 87 | 
 88 | subplot_idx_dict = {}
 89 | subplot_idx_dict[2] = [0, 1]
 90 | subplot_idx_dict[3] = [0, 1, 2]
 91 | subplot_idx_dict[4] = [(0,0), (0,1), (1,0), (1,1)]
 92 | subplot_idx_dict[5] = [(0,0), (0,1), (0,2), (1,0), (1,1)]
 93 | subplot_idx_dict[6] = [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]
 94 | subplot_idx_list = subplot_idx_dict[n_plots]
 95 | 
 96 | for idx, emb_c in enumerate(EMBED_COL_NAMES):
 97 |     t_embedd = get_tsne_df(predictions, emb_c)
 98 |     if n_plots == 1:
 99 |         ax = axs
100 |     elif n_plots in [2,3]:
101 |         ax = axs[subplot_idx_list[idx]]
102 |     else:
103 |         subpl_r, subpl_c = subplot_idx_list[idx]
104 |         ax = axs[subpl_r][subpl_c]
105 |     
106 |     ax = sns.scatterplot(data = t_embedd, x = 'x', y = 'y', ax = ax)
107 |     ax.set_title('T-SNE {}'.format(emb_c))
108 | 
109 | plt.show()
110 | 
111 | 
112 |  
113 | 


--------------------------------------------------------------------------------
/fake_news_classifier_page.py:
--------------------------------------------------------------------------------
  1 | import nlu
  2 | import streamlit as st
  3 | import pandas as pd
  4 | import numpy as np
  5 | import plotly.express as px
  6 | import re
  7 | 
  8 | 
  9 | def sentence_split(txt):
 10 |     """Function to split srings into sentences
 11 |     
 12 |     Parameters 
 13 |     ---------------
 14 |     txt -> the text to be splitted into sentences
 15 |     
 16 |     Output
 17 |     --------------
 18 |     non empty list object
 19 |     """
 20 |     txt = re.sub("[.!?]", "[SEP]", str(txt))
 21 |     txt = re.sub("\n", "", txt)
 22 | 
 23 |     return [s for s in txt.split("[SEP]") if len(s) is not 0]
 24 | 
 25 | @st.cache
 26 | def get_weighted_confidence_scores(df):
 27 |     """Function for aggregate over the sentence based fakenews decisions and apply the weighted mean.
 28 |     Weights are the shares of fakes and real sentences in the data
 29 |     ---------
 30 |     in -> pandas data frame from nlu prediction 
 31 |     out -> fake_confidence, real_confidence as weighted mean of each class
 32 |     """
 33 |     df = df.reset_index()
 34 |     df["fakenews_confidence"] = df.fakenews_confidence.astype(float)
 35 | 
 36 |     df_agg = df.groupby('fakenews', as_index = True)["fakenews_confidence"] \
 37 |         .agg([("mean confidence score", np.mean)
 38 |             # ,("var confidence score", np.var)
 39 |             # ,("number sentences", np.size)
 40 |             ,("weights", lambda x: x.size / float(df.__len__())) 
 41 |             ]) \
 42 |         .reset_index()
 43 | 
 44 |     df_agg["weighted_mean"] = df_agg["mean confidence score"] * df_agg["weights"]
 45 |     fake_confidence = df_agg.loc[df_agg.fakenews == "FAKE", "weighted_mean"]
 46 |     real_confidence = df_agg.loc[df_agg.fakenews == "REAL", "weighted_mean"]
 47 | 
 48 |     fake_confidence = float(fake_confidence) if fake_confidence.any() else 0.0
 49 |     real_confidence = float(real_confidence) if real_confidence.any() else 0.0
 50 | 
 51 |     return fake_confidence, real_confidence
 52 | 
 53 | 
 54 | @st.cache
 55 | def get_describe_over_fake_real_classes(df):
 56 |     """ This function aggregates over the classes FAKE and REAL and outputs some statistics
 57 |     in ->  pandas data frame from nlu prediction
 58 |     out -> pandas data frame 
 59 |     """
 60 | 
 61 |     df = df.reset_index()
 62 |     df["fakenews_confidence"] = df.fakenews_confidence.astype(float)
 63 | 
 64 |     df_agg = df.groupby('fakenews', as_index = True)["fakenews_confidence"] \
 65 |         .agg([("mean confidence score", np.mean)
 66 |             ,("var confidence score", np.var)
 67 |             ,("number sentences", np.size)
 68 |             ,("weights", lambda x: x.size / float(df.__len__())) 
 69 |             ]) \
 70 |         .reset_index()
 71 | 
 72 |     return df_agg
 73 | 
 74 | 
 75 | 
 76 | def show(session_state):
 77 |     """Run this function for showing the fake news section in the app
 78 |     """
 79 | 
 80 |     NLU_MODEL_NAMES = ["en.classify.fakenews"]
 81 | 
 82 |     # MAIN PAGE
 83 |     st.title("Fake News Classifier :newspaper:")
 84 |     st.info("This is a pre trained language model for fake news detection."
 85 |             "The **fake news classifiers** is an version of the development of [**John Snow Lab**](https://nlu.johnsnowlabs.com/)." 
 86 |             "It uses universal sentence embeddings and was trained with the classifierdl algorithm provided by Spark NLP.")
 87 |     
 88 |     # Load a model
 89 |     st.header("1. Load a model")
 90 |     model_name = st.selectbox("Select model", NLU_MODEL_NAMES)
 91 |     
 92 |     btn_load = st.button("Download the model from AWS", key="btn_load")
 93 |     if btn_load:
 94 |         with st.spinner("Download started this may take some time ..."):
 95 |             session_state.fakenews_pipe = nlu.load(model_name)
 96 |             session_state.fakenews_is_loaded = True
 97 |     
 98 |     if session_state.fakenews_is_loaded:
 99 |         st.success("Download {} done!".format(model_name))
100 | 
101 |         # Get prediction
102 |         st.header("2. Try the algorithm here")
103 |         txt = st.text_area("Enter news text for classification.", ".".join(session_state.fakenews_txt))
104 |         session_state.fakenews_txt = sentence_split(txt)
105 | 
106 |         btn_pred = st.button("Calculate", key="btn_pred")
107 |         if btn_pred:
108 |             with st.spinner("Calculation started ..."):
109 |                 session_state.fakenews_out = session_state.fakenews_pipe.predict(session_state.fakenews_txt)
110 |                 session_state.fakenews_is_predicted = True
111 | 
112 |         if session_state.fakenews_is_predicted:
113 |             st.success("Calculation done!")
114 | 
115 |             # Results
116 |             st.header("Result")
117 |             st.write("DEBUG", session_state.fakenews_out)
118 |             fake_confidence, real_confidence = get_weighted_confidence_scores(session_state.fakenews_out)
119 |             if fake_confidence > real_confidence:
120 |                 fakenews = "FAKE"
121 |                 st.warning("The news are {} with a certainty of {}".format(fakenews, fake_confidence))
122 |             else:
123 |                 fakenews = "REAL"
124 |                 st.info("The news are {} with a certainty of {}".format(fakenews, real_confidence))
125 |             st.write("*Note: the decision is infered from the weighted mean of as FAKE or REAL detected sentences.*")
126 | 
127 |             st.header("Deep Dive")
128 |             st.dataframe(get_describe_over_fake_real_classes(session_state.fakenews_out))
129 |             session_state.fakenews_fig = px.histogram(session_state.fakenews_out, x = "fakenews")
130 |             st.plotly_chart(session_state.fakenews_fig)
131 | 
132 |             
133 |             st.write("**Sentence Embeddings**")
134 |             st.dataframe(session_state.fakenews_out)
135 |     else:
136 |         st.info("No model loaded. Please load first a model!")
137 | 
138 | if __name__ == "__main__":
139 |     pass


--------------------------------------------------------------------------------
/sentence_embedding_page.py:
--------------------------------------------------------------------------------
  1 | import nlu
  2 | import streamlit as st
  3 | import base64
  4 | import time
  5 | from sklearn.manifold import TSNE
  6 | import numpy as np
  7 | import pandas as pd
  8 | 
  9 | import matplotlib.pyplot as plt
 10 | import seaborn as sns
 11 | sns.set_style('darkgrid')   
 12 | sns.set_palette('muted')
 13 | 
 14 | def get_tsne_df(predictions, embd_columns, hue_column = None):
 15 |     """ Function for get t-SNE ready df 
 16 |     Cast column to np aray and generate TSNE embedding and store them into DF with label ready for hue plot
 17 |     Some rows contain NONE text as result of preprocessing, thus we have some NA embeddings and drop them
 18 |     
 19 |     Parameters
 20 |     ------------
 21 |     predictions -> nlu prediction output as pandas data frame
 22 |     embd_columns -> column name for the embedding column as str  
 23 |     hue_column -> column name for hue. Leave this empty for hue the sentences
 24 |     """
 25 |     predictions.dropna(how='any', inplace=True)
 26 |     # We first create a column of type np array
 27 |     predictions['np_array'] = predictions[embd_columns].apply(lambda x: np.array(x))
 28 |     # Make a matrix from the vectors in the np_array column via list comprehension
 29 |     mat = np.matrix([x for x in predictions.np_array])
 30 |     
 31 |     # Fit and transform T-SNE algorithm
 32 |     model = TSNE(n_components=2) #n_components means the lower dimension
 33 |     low_dim_data = model.fit_transform(mat)
 34 | 
 35 |     if hue_column is not None:
 36 |         t_df = pd.DataFrame(low_dim_data, index = pd.factorize(hue_column)[0] )
 37 |         t_df.columns = ['x','y']
 38 |     
 39 |     else:
 40 |         t_df = pd.DataFrame(low_dim_data, index = np.ones(len(low_dim_data)))
 41 |         t_df.columns = ['x', 'y']
 42 | 
 43 |     return t_df
 44 | 
 45 | def get_table_download_link(df):
 46 |     """Generates a link allowing the data in a given panda dataframe to be downloaded
 47 |     in ->  pandas dataframe
 48 |     out -> href string
 49 |     """
 50 |     csv = df.to_csv(index=False)
 51 |     b64 = base64.b64encode(csv.encode()).decode()  # some strings <-> bytes conversions necessary here
 52 |     href = f'<a href="data:file/csv;base64,{b64}" download="Report.csv" >Download csv file</a>' 
 53 | 
 54 |     return href
 55 | 
 56 | 
 57 | def show(session_state):
 58 |     """Run this function for showing the sentence embedding section in the app
 59 |     """
 60 |     NLU_MODEL_NAMES = ["bert", "electra", "elmo", "glove", "xlnet", "albert"]
 61 | 
 62 |     # SIDEBAR
 63 |     st.sidebar.write(
 64 |         """
 65 |         --------------
 66 |         # Setup
 67 |         *Start here to select your project setup.
 68 |         You can choose between document or sentence embeddings and a vast variety of pre trained nlp models*
 69 |         """
 70 |     )
 71 | 
 72 |     st.sidebar.header("Step 1")
 73 |     model_names = st.sidebar.multiselect(
 74 |             "Select one or more models"
 75 |             ,NLU_MODEL_NAMES
 76 |             ,session_state.sent_embed_selected_model_names.split() # Remember selection
 77 |             )
 78 |     session_state.sent_embed_selected_model_names = ' '.join(model_names)
 79 | 
 80 |     
 81 |     st.sidebar.header("Step 2")
 82 |     INPUT_FORMATS = ["sentence", "document"]
 83 |     session_state.sent_embed_input_format = st.sidebar.radio(
 84 |             "Choose between sentence or document input before select calculate."
 85 |             ,INPUT_FORMATS
 86 |             ,index=int(np.where(np.array(INPUT_FORMATS) == session_state.sent_embed_input_format)[0][0]) # Remember selection
 87 |     )
 88 | 
 89 |     # st.sidebar.header("Step 3")
 90 |     # LABELED_OPTIONS = ["Labeled data", "Unlabeled data"]
 91 |     # session_state.sent_embed_is_labeled = st.sidebar.radio(
 92 |     #         "Are the data labeled?"
 93 |     #         ,LABELED_OPTIONS
 94 |     #         ,index=int(np.where(np.array(LABELED_OPTIONS) == session_state.sent_embed_is_labeled)[0][0]) # Remember selection
 95 |     #         )
 96 |     # st.sidebar.write("*Note: Select 'Labeled data' in order to hue the t-sne plot.*")
 97 | 
 98 | 
 99 |     # MAIN PAGE
100 |     st.title("Sentence or Document Embeddings with NLU")
101 |     st.info("This is an comparison of some of the embedding developments of [**John Snow Lab**](https://nlu.johnsnowlabs.com/). \
102 |             Here you can find **BERT**, **ALBERT**, **ELMO**, **ELECTRA**, **XLNET** and **GLOVE** embeddings in one output. "
103 |             "You can download ouput or use the result for NLP model selection "
104 |             )
105 | 
106 |     st.write("""
107 |     ## References
108 |     - [BERT Paper](https://arxiv.org/pdf/1810.04805.pdf)
109 |     - [ALBERT Paper](https://openreview.net/forum?id=H1eA7AEtvS)
110 |     - [ELMO Paper](https://arxiv.org/abs/1802.05365)
111 |     - [ELECTRA Paper](https://arxiv.org/abs/2003.10555)
112 |     - [XLNET Paper](https://arxiv.org/pdf/1906.08237.pdf)
113 |     - [GLOVE Paper](https://nlp.stanford.edu/pubs/glove.pdf)
114 |     """)
115 | 
116 |      # Load the nlu models: show just if
117 |     #   a) at least one model is selected OR
118 |     #   b) at least one model has been loaded
119 |     if session_state.sent_embed_selected_model_names or session_state.sent_embed_loaded_model_names:
120 |         st.header("Load a model")
121 |         btn_load = st.button("Download selected model(s) from AWS", key="btn_load")
122 |         # Case: at least one model is already loaded AND download button is seleced without any model selection
123 |         if btn_load and not session_state.sent_embed_selected_model_names:
124 |             with st.spinner("**Warning**: No model selected. Please select first at least one embedding model from the sidebar!"):
125 |                 time.sleep(3)
126 |             btn_load = False
127 |         # Case: selected model already loaded AND download button is selected
128 |         if btn_load and (session_state.sent_embed_selected_model_names == session_state.sent_embed_loaded_model_names):
129 |             with st.spinner("**Info**: Selected models '{}' already loaded. Stop request.".format(session_state.sent_embed_selected_model_names)):
130 |                 time.sleep(3)
131 |             btn_load = False
132 |         # Case: load selected model
133 |         if btn_load:
134 |             with st.spinner("Download started this may take some minutes ... :coffee:"):
135 |                 session_state.sent_embed_pipe = nlu.load(session_state.sent_embed_selected_model_names)
136 |                 session_state.sent_embed_loaded_model_names = ' '.join(model_names)
137 |                 # Reset results if exist: txt input
138 |                 session_state.sent_embed_csv_input = pd.DataFrame()
139 |                 session_state.sent_embed_csv_out = pd.DataFrame()
140 |                 session_state.sent_embed_csv_is_predicted = False
141 |                 session_state.sent_embed_csv_label_column_name = "-"
142 |                 session_state.sent_embed_csv_txt_column_name = "-"
143 |         
144 |         # Run data input Flow: just if at least one model is loaded; 
145 |         if session_state.sent_embed_loaded_model_names: 
146 |             st.success("**Info**: loaded models are: {} ".format(session_state.sent_embed_loaded_model_names))
147 | 
148 |         
149 |             ########################
150 |             # Flow: csv input flow #
151 |             ########################
152 |             st.header("Get Embeddings from CSV file here!")
153 | 
154 |             uploaded_file = st.file_uploader("Choose a CSV file to upload", type = "csv")
155 |             # st.write("DEBUG:", uploaded_file)
156 |             # st.write("DEBUG:", session_state.sent_embed_csv_input)
157 | 
158 |             # No file selected
159 |             if uploaded_file is None:
160 |                 st.info("Upload a CSV.")
161 |                 # Clear cache: User removed seletced file 
162 |                 if len(session_state.sent_embed_csv_input) > 0:
163 |                     session_state.sent_embed_csv_input = pd.DataFrame()
164 | 
165 |             # After file selection, read CSV one time
166 |             if uploaded_file and len(session_state.sent_embed_csv_input) == 0:
167 |                 session_state.sent_embed_csv_input = pd.read_csv(uploaded_file, sep=",", header=[0], encoding="utf-8", dtype = "unicode")
168 |                 
169 |             # After CSV has been loaded:
170 |             if len(session_state.sent_embed_csv_input) > 0:
171 |                 st.write(session_state.sent_embed_csv_input)
172 | 
173 |                 # Map Column
174 |                 st.write('**Map Column**')
175 |                 COLUMNS_NAMES = ["-"] + session_state.sent_embed_csv_input.columns.tolist()
176 |                 session_state.sent_embed_csv_txt_column_name = st.selectbox("Select text column"
177 |                                                                             ,COLUMNS_NAMES
178 |                                                                             ,index = int(np.where(np.array(COLUMNS_NAMES) == session_state.sent_embed_csv_txt_column_name)[0][0]) # Remember selection
179 |                                                                             )
180 |                 # if session_state.sent_embed_is_labeled == "Labeled data":
181 |                 #     session_state.sent_embed_csv_label_column_name = st.selectbox("Select label column"
182 |                 #                                                                   ,COLUMNS_NAMES
183 |                 #                                                                   ,index = int(np.where(np.array(COLUMNS_NAMES) == session_state.sent_embed_csv_label_column_name)[0][0]) # Remember selection
184 |                 #                                                                   )
185 |                     
186 |                 # Get prediction
187 |                 # NOTE: btn_pred state not cached for single prediction (state will return to false after one time trigger)
188 |                 if session_state.sent_embed_csv_txt_column_name != "-":
189 |                     btn_pred = st.button("Calculate", key = "btn_predict")
190 |                     if btn_pred:
191 |                         session_state.sent_embed_csv_is_predicted = False
192 |                         with st.spinner("Calculation started ... :coffee:"):
193 |                             session_state.sent_embed_csv_out = session_state.sent_embed_pipe.predict(
194 |                                 session_state.sent_embed_csv_input.doc.tolist()
195 |                                 ,output_level = session_state.sent_embed_input_format
196 |                                 ,positions=True)
197 |                             session_state.sent_embed_csv_is_predicted = True
198 |                 
199 |                 if session_state.sent_embed_csv_is_predicted:
200 |                     st.success("Calculation done!")
201 | 
202 |                     # Results
203 |                     st.header("Visualize Embeddings for the first 10 input")
204 |                     st.dataframe(session_state.sent_embed_csv_out.head(10))
205 | 
206 |                     # # Draw Subplots
207 |                     # st.header("t-SNE plot for each embeddings")
208 |                     # predictions = session_state.sent_embed_csv_out 
209 |                     # EMBED_COL_NAMES = [c for c in predictions.columns if c.endswith("_embeddings")] # Infer the embedding column names
210 | 
211 |                     # n_plots = len(EMBED_COL_NAMES) 
212 |                     # fig, axs = plt.subplots(ncols = 2 if n_plots == 4 else min(n_plots, 3) , nrows = 1 if n_plots <= 3 else 2 )
213 |                     # subplot_idx_dict = {}
214 |                     # subplot_idx_dict[2] = [0, 1]
215 |                     # subplot_idx_dict[3] = [0, 1, 2]
216 |                     # subplot_idx_dict[4] = [(0,0), (0,1), (1,0), (1,1)]
217 |                     # subplot_idx_dict[5] = [(0,0), (0,1), (0,2), (1,0), (1,1)]
218 |                     # subplot_idx_dict[6] = [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]
219 |                     # for idx, emb_c in enumerate(EMBED_COL_NAMES):
220 |                     #     t_embedd = get_tsne_df(
221 |                     #             predictions = predictions
222 |                     #             ,embd_columns = emb_c
223 |                     #             ,hue_column = session_state.sent_embed_csv_input[session_state.sent_embed_csv_label_column_name] if (session_state.sent_embed_is_labeled == "Labeled data") else None)
224 |                     #     if n_plots == 1:
225 |                     #         ax = axs
226 |                     #     elif n_plots in [2,3]: # 1 row
227 |                     #         ax = axs[subplot_idx_dict[n_plots][idx]]
228 |                     #     else: # row and column
229 |                     #         subpl_r, subpl_c = subplot_idx_dict[n_plots][idx]
230 |                     #         ax = axs[subpl_r][subpl_c]  
231 |                     #     ax = sns.scatterplot(data = t_embedd, x = 'x', y = 'y', ax = ax, c = t_embedd.index.tolist(), s = 100)
232 |                     #     ax.set_title('T-SNE {}'.format(emb_c))
233 |                     # st.pyplot(fig)
234 | 
235 |                     st.header("Download Embedding Table")
236 |                     link = get_table_download_link(session_state.sent_embed_csv_out)
237 |                     st.write(link, unsafe_allow_html = True)
238 | 
239 | 
240 | 


--------------------------------------------------------------------------------
/word_embedding_page.py:
--------------------------------------------------------------------------------
  1 | import nlu
  2 | import streamlit as st
  3 | import base64
  4 | import time
  5 | from sklearn.manifold import TSNE
  6 | import numpy as np
  7 | import pandas as pd
  8 | 
  9 | import matplotlib.pyplot as plt
 10 | import seaborn as sns
 11 | sns.set_style('darkgrid')   
 12 | sns.set_palette('muted')
 13 | 
 14 | 
 15 | def get_tsne_df(predictions, embd_column, hue_column = None):
 16 |     """ Function for get t-SNE ready df 
 17 |     Cast column to np aray and generate TSNE embedding and store them into DF with label ready for hue plot
 18 |     Some rows contain NONE text as result of preprocessing, thus we have some NA embeddings and drop them
 19 |     
 20 |     Parameters
 21 |     ------------
 22 |     predictions -> nlu prediction output as pandas data frame
 23 |     embd_column -> column name for the embedding column as str 
 24 |     hue_column -> column name for hue. Leave this empty for hue the sentences
 25 |     """
 26 |     predictions.dropna(how='any', inplace=True)
 27 |     # We first create a column of type np array
 28 |     predictions['np_array'] = predictions[embd_column].apply(lambda x: np.array(x))
 29 |     # Make a matrix from the vectors in the np_array column via list comprehension
 30 |     mat = np.matrix([x for x in predictions.np_array])
 31 |     
 32 |     # Fit and transform T-SNE algorithm
 33 |     model = TSNE(n_components=2) #n_components means the lower dimension
 34 |     low_dim_data = model.fit_transform(mat)
 35 | 
 36 |     if hue_column is not None:
 37 |         t_df = pd.DataFrame(low_dim_data, index = pd.factorize(hue_column)[0] )
 38 |         t_df.columns = ['x','y']
 39 |     
 40 |     else:
 41 |         t_df = pd.DataFrame(low_dim_data, index = np.ones(len(low_dim_data)))
 42 |         t_df.columns = ['x', 'y']
 43 | 
 44 |     return t_df
 45 | 
 46 | def get_table_download_link(df):
 47 |     """Generates a link allowing the data in a given panda dataframe to be downloaded
 48 |     in ->  pandas dataframe
 49 |     out -> href string
 50 |     """
 51 |     csv = df.to_csv(index=False)
 52 |     b64 = base64.b64encode(csv.encode()).decode()  # some strings <-> bytes conversions necessary here
 53 |     href = f'<a href="data:file/csv;base64,{b64}" download="Report.csv" >Download csv file</a>' 
 54 | 
 55 |     return href
 56 | 
 57 | 
 58 | def show(session_state):
 59 |     """Run this function for showing the word embedding section in the app
 60 |     """
 61 |     NLU_MODEL_NAMES = ["bert", "electra", "elmo", "glove", "xlnet", "albert"]
 62 | 
 63 |     # SIDEBAR
 64 |     st.sidebar.write(
 65 |         """
 66 |         --------------
 67 |         # Setup
 68 |         *Start here to select your project setup.
 69 |         You can choose between a vast variety of models and select either a text input or upload a CSV file with your text.*
 70 |         """
 71 |     )
 72 | 
 73 |     st.sidebar.header("Step 1")
 74 |     model_names = st.sidebar.multiselect(
 75 |             "Select one or more models"
 76 |             ,NLU_MODEL_NAMES
 77 |             ,session_state.word_embed_selected_model_names.split() # Remember selection
 78 |             )
 79 |     session_state.word_embed_selected_model_names = ' '.join(model_names)
 80 |     
 81 |     st.sidebar.header("Step 2")
 82 |     INPUT_FORMATS = ["Copy Paste Text", "Upload CSV File"]
 83 |     session_state.word_embed_input_format = st.sidebar.radio(
 84 |             "Select the input format"
 85 |             ,INPUT_FORMATS
 86 |             ,index=int(np.where(np.array(INPUT_FORMATS) == session_state.word_embed_input_format)[0][0]) # Remember selection
 87 |             )
 88 |     
 89 |     if session_state.word_embed_input_format == "Upload CSV File":
 90 |         st.sidebar.header("Step 3")
 91 |         LABELED_OPTIONS = ["Labeled data", "Unlabeled data"]
 92 |         session_state.word_embed_is_labeled = st.sidebar.radio(
 93 |                 "Are the data labeled?"
 94 |                 ,LABELED_OPTIONS
 95 |                 ,index=int(np.where(np.array(LABELED_OPTIONS) == session_state.word_embed_is_labeled)[0][0]) # Remember selection
 96 |                 )
 97 |         st.sidebar.write("*Note: Select 'Labeled data' in order to hue the t-sne plot.*")
 98 | 
 99 | 
100 | 
101 |     # MAIN PAGE
102 |     st.title("Word Embeddings with NLU")
103 |     st.info("This is an comparison of some of the embedding developments of [**John Snow Lab**](https://nlu.johnsnowlabs.com/). \
104 |             Here you can find **BERT**, **ALBERT**, **ELMO**, **ELECTRA**, **XLNET** and **GLOVE** embeddings in one output. "
105 |             "You can download ouput or use the result for NLP model selection "
106 |             )
107 | 
108 |     st.write("""
109 |     ## References
110 |     - [BERT Paper](https://arxiv.org/pdf/1810.04805.pdf)
111 |     - [ALBERT Paper](https://openreview.net/forum?id=H1eA7AEtvS)
112 |     - [ELMO Paper](https://arxiv.org/abs/1802.05365)
113 |     - [ELECTRA Paper](https://arxiv.org/abs/2003.10555)
114 |     - [XLNET Paper](https://arxiv.org/pdf/1906.08237.pdf)
115 |     - [GLOVE Paper](https://nlp.stanford.edu/pubs/glove.pdf)
116 |     """)
117 | 
118 |     # Load the nlu models: show just if
119 |     #   a) at least one model is selected OR
120 |     #   b) at least one model has been loaded
121 |     if session_state.word_embed_selected_model_names or session_state.word_embed_loaded_model_names:
122 |         st.header("Load a model")
123 |         btn_load = st.button("Download selected model(s) from AWS", key="btn_load")
124 |         # Case: at least one model is already loaded AND download button is seleced without any model selection
125 |         if btn_load and not session_state.word_embed_selected_model_names:
126 |             with st.spinner("**Warning**: No model selected. Please select first at least one embedding model from the sidebar!"):
127 |                 time.sleep(3)
128 |             btn_load = False
129 |         # Case: selected model already loaded AND download button is selected
130 |         if btn_load and (session_state.word_embed_selected_model_names == session_state.word_embed_loaded_model_names):
131 |             with st.spinner("**Info**: Selected models '{}' already loaded. Stop request.".format(session_state.word_embed_selected_model_names)):
132 |                 time.sleep(3)
133 |             btn_load = False
134 |         # Case: load selected model
135 |         if btn_load:
136 |             with st.spinner("Download started this may take some minutes ... :coffee:"):
137 |                 session_state.word_embed_pipe = nlu.load(session_state.word_embed_selected_model_names)
138 |                 session_state.word_embed_loaded_model_names = ' '.join(model_names)
139 |                 # Reset results if exist: csv input
140 |                 session_state.word_embed_csv_input = pd.DataFrame()
141 |                 session_state.word_embed_csv_out = pd.DataFrame()
142 |                 session_state.word_embed_csv_is_predicted = False
143 |                 session_state.word_embed_csv_label_column_name = "-"
144 |                 session_state.word_embed_csv_word_column_name = "-"
145 |                 # Reset results if exist: txt input
146 |                 session_state.word_embed_txt_out = pd.DataFrame()
147 |                 session_state.word_embed_txt_is_predicted = False
148 |         
149 |         # Run data input Flow: just if at least one model is loaded; 
150 |         if session_state.word_embed_loaded_model_names: 
151 |             st.success("**Info**: loaded models are: {} ".format(session_state.word_embed_loaded_model_names))
152 | 
153 |             #########################
154 |             # Flow: text input flow #
155 |             #########################
156 |             if session_state.word_embed_input_format == "Copy Paste Text":
157 |                 st.header("Get Embeddings from Text here!")
158 |                 # Write Here...
159 |                 session_state.word_embed_txt_input = st.text_area("Enter text for embedding here", session_state.word_embed_txt_input)
160 |                 # Get prediction
161 |                 # NOTE: btn_pred state not cached so it returns to false state and not re download every time
162 |                 btn_pred = st.button("Calculate", key = "btn_predict")
163 |                 if btn_pred:
164 |                     with st.spinner("Calculation started ... :coffee:"):
165 |                         session_state.word_embed_txt_out = session_state.word_embed_pipe.predict(session_state.word_embed_txt_input, positions=True, output_level ='token')
166 |                         session_state.word_embed_txt_is_predicted = True
167 |                 
168 |                 if session_state.word_embed_txt_is_predicted:
169 |                     st.success("Calculation done!")
170 | 
171 |                     # Results
172 |                     st.header("Visualize Embeddings for the first 10 Words")
173 |                     st.dataframe(session_state.word_embed_txt_out.head(10))
174 | 
175 |                     st.header("t-SNE plot for each embeddings")
176 |                     predictions = session_state.word_embed_txt_out 
177 |                     EMBED_COL_NAMES = [c for c in predictions.columns if c.endswith("_embeddings")] # Infer the embedding column names
178 | 
179 |                     # Draw Subplots
180 |                     n_plots = len(EMBED_COL_NAMES) 
181 |                     fig, axs = plt.subplots(ncols = 2 if n_plots == 4 else min(n_plots, 3) , nrows = 1 if n_plots <= 3 else 2 )
182 |                     subplot_idx_dict = {}
183 |                     subplot_idx_dict[2] = [0, 1]
184 |                     subplot_idx_dict[3] = [0, 1, 2]
185 |                     subplot_idx_dict[4] = [(0,0), (0,1), (1,0), (1,1)]
186 |                     subplot_idx_dict[5] = [(0,0), (0,1), (0,2), (1,0), (1,1)]
187 |                     subplot_idx_dict[6] = [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]
188 |                     for idx, emb_c in enumerate(EMBED_COL_NAMES):
189 |                         t_embedd = get_tsne_df(predictions, emb_c)
190 |                         if n_plots == 1:
191 |                             ax = axs
192 |                         elif n_plots in [2,3]: # 1 row
193 |                             ax = axs[subplot_idx_dict[n_plots][idx]]
194 |                         else: # row and column
195 |                             subpl_r, subpl_c = subplot_idx_dict[n_plots][idx]
196 |                             ax = axs[subpl_r][subpl_c]  
197 |                         ax = sns.scatterplot(data = t_embedd, x = 'x', y = 'y', ax = ax, c = t_embedd.index.tolist(), s = 100)
198 |                         ax.set_title('T-SNE {}'.format(emb_c))
199 |                     st.pyplot(fig)
200 | 
201 |                     st.header("Download Embedding Table")
202 |                     link = get_table_download_link(session_state.word_embed_txt_out)
203 |                     st.write(link, unsafe_allow_html = True)
204 | 
205 |              
206 |             ########################
207 |             # Flow: csv input flow #
208 |             ########################
209 |             else:
210 |                 st.header("Get Embeddings from CSV file here!")
211 | 
212 |                 uploaded_file = st.file_uploader("Choose a CSV file to upload", type = "csv")
213 |                 # st.write("DEBUG:", uploaded_file)
214 |                 # st.write("DEBUG:", session_state.word_embed_csv_input)
215 | 
216 |                 # No file selected
217 |                 if uploaded_file is None:
218 |                     st.info("Upload a CSV.")
219 |                     # Clear cache: User removed seletced file 
220 |                     if len(session_state.word_embed_csv_input) > 0:
221 |                         session_state.word_embed_csv_input = pd.DataFrame()
222 | 
223 |                 # After file selection, read CSV one time
224 |                 if uploaded_file and len(session_state.word_embed_csv_input) == 0:
225 |                     session_state.word_embed_csv_input = pd.read_csv(uploaded_file, sep=",", header=[0], encoding="utf-8", dtype = "unicode")
226 |                     
227 |                 # After CSV has been loaded:
228 |                 if len(session_state.word_embed_csv_input) > 0:
229 |                     st.write(session_state.word_embed_csv_input)
230 | 
231 |                     # Map Column
232 |                     st.write('**Map Column**')
233 |                     COLUMNS_NAMES = ["-"] + session_state.word_embed_csv_input.columns.tolist()
234 |                     session_state.word_embed_csv_word_column_name = st.selectbox("Select word column"
235 |                                                                                  ,COLUMNS_NAMES
236 |                                                                                  ,index = int(np.where(np.array(COLUMNS_NAMES) == session_state.word_embed_csv_word_column_name)[0][0]) # Remember selection
237 |                                                                                  )
238 |                     if session_state.word_embed_is_labeled == "Labeled data":
239 |                         session_state.word_embed_csv_label_column_name = st.selectbox("Select label column"
240 |                                                                                      ,COLUMNS_NAMES
241 |                                                                                      ,index = int(np.where(np.array(COLUMNS_NAMES) == session_state.word_embed_csv_label_column_name)[0][0]) # Remember selection
242 |                                                                                     )
243 |                         
244 |                     # Get prediction
245 |                     # NOTE: btn_pred state not cached for single prediction (state will return to false after one time trigger)
246 |                     if session_state.word_embed_csv_word_column_name != "-":
247 |                         btn_pred2 = st.button("Calculate", key = "btn_predict")
248 |                         if btn_pred2:
249 |                             with st.spinner("Calculation started ... :coffee:"):
250 |                                 session_state.word_embed_csv_out = session_state.word_embed_pipe.predict(session_state.word_embed_csv_input[[session_state.word_embed_csv_word_column_name]], positions=True, output_level ='token')
251 |                                 session_state.word_embed_csv_is_predicted = True
252 |                     
253 |                     if session_state.word_embed_csv_is_predicted:
254 |                         st.success("Calculation done!")
255 | 
256 |                         # Results
257 |                         st.header("Visualize Embeddings for the first 10 Words")
258 |                         st.dataframe(session_state.word_embed_csv_out.head(10))
259 | 
260 |                         st.header("t-SNE plot for each embeddings")
261 |                         predictions = session_state.word_embed_csv_out 
262 |                         EMBED_COL_NAMES = [c for c in predictions.columns if c.endswith("_embeddings")] # Infer the embedding column names
263 | 
264 |                         # Draw Subplots
265 |                         n_plots = len(EMBED_COL_NAMES) 
266 |                         fig, axs = plt.subplots(ncols = 2 if n_plots == 4 else min(n_plots, 3) , nrows = 1 if n_plots <= 3 else 2 )
267 |                         subplot_idx_dict = {}
268 |                         subplot_idx_dict[2] = [0, 1]
269 |                         subplot_idx_dict[3] = [0, 1, 2]
270 |                         subplot_idx_dict[4] = [(0,0), (0,1), (1,0), (1,1)]
271 |                         subplot_idx_dict[5] = [(0,0), (0,1), (0,2), (1,0), (1,1)]
272 |                         subplot_idx_dict[6] = [(0,0), (0,1), (0,2), (1,0), (1,1), (1,2)]
273 |                         for idx, emb_c in enumerate(EMBED_COL_NAMES):
274 |                             t_embedd = get_tsne_df(predictions, emb_c, hue_column = session_state.word_embed_csv_input[session_state.word_embed_csv_label_column_name] if (session_state.word_embed_is_labeled == "Labeled data") else None)
275 |                             if n_plots == 1:
276 |                                 ax = axs
277 |                             elif n_plots in [2,3]: # 1 row
278 |                                 ax = axs[subplot_idx_dict[n_plots][idx]]
279 |                             else: # row and column
280 |                                 subpl_r, subpl_c = subplot_idx_dict[n_plots][idx]
281 |                                 ax = axs[subpl_r][subpl_c]  
282 |                             ax = sns.scatterplot(data = t_embedd, x = 'x', y = 'y', ax = ax, c = t_embedd.index.tolist(), s = 100)
283 |                             ax.set_title('T-SNE {}'.format(emb_c))
284 |                         st.pyplot(fig)
285 | 
286 |                         st.header("Download Embedding Table")
287 |                         link = get_table_download_link(session_state.word_embed_csv_out)
288 |                         st.write(link, unsafe_allow_html = True)
289 | 
290 | 
291 | 


--------------------------------------------------------------------------------
/data/doc_embed_demo.csv:
--------------------------------------------------------------------------------
 1 | doc,label
 2 | "Story of a man who has unnatural feelings for a pig. Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audience is turned into an insane, violent mob by the crazy chantings of it's singers. Unfortunately it stays absurd the WHOLE time with no general narrative eventually making it just too off putting. Even those from the era should be turned off. The cryptic dialogue would make Shakespeare seem easy to a third grader. On a technical level it's better than you might think with some good cinematography by future great Vilmos Zsigmond. Future stars Sally Kirkland and Frederic Forrest can be seen briefly.",0
 3 | "Airport '77 starts as a brand new luxury 747 plane is loaded up with valuable paintings & such belonging to rich businessman Philip Stevens (James Stewart) who is flying them & a bunch of VIP's to his estate in preparation of it being opened to the public as a museum, also on board is Stevens daughter Julie (Kathleen Quinlan) & her son. The luxury jetliner takes off as planned but mid-air the plane is hi-jacked by the co-pilot Chambers (Robert Foxworth) & his two accomplice's Banker (Monte Markham) & Wilson (Michael Pataki) who knock the passengers & crew out with sleeping gas, they plan to steal the valuable cargo & land on a disused plane strip on an isolated island but while making his descent Chambers almost hits an oil rig in the Ocean & loses control of the plane sending it crashing into the sea where it sinks to the bottom right bang in the middle of the Bermuda Triangle. With air in short supply, water leaking in & having flown over 200 miles off course the problems mount for the survivor's as they await help with time fast running out...<br /><br />Also known under the slightly different tile Airport 1977 this second sequel to the smash-hit disaster thriller Airport (1970) was directed by Jerry Jameson & while once again like it's predecessors I can't say Airport '77 is any sort of forgotten classic it is entertaining although not necessarily for the right reasons. Out of the three Airport films I have seen so far I actually liked this one the best, just. It has my favourite plot of the three with a nice mid-air hi-jacking & then the crashing (didn't he see the oil rig?) & sinking of the 747 (maybe the makers were trying to cross the original Airport with another popular disaster flick of the period The Poseidon Adventure (1972)) & submerged is where it stays until the end with a stark dilemma facing those trapped inside, either suffocate when the air runs out or drown as the 747 floods or if any of the doors are opened & it's a decent idea that could have made for a great little disaster flick but bad unsympathetic character's, dull dialogue, lethargic set-pieces & a real lack of danger or suspense or tension means this is a missed opportunity. While the rather sluggish plot keeps one entertained for 108 odd minutes not that much happens after the plane sinks & there's not as much urgency as I thought there should have been. Even when the Navy become involved things don't pick up that much with a few shots of huge ships & helicopters flying about but there's just something lacking here. George Kennedy as the jinxed airline worker Joe Patroni is back but only gets a couple of scenes & barely even says anything preferring to just look worried in the background.<br /><br />The home video & theatrical version of Airport '77 run 108 minutes while the US TV versions add an extra hour of footage including a new opening credits sequence, many more scenes with George Kennedy as Patroni, flashbacks to flesh out character's, longer rescue scenes & the discovery or another couple of dead bodies including the navigator. While I would like to see this extra footage I am not sure I could sit through a near three hour cut of Airport '77. As expected the film has dated badly with horrible fashions & interior design choices, I will say no more other than the toy plane model effects aren't great either. Along with the other two Airport sequels this takes pride of place in the Razzie Award's Hall of Shame although I can think of lots of worse films than this so I reckon that's a little harsh. The action scenes are a little dull unfortunately, the pace is slow & not much excitement or tension is generated which is a shame as I reckon this could have been a pretty good film if made properly.<br /><br />The production values are alright if nothing spectacular. The acting isn't great, two time Oscar winner Jack Lemmon has said since it was a mistake to star in this, one time Oscar winner James Stewart looks old & frail, also one time Oscar winner Lee Grant looks drunk while Sir Christopher Lee is given little to do & there are plenty of other familiar faces to look out for too.<br /><br />Airport '77 is the most disaster orientated of the three Airport films so far & I liked the ideas behind it even if they were a bit silly, the production & bland direction doesn't help though & a film about a sunken plane just shouldn't be this boring or lethargic. Followed by The Concorde ... Airport '79 (1979).",0
 4 | "This film lacked something I couldn't put my finger on at first: charisma on the part of the leading actress. This inevitably translated to lack of chemistry when she shared the screen with her leading man. Even the romantic scenes came across as being merely the actors at play. It could very well have been the director who miscalculated what he needed from the actors. I just don't know.<br /><br />But could it have been the screenplay? Just exactly who was the chef in love with? He seemed more enamored of his culinary skills and restaurant, and ultimately of himself and his youthful exploits, than of anybody or anything else. He never convinced me he was in love with the princess.<br /><br />I was disappointed in this movie. But, don't forget it was nominated for an Oscar, so judge for yourself.",0
 5 | "Sorry everyone,,, I know this is supposed to be an ""art"" film,, but wow, they should have handed out guns at the screening so people could blow their brains out and not watch. Although the scene design and photographic direction was excellent, this story is too painful to watch. The absence of a sound track was brutal. The loooonnnnng shots were too long. How long can you watch two people just sitting there and talking? Especially when the dialogue is two people complaining. I really had a hard time just getting through this film. The performances were excellent, but how much of that dark, sombre, uninspired, stuff can you take? The only thing i liked was Maureen Stapleton and her red dress and dancing scene. Otherwise this was a ripoff of Bergman. And i'm no fan f his either. I think anyone who says they enjoyed 1 1/2 hours of this is,, well, lying.",0
 6 | "When I was little my parents took me along to the theater to see Interiors. It was one of many movies I watched with my parents, but this was the only one we walked out of. Since then I had never seen Interiors until just recently, and I could have lived out the rest of my life without it. What a pretentious, ponderous, and painfully boring piece of 70's wine and cheese tripe. Woody Allen is one of my favorite directors but Interiors is by far the worst piece of crap of his career. In the unmistakable style of Ingmar Berman, Allen gives us a dark, angular, muted, insight in to the lives of a family wrought by the psychological damage caused by divorce, estrangement, career, love, non-love, halitosis, whatever. The film, intentionally, has no comic relief, no music, and is drenched in shadowy pathos. This film style can be best defined as expressionist in nature, using an improvisational method of dialogue to illicit a ""more pronounced depth of meaning and truth"". But Woody Allen is no Ingmar Bergman. The film is painfully slow and dull. But beyond that, I simply had no connection with or sympathy for any of the characters. Instead I felt only contempt for this parade of shuffling, whining, nicotine stained, martyrs in a perpetual quest for identity. Amid a backdrop of cosmopolitan affluence and baked Brie intelligentsia the story looms like a fart in the room. Everyone speaks in affected platitudes and elevated language between cigarettes. Everyone is ""lost"" and ""struggling"", desperate to find direction or understanding or whatever and it just goes on and on to the point where you just want to slap all of them. It's never about resolution, it's only about interminable introspective babble. It is nothing more than a psychological drama taken to an extreme beyond the audience's ability to connect. Woody Allen chose to make characters so immersed in themselves we feel left out. And for that reason I found this movie painfully self indulgent and spiritually draining. I see what he was going for but his insistence on promoting his message through Prozac prose and distorted film techniques jettisons it past the point of relevance. I highly recommend this one if you're feeling a little too happy and need something to remind you of death. Otherwise, let's just pretend this film never happened.",0
 7 | """It appears that many critics find the idea of a Woody Allen drama unpalatable."" And for good reason: they are unbearably wooden and pretentious imitations of Bergman. And let's not kid ourselves: critics were mostly supportive of Allen's Bergman pretensions, Allen's whining accusations to the contrary notwithstanding. What I don't get is this: why was Allen generally applauded for his originality in imitating Bergman, but the contemporaneous Brian DePalma was excoriated for ""ripping off"" Hitchcock in his suspense/horror films? In Robin Wood's view, it's a strange form of cultural snobbery. I would have to agree with that.",0
 8 | "The second attempt by a New York intellectual in less than 10 years to make a ""Swedish"" film - the first being Susan Sontag's ""Brother Carl"" (which was made in Sweden, with Swedish actors, no less!) The results? Oscar Wilde said it best, in reference to Dickens' ""The Old Curiosity Shop"": ""One would have to have a heart of stone not to laugh out loud at the death of Little Nell."" Pretty much the same thing here. ""Interiors"" is chock full of solemnly intoned howlers. (""I'm afraid of my anger."" Looking into the middle distance: ""I don't like who I'm becoming."") The directorial quotations (to use a polite term) from Bergman are close to parody. The incredibly self-involved family keep reminding us of how brilliant and talented they are, to the point of strangulation. (""I read a poem of yours the other day. It was in - I don't know - The New Yorker."" ""Oh. That was an old poem. I reworked it."") Far from not caring about these people, however, I found them quite hilarious. Much of the dialog is exactly like the funny stuff from Allen's earlier films - only he's directed his actors to play the lines straight. Having not cast himself in the movie, he has poor Mary Beth Hurt copy all of his thespian tics, intonations, and neurotic habits, turning her into an embarrassing surrogate (much like Kenneth Branagh in ""Celebrity"").<br /><br />The basic plot - dysfunctional family with quietly domineering mother - seems to be lifted more or less from Bergman's ""Winter Light,"" the basic family melodrama tricked up with a lot of existential angst. It all comes through in the shopworn visual/aural tricks: the deafening scratching of a pencil on paper, the towering surf that dwarfs the people walking on the beach. etc, etc.<br /><br />Allen's later ""serious"" films are less embarrassing, but also far less entertaining. I'll take ""Interiors."" Woody's rarely made a funnier movie.",0
 9 | "I don't know who to blame, the timid writers or the clueless director. It seemed to be one of those movies where so much was paid to the stars (Angie, Charlie, Denise, Rosanna and Jon) that there wasn't enough left to really make a movie. This could have been very entertaining, but there was a veil of timidity, even cowardice, that hung over each scene. Since it got an R rating anyway why was the ubiquitous bubble bath scene shot with a 70-year-old woman and not Angie Harmon? Why does Sheen sleepwalk through potentially hot relationships WITH TWO OF THE MOST BEAUTIFUL AND SEXY ACTRESSES in the world? If they were only looking for laughs why not cast Whoopi Goldberg and Judy Tenuta instead? This was so predictable I was surprised to find that the director wasn't a five year old. What a waste, not just for the viewers but for the actors as well.",0
10 | "This film is mediocre at best. Angie Harmon is as funny as a bag of hammers. Her bitchy demeanor from ""Law and Order"" carries over in a failed attempt at comedy. Charlie Sheen is the only one to come out unscathed in this horrible anti-comedy. The only positive thing to come out of this mess is Charlie and Denise's marriage. Hopefully that effort produces better results.",0
11 | "The film is bad. There is no other way to say it. The story is weak and outdated, especially for this country. I don't think most people know what a ""walker"" is or will really care. I felt as if I was watching a movie from the 70's. The subject was just not believable for the year 2007, even being set in DC. I think this rang true for everyone else who watched it too as the applause were low and quick at the end. Most didn't stay for the Q&A either.<br /><br />I don't think Schrader really thought the film out ahead of time. Many of the scenes seemed to be cut short as if they were never finished or he just didn't know how to finish them. He jumped from one scene to the next and you had to try and figure out or guess what was going on. I really didn't get Woody's (Carter) private life or boyfriend either. What were all the ""artistic"" male bondage and torture pictures (from Iraq prisons) about? What was he thinking? I think it was his very poor attempt at trying to create this dark private subculture life for Woody's character (Car). It didn't work. It didn't even seem to make sense really.<br /><br />The only good thing about this film was Woody Harrelson. He played his character (Car) flawlessly. You really did get a great sense of what a ""walker"" may have been like (say twenty years ago). He was great and most likely will never get recognized for it. <br /><br />As for Lauren, Lily and Kristin... Boring.<br /><br />Don't see it! It is painful! Unless you are a true Harrelson fan.",0
12 | "Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as ""Teachers"". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is ""Teachers"". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!",1
13 | "Homelessness (or Houselessness as George Carlin stated) has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school, work, or vote for the matter. Most people think of the homeless as just a lost cause while worrying about things such as racism, the war on Iraq, pressuring kids to succeed, technology, the elections, inflation, or worrying if they'll be next to end up on the streets.<br /><br />But what if you were given a bet to live on the streets for a month without the luxuries you once had from a home, the entertainment sets, a bathroom, pictures on the wall, a computer, and everything you once treasure to see what it's like to be homeless? That is Goddard Bolt's lesson.<br /><br />Mel Brooks (who directs) who stars as Bolt plays a rich man who has everything in the world until deciding to make a bet with a sissy rival (Jeffery Tambor) to see if he can live in the streets for thirty days without the luxuries; if Bolt succeeds, he can do what he wants with a future project of making more buildings. The bet's on where Bolt is thrown on the street with a bracelet on his leg to monitor his every move where he can't step off the sidewalk. He's given the nickname Pepto by a vagrant after it's written on his forehead where Bolt meets other characters including a woman by the name of Molly (Lesley Ann Warren) an ex-dancer who got divorce before losing her home, and her pals Sailor (Howard Morris) and Fumes (Teddy Wilson) who are already used to the streets. They're survivors. Bolt isn't. He's not used to reaching mutual agreements like he once did when being rich where it's fight or flight, kill or be killed.<br /><br />While the love connection between Molly and Bolt wasn't necessary to plot, I found ""Life Stinks"" to be one of Mel Brooks' observant films where prior to being a comedy, it shows a tender side compared to his slapstick work such as Blazing Saddles, Young Frankenstein, or Spaceballs for the matter, to show what it's like having something valuable before losing it the next day or on the other hand making a stupid bet like all rich people do when they don't know what to do with their money. Maybe they should give it to the homeless instead of using it like Monopoly money.<br /><br />Or maybe this film will inspire you to help others.",1
14 | "Brilliant over-acting by Lesley Ann Warren. Best dramatic hobo lady I have ever seen, and love scenes in clothes warehouse are second to none. The corn on face is a classic, as good as anything in Blazing Saddles. The take on lawyers is also superb. After being accused of being a turncoat, selling out his boss, and being dishonest the lawyer of Pepto Bolt shrugs indifferently ""I'm a lawyer"" he says. Three funny words. Jeffrey Tambor, a favorite from the later Larry Sanders show, is fantastic here too as a mad millionaire who wants to crush the ghetto. His character is more malevolent than usual. The hospital scene, and the scene where the homeless invade a demolition site, are all-time classics. Look for the legs scene and the two big diggers fighting (one bleeds). This movie gets better each time I see it (which is quite often).",1
15 | "This is easily the most underrated film inn the Brooks cannon. Sure, its flawed. It does not give a realistic view of homelessness (unlike, say, how Citizen Kane gave a realistic view of lounge singers, or Titanic gave a realistic view of Italians YOU IDIOTS). Many of the jokes fall flat. But still, this film is very lovable in a way many comedies are not, and to pull that off in a story about some of the most traditionally reviled members of society is truly impressive. Its not The Fisher King, but its not crap, either. My only complaint is that Brooks should have cast someone else in the lead (I love Mel as a Director and Writer, not so much as a lead).",1
16 | "This is not the typical Mel Brooks film. It was much less slapstick than most of his movies and actually had a plot that was followable. Leslie Ann Warren made the movie, she is such a fantastic, under-rated actress. There were some moments that could have been fleshed out a bit more, and some scenes that could probably have been cut to make the room to do so, but all in all, this is worth the price to rent and see it. The acting was good overall, Brooks himself did a good job without his characteristic speaking to directly to the audience. Again, Warren was the best actor in the movie, but ""Fume"" and ""Sailor"" both played their parts well.",1
17 | "This isn't the comedic Robin Williams, nor is it the quirky/insane Robin Williams of recent thriller fame. This is a hybrid of the classic drama without over-dramatization, mixed with Robin's new love of the thriller. But this isn't a thriller, per se. This is more a mystery/suspense vehicle through which Williams attempts to locate a sick boy and his keeper.<br /><br />Also starring Sandra Oh and Rory Culkin, this Suspense Drama plays pretty much like a news report, until William's character gets close to achieving his goal.<br /><br />I must say that I was highly entertained, though this movie fails to teach, guide, inspect, or amuse. It felt more like I was watching a guy (Williams), as he was actually performing the actions, from a third person perspective. In other words, it felt real, and I was able to subscribe to the premise of the story.<br /><br />All in all, it's worth a watch, though it's definitely not Friday/Saturday night fare.<br /><br />It rates a 7.7/10 from...<br /><br />the Fiend :.",1
18 | "Yes its an art... to successfully make a slow paced thriller.<br /><br />The story unfolds in nice volumes while you don't even notice it happening.<br /><br />Fine performance by Robin Williams. The sexuality angles in the film can seem unnecessary and can probably affect how much you enjoy the film. However, the core plot is very engaging. The movie doesn't rush onto you and still grips you enough to keep you wondering. The direction is good. Use of lights to achieve desired affects of suspense and unexpectedness is good.<br /><br />Very nice 1 time watch if you are looking to lay back and hear a thrilling short story!",1
19 | "In this ""critically acclaimed psychological thriller based on true events, Gabriel (Robin Williams), a celebrated writer and late-night talk show host, becomes captivated by the harrowing story of a young listener and his adoptive mother (Toni Collette). When troubling questions arise about this boy's (story), however, Gabriel finds himself drawn into a widening mystery that hides a deadly secretÂ…"" according to film's official synopsis.<br /><br />You really should STOP reading these comments, and watch the film NOW...<br /><br />The ""How did he lose his leg?"" ending, with Ms. Collette planning her new life, should be chopped off, and sent to ""deleted scenes"" land. It's overkill. The true nature of her physical and mental ailments should be obvious, by the time Mr. Williams returns to New York. Possibly, her blindness could be in question - but a revelation could have be made certain in either the ""highway"" or ""video tape"" scenes. The film would benefit from a re-editing - how about a ""director's cut""? <br /><br />Williams and Bobby Cannavale (as Jess) don't seem, initially, believable as a couple. A scene or two establishing their relationship might have helped set the stage. Otherwise, the cast is exemplary. Williams offers an exceptionally strong characterization, and not a ""gay impersonation"". Sandra Oh (as Anna), Joe Morton (as Ashe), and Rory Culkin (Pete Logand) are all perfect.<br /><br />Best of all, Collette's ""Donna"" belongs in the creepy hall of fame. Ms. Oh is correct in saying Collette might be, ""you know, like that guy from 'Psycho'."" There have been several years when organizations giving acting awards seemed to reach for women, due to a slighter dispersion of roles; certainly, they could have noticed Collette with some award consideration. She is that good. And, director Patrick Stettner definitely evokes Hitchcock - he even makes getting a sandwich from a vending machine suspenseful.<br /><br />Finally, writers Stettner, Armistead Maupin, and Terry Anderson deserve gratitude from flight attendants everywhere.<br /><br />******* The Night Listener (1/21/06) Patrick Stettner ~ Robin Williams, Toni Collette, Sandra Oh, Rory Culkin",1
20 | "THE NIGHT LISTENER (2006) **1/2 Robin Williams, Toni Collette, Bobby Cannavale, Rory Culkin, Joe Morton, Sandra Oh, John Cullum, Lisa Emery, Becky Ann Baker. (Dir: Patrick Stettner) <br /><br />Hitchcockian suspenser gives Williams a stand-out low-key performance.<br /><br />What is it about celebrities and fans? What is the near paranoia one associates with the other and why is it almost the norm? <br /><br />In the latest derange fan scenario, based on true events no less, Williams stars as a talk-radio personality named Gabriel No one, who reads stories he's penned over the airwaves and has accumulated an interesting fan in the form of a young boy named Pete Logand (Culkin) who has submitted a manuscript about the travails of his troubled youth to No one's editor Ashe (Morton) who gives it to No one to read for himself. <br /><br />No one is naturally disturbed but ultimately intrigued about the nightmarish existence of Pete being abducted and sexually abused for years until he was finally rescued by a nurse named Donna (Collette giving an excellent performance) who has adopted the boy but her correspondence with No one reveals that Pete is dying from AIDS. Naturally No one wants to meet the fans but is suddenly in doubt to their possibly devious ulterior motives when the seed is planted by his estranged lover Jess (Cannavale) whose sudden departure from their New York City apartment has No one in an emotional tailspin that has only now grown into a tempest in a teacup when he decides to do some investigating into Donna and Pete's backgrounds discovering some truths that he didn't anticipate.<br /><br />Written by Armistead Maupin (who co-wrote the screenplay with his former lover Terry Anderson and the film's novice director Stettner) and based on a true story about a fan's hoax found out has some Hitchcockian moments that run on full tilt like any good old fashioned pot-boiler does. It helps that Williams gives a stand-out, low-key performance as the conflicted good-hearted personality who genuinely wants to believe that his number one fan is in fact real and does love him (the one thing that has escaped his own reality) and has some unsettling dreadful moments with the creepy Collette whose one physical trait I will leave unmentioned but underlines the desperation of her character that can rattle you to the core.<br /><br />However the film runs out of gas and eventually becomes a bit repetitive and predictable despite a finely directed piece of hoodwink and mystery by Stettner, it pays to listen to your own inner voice: be careful of what you hope for.",1
21 | "You know, Robin Williams, God bless him, is constantly shooting himself in the foot lately with all these dumb comedies he has done this decade (with perhaps the exception of ""Death To Smoochy"", which bombed when it came out but is now a cult classic). The dramas he has made lately have been fantastic, especially ""Insomnia"" and ""One Hour Photo"". ""The Night Listener"", despite mediocre reviews and a quick DVD release, is among his best work, period.<br /><br />This is a very chilling story, even though it doesn't include a serial killer or anyone that physically dangerous for that matter. The concept of the film is based on an actual case of fraud that still has yet to be officially confirmed. In high school, I read an autobiography by a child named Anthony Godby Johnson, who suffered horrific abuse and eventually contracted AIDS as a result. I was moved by the story until I read reports online that Johnson may not actually exist. When I saw this movie, the confused feelings that Robin Williams so brilliantly portrayed resurfaced in my mind.<br /><br />Toni Collette probably gives her best dramatic performance too as the ultimately sociopathic ""caretaker"". Her role was a far cry from those she had in movies like ""Little Miss Sunshine"". There were even times she looked into the camera where I thought she was staring right at me. It takes a good actress to play that sort of role, and it's this understated (yet well reviewed) role that makes Toni Collette probably one of the best actresses of this generation not to have even been nominated for an Academy Award (as of 2008). It's incredible that there is at least one woman in this world who is like this, and it's scary too.<br /><br />This is a good, dark film that I highly recommend. Be prepared to be unsettled, though, because this movie leaves you with a strange feeling at the end.",1
22 | 


--------------------------------------------------------------------------------