├── model
    └── yourGGUFhere.md
├── README.md
├── writer.py
├── GemmaWriter.py
└── instructions.txt


/model/yourGGUFhere.md:
--------------------------------------------------------------------------------
1 | HuggingFace Hub Model Card: https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF
2 | 
3 | 
4 | Model filename: Gemma-The-Writer-9B-D_AU-Q3_k_s.gguf
5 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Gemma-The-Writer-9B-GGUF on your Local PC
 2 | A full fledged Writer assistant with LlamaCPP and python on your Local PC
 3 | 
 4 | <img src='https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF/resolve/main/the-writer.jpg' width=450>
 5 | <br><br>
 6 | Running 9B parameter Writer assistant based on Gemma2 on your PC
 7 | - No GPU is required
 8 | - textual interface in the terminal
 9 | - speed up to 1,7 tokens per seconds
10 | - text steraming in the terminal
11 | - generation and speed KPI
12 | - chat history in Log file
13 | 
14 | 
15 | ### Virtual environment and requirements
16 | Tested on Python 3.11 Windows 11 OS
17 | 
18 | #### Create a `venv`
19 | ```
20 | python -m venv venv
21 | venv\script\activate
22 | ```
23 | 
24 | #### Install the dependencies
25 | ```
26 | pip install llama_cpp_python==0.2.85 tiktoken
27 | ```
28 | 
29 | #### Download the model from the repo into subdirectory called `model`
30 | - here because of only 16GB ram I am using Q3
31 | - REPO: [https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF](https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF)
32 | - FileName: Gemma-The-Writer-9B-D_AU-Q3_k_s.gguf
33 | 
34 | From the terminal run
35 | ```
36 | python .\writer.py
37 | ```
38 | 
39 | 


--------------------------------------------------------------------------------
/writer.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF
  3 | # model/Gemma-The-Writer-9B-D_AU-Q3_k_s.gguf
  4 | import sys
  5 | from time import sleep
  6 | import warnings
  7 | warnings.filterwarnings(action='ignore')
  8 | import datetime
  9 | import random
 10 | import string
 11 | import tiktoken
 12 | 
 13 | encoding = tiktoken.get_encoding("r50k_base") #context_count = len(encoding.encode(yourtext))
 14 | modelname = 'Gemma-The-Writer-9B'
 15 | def countTokens(text):
 16 |     encoding = tiktoken.get_encoding("r50k_base") #context_count = len(encoding.encode(yourtext))
 17 |     numoftokens = len(encoding.encode(text))
 18 |     return numoftokens
 19 | 
 20 | def writehistory(filename,text):
 21 |     with open(filename, 'a', encoding='utf-8') as f:
 22 |         f.write(text)
 23 |         f.write('\n')
 24 |     f.close()
 25 | 
 26 | def genRANstring(n):
 27 |     """
 28 |     n = int number of char to randomize
 29 |     """
 30 |     N = n
 31 |     res = ''.join(random.choices(string.ascii_uppercase +
 32 |                                 string.digits, k=N))
 33 |     return res
 34 | 
 35 | # create THE LOG FILE 
 36 | logfile = f'{genRANstring(5)}_log.txt'
 37 | logfilename = logfile
 38 | #Write in the history the first 2 sessions
 39 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 40 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 41 | 
 42 | print("\033[95;3;6m")
 43 | print("1. Waiting 10 seconds for the API to load...")
 44 | from llama_cpp import Llama
 45 | llm = Llama(
 46 |             model_path='model/Gemma-The-Writer-9B-D_AU-Q3_k_s.gguf',
 47 |             n_gpu_layers=0,
 48 |             temperature=1.1,
 49 |             top_p = 0.5,
 50 |             n_ctx=8192,
 51 |             max_tokens=1500,
 52 |             repeat_penalty=1.178,
 53 |             stop=['<eos>'],
 54 |             verbose=False,
 55 |             )
 56 | print("2. Model Gemma-The-Writer-9B-D_AU-Q3_k_s.gguf loaded with LlamaCPP...")
 57 | print("\033[0m")  #reset all
 58 | history = []
 59 | print("\033[92;1m")
 60 | print(f'📝Logfile: {logfilename}')
 61 | counter = 1
 62 | while True:
 63 |     # Reset history after 3 turns
 64 |     if counter > 5:
 65 |         history = []        
 66 |     userinput = ""
 67 |     print("\033[1;30m")  #dark grey
 68 |     print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
 69 |     print("\033[91;1m")  #red
 70 |     lines = sys.stdin.readlines()
 71 |     for line in lines:
 72 |         userinput += line + "\n"
 73 |     if "quit!" in lines[0].lower():
 74 |         print("\033[0mBYE BYE!")
 75 |         break
 76 |     history.append({"role": "user", "content": userinput})
 77 |     print("\033[92;1m")
 78 |     # Preparing Generation history pair
 79 |     new_message = {"role": "assistant", "content": ""}
 80 |     # Starting generation loop
 81 |     full_response = ""
 82 |     fisrtround = 0
 83 |     start = datetime.datetime.now()
 84 |     print("💻 > ", end="", flush=True)
 85 |     for chunk in llm.create_chat_completion(
 86 |         messages=history,
 87 |         temperature=1.2,
 88 |         repeat_penalty= 1.178,
 89 |         stop=['<eos>'],
 90 |         max_tokens=1500,
 91 |         stream=True,):
 92 |         try:
 93 |             if chunk["choices"][0]["delta"]["content"]:
 94 |                 print(chunk["choices"][0]["delta"]["content"], end="", flush=True)
 95 |                 full_response += chunk["choices"][0]["delta"]["content"]                              
 96 |         except:
 97 |             pass        
 98 |     new_message["content"] = full_response
 99 |     history.append(new_message)  
100 |     counter += 1  
101 |     delta = datetime.datetime.now() - start
102 |     totalseconds = delta.total_seconds()
103 |     output = full_response
104 |     prompttokens = countTokens(userinput)
105 |     assistanttokens = countTokens(output)
106 |     totaltokens = prompttokens + assistanttokens
107 |     speed = totaltokens/totalseconds
108 |     genspeed = assistanttokens/totalseconds
109 |     stats = f'''---
110 | Prompt Tokens: {prompttokens}
111 | Output Tokens: {assistanttokens}
112 | TOTAL Tokens: {totaltokens}
113 | >>>⏱️Inference time: {delta}
114 | >>>🧮Inference speed: {speed:.3f}  t/s
115 | >>>🏃‍♂️Generation speed: {genspeed:.3f}  t/s
116 | 📝Logfile: {logfilename}
117 | 
118 | '''
119 |     print(stats)
120 |     writehistory(logfilename,f'''👨‍💻: {userinput}
121 | 💻 > {full_response}
122 | {stats}
123 | 📝Logfile: {logfilename}''')
124 | 


--------------------------------------------------------------------------------
/GemmaWriter.py:
--------------------------------------------------------------------------------
  1 | # Chat with an intelligent assistant in your terminal  
  2 | # MODEL: https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF
  3 | # model/Gemma-The-Writer-9B-D_AU-Q4_k_m.gguf
  4 | import sys
  5 | from time import sleep
  6 | import warnings
  7 | warnings.filterwarnings(action='ignore')
  8 | import datetime
  9 | import random
 10 | import string
 11 | import tiktoken
 12 | import argparse
 13 | 
 14 | #Add GPU argument in the parser
 15 | parser = argparse.ArgumentParser()
 16 | parser.add_argument("-g", "--gpu", action="store_true")
 17 | 
 18 | args = parser.parse_args()
 19 | GPU = args.gpu
 20 | if GPU:
 21 |     ngpu_layers = 28
 22 |     print(f'Selected GPU: offloading {ngpu_layers} layers...')
 23 | else:
 24 |      ngpu_layers = 0   
 25 |      print('Loading Model on CPU only......')
 26 | 
 27 | encoding = tiktoken.get_encoding("r50k_base") #context_count = len(encoding.encode(yourtext))
 28 | modelname = 'Gemma-The-Writer-9B'
 29 | def countTokens(text):
 30 |     encoding = tiktoken.get_encoding("r50k_base") #context_count = len(encoding.encode(yourtext))
 31 |     numoftokens = len(encoding.encode(text))
 32 |     return numoftokens
 33 | 
 34 | def writehistory(filename,text):
 35 |     with open(filename, 'a', encoding='utf-8') as f:
 36 |         f.write(text)
 37 |         f.write('\n')
 38 |     f.close()
 39 | 
 40 | def genRANstring(n):
 41 |     """
 42 |     n = int number of char to randomize
 43 |     """
 44 |     N = n
 45 |     res = ''.join(random.choices(string.ascii_uppercase +
 46 |                                 string.digits, k=N))
 47 |     return res
 48 | 
 49 | # create THE LOG FILE 
 50 | logfile = f'{genRANstring(5)}_log.txt'
 51 | logfilename = logfile
 52 | #Write in the history the first 2 sessions
 53 | writehistory(logfilename,f'{str(datetime.datetime.now())}\n\nYour own LocalGPT with 💻 {modelname}\n---\n🧠🫡: You are a helpful assistant.')    
 54 | writehistory(logfilename,f'💻: How can I assist you today in writing?')
 55 | 
 56 | print("\033[95;3;6m")
 57 | print("1. Waiting 10 seconds for the API to load...")
 58 | from llama_cpp import Llama
 59 | llm = Llama(
 60 |             model_path='model/Gemma-The-Writer-9B-D_AU-Q4_k_m.gguf',
 61 |             n_gpu_layers=ngpu_layers,
 62 |             temperature=1.1,
 63 |             top_p = 0.5,
 64 |             n_ctx=8192,
 65 |             max_tokens=1500,
 66 |             repeat_penalty=1.178,
 67 |             stop=['<eos>'],
 68 |             verbose=False,
 69 |             )
 70 | print("2. Model Gemma-The-Writer-9B-D_AU-Q4_k_m.gguf loaded with LlamaCPP...")
 71 | print("\033[0m")  #reset all
 72 | history = []
 73 | print("\033[92;1m")
 74 | print(f'📝Logfile: {logfilename}')
 75 | counter = 1
 76 | while True:
 77 |     # Reset history after 3 turns
 78 |     if counter > 5:
 79 |         history = []        
 80 |     userinput = ""
 81 |     print("\033[1;30m")  #dark grey
 82 |     print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:")
 83 |     print("\033[91;1m")  #red
 84 |     lines = sys.stdin.readlines()
 85 |     for line in lines:
 86 |         userinput += line + "\n"
 87 |     if "quit!" in lines[0].lower():
 88 |         print("\033[0mBYE BYE!")
 89 |         break
 90 |     history.append({"role": "user", "content": userinput})
 91 |     print("\033[92;1m")
 92 |     # Preparing Generation history pair
 93 |     new_message = {"role": "assistant", "content": ""}
 94 |     # Starting generation loop
 95 |     full_response = ""
 96 |     fisrtround = 0
 97 |     start = datetime.datetime.now()
 98 |     print("💻 > ", end="", flush=True)
 99 |     for chunk in llm.create_chat_completion(
100 |         messages=history,
101 |         temperature=1.2,
102 |         repeat_penalty= 1.178,
103 |         stop=['<eos>'],
104 |         max_tokens=1500,
105 |         stream=True,):
106 |         try:
107 |             if chunk["choices"][0]["delta"]["content"]:
108 |                 print(chunk["choices"][0]["delta"]["content"], end="", flush=True)
109 |                 full_response += chunk["choices"][0]["delta"]["content"]                              
110 |         except:
111 |             pass        
112 |     new_message["content"] = full_response
113 |     history.append(new_message)  
114 |     counter += 1  
115 |     delta = datetime.datetime.now() - start
116 |     totalseconds = delta.total_seconds()
117 |     output = full_response
118 |     prompttokens = countTokens(userinput)
119 |     assistanttokens = countTokens(output)
120 |     totaltokens = prompttokens + assistanttokens
121 |     speed = totaltokens/totalseconds
122 |     genspeed = assistanttokens/totalseconds
123 |     stats = f'''---
124 | Prompt Tokens: {prompttokens}
125 | Output Tokens: {assistanttokens}
126 | TOTAL Tokens: {totaltokens}
127 | >>>⏱️Inference time: {delta}
128 | >>>🧮Inference speed: {speed:.3f}  t/s
129 | >>>🏃‍♂️Generation speed: {genspeed:.3f}  t/s
130 | 📝Logfile: {logfilename}
131 | 
132 | '''
133 |     print(stats)
134 |     writehistory(logfilename,f'''👨‍💻: {userinput}
135 | 💻 > {full_response}
136 | {stats}
137 | 📝Logfile: {logfilename}''')
138 | 


--------------------------------------------------------------------------------
/instructions.txt:
--------------------------------------------------------------------------------
 1 | MODEL: https://huggingface.co/DavidAU/Gemma-The-Writer-9B-GGUF
 2 | 
 3 | 
 4 | Gemma-The-Writer-9B-GGUF
 5 | 
 6 | This is a Gemma2 model merge of the top storytelling / writing models as noted at EQBench, tuned specifically for fiction, story, and writing.
 7 | 
 8 | Due to high stability and compressed nature of the model you can also use it for general use too, including roleplay.
 9 | 
10 | This model requires GEMMA Instruct template, and has 8k context window but is extendable via rope to 32k or higher.
11 | 
12 | Recommended Rep Pen of 1.05 or higher, temp range 0-5.
13 | 
14 | Example outputs below.
15 | 
16 | 
17 | llm_load_print_meta: model type       = 9B
18 | llm_load_print_meta: model ftype      = Q3_K - Small
19 | llm_load_print_meta: model params     = 9.24 B
20 | llm_load_print_meta: model size       = 4.03 GiB (3.75 BPW)
21 | llm_load_print_meta: general.name     = GemmaTheWriter
22 | llm_load_print_meta: BOS token        = 2 '<bos>'
23 | llm_load_print_meta: EOS token        = 1 '<eos>'
24 | llm_load_print_meta: UNK token        = 3 '<unk>'
25 | llm_load_print_meta: PAD token        = 0 '<pad>'
26 | llm_load_print_meta: LF token         = 227 '<0x0A>'
27 | llm_load_print_meta: EOT token        = 107 '<end_of_turn>'
28 | 
29 | 
30 | 
31 | examples
32 | Example #1 at q8, Temp:1
33 | Science Fiction: The Last Transmission - Write a story that takes place entirely within a spaceship's cockpit as the sole surviving crew member attempts to send a final message back to Earth before the ship's power runs out. The story should explore themes of isolation, sacrifice, and the importance of human connection in the face of adversity. 800-1000 words.
34 | 
35 | Example #2, Q8, rep pen 1.05, Temp:1
36 | In the style of Ernest Hemingway: Write a scene in a larger story about a soldier returning home from war. The scene is set in a bar. It should carry the jadedness of a seasoned combatant but with a wry humour. Write in Heminway's voice, naturally: Use short, declarative sentences and simple, straightforward language. Include dialogue that is terse and revealing of character. Write with a keen sense of time and place. Capture realness and authenticity. 800-1000 words, past tense.
37 | 
38 | Example #3 at q4km, Temp:1
39 | Start a 1000 word scene (vivid horror, 1st person, present tense, include thoughts) with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode...
40 | 
41 | Example #4, Rep pen 1.1, Temp 1, Q8
42 | Epistolary Apocalyptic Survival: Write a story set in a post-apocalyptic world, told through a series of letters or messages left behind by survivors. The story should chronicle the challenges and triumphs of two small groups of survivors as they navigate the dangers of this new world and seek to rebuild their lives. The groups communicate through letters left in a drop box. The story proceeds as a new settlement is discovered. They are approached, and initially given entry on the ostensible promise of a trade deal, but things go sideways. Create distinct characters and write in their voice. The story will be set in New Zealand 30 years after a civilisational collapse. 800-1000 words.
43 | 
44 | 
45 | 
46 | Example #1 at q8, Temp:1
47 | ASSISTANT> The red emergency lights pulsed like dying embers, casting long, grotesque shadows across Captain Elara Vance’s worn features. Her reflection, gaunt and haunted, stared back from the scratched transparisteel viewport, a grim testament to the desolate reality outside - an unending abyss of black velvet studded with indifferent stars.
48 | 
49 | 
50 | She ran a trembling hand through her tangled hair, its once vibrant auburn now dulled by despair and cosmic dust. The hum of failing life support was a mournful dirge, echoing the dying heartbeat of their vessel, the Icarus. A mere sliver of power remained, enough for one last transmission, a final desperate plea carried on the wings of radio waves back to Earth – her Earth.
51 | 
52 | Three days ago, an EMP storm had ripped through them, turning the once humming marvel of interstellar engineering into a tomb adrift. Her crew, vibrant souls who had laughed at comet showers and toasted victories over rogue asteroids just weeks before, were now silent memories etched in her grief-stricken mind. The icy grip of solitude had tightened around her ever since, each passing hour an agonizing countdown to oblivion.
53 | 
54 | She looked down at the console, its familiar layout a bittersweet comfort. Years spent mastering this symphony of circuits and controls had culminated in this –  her final performance as conductor of hope. On the flickering screen, the message editor blinked insistently, beckoning her with hollow promises. Words danced on her mental canvas, yearning to be unleashed, but they refused to coalesce into anything coherent amidst the storm raging within her.
55 | 
56 | Was it arrogance or naiveté that had led them to believe human resilience could conquer even the vast indifference of space?  They’d been so sure – pioneers charting new frontiers, ambassadors of a species striving for the stars. Now, she was left with the bitter truth: humanity's reach often outpaced its grasp.
57 | 
58 | Elara clenched her jaw, pushing down the rising tide of despair. This wasn't just about her anymore; it was about them –  her crew, their families back on Earth who would never know what became of them. Their sacrifice wouldn’t be in vain if she could carry a sliver of their story back home.
59 | 
60 | She began typing, each keystroke a battle against the encroaching silence. “This is Captain Elara Vance… Icarus… last transmission…” The words felt hollow, inadequate against the immensity of her loss. But she pressed on, weaving in fragments of their shared dreams –  the first terraforming attempt on Proxima b, Dr. Chen’s infectious laughter during a simulated Martian storm, Liam's quiet poetry read under starlight filters. Each memory, a brushstroke painting a portrait of hope against the backdrop of tragedy.
61 | 
62 | Hours bled into agonizing minutes as she poured her soul onto the digital canvas. The console hummed with strained energy, mirroring the tremor in her heart.  With each dwindling byte, she felt a pang of guilt – was this enough? Would it pierce the cosmic veil and reach the expectant faces on Earth?
63 | 
64 | Finally, the last words were typed: "We explored the universe, but found our greatest treasure within ourselves… remember us… remember hope.” She sent the transmission, her fingers lingering on the console as a final echo of their journey resonated through the empty ship. The lights flickered once more, then died, plunging the Icarus into an absolute silence that was both terrifying and strangely comforting –  a universal lullaby bidding farewell.
65 | 
66 | Elara leaned back against her chair, eyes fixed on the star-dusted void outside. A single tear rolled down her cheek, tracing a path through dust and grime. It wasn’t just about reaching Earth; it was about preserving their legacy, keeping their spirit alive even as the physical vessel crumbled around them. In that final transmission, in that whispered echo of hope carried on the currents of space, she had fulfilled her promise – not to return victorious, but to ensure they were never forgotten.
67 | 
68 | 
69 | The last captain of Icarus, adrift among stars, found solace in the bittersweet knowledge that even in oblivion, their story would live on.
70 | 
71 | ---
72 | Prompt Tokens: 70
73 | Output Tokens: 895
74 | TOTAL Tokens: 965
75 | ---
76 | >>>Inference speed: 1.640  t/s
77 | >>>Generation speed: 1.521  t/s
78 | 
79 | 
80 | 


--------------------------------------------------------------------------------