├── .gitignore
├── LICENSE
├── README.md
├── cmover.py
├── cmover_control.py
├── cmover_control_del.py
├── cmover_control_dirtime.py
├── cmover_del.py
├── cmover_dirtime.py
├── config.py
├── exceptions
├── lustre
    └── __init__.py
├── memcache.py
├── rabbitmq_init.sh
├── run.sh
├── run_del.sh
├── run_dirtime.sh
├── send_graphite.py
├── send_queue.py
├── send_queue_graphite.py
└── settings.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | rabbitmq
3 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2016, San Diego Supercomputer Center, University of California San Diego
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 5 | 
 6 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 7 | 
 8 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
 9 | 
10 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Description
  2 | This application clones large LUSTRE filesystems by parallelizing the data copy between worker nodes.
  3 | 
  4 | In general it's a good idea to copy data in several passes. Migrating a large filesystem involves HPC cluster downtime, which can be minimised by copying most of the data from live filesystem and then finalising the copy during a downtime, when no changes are made by users.
  5 | 
  6 | One copy pass can contain either of 3 actions:
  7 | 
  8 | * Copying the data. The files on source different from ones on target (including the case when those don't exist) are copied. Also involves creating folders on target.
  9 | 
 10 | * Delete. Needed for the final pass, when we want to delete files on target which were deleted by users from source after the last migration happened.
 11 | 
 12 | * Dirs mtime sync. After all changes to files are made on target filesystem, this pass will synchronize the target folders mtimes with ones on source to create an identical copy of the filesystem.
 13 | 
 14 | _(source: the filesystem with users' data; target: the filesystem where data is being migrated to)_
 15 | 
 16 | On the first run during the first-level folders creation on target filesystem the folders will be alternated between 2 MDS'es (0 and 1). THis is done to better balance the load in case we have 2 MDS. Adjust the code if needed otherwise.
 17 | 
 18 | #Dependencies:
 19 | * RabbitMQ server
 20 | * Python (tested with 2.7)
 21 | * Celery python package
 22 | * Memcached server
 23 | * Lustre filesystems mounted on all mover nodes
 24 | * Graphite server for statistics collection
 25 | * cloghandler python package
 26 | * put lustreapi.py from pcp project ( https://github.com/wtsi-ssg/pcp/blob/master/pcplib/lustreapi.py ) and its required liblustreapi.so module into "lustre" folder
 27 | 
 28 | #Configuring:
 29 | 
 30 | All celery task files are using  rabbitmq/rabbitmq.conf and rabbitmq/rabbitmq_data_move.conf files to connect to the rabbitmq server.
 31 | 
 32 | The rabbitmq/rabbitmq_data_move.conf file should have the data_move user password. To set a password and create the file run rabbitmq_init.sh script on rabbitmq server. The file should be synced to all nodes which will perform the data migration. The rabbitmq/rabbitmq.conf file should contain the RabbitMQ server hostname. Same hostname will be used for memcached server.
 33 | 
 34 | Most data migration settings can be set in config.py file:
 35 | 
 36 | ```USERNAME = 'data_move'```
 37 | 
 38 | used as rabbitmq virtual hostname and rabbitmq username. The passord for the user should be in rabbitmq/rabbitmq_<username>.conf file
 39 | 
 40 | ```OLDEST_DATE = 90 * 24 * 60 * 60```
 41 | 
 42 | filter files by atime. Files older than OLDEST_DATE won't be copied. To disable set to 0.
 43 | 
 44 | ```NEWEST_DATE =  0 #24 * 60 * 60```
 45 | 
 46 | filter files by mtime. Files newer than NEWEST_DATE won't be copied. To disable set to 0.
 47 | Useful for initial pass: the files with recent mtime are likely to be changed by a user during the migration. Should be set to 0 for the final pass.
 48 | 
 49 | ```STATS_ENABLED = True```
 50 | 
 51 | Enable statistics to memcached
 52 | 
 53 | ```REPORT_INTERVAL = 30 # seconds```
 54 | 
 55 | Minimal time interval between stats reports
 56 | 
 57 | ```source_mount = "/mnt/source"```
 58 | 
 59 | Source filesystem mount point
 60 | 
 61 | ```target_mount = "/mnt/target"```
 62 | 
 63 | Target filesystem mount point
 64 | 
 65 | #Running:
 66 | 
 67 | To start regular file copy celery workers on a node, run run.sh script. The run_del.sh script will run delete workers, and run_dirtime.sh runs directories mtime fix workers. To run the same scripts in debug mode, use "try_f" (files worker) and "try_d" (dirs worker) parameters for run.sh and run_del.sh scripts. Those will run in foreground and display all debugging information.
 68 | 
 69 | By default scripts run 16 file and 16 dir celery workers, which connect to RabbitMQ server and wait for a job to perform.
 70 | 
 71 | Once all workers have been started, an initial message should be added to the queue containing the root location of source filesystem. To start files copy run:
 72 | 
 73 | ```python cmover_control.py start```
 74 | 
 75 | To start files delete pass:
 76 | 
 77 | ```python cmover_control_del.py start```
 78 | 
 79 | To start folders mtime fix pass:
 80 | 
 81 | ```python cmover_control_dirtime.py start```
 82 | 
 83 | To stop operation and shutdown all the workers, run:
 84 | 
 85 | ```python cmover_control.py stop```
 86 | 
 87 | This will stop all the workers in virtual host. To send the command to a specific node, modify the cmover_control<_*>.py file and add destination parameter, f.e.:
 88 | 
 89 | ```app.control.broadcast('shutdown', destination=["celery@file_node02", "celery@dir_node02"])```
 90 | 
 91 | The workers pool on all nodes can be extended by n workers with command:
 92 | 
 93 | ```celery control -A cmover_control pool_grow n```
 94 | 
 95 | or on a single node:
 96 | ```celery control -A cmover_control -d "celery@file_<hostname>" pool_grow n ```
 97 | 
 98 | To get the list of current tasks for all nodes:
 99 | 
100 | ```celery inspect active -A cmover_control```
101 | 
102 | #Monitoring:
103 | 
104 | The statistics of data moving process can be collected by a graphite server. The send_graphite.py script collects known fields from memcached service and sends these to graphite. The script can be run periodically with a crontab task at the frequency defined in graphite for the collection.
105 | 
106 | 


--------------------------------------------------------------------------------
/cmover.py:
--------------------------------------------------------------------------------
  1 | #!/opt/python/bin/python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import logging
  5 | import json
  6 | import sys
  7 | import stat
  8 | import ctypes
  9 | import time
 10 | import copy
 11 | import datetime
 12 | 
 13 | import settings
 14 | 
 15 | from config import *
 16 | 
 17 | from cloghandler import ConcurrentRotatingFileHandler
 18 | 
 19 | from os.path import lexists, relpath, exists, join, dirname, samefile, isfile, \
 20 |     islink, isdir, ismount
 21 | import os
 22 | 
 23 | from lustre import lustreapi
 24 | 
 25 | import subprocess
 26 | 
 27 | from celery import Celery
 28 | from celery.decorators import periodic_task
 29 | from billiard import current_process
 30 | 
 31 | import memcache
 32 | 
 33 | 
 34 | clib = ctypes.CDLL('libc.so.6', use_errno=True)
 35 | 
 36 | logger = logging.getLogger(__name__)
 37 | rotateHandler = ConcurrentRotatingFileHandler("/var/log/cmover.log", "a", 128*1024*1024)
 38 | formatter = logging.Formatter('%(asctime)s - %(levelname)s [%(filename)s:%(lineno)s - %(funcName)20s()] - %(message)s')
 39 | rotateHandler.setFormatter(formatter)
 40 | logger.addHandler(rotateHandler)
 41 | 
 42 | with open('rabbitmq/rabbitmq.conf','r') as f:
 43 |     rabbitmq_server = f.read().rstrip()
 44 | 
 45 | with open('rabbitmq/rabbitmq_%s.conf'%USERNAME,'r') as f:
 46 |     rabbitmq_password = f.read().rstrip()
 47 | 
 48 | exceptions = []
 49 | if (isfile('exceptions')):
 50 |     with open('exceptions','r') as f:
 51 |         exceptions = f.read().splitlines()                                                                                                                              
 52 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
 53 | app.config_from_object(settings)
 54 | 
 55 | class ActionError(Exception):                                                                                                                                 
 56 |     pass
 57 | 
 58 | class LustreSource(object):
 59 | 
 60 |     SPLIT_FILES_CHUNK = 1000  # how many files to parse before passing job to others
 61 | 
 62 |     def procDir(self, dir, mds_num):
 63 |         cur_depth = len(dir.split('/'))
 64 | 
 65 |         if(dir != source_mount):
 66 |             #logger.info("Creating dir %s"%dir)
 67 |             self.createDir(dir, join(target_mount,
 68 |                            relpath(dir, source_mount)), mds_num)
 69 | 
 70 |         cur_files = []
 71 |         proc = subprocess.Popen([
 72 |             'lfs',
 73 |             'find',
 74 |             dir,
 75 |             '-maxdepth',
 76 |             '1',
 77 |             '!',
 78 |             '--type',
 79 |             'd'],
 80 |             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 81 |         while True:
 82 |             line = proc.stdout.readline().rstrip()
 83 |             if line:
 84 |                 cur_files.append(line)
 85 |                 if(len(cur_files) >= self.SPLIT_FILES_CHUNK):
 86 |                     procFiles.delay(cur_files)
 87 |                     cur_files = []
 88 |             else:
 89 |                 if(len(cur_files)):
 90 |                     procFiles.delay(cur_files)
 91 |                     cur_files = []
 92 |                 break
 93 |         while True:
 94 |             line = proc.stderr.readline().rstrip()
 95 |             if line:
 96 |                 logger.error("Got error scanning %s folder: %s"%(dir, line))
 97 |             else:
 98 |                 break
 99 | 
100 |         proc = subprocess.Popen([
101 |             'lfs',
102 |             'find',
103 |             dir,
104 |             '-maxdepth',
105 |             '1',
106 |             '-type',
107 |             'd'],
108 |             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
109 |         cur_mds_num = 0
110 |         while True:
111 |             line = proc.stdout.readline().rstrip()
112 |             if line:
113 |                 if line != dir:
114 |                     procDir.delay(line, cur_mds_num)
115 |             else:
116 |                 break
117 |             cur_mds_num = 1-cur_mds_num
118 |         while True:
119 |             line = proc.stderr.readline().rstrip()
120 |             if line:
121 |                 logger.error("Got error scanning %s folder: %s"%(dir, line))
122 |             else:
123 |                 break
124 | 
125 |     def createDir(self, sourcedir, destdir, mds_num):
126 | 
127 |         if not exists(sourcedir):
128 |             return
129 | 
130 |         if not exists(destdir):
131 |             level = len(filter(None, destdir.replace(target_mount,'').split("/")))
132 |             if( (not MDS_IS_STRIPED) or (level > 1) ):
133 |                 os.mkdir(destdir)
134 |             else:
135 |                 subprocess.Popen(['lfs setdirstripe -i %s %s'%(mds_num, destdir)], 
136 |                     shell=True).communicate()
137 | 
138 |         sstat = self.safestat(sourcedir)
139 |         dstat = self.safestat(destdir)
140 |         if(sstat.st_mode != dstat.st_mode):
141 |             os.chmod(destdir, sstat.st_mode)
142 |         if((sstat.st_uid != dstat.st_uid) or (sstat.st_gid != dstat.st_gid)):
143 |             os.chown(destdir, sstat.st_uid, sstat.st_gid)
144 |         
145 |         slayout = lustreapi.getstripe(sourcedir)
146 |         dlayout = lustreapi.getstripe(destdir)
147 |         if slayout.isstriped() != dlayout.isstriped() or slayout.stripecount != dlayout.stripecount:
148 |             lustreapi.setstripe(destdir, stripecount=slayout.stripecount)
149 | 
150 |     def copyFile(self, src, dst):
151 |         try:
152 |             logger.debug("looking at %s"%src)
153 |             srcstat = self.safestat(src)
154 |             mode = srcstat.st_mode
155 |             size = srcstat.st_size
156 |             blksize = srcstat.st_blksize
157 | 
158 |             if OLDEST_DATE and srcstat.st_atime + OLDEST_DATE \
159 |                 < int(time.time()):
160 |                 if not stat.S_ISLNK(mode): # copy all symlinks
161 |                     return
162 |                 else:
163 |                     return
164 | 
165 |             if NEWEST_DATE and srcstat.st_mtime + NEWEST_DATE \
166 |                 > int(time.time()):
167 |                 if not stat.S_ISLNK(mode): # copy all symlinks
168 |                     return
169 |                 else:
170 |                     return
171 | 
172 |             # regular files
173 | 
174 |             if stat.S_ISREG(mode):
175 |                 layout = lustreapi.getstripe(src)
176 |                 if layout.stripecount < 16:
177 |                     count = layout.stripecount
178 |                 else:
179 |                     count = -1
180 | 
181 |                 done = False
182 |                 while not done:
183 |                     try:
184 |                         if exists(dst):
185 |                             deststat = self.safestat(dst)
186 |                             if srcstat.st_size == deststat.st_size \
187 |                                 and srcstat.st_mtime == deststat.st_mtime \
188 |                                 and srcstat.st_uid == deststat.st_uid \
189 |                                 and srcstat.st_gid == deststat.st_gid \
190 |                                 and srcstat.st_mode == deststat.st_mode:
191 |                                 return
192 | 
193 |                             # file exists; blow it away
194 | 
195 |                             os.remove(dst)
196 |                             #logger.warn('%s has changed' % dst)
197 |                         lustreapi.setstripe(dst, stripecount=count)
198 |                         done = True
199 |                     except IOError, error:
200 |                         if error.errno != 17:
201 |                             raise
202 |                         logger.warn('Restart %s' % dst)
203 | 
204 |                 copied_data = self.bcopy(src, dst, blksize)
205 |                 os.chown(dst, srcstat.st_uid, srcstat.st_gid)
206 |                 os.chmod(dst, srcstat.st_mode)
207 |                 os.utime(dst, (srcstat.st_atime, srcstat.st_mtime))
208 |                 return copied_data
209 | 
210 |             # symlinks
211 | 
212 |             if stat.S_ISLNK(mode):
213 |                 linkto = os.readlink(src)
214 |                 try:
215 |                     os.symlink(linkto, dst)
216 |                 except OSError, error:
217 |                     if error.errno == 17:
218 |                         os.remove(dst)
219 |                         os.symlink(linkto, dst)
220 |                     else:
221 |                         raise
222 |                 try:
223 |                     os.lchown(dst, srcstat.st_uid, srcstat.st_gid)
224 |                     return
225 |                 except OSError, error:
226 |                     raise
227 | 
228 |             logger.error("Unknown filetype %s"%src)
229 |         except:
230 |             logger.exception("Error copying file %s"%src)
231 | 
232 |     def fadviseSeqNoCache(self, fileD):
233 |         """Advise the kernel that we are only going to access file-descriptor
234 |         fileD once, sequentially."""
235 | 
236 |         POSIX_FADV_SEQUENTIAL = 2
237 |         POSIX_FADV_DONTNEED = 4
238 |         offset = ctypes.c_int64(0)
239 |         length = ctypes.c_int64(0)
240 |         clib.posix_fadvise(fileD, offset, length, POSIX_FADV_SEQUENTIAL)
241 |         clib.posix_fadvise(fileD, offset, length, POSIX_FADV_DONTNEED)
242 | 
243 |     def bcopy(
244 |         self,
245 |         src,
246 |         dst,
247 |         blksize,
248 |         ):
249 | 
250 |         try:
251 |             with open(src, 'rb') as infile:
252 |                 with open(dst, 'wb') as outfile:
253 |                     self.fadviseSeqNoCache(infile.fileno())
254 |                     self.fadviseSeqNoCache(outfile.fileno())
255 |                     logger.debug("bcopy %s"%src)
256 |                     tot_size = 0
257 |                     while True:
258 |                         data = infile.read(blksize)
259 |                         if not data:
260 |                             break
261 |                         outfile.write(data)
262 |                         tot_size += len(data)
263 |                     return tot_size
264 |         except:
265 |             logger.exception('Error copying %s'%src)
266 | 
267 |     def safestat(self, filename, follow_symlink=False):
268 |         while True:
269 |             try:
270 |                 if(follow_symlink):
271 |                     return os.stat(filename)
272 |                 else:
273 |                     return os.lstat(filename)
274 |             except IOError, error:
275 |                 if error.errno != 4:
276 |                     raise
277 | 
278 | 
279 | 
280 | dataPlugin = LustreSource()
281 | 
282 | def get_mc_conn():
283 |     mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
284 |     return mc
285 | 
286 | def isProperDirection(path):
287 |     if not path.startswith(source_mount):
288 |         raise Exception("Wrong direction param, %s not starts with %s"%(path, source_mount)) 
289 |     if (not ismount(source_mount)):
290 |        logger.error("%s not mounted"%source_mount)
291 |        raise Exception("%s not mounted"%source_mount) 
292 |     if (not ismount(target_mount)):
293 |        logger.error("%s not mounted"%target_mount)
294 |        raise Exception("%s not mounted"%target_mount) 
295 | 
296 | def report_files_progress(copied_files, copied_data):
297 |     if(not STATS_ENABLED):
298 |         return
299 |     mc = get_mc_conn()
300 |     if(copied_files):
301 |         if(not mc.incr("%s.files"%get_worker_name(), "%s"%copied_files)):
302 |             mc.set("%s.files"%get_worker_name(), "%s"%copied_files)
303 | 
304 |     if(copied_data):
305 |         if(not mc.incr("%s.data"%get_worker_name(), "%s"%copied_data)):
306 |             mc.set("%s.data"%get_worker_name(), "%s"%copied_data)
307 |     mc.disconnect_all()
308 | 
309 | @app.task(ignore_result=True)
310 | def procDir(dir, mds_num):
311 |     isProperDirection(dir.rstrip())
312 |     if(not dir.startswith( tuple(exceptions) )):
313 |         dataPlugin.procDir(dir, mds_num)
314 | 
315 |     if(STATS_ENABLED):
316 |         mc = get_mc_conn()
317 |         if(not mc.incr("%s.dirs"%get_worker_name()) ):
318 |             mc.set("%s.dirs"%get_worker_name(), "1")
319 |         mc.disconnect_all()
320 | 
321 | @app.task(ignore_result=True)
322 | def procFiles(files):
323 | 
324 |     copied_files = 0
325 |     copied_data = 0
326 | 
327 |     last_report = 0
328 | 
329 |     for file in files:
330 | 
331 |         if(file.startswith( tuple(exceptions) )):
332 |             return
333 | 
334 |         isProperDirection(file)
335 |         copied_data_cur = dataPlugin.copyFile(file,
336 |                 join(target_mount, relpath(file,
337 |                 source_mount)))
338 | 
339 |         if(copied_data_cur):
340 |             copied_data = copied_data + copied_data_cur
341 |         copied_files = copied_files+1
342 | 
343 |         if(time.time()-last_report > REPORT_INTERVAL):
344 |             report_files_progress(copied_files, copied_data)
345 |             copied_files = 0
346 |             copied_data = 0
347 |             last_report = time.time()
348 | 
349 |     report_files_progress(copied_files, copied_data)
350 | 
351 | from cmover import procDir, procFiles
352 | 
353 | def get_worker_name():
354 |     return "cmover.%s_%s"%(current_process().initargs[1].split('@')[1],current_process().index)
355 | 


--------------------------------------------------------------------------------
/cmover_control.py:
--------------------------------------------------------------------------------
 1 | #!/opt/python/bin/python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import settings
 5 | 
 6 | from celery import Celery
 7 | from celery.decorators import periodic_task
 8 | import sys, os.path
 9 | from celery.task.control import revoke
10 | 
11 | from config import *
12 | 
13 | cur_dir = os.path.dirname(os.path.realpath(__file__))
14 | 
15 | with open('%s/rabbitmq/rabbitmq.conf'%cur_dir,'r') as f:
16 |     rabbitmq_server = f.read().rstrip()
17 | 
18 | with open('%s/rabbitmq/rabbitmq_%s.conf'%(cur_dir, USERNAME),'r') as f:
19 |     rabbitmq_password = f.read().rstrip()
20 | 
21 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
22 | app.config_from_object(settings)
23 | 
24 | 
25 | if __name__ == "__main__":
26 |     
27 |     if (len(sys.argv)< 2 ):
28 |         print "Usage: %s [start|stop]"
29 |         sys.exit
30 | 
31 |     if(str(sys.argv[1]) == "start"):
32 |         app.send_task('cmover.procDir', args=["%s"%source_mount,0], kwargs={})
33 |     elif(str(sys.argv[1]) == "stop"):
34 |         app.control.broadcast('shutdown')
35 | 
36 |     else:
37 |         print "Wrong argument"
38 |         sys.exit
39 | 
40 | 
41 | 


--------------------------------------------------------------------------------
/cmover_control_del.py:
--------------------------------------------------------------------------------
 1 | #!/opt/python/bin/python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import settings
 5 | 
 6 | from celery import Celery
 7 | from celery.decorators import periodic_task
 8 | import sys
 9 | import os.path
10 | 
11 | from config import *
12 | 
13 | cur_dir = os.path.dirname(os.path.realpath(__file__))
14 | 
15 | with open('%s/rabbitmq/rabbitmq.conf'%cur_dir,'r') as f:
16 |     rabbitmq_server = f.read().rstrip()
17 | 
18 | with open('%s/rabbitmq/rabbitmq_%s.conf'%(cur_dir, USERNAME),'r') as f:
19 |     rabbitmq_password = f.read().rstrip()
20 | 
21 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
22 | app.config_from_object(settings)
23 | 
24 | 
25 | if __name__ == "__main__":
26 |     
27 |     if (len(sys.argv)< 2 ):
28 |         print "Usage: %s [start|stop]"
29 |         sys.exit
30 | 
31 |     if(str(sys.argv[1]) == "start"):
32 |         app.send_task('cmover_del.procDir', args=["%s"%target_mount], kwargs={})
33 |     elif(str(sys.argv[1]) == "stop"):
34 |         app.control.broadcast('shutdown')
35 | 
36 |     else:
37 |         print "Wrong argument"
38 |         sys.exit
39 | 
40 | 
41 | 


--------------------------------------------------------------------------------
/cmover_control_dirtime.py:
--------------------------------------------------------------------------------
 1 | #!/opt/python/bin/python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import settings
 5 | 
 6 | from celery import Celery
 7 | from celery.decorators import periodic_task
 8 | import sys
 9 | 
10 | from config import *
11 | 
12 | with open('rabbitmq/rabbitmq.conf','r') as f:
13 |     rabbitmq_server = f.read().rstrip()
14 | 
15 | with open('rabbitmq/rabbitmq_%s.conf'%USERNAME,'r') as f:
16 |     rabbitmq_password = f.read().rstrip()
17 | 
18 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
19 | app.config_from_object(settings)
20 | 
21 | 
22 | if __name__ == "__main__":
23 |     
24 |     if (len(sys.argv)< 2 ):
25 |         print "Usage: %s [start|stop]"
26 |         sys.exit
27 | 
28 |     if(str(sys.argv[1]) == "start"):
29 |         app.send_task('cmover_dirtime.procDir', args=["%s"%source_mount], kwargs={})
30 |     elif(str(sys.argv[1]) == "stop"):
31 |         app.control.broadcast('shutdown')
32 | 
33 |     else:
34 |         print "Wrong argument"
35 |         sys.exit
36 | 
37 | 
38 | 


--------------------------------------------------------------------------------
/cmover_del.py:
--------------------------------------------------------------------------------
  1 | #!/opt/python/bin/python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import logging
  5 | import json
  6 | import sys
  7 | import stat
  8 | import ctypes
  9 | import time
 10 | import copy
 11 | import datetime
 12 | 
 13 | import settings
 14 | 
 15 | from config import *
 16 | 
 17 | from cloghandler import ConcurrentRotatingFileHandler
 18 | 
 19 | from random import random
 20 | 
 21 | from os.path import relpath, exists, lexists, join, dirname, samefile, isfile, \
 22 |     islink, isdir, ismount
 23 | import os
 24 | 
 25 | from lustre import lustreapi
 26 | 
 27 | import subprocess
 28 | 
 29 | from celery import Celery
 30 | from celery.decorators import periodic_task
 31 | from billiard import current_process
 32 | 
 33 | import memcache
 34 | 
 35 | import shutil
 36 | 
 37 | clib = ctypes.CDLL('libc.so.6', use_errno=True)
 38 | 
 39 | logger = logging.getLogger(__name__)
 40 | rotateHandler = ConcurrentRotatingFileHandler("/var/log/cmover_del.log", "a", 128*1024*1024)
 41 | formatter = logging.Formatter('%(asctime)s - %(levelname)s [%(filename)s:%(lineno)s - %(funcName)20s()] - %(message)s')
 42 | rotateHandler.setFormatter(formatter)
 43 | logger.addHandler(rotateHandler)
 44 | 
 45 | REPORT_INTERVAL = 30 # seconds
 46 | 
 47 | with open('rabbitmq/rabbitmq.conf','r') as f:
 48 |     rabbitmq_server = f.read().rstrip()
 49 | 
 50 | with open('rabbitmq/rabbitmq_%s.conf'%USERNAME,'r') as f:
 51 |     rabbitmq_password = f.read().rstrip()
 52 | 
 53 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
 54 | app.config_from_object(settings)
 55 | 
 56 | class ActionError(Exception):                                                                                                                                 
 57 |     pass
 58 | 
 59 | class LustreSource(object):
 60 | 
 61 |     SPLIT_FILES_CHUNK = 1000  # how many files to parse before passing job to others
 62 | 
 63 |     def procDir(self, dir):
 64 |         cur_depth = len(dir.split('/'))
 65 | 
 66 |         if(dir != target_mount and
 67 |             not os.path.exists(join(source_mount,
 68 |                        relpath(dir, target_mount)))):
 69 |         
 70 |             self.delDir(dir)
 71 |             return
 72 | 
 73 |         cur_files = []
 74 |         proc = subprocess.Popen([
 75 |             'lfs',
 76 |             'find',
 77 |             dir,
 78 |             '-maxdepth',
 79 |             '1',
 80 |             '!',
 81 |             '--type',
 82 |             'd'],
 83 |             stdout=subprocess.PIPE)
 84 |         while True:
 85 |             line = proc.stdout.readline().rstrip()
 86 |             if line:
 87 |                 cur_files.append(line)
 88 |                 if(len(cur_files) >= self.SPLIT_FILES_CHUNK):
 89 |                     procFiles.delay(cur_files)
 90 |                     cur_files = []
 91 |             else:
 92 |                 if(len(cur_files)):
 93 |                     procFiles.delay(cur_files)
 94 |                     cur_files = []
 95 |                 break
 96 |         proc.communicate()
 97 | 
 98 |         proc = subprocess.Popen([
 99 |             'lfs',
100 |             'find',
101 |             dir,
102 |             '-maxdepth',
103 |             '1',
104 |             '-type',
105 |             'd'],
106 |             stdout=subprocess.PIPE)
107 |         while True:
108 |             line = proc.stdout.readline().rstrip()
109 |             if line:
110 |                 if line != dir:
111 |                     procDir.delay(line)
112 |             else:
113 |                 break
114 |         proc.communicate()
115 | 
116 |     def delDir(self, dir):
117 |         try:
118 |             shutil.rmtree(dir)
119 |             #logger.warning("Deleting %s"%dir)
120 |         except:
121 |             logger.exception("Error removing dir %s"%dir)
122 | 
123 |     def safestat(self, filename):
124 |         """lstat sometimes get Interrupted system calls; wrap it up so we can
125 |         retry"""
126 | 
127 |         while True:
128 |             try:
129 |                 statdata = os.lstat(filename)
130 |                 return statdata
131 |             except IOError, error:
132 |                 if error.errno != 4:
133 |                     raise
134 | 
135 | 
136 | 
137 | dataPlugin = LustreSource()
138 | 
139 | def get_mc_conn():
140 |     mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
141 |     return mc
142 | 
143 | def isProperDirection(path):
144 |     if not path.startswith(target_mount):
145 |         raise Exception("Wrong direction param, %s not starts with %s"%(path, target_mount)) 
146 |     if (not ismount(source_mount)):
147 |        logger.error("%s not mounted"%source_mount)
148 |        raise Exception("%s not mounted"%source_mount) 
149 |     if (not ismount(target_mount)):
150 |        logger.error("%s not mounted"%target_mount)
151 |        raise Exception("%s not mounted"%target_mount) 
152 | 
153 | 
154 | def report_files_progress(copied_files):
155 |     if(not STATS_ENABLED):
156 |         return
157 |     mc = get_mc_conn()
158 |     if(copied_files):
159 |         if(not mc.incr("%s.files"%get_worker_name(), "%s"%copied_files)):
160 |             mc.set("%s.files"%get_worker_name(), "%s"%copied_files)
161 |     mc.disconnect_all()
162 | 
163 | def safestat(filename, follow_symlink=False):
164 |     while True:
165 |         try:
166 |             if(follow_symlink):
167 |                 return os.stat(filename)
168 |             else:
169 |                 return os.lstat(filename)
170 |         except IOError, error:
171 |             if error.errno != 4:
172 |                 raise
173 | 
174 | def is_delete():
175 |     return random() <= (percent_to_delete/100.0)
176 | 
177 | def checkFile(src, dst):
178 |     try:
179 |         if (not lexists(src)) or (percent_to_delete and is_delete()):
180 |             os.remove(dst)
181 |             #logger.warning("Deleting %s"%dst)
182 |             return
183 | 
184 |     except:
185 |         logger.exception("Error removing file %s"%dst)
186 | 
187 | 
188 | @app.task(ignore_result=True)
189 | def procDir(dir):
190 |     isProperDirection(dir.rstrip())
191 |     dataPlugin.procDir(dir)
192 | 
193 |     if(STATS_ENABLED):
194 |         mc = get_mc_conn()
195 |         if(not mc.incr("%s.dirs"%get_worker_name()) ):
196 |             mc.set("%s.dirs"%get_worker_name(), "1")
197 |         mc.disconnect_all()
198 | 
199 | @app.task(ignore_result=True)
200 | def procFiles(files):
201 | 
202 |     copied_files = 0
203 | 
204 |     last_report = 0
205 | 
206 |     for file in files:
207 | 
208 |         isProperDirection(file)
209 |         checkFile(
210 |                 join(source_mount, relpath(file,
211 |                 target_mount)),
212 |                 file)
213 | 
214 | 
215 |         copied_files = copied_files+1
216 | 
217 |         if(time.time()-last_report > REPORT_INTERVAL):
218 |             report_files_progress(copied_files)
219 |             copied_files = 0
220 |             last_report = time.time()
221 | 
222 |     report_files_progress(copied_files)
223 | 
224 | from cmover_del import procDir, procFiles
225 | 
226 | def get_worker_name():
227 |     return "cmover.%s_%s"%(current_process().initargs[1].split('@')[1],current_process().index)
228 | 
229 | 


--------------------------------------------------------------------------------
/cmover_dirtime.py:
--------------------------------------------------------------------------------
  1 | #!/opt/python/bin/python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | import logging
  5 | import json
  6 | import sys
  7 | import stat
  8 | import ctypes
  9 | import time
 10 | import copy
 11 | import datetime
 12 | import shutil
 13 | 
 14 | import settings
 15 | 
 16 | from config import *
 17 | 
 18 | from cloghandler import ConcurrentRotatingFileHandler
 19 | 
 20 | from random import normalvariate
 21 | 
 22 | from os.path import relpath, exists, lexists, join, dirname, samefile, isfile, \
 23 |     islink, isdir, ismount
 24 | import os
 25 | import re
 26 | 
 27 | from lustre import lustreapi
 28 | 
 29 | import subprocess
 30 | 
 31 | from celery import Celery
 32 | from celery.decorators import periodic_task
 33 | from billiard import current_process
 34 | 
 35 | import memcache
 36 | 
 37 | clib = ctypes.CDLL('libc.so.6', use_errno=True)
 38 | 
 39 | logger = logging.getLogger(__name__)
 40 | rotateHandler = ConcurrentRotatingFileHandler("/var/log/cmover_dir.log", "a", 5*1024*1024*1024)
 41 | formatter = logging.Formatter('%(asctime)s - %(levelname)s [%(filename)s:%(lineno)s - %(funcName)20s()] - %(message)s')
 42 | rotateHandler.setFormatter(formatter)
 43 | logger.addHandler(rotateHandler)
 44 | 
 45 | REPORT_INTERVAL = 30 # seconds
 46 | 
 47 | with open('rabbitmq/rabbitmq.conf','r') as f:
 48 |     rabbitmq_server = f.read().rstrip()
 49 | 
 50 | with open('rabbitmq/rabbitmq_%s.conf'%USERNAME,'r') as f:
 51 |     rabbitmq_password = f.read().rstrip()
 52 | 
 53 | exceptions = []
 54 | if (isfile('exceptions')):
 55 |     with open('exceptions','r') as f:
 56 |         exceptions = f.read().splitlines()                                                                                                                              
 57 | 
 58 | app = Celery(USERNAME, broker='amqp://%s:%s@%s/%s'%(USERNAME, rabbitmq_password, rabbitmq_server, USERNAME))
 59 | app.config_from_object(settings)
 60 | 
 61 | class ActionError(Exception):                                                                                                                                 
 62 |     pass
 63 | 
 64 | class LustreSource(object):
 65 | 
 66 |     def procDir(self, dir):
 67 |         cur_depth = len(dir.split('/'))
 68 | 
 69 |         if(dir != source_mount):
 70 |             self.fixDir(dir, join(target_mount,
 71 |                            relpath(dir, source_mount)))
 72 | 
 73 |         proc = subprocess.Popen([
 74 |             'lfs',
 75 |             'find',
 76 |             dir,
 77 |             '-maxdepth',
 78 |             '1',
 79 |             '-type',
 80 |             'd'],
 81 |             stdout=subprocess.PIPE, stderr=subprocess.PIPE)
 82 |         while True:
 83 |             line = proc.stdout.readline().rstrip()
 84 |             if line:
 85 |                 if line != dir:
 86 |                     procDir.delay(line)
 87 |             else:
 88 |                 break
 89 |         while True:
 90 |             line = proc.stderr.readline().rstrip()
 91 |             if line:
 92 |                 logger.error("Got error scanning %s folder: %s"%(dir, line))
 93 |             else:
 94 |                 break
 95 | 
 96 |     def fixDir(self, sourcedir, destdir):
 97 | 
 98 |         if not exists(sourcedir):
 99 |             return
100 | 
101 |         if not exists(destdir):
102 |             logger.error("Destdir not exist %s"%destdir)
103 | 
104 |         sstat = self.safestat(sourcedir)
105 |         dstat = self.safestat(destdir)
106 |         if(sstat.st_atime != dstat.st_atime or sstat.st_mtime != dstat.st_mtime):
107 |             os.utime(destdir, (sstat.st_atime, sstat.st_mtime))
108 | 
109 |     def safestat(self, filename, follow_symlink=False):
110 |         while True:
111 |             try:
112 |                 if(follow_symlink):
113 |                     return os.stat(filename)
114 |                 else:
115 |                     return os.lstat(filename)
116 |             except IOError, error:
117 |                 if error.errno != 4:
118 |                     raise
119 | 
120 | 
121 | 
122 | dataPlugin = LustreSource()
123 | 
124 | def get_mc_conn():
125 |     mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
126 |     return mc
127 | 
128 | def isProperDirection(path):
129 |     if not path.startswith(source_mount):
130 |         raise Exception("Wrong direction param, %s not starts with %s"%(path, source_mount)) 
131 |     if (not ismount(source_mount)):
132 |        logger.error("%s not mounted"%source_mount)
133 |        raise Exception("%s not mounted"%source_mount) 
134 |     if (not ismount(target_mount)):
135 |        logger.error("%s not mounted"%target_mount)
136 |        raise Exception("%s not mounted"%target_mount) 
137 | 
138 | @app.task(ignore_result=True)
139 | def procDir(dir):
140 |     isProperDirection(dir.rstrip())
141 |     if(not dir.startswith( tuple(exceptions) )):
142 |         dataPlugin.procDir(dir)
143 | 
144 |     if(STATS_ENABLED):
145 |         mc = get_mc_conn()
146 |         if(not mc.incr("%s.dirs"%get_worker_name()) ):
147 |             mc.set("%s.dirs"%get_worker_name(), "1")
148 |         mc.disconnect_all()
149 | 
150 | from cmover_dirtime import procDir
151 | 
152 | def get_worker_name():
153 |     return "cmover.%s_%s"%(current_process().initargs[1].split('@')[1],current_process().index)
154 | 
155 | 


--------------------------------------------------------------------------------
/config.py:
--------------------------------------------------------------------------------
 1 | #!/opt/python/bin/python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | # used as rabbitmq virtual host and rabbitmq username, reads password from rabbitmq/rabbitmq_<username>.conf file
 5 | USERNAME = 'data_move'
 6 | 
 7 | # filter files by atime. Files older than OLDEST_DATE won't be copied. To disable set to 0.
 8 | OLDEST_DATE = 0 #90 * 24 * 60 * 60
 9 | 
10 | # filter files by mtime. Files newer than NEWEST_DATE won't be copied. To disable set to 0.
11 | # Useful for initial passes: the files with recent mtime are likely to change soon. Should be set to 0 for end pass.
12 | NEWEST_DATE =  12*60*60 #24 * 60 * 60
13 | 
14 | # Minimal time between reports to memcached
15 | REPORT_INTERVAL = 30 # seconds
16 | 
17 | # Send stats to memcached
18 | STATS_ENABLED = True
19 | 
20 | # When having 2 MDS servers, we want to assign half of first-level folders to one of them and another half to another one
21 | MDS_IS_STRIPED = False
22 | 
23 | # Copy from
24 | source_mount = "/panda"
25 | 
26 | # Copy to
27 | target_mount = "/badger"
28 | 
29 | # percent of random files to delete
30 | percent_to_delete = 0
31 | 


--------------------------------------------------------------------------------
/exceptions:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sdsc/lustre-data-mover/30431337749302ebc8e09db256679a2de08316f1/exceptions


--------------------------------------------------------------------------------
/lustre/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sdsc/lustre-data-mover/30431337749302ebc8e09db256679a2de08316f1/lustre/__init__.py


--------------------------------------------------------------------------------
/memcache.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | 
   3 | """client module for memcached (memory cache daemon)
   4 | 
   5 | Overview
   6 | ========
   7 | 
   8 | See U{the MemCached homepage<http://www.danga.com/memcached>} for more
   9 | about memcached.
  10 | 
  11 | Usage summary
  12 | =============
  13 | 
  14 | This should give you a feel for how this module operates::
  15 | 
  16 |     import memcache
  17 |     mc = memcache.Client(['127.0.0.1:11211'], debug=0)
  18 | 
  19 |     mc.set("some_key", "Some value")
  20 |     value = mc.get("some_key")
  21 | 
  22 |     mc.set("another_key", 3)
  23 |     mc.delete("another_key")
  24 | 
  25 |     mc.set("key", "1") # note that the key used for incr/decr must be
  26 |                        # a string.
  27 |     mc.incr("key")
  28 |     mc.decr("key")
  29 | 
  30 | The standard way to use memcache with a database is like this:
  31 | 
  32 |     key = derive_key(obj)
  33 |     obj = mc.get(key)
  34 |     if not obj:
  35 |         obj = backend_api.get(...)
  36 |         mc.set(key, obj)
  37 | 
  38 |     # we now have obj, and future passes through this code
  39 |     # will use the object from the cache.
  40 | 
  41 | Detailed Documentation
  42 | ======================
  43 | 
  44 | More detailed documentation is available in the L{Client} class.
  45 | 
  46 | """
  47 | 
  48 | from __future__ import print_function
  49 | 
  50 | import binascii
  51 | import os
  52 | import pickle
  53 | import re
  54 | import socket
  55 | import sys
  56 | import threading
  57 | import time
  58 | import zlib
  59 | 
  60 | import six
  61 | 
  62 | 
  63 | def cmemcache_hash(key):
  64 |     return (
  65 |         (((binascii.crc32(key) & 0xffffffff)
  66 |           >> 16) & 0x7fff) or 1)
  67 | serverHashFunction = cmemcache_hash
  68 | 
  69 | 
  70 | def useOldServerHashFunction():
  71 |     """Use the old python-memcache server hash function."""
  72 |     global serverHashFunction
  73 |     serverHashFunction = binascii.crc32
  74 | 
  75 | from io import BytesIO
  76 | if six.PY2:
  77 |     try:
  78 |         unicode
  79 |     except NameError:
  80 |         _has_unicode = False
  81 |     else:
  82 |         _has_unicode = True
  83 | else:
  84 |     _has_unicode = True
  85 | 
  86 | _str_cls = six.string_types
  87 | 
  88 | valid_key_chars_re = re.compile(b'[\x21-\x7e\x80-\xff]+$')
  89 | 
  90 | 
  91 | #  Original author: Evan Martin of Danga Interactive
  92 | __author__ = "Sean Reifschneider <jafo-memcached@tummy.com>"
  93 | __version__ = "1.57"
  94 | __copyright__ = "Copyright (C) 2003 Danga Interactive"
  95 | #  http://en.wikipedia.org/wiki/Python_Software_Foundation_License
  96 | __license__ = "Python Software Foundation License"
  97 | 
  98 | SERVER_MAX_KEY_LENGTH = 250
  99 | # Storing values larger than 1MB requires starting memcached with -I <size> for
 100 | # memcached >= 1.4.2 or recompiling for < 1.4.2. If you do, this value can be
 101 | # changed by doing "memcache.SERVER_MAX_VALUE_LENGTH = N" after importing this
 102 | # module.
 103 | SERVER_MAX_VALUE_LENGTH = 1024 * 1024
 104 | 
 105 | 
 106 | class _Error(Exception):
 107 |     pass
 108 | 
 109 | 
 110 | class _ConnectionDeadError(Exception):
 111 |     pass
 112 | 
 113 | 
 114 | _DEAD_RETRY = 30  # number of seconds before retrying a dead server.
 115 | _SOCKET_TIMEOUT = 3  # number of seconds before sockets timeout.
 116 | 
 117 | 
 118 | class Client(threading.local):
 119 |     """Object representing a pool of memcache servers.
 120 | 
 121 |     See L{memcache} for an overview.
 122 | 
 123 |     In all cases where a key is used, the key can be either:
 124 |         1. A simple hashable type (string, integer, etc.).
 125 |         2. A tuple of C{(hashvalue, key)}.  This is useful if you want
 126 |         to avoid making this module calculate a hash value.  You may
 127 |         prefer, for example, to keep all of a given user's objects on
 128 |         the same memcache server, so you could use the user's unique
 129 |         id as the hash value.
 130 | 
 131 | 
 132 |     @group Setup: __init__, set_servers, forget_dead_hosts,
 133 |     disconnect_all, debuglog
 134 |     @group Insertion: set, add, replace, set_multi
 135 |     @group Retrieval: get, get_multi
 136 |     @group Integers: incr, decr
 137 |     @group Removal: delete, delete_multi
 138 |     @sort: __init__, set_servers, forget_dead_hosts, disconnect_all,
 139 |            debuglog,\ set, set_multi, add, replace, get, get_multi,
 140 |            incr, decr, delete, delete_multi
 141 |     """
 142 |     _FLAG_PICKLE = 1 << 0
 143 |     _FLAG_INTEGER = 1 << 1
 144 |     _FLAG_LONG = 1 << 2
 145 |     _FLAG_COMPRESSED = 1 << 3
 146 | 
 147 |     _SERVER_RETRIES = 10  # how many times to try finding a free server.
 148 | 
 149 |     # exceptions for Client
 150 |     class MemcachedKeyError(Exception):
 151 |         pass
 152 | 
 153 |     class MemcachedKeyLengthError(MemcachedKeyError):
 154 |         pass
 155 | 
 156 |     class MemcachedKeyCharacterError(MemcachedKeyError):
 157 |         pass
 158 | 
 159 |     class MemcachedKeyNoneError(MemcachedKeyError):
 160 |         pass
 161 | 
 162 |     class MemcachedKeyTypeError(MemcachedKeyError):
 163 |         pass
 164 | 
 165 |     class MemcachedStringEncodingError(Exception):
 166 |         pass
 167 | 
 168 |     def __init__(self, servers, debug=0, pickleProtocol=0,
 169 |                  pickler=pickle.Pickler, unpickler=pickle.Unpickler,
 170 |                  compressor=zlib.compress, decompressor=zlib.decompress,
 171 |                  pload=None, pid=None,
 172 |                  server_max_key_length=None, server_max_value_length=None,
 173 |                  dead_retry=_DEAD_RETRY, socket_timeout=_SOCKET_TIMEOUT,
 174 |                  cache_cas=False, flush_on_reconnect=0, check_keys=True):
 175 |         """Create a new Client object with the given list of servers.
 176 | 
 177 |         @param servers: C{servers} is passed to L{set_servers}.
 178 |         @param debug: whether to display error messages when a server
 179 |         can't be contacted.
 180 |         @param pickleProtocol: number to mandate protocol used by
 181 |         (c)Pickle.
 182 |         @param pickler: optional override of default Pickler to allow
 183 |         subclassing.
 184 |         @param unpickler: optional override of default Unpickler to
 185 |         allow subclassing.
 186 |         @param pload: optional persistent_load function to call on
 187 |         pickle loading.  Useful for cPickle since subclassing isn't
 188 |         allowed.
 189 |         @param pid: optional persistent_id function to call on pickle
 190 |         storing.  Useful for cPickle since subclassing isn't allowed.
 191 |         @param dead_retry: number of seconds before retrying a
 192 |         blacklisted server. Default to 30 s.
 193 |         @param socket_timeout: timeout in seconds for all calls to a
 194 |         server. Defaults to 3 seconds.
 195 |         @param cache_cas: (default False) If true, cas operations will
 196 |         be cached.  WARNING: This cache is not expired internally, if
 197 |         you have a long-running process you will need to expire it
 198 |         manually via client.reset_cas(), or the cache can grow
 199 |         unlimited.
 200 |         @param server_max_key_length: (default SERVER_MAX_KEY_LENGTH)
 201 |         Data that is larger than this will not be sent to the server.
 202 |         @param server_max_value_length: (default
 203 |         SERVER_MAX_VALUE_LENGTH) Data that is larger than this will
 204 |         not be sent to the server.
 205 |         @param flush_on_reconnect: optional flag which prevents a
 206 |         scenario that can cause stale data to be read: If there's more
 207 |         than one memcached server and the connection to one is
 208 |         interrupted, keys that mapped to that server will get
 209 |         reassigned to another. If the first server comes back, those
 210 |         keys will map to it again. If it still has its data, get()s
 211 |         can read stale data that was overwritten on another
 212 |         server. This flag is off by default for backwards
 213 |         compatibility.
 214 |         @param check_keys: (default True) If True, the key is checked
 215 |         to ensure it is the correct length and composed of the right
 216 |         characters.
 217 |         """
 218 |         super(Client, self).__init__()
 219 |         self.debug = debug
 220 |         self.dead_retry = dead_retry
 221 |         self.socket_timeout = socket_timeout
 222 |         self.flush_on_reconnect = flush_on_reconnect
 223 |         self.set_servers(servers)
 224 |         self.stats = {}
 225 |         self.cache_cas = cache_cas
 226 |         self.reset_cas()
 227 |         self.do_check_key = check_keys
 228 | 
 229 |         # Allow users to modify pickling/unpickling behavior
 230 |         self.pickleProtocol = pickleProtocol
 231 |         self.pickler = pickler
 232 |         self.unpickler = unpickler
 233 |         self.compressor = compressor
 234 |         self.decompressor = decompressor
 235 |         self.persistent_load = pload
 236 |         self.persistent_id = pid
 237 |         self.server_max_key_length = server_max_key_length
 238 |         if self.server_max_key_length is None:
 239 |             self.server_max_key_length = SERVER_MAX_KEY_LENGTH
 240 |         self.server_max_value_length = server_max_value_length
 241 |         if self.server_max_value_length is None:
 242 |             self.server_max_value_length = SERVER_MAX_VALUE_LENGTH
 243 | 
 244 |         #  figure out the pickler style
 245 |         file = BytesIO()
 246 |         try:
 247 |             pickler = self.pickler(file, protocol=self.pickleProtocol)
 248 |             self.picklerIsKeyword = True
 249 |         except TypeError:
 250 |             self.picklerIsKeyword = False
 251 | 
 252 |     def _encode_key(self, key):
 253 |         if isinstance(key, tuple):
 254 |             if isinstance(key[1], six.text_type):
 255 |                 return (key[0], key[1].encode('utf8'))
 256 |         elif isinstance(key, six.text_type):
 257 |             return key.encode('utf8')
 258 |         return key
 259 | 
 260 |     def _encode_cmd(self, cmd, key, headers, noreply, *args):
 261 |         cmd_bytes = cmd.encode() if six.PY3 else cmd
 262 |         fullcmd = [cmd_bytes, b' ', key]
 263 | 
 264 |         if headers:
 265 |             if six.PY3:
 266 |                 headers = headers.encode()
 267 |             fullcmd.append(b' ')
 268 |             fullcmd.append(headers)
 269 | 
 270 |         if noreply:
 271 |             fullcmd.append(b' noreply')
 272 | 
 273 |         if args:
 274 |             fullcmd.append(b' ')
 275 |             fullcmd.extend(args)
 276 |         return b''.join(fullcmd)
 277 | 
 278 |     def reset_cas(self):
 279 |         """Reset the cas cache.
 280 | 
 281 |         This is only used if the Client() object was created with
 282 |         "cache_cas=True".  If used, this cache does not expire
 283 |         internally, so it can grow unbounded if you do not clear it
 284 |         yourself.
 285 |         """
 286 |         self.cas_ids = {}
 287 | 
 288 |     def set_servers(self, servers):
 289 |         """Set the pool of servers used by this client.
 290 | 
 291 |         @param servers: an array of servers.
 292 |         Servers can be passed in two forms:
 293 |             1. Strings of the form C{"host:port"}, which implies a
 294 |             default weight of 1.
 295 |             2. Tuples of the form C{("host:port", weight)}, where
 296 |             C{weight} is an integer weight value.
 297 | 
 298 |         """
 299 |         self.servers = [_Host(s, self.debug, dead_retry=self.dead_retry,
 300 |                               socket_timeout=self.socket_timeout,
 301 |                               flush_on_reconnect=self.flush_on_reconnect)
 302 |                         for s in servers]
 303 |         self._init_buckets()
 304 | 
 305 |     def get_stats(self, stat_args=None):
 306 |         """Get statistics from each of the servers.
 307 | 
 308 |         @param stat_args: Additional arguments to pass to the memcache
 309 |             "stats" command.
 310 | 
 311 |         @return: A list of tuples ( server_identifier,
 312 |             stats_dictionary ).  The dictionary contains a number of
 313 |             name/value pairs specifying the name of the status field
 314 |             and the string value associated with it.  The values are
 315 |             not converted from strings.
 316 |         """
 317 |         data = []
 318 |         for s in self.servers:
 319 |             if not s.connect():
 320 |                 continue
 321 |             if s.family == socket.AF_INET:
 322 |                 name = '%s:%s (%s)' % (s.ip, s.port, s.weight)
 323 |             elif s.family == socket.AF_INET6:
 324 |                 name = '[%s]:%s (%s)' % (s.ip, s.port, s.weight)
 325 |             else:
 326 |                 name = 'unix:%s (%s)' % (s.address, s.weight)
 327 |             if not stat_args:
 328 |                 s.send_cmd('stats')
 329 |             else:
 330 |                 s.send_cmd('stats ' + stat_args)
 331 |             serverData = {}
 332 |             data.append((name, serverData))
 333 |             readline = s.readline
 334 |             while 1:
 335 |                 line = readline()
 336 |                 if not line or line.strip() == 'END':
 337 |                     break
 338 |                 stats = line.split(' ', 2)
 339 |                 serverData[stats[1]] = stats[2]
 340 | 
 341 |         return(data)
 342 | 
 343 |     def get_slabs(self):
 344 |         data = []
 345 |         for s in self.servers:
 346 |             if not s.connect():
 347 |                 continue
 348 |             if s.family == socket.AF_INET:
 349 |                 name = '%s:%s (%s)' % (s.ip, s.port, s.weight)
 350 |             elif s.family == socket.AF_INET6:
 351 |                 name = '[%s]:%s (%s)' % (s.ip, s.port, s.weight)
 352 |             else:
 353 |                 name = 'unix:%s (%s)' % (s.address, s.weight)
 354 |             serverData = {}
 355 |             data.append((name, serverData))
 356 |             s.send_cmd('stats items')
 357 |             readline = s.readline
 358 |             while 1:
 359 |                 line = readline()
 360 |                 if not line or line.strip() == 'END':
 361 |                     break
 362 |                 item = line.split(' ', 2)
 363 |                 # 0 = STAT, 1 = ITEM, 2 = Value
 364 |                 slab = item[1].split(':', 2)
 365 |                 # 0 = items, 1 = Slab #, 2 = Name
 366 |                 if slab[1] not in serverData:
 367 |                     serverData[slab[1]] = {}
 368 |                 serverData[slab[1]][slab[2]] = item[2]
 369 |         return data
 370 | 
 371 |     def flush_all(self):
 372 |         """Expire all data in memcache servers that are reachable."""
 373 |         for s in self.servers:
 374 |             if not s.connect():
 375 |                 continue
 376 |             s.flush()
 377 | 
 378 |     def debuglog(self, str):
 379 |         if self.debug:
 380 |             sys.stderr.write("MemCached: %s\n" % str)
 381 | 
 382 |     def _statlog(self, func):
 383 |         if func not in self.stats:
 384 |             self.stats[func] = 1
 385 |         else:
 386 |             self.stats[func] += 1
 387 | 
 388 |     def forget_dead_hosts(self):
 389 |         """Reset every host in the pool to an "alive" state."""
 390 |         for s in self.servers:
 391 |             s.deaduntil = 0
 392 | 
 393 |     def _init_buckets(self):
 394 |         self.buckets = []
 395 |         for server in self.servers:
 396 |             for i in range(server.weight):
 397 |                 self.buckets.append(server)
 398 | 
 399 |     def _get_server(self, key):
 400 |         if isinstance(key, tuple):
 401 |             serverhash, key = key
 402 |         else:
 403 |             serverhash = serverHashFunction(key)
 404 | 
 405 |         if not self.buckets:
 406 |             return None, None
 407 | 
 408 |         for i in range(Client._SERVER_RETRIES):
 409 |             server = self.buckets[serverhash % len(self.buckets)]
 410 |             if server.connect():
 411 |                 # print("(using server %s)" % server,)
 412 |                 return server, key
 413 |             serverhash = serverHashFunction(str(serverhash) + str(i))
 414 |         return None, None
 415 | 
 416 |     def disconnect_all(self):
 417 |         for s in self.servers:
 418 |             s.close_socket()
 419 | 
 420 |     def delete_multi(self, keys, time=0, key_prefix='', noreply=False):
 421 |         """Delete multiple keys in the memcache doing just one query.
 422 | 
 423 |         >>> notset_keys = mc.set_multi({'a1' : 'val1', 'a2' : 'val2'})
 424 |         >>> mc.get_multi(['a1', 'a2']) == {'a1' : 'val1','a2' : 'val2'}
 425 |         1
 426 |         >>> mc.delete_multi(['key1', 'key2'])
 427 |         1
 428 |         >>> mc.get_multi(['key1', 'key2']) == {}
 429 |         1
 430 | 
 431 |         This method is recommended over iterated regular L{delete}s as
 432 |         it reduces total latency, since your app doesn't have to wait
 433 |         for each round-trip of L{delete} before sending the next one.
 434 | 
 435 |         @param keys: An iterable of keys to clear
 436 |         @param time: number of seconds any subsequent set / update
 437 |         commands should fail. Defaults to 0 for no delay.
 438 |         @param key_prefix: Optional string to prepend to each key when
 439 |             sending to memcache.  See docs for L{get_multi} and
 440 |             L{set_multi}.
 441 |         @param noreply: optional parameter instructs the server to not send the
 442 |             reply.
 443 |         @return: 1 if no failure in communication with any memcacheds.
 444 |         @rtype: int
 445 |         """
 446 | 
 447 |         self._statlog('delete_multi')
 448 | 
 449 |         server_keys, prefixed_to_orig_key = self._map_and_prefix_keys(
 450 |             keys, key_prefix)
 451 | 
 452 |         # send out all requests on each server before reading anything
 453 |         dead_servers = []
 454 | 
 455 |         rc = 1
 456 |         for server in six.iterkeys(server_keys):
 457 |             bigcmd = []
 458 |             write = bigcmd.append
 459 |             extra = ' noreply' if noreply else ''
 460 |             if time is not None:
 461 |                 for key in server_keys[server]:  # These are mangled keys
 462 |                     write("delete %s %d%s\r\n" % (key, time, extra))
 463 |             else:
 464 |                 for key in server_keys[server]:  # These are mangled keys
 465 |                     write("delete %s%s\r\n" % (key, extra))
 466 |             try:
 467 |                 server.send_cmds(''.join(bigcmd))
 468 |             except socket.error as msg:
 469 |                 rc = 0
 470 |                 if isinstance(msg, tuple):
 471 |                     msg = msg[1]
 472 |                 server.mark_dead(msg)
 473 |                 dead_servers.append(server)
 474 | 
 475 |         # if noreply, just return
 476 |         if noreply:
 477 |             return rc
 478 | 
 479 |         # if any servers died on the way, don't expect them to respond.
 480 |         for server in dead_servers:
 481 |             del server_keys[server]
 482 | 
 483 |         for server, keys in six.iteritems(server_keys):
 484 |             try:
 485 |                 for key in keys:
 486 |                     server.expect("DELETED")
 487 |             except socket.error as msg:
 488 |                 if isinstance(msg, tuple):
 489 |                     msg = msg[1]
 490 |                 server.mark_dead(msg)
 491 |                 rc = 0
 492 |         return rc
 493 | 
 494 |     def delete(self, key, time=0, noreply=False):
 495 |         '''Deletes a key from the memcache.
 496 | 
 497 |         @return: Nonzero on success.
 498 |         @param time: number of seconds any subsequent set / update commands
 499 |         should fail. Defaults to None for no delay.
 500 |         @param noreply: optional parameter instructs the server to not send the
 501 |             reply.
 502 |         @rtype: int
 503 |         '''
 504 |         return self._deletetouch([b'DELETED', b'NOT_FOUND'], "delete", key,
 505 |                                  time, noreply)
 506 | 
 507 |     def touch(self, key, time=0, noreply=False):
 508 |         '''Updates the expiration time of a key in memcache.
 509 | 
 510 |         @return: Nonzero on success.
 511 |         @param time: Tells memcached the time which this value should
 512 |             expire, either as a delta number of seconds, or an absolute
 513 |             unix time-since-the-epoch value. See the memcached protocol
 514 |             docs section "Storage Commands" for more info on <exptime>. We
 515 |             default to 0 == cache forever.
 516 |         @param noreply: optional parameter instructs the server to not send the
 517 |             reply.
 518 |         @rtype: int
 519 |         '''
 520 |         return self._deletetouch([b'TOUCHED'], "touch", key, time, noreply)
 521 | 
 522 |     def _deletetouch(self, expected, cmd, key, time=0, noreply=False):
 523 |         key = self._encode_key(key)
 524 |         if self.do_check_key:
 525 |             self.check_key(key)
 526 |         server, key = self._get_server(key)
 527 |         if not server:
 528 |             return 0
 529 |         self._statlog(cmd)
 530 |         if time is not None and time != 0:
 531 |             fullcmd = self._encode_cmd(cmd, key, str(time), noreply)
 532 |         else:
 533 |             fullcmd = self._encode_cmd(cmd, key, None, noreply)
 534 | 
 535 |         try:
 536 |             server.send_cmd(fullcmd)
 537 |             if noreply:
 538 |                 return 1
 539 |             line = server.readline()
 540 |             if line and line.strip() in expected:
 541 |                 return 1
 542 |             self.debuglog('%s expected %s, got: %r'
 543 |                           % (cmd, ' or '.join(expected), line))
 544 |         except socket.error as msg:
 545 |             if isinstance(msg, tuple):
 546 |                 msg = msg[1]
 547 |             server.mark_dead(msg)
 548 |         return 0
 549 | 
 550 |     def incr(self, key, delta=1, noreply=False):
 551 |         """Increment value for C{key} by C{delta}
 552 | 
 553 |         Sends a command to the server to atomically increment the
 554 |         value for C{key} by C{delta}, or by 1 if C{delta} is
 555 |         unspecified.  Returns None if C{key} doesn't exist on server,
 556 |         otherwise it returns the new value after incrementing.
 557 | 
 558 |         Note that the value for C{key} must already exist in the
 559 |         memcache, and it must be the string representation of an
 560 |         integer.
 561 | 
 562 |         >>> mc.set("counter", "20")  # returns 1, indicating success
 563 |         1
 564 |         >>> mc.incr("counter")
 565 |         21
 566 |         >>> mc.incr("counter")
 567 |         22
 568 | 
 569 |         Overflow on server is not checked.  Be aware of values
 570 |         approaching 2**32.  See L{decr}.
 571 | 
 572 |         @param delta: Integer amount to increment by (should be zero
 573 |         or greater).
 574 | 
 575 |         @param noreply: optional parameter instructs the server to not send the
 576 |         reply.
 577 | 
 578 |         @return: New value after incrementing, no None for noreply or error.
 579 |         @rtype: int
 580 |         """
 581 |         return self._incrdecr("incr", key, delta, noreply)
 582 | 
 583 |     def decr(self, key, delta=1, noreply=False):
 584 |         """Decrement value for C{key} by C{delta}
 585 | 
 586 |         Like L{incr}, but decrements.  Unlike L{incr}, underflow is
 587 |         checked and new values are capped at 0.  If server value is 1,
 588 |         a decrement of 2 returns 0, not -1.
 589 | 
 590 |         @param delta: Integer amount to decrement by (should be zero
 591 |         or greater).
 592 | 
 593 |         @param noreply: optional parameter instructs the server to not send the
 594 |         reply.
 595 | 
 596 |         @return: New value after decrementing,  or None for noreply or error.
 597 |         @rtype: int
 598 |         """
 599 |         return self._incrdecr("decr", key, delta, noreply)
 600 | 
 601 |     def _incrdecr(self, cmd, key, delta, noreply=False):
 602 |         key = self._encode_key(key)
 603 |         if self.do_check_key:
 604 |             self.check_key(key)
 605 |         server, key = self._get_server(key)
 606 |         if not server:
 607 |             return None
 608 |         self._statlog(cmd)
 609 |         fullcmd = self._encode_cmd(cmd, key, str(delta), noreply)
 610 |         try:
 611 |             server.send_cmd(fullcmd)
 612 |             if noreply:
 613 |                 return
 614 |             line = server.readline()
 615 |             if line is None or line.strip() == b'NOT_FOUND':
 616 |                 return None
 617 |             return int(line)
 618 |         except socket.error as msg:
 619 |             if isinstance(msg, tuple):
 620 |                 msg = msg[1]
 621 |             server.mark_dead(msg)
 622 |             return None
 623 | 
 624 |     def add(self, key, val, time=0, min_compress_len=0, noreply=False):
 625 |         '''Add new key with value.
 626 | 
 627 |         Like L{set}, but only stores in memcache if the key doesn't
 628 |         already exist.
 629 | 
 630 |         @return: Nonzero on success.
 631 |         @rtype: int
 632 |         '''
 633 |         return self._set("add", key, val, time, min_compress_len, noreply)
 634 | 
 635 |     def append(self, key, val, time=0, min_compress_len=0, noreply=False):
 636 |         '''Append the value to the end of the existing key's value.
 637 | 
 638 |         Only stores in memcache if key already exists.
 639 |         Also see L{prepend}.
 640 | 
 641 |         @return: Nonzero on success.
 642 |         @rtype: int
 643 |         '''
 644 |         return self._set("append", key, val, time, min_compress_len, noreply)
 645 | 
 646 |     def prepend(self, key, val, time=0, min_compress_len=0, noreply=False):
 647 |         '''Prepend the value to the beginning of the existing key's value.
 648 | 
 649 |         Only stores in memcache if key already exists.
 650 |         Also see L{append}.
 651 | 
 652 |         @return: Nonzero on success.
 653 |         @rtype: int
 654 |         '''
 655 |         return self._set("prepend", key, val, time, min_compress_len, noreply)
 656 | 
 657 |     def replace(self, key, val, time=0, min_compress_len=0, noreply=False):
 658 |         '''Replace existing key with value.
 659 | 
 660 |         Like L{set}, but only stores in memcache if the key already exists.
 661 |         The opposite of L{add}.
 662 | 
 663 |         @return: Nonzero on success.
 664 |         @rtype: int
 665 |         '''
 666 |         return self._set("replace", key, val, time, min_compress_len, noreply)
 667 | 
 668 |     def set(self, key, val, time=0, min_compress_len=0, noreply=False):
 669 |         '''Unconditionally sets a key to a given value in the memcache.
 670 | 
 671 |         The C{key} can optionally be an tuple, with the first element
 672 |         being the server hash value and the second being the key.  If
 673 |         you want to avoid making this module calculate a hash value.
 674 |         You may prefer, for example, to keep all of a given user's
 675 |         objects on the same memcache server, so you could use the
 676 |         user's unique id as the hash value.
 677 | 
 678 |         @return: Nonzero on success.
 679 |         @rtype: int
 680 | 
 681 |         @param time: Tells memcached the time which this value should
 682 |         expire, either as a delta number of seconds, or an absolute
 683 |         unix time-since-the-epoch value. See the memcached protocol
 684 |         docs section "Storage Commands" for more info on <exptime>. We
 685 |         default to 0 == cache forever.
 686 | 
 687 |         @param min_compress_len: The threshold length to kick in
 688 |         auto-compression of the value using the compressor
 689 |         routine. If the value being cached is a string, then the
 690 |         length of the string is measured, else if the value is an
 691 |         object, then the length of the pickle result is measured. If
 692 |         the resulting attempt at compression yeilds a larger string
 693 |         than the input, then it is discarded. For backwards
 694 |         compatability, this parameter defaults to 0, indicating don't
 695 |         ever try to compress.
 696 | 
 697 |         @param noreply: optional parameter instructs the server to not
 698 |         send the reply.
 699 |         '''
 700 |         return self._set("set", key, val, time, min_compress_len, noreply)
 701 | 
 702 |     def cas(self, key, val, time=0, min_compress_len=0, noreply=False):
 703 |         '''Check and set (CAS)
 704 | 
 705 |         Sets a key to a given value in the memcache if it hasn't been
 706 |         altered since last fetched. (See L{gets}).
 707 | 
 708 |         The C{key} can optionally be an tuple, with the first element
 709 |         being the server hash value and the second being the key.  If
 710 |         you want to avoid making this module calculate a hash value.
 711 |         You may prefer, for example, to keep all of a given user's
 712 |         objects on the same memcache server, so you could use the
 713 |         user's unique id as the hash value.
 714 | 
 715 |         @return: Nonzero on success.
 716 |         @rtype: int
 717 | 
 718 |         @param time: Tells memcached the time which this value should
 719 |         expire, either as a delta number of seconds, or an absolute
 720 |         unix time-since-the-epoch value. See the memcached protocol
 721 |         docs section "Storage Commands" for more info on <exptime>. We
 722 |         default to 0 == cache forever.
 723 | 
 724 |         @param min_compress_len: The threshold length to kick in
 725 |         auto-compression of the value using the compressor
 726 |         routine. If the value being cached is a string, then the
 727 |         length of the string is measured, else if the value is an
 728 |         object, then the length of the pickle result is measured. If
 729 |         the resulting attempt at compression yeilds a larger string
 730 |         than the input, then it is discarded. For backwards
 731 |         compatability, this parameter defaults to 0, indicating don't
 732 |         ever try to compress.
 733 | 
 734 |         @param noreply: optional parameter instructs the server to not
 735 |         send the reply.
 736 |         '''
 737 |         return self._set("cas", key, val, time, min_compress_len, noreply)
 738 | 
 739 |     def _map_and_prefix_keys(self, key_iterable, key_prefix):
 740 |         """Compute the mapping of server (_Host instance) -> list of keys to
 741 |         stuff onto that server, as well as the mapping of prefixed key
 742 |         -> original key.
 743 |         """
 744 |         key_prefix = self._encode_key(key_prefix)
 745 |         # Check it just once ...
 746 |         key_extra_len = len(key_prefix)
 747 |         if key_prefix and self.do_check_key:
 748 |             self.check_key(key_prefix)
 749 | 
 750 |         # server (_Host) -> list of unprefixed server keys in mapping
 751 |         server_keys = {}
 752 | 
 753 |         prefixed_to_orig_key = {}
 754 |         # build up a list for each server of all the keys we want.
 755 |         for orig_key in key_iterable:
 756 |             if isinstance(orig_key, tuple):
 757 |                 # Tuple of hashvalue, key ala _get_server(). Caller is
 758 |                 # essentially telling us what server to stuff this on.
 759 |                 # Ensure call to _get_server gets a Tuple as well.
 760 |                 serverhash, key = orig_key
 761 | 
 762 |                 key = self._encode_key(key)
 763 |                 if not isinstance(key, six.binary_type):
 764 |                     # set_multi supports int / long keys.
 765 |                     key = str(key)
 766 |                     if six.PY3:
 767 |                         key = key.encode('utf8')
 768 |                 bytes_orig_key = key
 769 | 
 770 |                 # Gotta pre-mangle key before hashing to a
 771 |                 # server. Returns the mangled key.
 772 |                 server, key = self._get_server(
 773 |                     (serverhash, key_prefix + key))
 774 | 
 775 |                 orig_key = orig_key[1]
 776 |             else:
 777 |                 key = self._encode_key(orig_key)
 778 |                 if not isinstance(key, six.binary_type):
 779 |                     # set_multi supports int / long keys.
 780 |                     key = str(key)
 781 |                     if six.PY3:
 782 |                         key = key.encode('utf8')
 783 |                 bytes_orig_key = key
 784 |                 server, key = self._get_server(key_prefix + key)
 785 | 
 786 |             #  alert when passed in key is None
 787 |             if orig_key is None:
 788 |                 self.check_key(orig_key, key_extra_len=key_extra_len)
 789 | 
 790 |             # Now check to make sure key length is proper ...
 791 |             if self.do_check_key:
 792 |                 self.check_key(bytes_orig_key, key_extra_len=key_extra_len)
 793 | 
 794 |             if not server:
 795 |                 continue
 796 | 
 797 |             if server not in server_keys:
 798 |                 server_keys[server] = []
 799 |             server_keys[server].append(key)
 800 |             prefixed_to_orig_key[key] = orig_key
 801 | 
 802 |         return (server_keys, prefixed_to_orig_key)
 803 | 
 804 |     def set_multi(self, mapping, time=0, key_prefix='', min_compress_len=0,
 805 |                   noreply=False):
 806 |         '''Sets multiple keys in the memcache doing just one query.
 807 | 
 808 |         >>> notset_keys = mc.set_multi({'key1' : 'val1', 'key2' : 'val2'})
 809 |         >>> mc.get_multi(['key1', 'key2']) == {'key1' : 'val1',
 810 |         ...                                    'key2' : 'val2'}
 811 |         1
 812 | 
 813 | 
 814 |         This method is recommended over regular L{set} as it lowers
 815 |         the number of total packets flying around your network,
 816 |         reducing total latency, since your app doesn't have to wait
 817 |         for each round-trip of L{set} before sending the next one.
 818 | 
 819 |         @param mapping: A dict of key/value pairs to set.
 820 | 
 821 |         @param time: Tells memcached the time which this value should
 822 |             expire, either as a delta number of seconds, or an
 823 |             absolute unix time-since-the-epoch value. See the
 824 |             memcached protocol docs section "Storage Commands" for
 825 |             more info on <exptime>. We default to 0 == cache forever.
 826 | 
 827 |         @param key_prefix: Optional string to prepend to each key when
 828 |             sending to memcache. Allows you to efficiently stuff these
 829 |             keys into a pseudo-namespace in memcache:
 830 | 
 831 |             >>> notset_keys = mc.set_multi(
 832 |             ...     {'key1' : 'val1', 'key2' : 'val2'},
 833 |             ...     key_prefix='subspace_')
 834 |             >>> len(notset_keys) == 0
 835 |             True
 836 |             >>> mc.get_multi(['subspace_key1',
 837 |             ...               'subspace_key2']) == {'subspace_key1': 'val1',
 838 |             ...                                     'subspace_key2' : 'val2'}
 839 |             True
 840 | 
 841 |             Causes key 'subspace_key1' and 'subspace_key2' to be
 842 |             set. Useful in conjunction with a higher-level layer which
 843 |             applies namespaces to data in memcache.  In this case, the
 844 |             return result would be the list of notset original keys,
 845 |             prefix not applied.
 846 | 
 847 |         @param min_compress_len: The threshold length to kick in
 848 |             auto-compression of the value using the compressor
 849 |             routine. If the value being cached is a string, then the
 850 |             length of the string is measured, else if the value is an
 851 |             object, then the length of the pickle result is
 852 |             measured. If the resulting attempt at compression yeilds a
 853 |             larger string than the input, then it is discarded. For
 854 |             backwards compatability, this parameter defaults to 0,
 855 |             indicating don't ever try to compress.
 856 | 
 857 |         @param noreply: optional parameter instructs the server to not
 858 |             send the reply.
 859 | 
 860 |         @return: List of keys which failed to be stored [ memcache out
 861 |            of memory, etc. ].
 862 | 
 863 |         @rtype: list
 864 |         '''
 865 |         self._statlog('set_multi')
 866 | 
 867 |         server_keys, prefixed_to_orig_key = self._map_and_prefix_keys(
 868 |             six.iterkeys(mapping), key_prefix)
 869 | 
 870 |         # send out all requests on each server before reading anything
 871 |         dead_servers = []
 872 |         notstored = []  # original keys.
 873 | 
 874 |         for server in six.iterkeys(server_keys):
 875 |             bigcmd = []
 876 |             write = bigcmd.append
 877 |             try:
 878 |                 for key in server_keys[server]:  # These are mangled keys
 879 |                     store_info = self._val_to_store_info(
 880 |                         mapping[prefixed_to_orig_key[key]],
 881 |                         min_compress_len)
 882 |                     if store_info:
 883 |                         flags, len_val, val = store_info
 884 |                         headers = "%d %d %d" % (flags, time, len_val)
 885 |                         fullcmd = self._encode_cmd('set', key, headers,
 886 |                                                    noreply,
 887 |                                                    b'\r\n', val, b'\r\n')
 888 |                         write(fullcmd)
 889 |                     else:
 890 |                         notstored.append(prefixed_to_orig_key[key])
 891 |                 server.send_cmds(b''.join(bigcmd))
 892 |             except socket.error as msg:
 893 |                 if isinstance(msg, tuple):
 894 |                     msg = msg[1]
 895 |                 server.mark_dead(msg)
 896 |                 dead_servers.append(server)
 897 | 
 898 |         # if noreply, just return early
 899 |         if noreply:
 900 |             return notstored
 901 | 
 902 |         # if any servers died on the way, don't expect them to respond.
 903 |         for server in dead_servers:
 904 |             del server_keys[server]
 905 | 
 906 |         #  short-circuit if there are no servers, just return all keys
 907 |         if not server_keys:
 908 |             return(mapping.keys())
 909 | 
 910 |         for server, keys in six.iteritems(server_keys):
 911 |             try:
 912 |                 for key in keys:
 913 |                     if server.readline() == 'STORED':
 914 |                         continue
 915 |                     else:
 916 |                         # un-mangle.
 917 |                         notstored.append(prefixed_to_orig_key[key])
 918 |             except (_Error, socket.error) as msg:
 919 |                 if isinstance(msg, tuple):
 920 |                     msg = msg[1]
 921 |                 server.mark_dead(msg)
 922 |         return notstored
 923 | 
 924 |     def _val_to_store_info(self, val, min_compress_len):
 925 |         """Transform val to a storable representation.
 926 | 
 927 |         Returns a tuple of the flags, the length of the new value, and
 928 |         the new value itself.
 929 |         """
 930 |         flags = 0
 931 |         if isinstance(val, six.binary_type):
 932 |             pass
 933 |         elif isinstance(val, six.text_type):
 934 |             val = val.encode('utf-8')
 935 |         elif isinstance(val, int):
 936 |             flags |= Client._FLAG_INTEGER
 937 |             val = '%d' % val
 938 |             if six.PY3:
 939 |                 val = val.encode('ascii')
 940 |             # force no attempt to compress this silly string.
 941 |             min_compress_len = 0
 942 |         elif six.PY2 and isinstance(val, long):
 943 |             flags |= Client._FLAG_LONG
 944 |             val = str(val)
 945 |             if six.PY3:
 946 |                 val = val.encode('ascii')
 947 |             # force no attempt to compress this silly string.
 948 |             min_compress_len = 0
 949 |         else:
 950 |             flags |= Client._FLAG_PICKLE
 951 |             file = BytesIO()
 952 |             if self.picklerIsKeyword:
 953 |                 pickler = self.pickler(file, protocol=self.pickleProtocol)
 954 |             else:
 955 |                 pickler = self.pickler(file, self.pickleProtocol)
 956 |             if self.persistent_id:
 957 |                 pickler.persistent_id = self.persistent_id
 958 |             pickler.dump(val)
 959 |             val = file.getvalue()
 960 | 
 961 |         lv = len(val)
 962 |         # We should try to compress if min_compress_len > 0
 963 |         # and this string is longer than our min threshold.
 964 |         if min_compress_len and lv > min_compress_len:
 965 |             comp_val = self.compressor(val)
 966 |             # Only retain the result if the compression result is smaller
 967 |             # than the original.
 968 |             if len(comp_val) < lv:
 969 |                 flags |= Client._FLAG_COMPRESSED
 970 |                 val = comp_val
 971 | 
 972 |         #  silently do not store if value length exceeds maximum
 973 |         if (self.server_max_value_length != 0 and
 974 |                 len(val) > self.server_max_value_length):
 975 |             return(0)
 976 | 
 977 |         return (flags, len(val), val)
 978 | 
 979 |     def _set(self, cmd, key, val, time, min_compress_len=0, noreply=False):
 980 |         key = self._encode_key(key)
 981 |         if self.do_check_key:
 982 |             self.check_key(key)
 983 |         server, key = self._get_server(key)
 984 |         if not server:
 985 |             return 0
 986 | 
 987 |         def _unsafe_set():
 988 |             self._statlog(cmd)
 989 | 
 990 |             if cmd == 'cas' and key not in self.cas_ids:
 991 |                 return self._set('set', key, val, time, min_compress_len,
 992 |                                  noreply)
 993 | 
 994 |             store_info = self._val_to_store_info(val, min_compress_len)
 995 |             if not store_info:
 996 |                 return(0)
 997 |             flags, len_val, encoded_val = store_info
 998 | 
 999 |             if cmd == 'cas':
1000 |                 headers = ("%d %d %d %d"
1001 |                            % (flags, time, len_val, self.cas_ids[key]))
1002 |             else:
1003 |                 headers = "%d %d %d" % (flags, time, len_val)
1004 |             fullcmd = self._encode_cmd(cmd, key, headers, noreply,
1005 |                                        b'\r\n', encoded_val)
1006 | 
1007 |             try:
1008 |                 server.send_cmd(fullcmd)
1009 |                 if noreply:
1010 |                     return True
1011 |                 return(server.expect(b"STORED", raise_exception=True)
1012 |                        == b"STORED")
1013 |             except socket.error as msg:
1014 |                 if isinstance(msg, tuple):
1015 |                     msg = msg[1]
1016 |                 server.mark_dead(msg)
1017 |             return 0
1018 | 
1019 |         try:
1020 |             return _unsafe_set()
1021 |         except _ConnectionDeadError:
1022 |             # retry once
1023 |             try:
1024 |                 if server._get_socket():
1025 |                     return _unsafe_set()
1026 |             except (_ConnectionDeadError, socket.error) as msg:
1027 |                 server.mark_dead(msg)
1028 |             return 0
1029 | 
1030 |     def _get(self, cmd, key):
1031 |         key = self._encode_key(key)
1032 |         if self.do_check_key:
1033 |             self.check_key(key)
1034 |         server, key = self._get_server(key)
1035 |         if not server:
1036 |             return None
1037 | 
1038 |         def _unsafe_get():
1039 |             self._statlog(cmd)
1040 | 
1041 |             try:
1042 |                 cmd_bytes = cmd.encode() if six.PY3 else cmd
1043 |                 fullcmd = b''.join((cmd_bytes, b' ', key))
1044 |                 server.send_cmd(fullcmd)
1045 |                 rkey = flags = rlen = cas_id = None
1046 | 
1047 |                 if cmd == 'gets':
1048 |                     rkey, flags, rlen, cas_id, = self._expect_cas_value(
1049 |                         server, raise_exception=True
1050 |                     )
1051 |                     if rkey and self.cache_cas:
1052 |                         self.cas_ids[rkey] = cas_id
1053 |                 else:
1054 |                     rkey, flags, rlen, = self._expectvalue(
1055 |                         server, raise_exception=True
1056 |                     )
1057 | 
1058 |                 if not rkey:
1059 |                     return None
1060 |                 try:
1061 |                     value = self._recv_value(server, flags, rlen)
1062 |                 finally:
1063 |                     server.expect(b"END", raise_exception=True)
1064 |             except (_Error, socket.error) as msg:
1065 |                 if isinstance(msg, tuple):
1066 |                     msg = msg[1]
1067 |                 server.mark_dead(msg)
1068 |                 return None
1069 | 
1070 |             return value
1071 | 
1072 |         try:
1073 |             return _unsafe_get()
1074 |         except _ConnectionDeadError:
1075 |             # retry once
1076 |             try:
1077 |                 if server.connect():
1078 |                     return _unsafe_get()
1079 |                 return None
1080 |             except (_ConnectionDeadError, socket.error) as msg:
1081 |                 server.mark_dead(msg)
1082 |             return None
1083 | 
1084 |     def get(self, key):
1085 |         '''Retrieves a key from the memcache.
1086 | 
1087 |         @return: The value or None.
1088 |         '''
1089 |         return self._get('get', key)
1090 | 
1091 |     def gets(self, key):
1092 |         '''Retrieves a key from the memcache. Used in conjunction with 'cas'.
1093 | 
1094 |         @return: The value or None.
1095 |         '''
1096 |         return self._get('gets', key)
1097 | 
1098 |     def get_multi(self, keys, key_prefix=''):
1099 |         '''Retrieves multiple keys from the memcache doing just one query.
1100 | 
1101 |         >>> success = mc.set("foo", "bar")
1102 |         >>> success = mc.set("baz", 42)
1103 |         >>> mc.get_multi(["foo", "baz", "foobar"]) == {
1104 |         ...     "foo": "bar", "baz": 42
1105 |         ... }
1106 |         1
1107 |         >>> mc.set_multi({'k1' : 1, 'k2' : 2}, key_prefix='pfx_') == []
1108 |         1
1109 | 
1110 |         This looks up keys 'pfx_k1', 'pfx_k2', ... . Returned dict
1111 |         will just have unprefixed keys 'k1', 'k2'.
1112 | 
1113 |         >>> mc.get_multi(['k1', 'k2', 'nonexist'],
1114 |         ...              key_prefix='pfx_') == {'k1' : 1, 'k2' : 2}
1115 |         1
1116 | 
1117 |         get_mult [ and L{set_multi} ] can take str()-ables like ints /
1118 |         longs as keys too. Such as your db pri key fields.  They're
1119 |         rotored through str() before being passed off to memcache,
1120 |         with or without the use of a key_prefix.  In this mode, the
1121 |         key_prefix could be a table name, and the key itself a db
1122 |         primary key number.
1123 | 
1124 |         >>> mc.set_multi({42: 'douglass adams',
1125 |         ...               46: 'and 2 just ahead of me'},
1126 |         ...              key_prefix='numkeys_') == []
1127 |         1
1128 |         >>> mc.get_multi([46, 42], key_prefix='numkeys_') == {
1129 |         ...     42: 'douglass adams',
1130 |         ...     46: 'and 2 just ahead of me'
1131 |         ... }
1132 |         1
1133 | 
1134 |         This method is recommended over regular L{get} as it lowers
1135 |         the number of total packets flying around your network,
1136 |         reducing total latency, since your app doesn't have to wait
1137 |         for each round-trip of L{get} before sending the next one.
1138 | 
1139 |         See also L{set_multi}.
1140 | 
1141 |         @param keys: An array of keys.
1142 | 
1143 |         @param key_prefix: A string to prefix each key when we
1144 |         communicate with memcache.  Facilitates pseudo-namespaces
1145 |         within memcache. Returned dictionary keys will not have this
1146 |         prefix.
1147 | 
1148 |         @return: A dictionary of key/value pairs that were
1149 |         available. If key_prefix was provided, the keys in the retured
1150 |         dictionary will not have it present.
1151 |         '''
1152 | 
1153 |         self._statlog('get_multi')
1154 | 
1155 |         server_keys, prefixed_to_orig_key = self._map_and_prefix_keys(
1156 |             keys, key_prefix)
1157 | 
1158 |         # send out all requests on each server before reading anything
1159 |         dead_servers = []
1160 |         for server in six.iterkeys(server_keys):
1161 |             try:
1162 |                 fullcmd = b"get " + b" ".join(server_keys[server])
1163 |                 server.send_cmd(fullcmd)
1164 |             except socket.error as msg:
1165 |                 if isinstance(msg, tuple):
1166 |                     msg = msg[1]
1167 |                 server.mark_dead(msg)
1168 |                 dead_servers.append(server)
1169 | 
1170 |         # if any servers died on the way, don't expect them to respond.
1171 |         for server in dead_servers:
1172 |             del server_keys[server]
1173 | 
1174 |         retvals = {}
1175 |         for server in six.iterkeys(server_keys):
1176 |             try:
1177 |                 line = server.readline()
1178 |                 while line and line != b'END':
1179 |                     rkey, flags, rlen = self._expectvalue(server, line)
1180 |                     #  Bo Yang reports that this can sometimes be None
1181 |                     if rkey is not None:
1182 |                         val = self._recv_value(server, flags, rlen)
1183 |                         # un-prefix returned key.
1184 |                         retvals[prefixed_to_orig_key[rkey]] = val
1185 |                     line = server.readline()
1186 |             except (_Error, socket.error) as msg:
1187 |                 if isinstance(msg, tuple):
1188 |                     msg = msg[1]
1189 |                 server.mark_dead(msg)
1190 |         return retvals
1191 | 
1192 |     def _expect_cas_value(self, server, line=None, raise_exception=False):
1193 |         if not line:
1194 |             line = server.readline(raise_exception)
1195 | 
1196 |         if line and line[:5] == b'VALUE':
1197 |             resp, rkey, flags, len, cas_id = line.split()
1198 |             return (rkey, int(flags), int(len), int(cas_id))
1199 |         else:
1200 |             return (None, None, None, None)
1201 | 
1202 |     def _expectvalue(self, server, line=None, raise_exception=False):
1203 |         if not line:
1204 |             line = server.readline(raise_exception)
1205 | 
1206 |         if line and line[:5] == b'VALUE':
1207 |             resp, rkey, flags, len = line.split()
1208 |             flags = int(flags)
1209 |             rlen = int(len)
1210 |             return (rkey, flags, rlen)
1211 |         else:
1212 |             return (None, None, None)
1213 | 
1214 |     def _recv_value(self, server, flags, rlen):
1215 |         rlen += 2  # include \r\n
1216 |         buf = server.recv(rlen)
1217 |         if len(buf) != rlen:
1218 |             raise _Error("received %d bytes when expecting %d"
1219 |                          % (len(buf), rlen))
1220 | 
1221 |         if len(buf) == rlen:
1222 |             buf = buf[:-2]  # strip \r\n
1223 | 
1224 |         if flags & Client._FLAG_COMPRESSED:
1225 |             buf = self.decompressor(buf)
1226 |             flags &= ~Client._FLAG_COMPRESSED
1227 | 
1228 |         if flags == 0:
1229 |             # Bare string
1230 |             if six.PY3:
1231 |                 val = buf.decode('utf8')
1232 |             else:
1233 |                 val = buf
1234 |         elif flags & Client._FLAG_INTEGER:
1235 |             val = int(buf)
1236 |         elif flags & Client._FLAG_LONG:
1237 |             if six.PY3:
1238 |                 val = int(buf)
1239 |             else:
1240 |                 val = long(buf)
1241 |         elif flags & Client._FLAG_PICKLE:
1242 |             try:
1243 |                 file = BytesIO(buf)
1244 |                 unpickler = self.unpickler(file)
1245 |                 if self.persistent_load:
1246 |                     unpickler.persistent_load = self.persistent_load
1247 |                 val = unpickler.load()
1248 |             except Exception as e:
1249 |                 self.debuglog('Pickle error: %s\n' % e)
1250 |                 return None
1251 |         else:
1252 |             self.debuglog("unknown flags on get: %x\n" % flags)
1253 |             raise ValueError('Unknown flags on get: %x' % flags)
1254 | 
1255 |         return val
1256 | 
1257 |     def check_key(self, key, key_extra_len=0):
1258 |         """Checks sanity of key.
1259 | 
1260 |             Fails if:
1261 | 
1262 |             Key length is > SERVER_MAX_KEY_LENGTH (Raises MemcachedKeyLength).
1263 |             Contains control characters  (Raises MemcachedKeyCharacterError).
1264 |             Is not a string (Raises MemcachedStringEncodingError)
1265 |             Is an unicode string (Raises MemcachedStringEncodingError)
1266 |             Is not a string (Raises MemcachedKeyError)
1267 |             Is None (Raises MemcachedKeyError)
1268 |         """
1269 |         if isinstance(key, tuple):
1270 |             key = key[1]
1271 |         if key is None:
1272 |             raise Client.MemcachedKeyNoneError("Key is None")
1273 |         if key is '':
1274 |             if key_extra_len is 0:
1275 |                 raise Client.MemcachedKeyNoneError("Key is empty")
1276 | 
1277 |             #  key is empty but there is some other component to key
1278 |             return
1279 | 
1280 |         if not isinstance(key, six.binary_type):
1281 |             raise Client.MemcachedKeyTypeError("Key must be a binary string")
1282 | 
1283 |         if (self.server_max_key_length != 0 and
1284 |                 len(key) + key_extra_len > self.server_max_key_length):
1285 |             raise Client.MemcachedKeyLengthError(
1286 |                 "Key length is > %s" % self.server_max_key_length
1287 |             )
1288 |         if not valid_key_chars_re.match(key):
1289 |             raise Client.MemcachedKeyCharacterError(
1290 |                 "Control/space characters not allowed (key=%r)" % key)
1291 | 
1292 | 
1293 | class _Host(object):
1294 | 
1295 |     def __init__(self, host, debug=0, dead_retry=_DEAD_RETRY,
1296 |                  socket_timeout=_SOCKET_TIMEOUT, flush_on_reconnect=0):
1297 |         self.dead_retry = dead_retry
1298 |         self.socket_timeout = socket_timeout
1299 |         self.debug = debug
1300 |         self.flush_on_reconnect = flush_on_reconnect
1301 |         if isinstance(host, tuple):
1302 |             host, self.weight = host
1303 |         else:
1304 |             self.weight = 1
1305 | 
1306 |         #  parse the connection string
1307 |         m = re.match(r'^(?P<proto>unix):(?P<path>.*)$', host)
1308 |         if not m:
1309 |             m = re.match(r'^(?P<proto>inet6):'
1310 |                          r'\[(?P<host>[^\[\]]+)\](:(?P<port>[0-9]+))?$', host)
1311 |         if not m:
1312 |             m = re.match(r'^(?P<proto>inet):'
1313 |                          r'(?P<host>[^:]+)(:(?P<port>[0-9]+))?$', host)
1314 |         if not m:
1315 |             m = re.match(r'^(?P<host>[^:]+)(:(?P<port>[0-9]+))?$', host)
1316 |         if not m:
1317 |             raise ValueError('Unable to parse connection string: "%s"' % host)
1318 | 
1319 |         hostData = m.groupdict()
1320 |         if hostData.get('proto') == 'unix':
1321 |             self.family = socket.AF_UNIX
1322 |             self.address = hostData['path']
1323 |         elif hostData.get('proto') == 'inet6':
1324 |             self.family = socket.AF_INET6
1325 |             self.ip = hostData['host']
1326 |             self.port = int(hostData.get('port') or 11211)
1327 |             self.address = (self.ip, self.port)
1328 |         else:
1329 |             self.family = socket.AF_INET
1330 |             self.ip = hostData['host']
1331 |             self.port = int(hostData.get('port') or 11211)
1332 |             self.address = (self.ip, self.port)
1333 | 
1334 |         self.deaduntil = 0
1335 |         self.socket = None
1336 |         self.flush_on_next_connect = 0
1337 | 
1338 |         self.buffer = b''
1339 | 
1340 |     def debuglog(self, str):
1341 |         if self.debug:
1342 |             sys.stderr.write("MemCached: %s\n" % str)
1343 | 
1344 |     def _check_dead(self):
1345 |         if self.deaduntil and self.deaduntil > time.time():
1346 |             return 1
1347 |         self.deaduntil = 0
1348 |         return 0
1349 | 
1350 |     def connect(self):
1351 |         if self._get_socket():
1352 |             return 1
1353 |         return 0
1354 | 
1355 |     def mark_dead(self, reason):
1356 |         self.debuglog("MemCache: %s: %s.  Marking dead." % (self, reason))
1357 |         self.deaduntil = time.time() + self.dead_retry
1358 |         if self.flush_on_reconnect:
1359 |             self.flush_on_next_connect = 1
1360 |         self.close_socket()
1361 | 
1362 |     def _get_socket(self):
1363 |         if self._check_dead():
1364 |             return None
1365 |         if self.socket:
1366 |             return self.socket
1367 |         s = socket.socket(self.family, socket.SOCK_STREAM)
1368 |         if hasattr(s, 'settimeout'):
1369 |             s.settimeout(self.socket_timeout)
1370 |         try:
1371 |             s.connect(self.address)
1372 |         except socket.timeout as msg:
1373 |             self.mark_dead("connect: %s" % msg)
1374 |             return None
1375 |         except socket.error as msg:
1376 |             if isinstance(msg, tuple):
1377 |                 msg = msg[1]
1378 |             self.mark_dead("connect: %s" % msg)
1379 |             return None
1380 |         self.socket = s
1381 |         self.buffer = b''
1382 |         if self.flush_on_next_connect:
1383 |             self.flush()
1384 |             self.flush_on_next_connect = 0
1385 |         return s
1386 | 
1387 |     def close_socket(self):
1388 |         if self.socket:
1389 |             self.socket.close()
1390 |             self.socket = None
1391 | 
1392 |     def send_cmd(self, cmd):
1393 |         if isinstance(cmd, six.text_type):
1394 |             cmd = cmd.encode('utf8')
1395 |         self.socket.sendall(cmd + b'\r\n')
1396 | 
1397 |     def send_cmds(self, cmds):
1398 |         """cmds already has trailing \r\n's applied."""
1399 |         if isinstance(cmds, six.text_type):
1400 |             cmds = cmds.encode('utf8')
1401 |         self.socket.sendall(cmds)
1402 | 
1403 |     def readline(self, raise_exception=False):
1404 |         """Read a line and return it.
1405 | 
1406 |         If "raise_exception" is set, raise _ConnectionDeadError if the
1407 |         read fails, otherwise return an empty string.
1408 |         """
1409 |         buf = self.buffer
1410 |         if self.socket:
1411 |             recv = self.socket.recv
1412 |         else:
1413 |             recv = lambda bufsize: b''
1414 | 
1415 |         while True:
1416 |             index = buf.find(b'\r\n')
1417 |             if index >= 0:
1418 |                 break
1419 |             data = recv(4096)
1420 |             if not data:
1421 |                 # connection close, let's kill it and raise
1422 |                 self.mark_dead('connection closed in readline()')
1423 |                 if raise_exception:
1424 |                     raise _ConnectionDeadError()
1425 |                 else:
1426 |                     return ''
1427 | 
1428 |             buf += data
1429 |         self.buffer = buf[index + 2:]
1430 |         return buf[:index]
1431 | 
1432 |     def expect(self, text, raise_exception=False):
1433 |         line = self.readline(raise_exception)
1434 |         if self.debug and line != text:
1435 |             if six.PY3:
1436 |                 text = text.decode('utf8')
1437 |                 line = line.decode('utf8', 'replace')
1438 |             self.debuglog("while expecting %r, got unexpected response %r"
1439 |                           % (text, line))
1440 |         return line
1441 | 
1442 |     def recv(self, rlen):
1443 |         self_socket_recv = self.socket.recv
1444 |         buf = self.buffer
1445 |         while len(buf) < rlen:
1446 |             foo = self_socket_recv(max(rlen - len(buf), 4096))
1447 |             buf += foo
1448 |             if not foo:
1449 |                 raise _Error('Read %d bytes, expecting %d, '
1450 |                              'read returned 0 length bytes' % (len(buf), rlen))
1451 |         self.buffer = buf[rlen:]
1452 |         return buf[:rlen]
1453 | 
1454 |     def flush(self):
1455 |         self.send_cmd('flush_all')
1456 |         self.expect(b'OK')
1457 | 
1458 |     def __str__(self):
1459 |         d = ''
1460 |         if self.deaduntil:
1461 |             d = " (dead until %d)" % self.deaduntil
1462 | 
1463 |         if self.family == socket.AF_INET:
1464 |             return "inet:%s:%d%s" % (self.address[0], self.address[1], d)
1465 |         elif self.family == socket.AF_INET6:
1466 |             return "inet6:[%s]:%d%s" % (self.address[0], self.address[1], d)
1467 |         else:
1468 |             return "unix:%s%s" % (self.address, d)
1469 | 
1470 | 
1471 | def _doctest():
1472 |     import doctest
1473 |     import memcache
1474 |     servers = ["127.0.0.1:11211"]
1475 |     mc = Client(servers, debug=1)
1476 |     globs = {"mc": mc}
1477 |     return doctest.testmod(memcache, globs=globs)
1478 | 
1479 | if __name__ == "__main__":
1480 |     failures = 0
1481 |     print("Testing docstrings...")
1482 |     _doctest()
1483 |     print("Running tests:")
1484 |     print()
1485 |     serverList = [["127.0.0.1:11211"]]
1486 |     if '--do-unix' in sys.argv:
1487 |         serverList.append([os.path.join(os.getcwd(), 'memcached.socket')])
1488 | 
1489 |     for servers in serverList:
1490 |         mc = Client(servers, debug=1)
1491 | 
1492 |         def to_s(val):
1493 |             if not isinstance(val, _str_cls):
1494 |                 return "%s (%s)" % (val, type(val))
1495 |             return "%s" % val
1496 | 
1497 |         def test_setget(key, val, noreply=False):
1498 |             global failures
1499 |             print("Testing set/get (noreply=%s) {'%s': %s} ..."
1500 |                   % (noreply, to_s(key), to_s(val)), end=" ")
1501 |             mc.set(key, val, noreply=noreply)
1502 |             newval = mc.get(key)
1503 |             if newval == val:
1504 |                 print("OK")
1505 |                 return 1
1506 |             else:
1507 |                 print("FAIL")
1508 |                 failures += 1
1509 |                 return 0
1510 | 
1511 |         class FooStruct(object):
1512 | 
1513 |             def __init__(self):
1514 |                 self.bar = "baz"
1515 | 
1516 |             def __str__(self):
1517 |                 return "A FooStruct"
1518 | 
1519 |             def __eq__(self, other):
1520 |                 if isinstance(other, FooStruct):
1521 |                     return self.bar == other.bar
1522 |                 return 0
1523 | 
1524 |         test_setget("a_string", "some random string")
1525 |         test_setget("a_string_2", "some random string", noreply=True)
1526 |         test_setget("an_integer", 42)
1527 |         test_setget("an_integer_2", 42, noreply=True)
1528 |         if six.PY3:
1529 |             ok = test_setget("long", 1 << 30)
1530 |         else:
1531 |             ok = test_setget("long", long(1 << 30))
1532 |         if ok:
1533 |             print("Testing delete ...", end=" ")
1534 |             if mc.delete("long"):
1535 |                 print("OK")
1536 |             else:
1537 |                 print("FAIL")
1538 |                 failures += 1
1539 |             print("Checking results of delete ...", end=" ")
1540 |             if mc.get("long") is None:
1541 |                 print("OK")
1542 |             else:
1543 |                 print("FAIL")
1544 |                 failures += 1
1545 |         print("Testing get_multi ...",)
1546 |         print(mc.get_multi(["a_string", "an_integer", "a_string_2",
1547 |                             "an_integer_2"]))
1548 | 
1549 |         #  removed from the protocol
1550 |         # if test_setget("timed_delete", 'foo'):
1551 |         #     print "Testing timed delete ...",
1552 |         #     if mc.delete("timed_delete", 1):
1553 |         #         print("OK")
1554 |         #     else:
1555 |         #         print("FAIL")
1556 |         #         failures += 1
1557 |         #     print "Checking results of timed delete ..."
1558 |         #     if mc.get("timed_delete") is None:
1559 |         #         print("OK")
1560 |         #     else:
1561 |         #         print("FAIL")
1562 |         #         failures += 1
1563 | 
1564 |         print("Testing get(unknown value) ...", end=" ")
1565 |         print(to_s(mc.get("unknown_value")))
1566 | 
1567 |         f = FooStruct()
1568 |         test_setget("foostruct", f)
1569 |         test_setget("foostruct_2", f, noreply=True)
1570 | 
1571 |         print("Testing incr ...", end=" ")
1572 |         x = mc.incr("an_integer", 1)
1573 |         if x == 43:
1574 |             print("OK")
1575 |         else:
1576 |             print("FAIL")
1577 |             failures += 1
1578 | 
1579 |         print("Testing incr (noreply=True) ...", end=" ")
1580 |         mc.incr("an_integer_2", 1, noreply=True)
1581 |         x = mc.get("an_integer_2")
1582 |         if x == 43:
1583 |             print("OK")
1584 |         else:
1585 |             print("FAIL")
1586 |             failures += 1
1587 | 
1588 |         print("Testing decr ...", end=" ")
1589 |         x = mc.decr("an_integer", 1)
1590 |         if x == 42:
1591 |             print("OK")
1592 |         else:
1593 |             print("FAIL")
1594 |             failures += 1
1595 |         sys.stdout.flush()
1596 | 
1597 |         print("Testing decr (noreply=True) ...", end=" ")
1598 |         mc.decr("an_integer_2", 1, noreply=True)
1599 |         x = mc.get("an_integer_2")
1600 |         if x == 42:
1601 |             print("OK")
1602 |         else:
1603 |             print("FAIL")
1604 |             failures += 1
1605 |         sys.stdout.flush()
1606 | 
1607 |         # sanity tests
1608 |         print("Testing sending spaces...", end=" ")
1609 |         sys.stdout.flush()
1610 |         try:
1611 |             x = mc.set("this has spaces", 1)
1612 |         except Client.MemcachedKeyCharacterError as msg:
1613 |             print("OK")
1614 |         else:
1615 |             print("FAIL")
1616 |             failures += 1
1617 | 
1618 |         print("Testing sending control characters...", end=" ")
1619 |         try:
1620 |             x = mc.set("this\x10has\x11control characters\x02", 1)
1621 |         except Client.MemcachedKeyCharacterError as msg:
1622 |             print("OK")
1623 |         else:
1624 |             print("FAIL")
1625 |             failures += 1
1626 | 
1627 |         print("Testing using insanely long key...", end=" ")
1628 |         try:
1629 |             x = mc.set('a'*SERVER_MAX_KEY_LENGTH, 1)
1630 |             x = mc.set('a'*SERVER_MAX_KEY_LENGTH, 1, noreply=True)
1631 |         except Client.MemcachedKeyLengthError as msg:
1632 |             print("FAIL")
1633 |             failures += 1
1634 |         else:
1635 |             print("OK")
1636 |         try:
1637 |             x = mc.set('a'*SERVER_MAX_KEY_LENGTH + 'a', 1)
1638 |         except Client.MemcachedKeyLengthError as msg:
1639 |             print("OK")
1640 |         else:
1641 |             print("FAIL")
1642 |             failures += 1
1643 | 
1644 |         print("Testing sending a unicode-string key...", end=" ")
1645 |         try:
1646 |             x = mc.set(unicode('keyhere'), 1)
1647 |         except Client.MemcachedStringEncodingError as msg:
1648 |             print("OK", end=" ")
1649 |         else:
1650 |             print("FAIL", end=" ")
1651 |             failures += 1
1652 |         try:
1653 |             x = mc.set((unicode('a')*SERVER_MAX_KEY_LENGTH).encode('utf-8'), 1)
1654 |         except Client.MemcachedKeyError:
1655 |             print("FAIL", end=" ")
1656 |             failures += 1
1657 |         else:
1658 |             print("OK", end=" ")
1659 |         s = pickle.loads('V\\u4f1a\np0\n.')
1660 |         try:
1661 |             x = mc.set((s * SERVER_MAX_KEY_LENGTH).encode('utf-8'), 1)
1662 |         except Client.MemcachedKeyLengthError:
1663 |             print("OK")
1664 |         else:
1665 |             print("FAIL")
1666 |             failures += 1
1667 | 
1668 |         print("Testing using a value larger than the memcached value limit...")
1669 |         print('NOTE: "MemCached: while expecting[...]" is normal...')
1670 |         x = mc.set('keyhere', 'a'*SERVER_MAX_VALUE_LENGTH)
1671 |         if mc.get('keyhere') is None:
1672 |             print("OK", end=" ")
1673 |         else:
1674 |             print("FAIL", end=" ")
1675 |             failures += 1
1676 |         x = mc.set('keyhere', 'a'*SERVER_MAX_VALUE_LENGTH + 'aaa')
1677 |         if mc.get('keyhere') is None:
1678 |             print("OK")
1679 |         else:
1680 |             print("FAIL")
1681 |             failures += 1
1682 | 
1683 |         print("Testing set_multi() with no memcacheds running", end=" ")
1684 |         mc.disconnect_all()
1685 |         errors = mc.set_multi({'keyhere': 'a', 'keythere': 'b'})
1686 |         if errors != []:
1687 |             print("FAIL")
1688 |             failures += 1
1689 |         else:
1690 |             print("OK")
1691 | 
1692 |         print("Testing delete_multi() with no memcacheds running", end=" ")
1693 |         mc.disconnect_all()
1694 |         ret = mc.delete_multi({'keyhere': 'a', 'keythere': 'b'})
1695 |         if ret != 1:
1696 |             print("FAIL")
1697 |             failures += 1
1698 |         else:
1699 |             print("OK")
1700 | 
1701 |     if failures > 0:
1702 |         print('*** THERE WERE FAILED TESTS')
1703 |         sys.exit(1)
1704 |     sys.exit(0)
1705 | 
1706 | 
1707 | # vim: ts=4 sw=4 et :
1708 | 


--------------------------------------------------------------------------------
/rabbitmq_init.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | admin_pass=$(< /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c26)
 3 | /usr/sbin/rabbitmqctl add_user data_move ${admin_pass}
 4 | /usr/sbin/rabbitmqctl add_vhost data_move
 5 | /usr/sbin/rabbitmqctl set_permissions -p data_move data_move ".*" ".*" ".*"
 6 | /usr/sbin/rabbitmqctl set_permissions -p data_move admin ".*" ".*" ".*"
 7 | mkdir -p rabbitmq
 8 | echo "${admin_pass}" > rabbitmq/rabbitmq_data_move.conf
 9 | chmod 400 rabbitmq/rabbitmq_data_move.conf
10 | 
11 | hostname > rabbitmq/rabbitmq.conf
12 | 


--------------------------------------------------------------------------------
/run.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | DIRNAME=`dirname $0`
 4 | pushd $DIRNAME > /dev/null 2>&1 
 5 | 
 6 | export C_FORCE_ROOT=1 # allow serializing jobs to pickles; json serializer gives errors for filenames in weird encodings
 7 | 
 8 | worker=`hostname | sed s/\.local//g | sed s/-/_/g`
 9 | 
10 | 
11 | if [[ $1 = "try_f" ]]; then
12 |     celery -A cmover -n "file_$worker" -c 1 -Q mover.files worker --pidfile=/var/run/celeryf_%n.pid -l debug
13 | elif [[ $1 = "try_d" ]]; then
14 |     celery -A cmover -n "dir_$worker" -c 1 -Q mover.dir worker --pidfile=/var/run/celeryd_%n.pid -l debug
15 | else
16 |     rm -f /var/log/cmover.lo*
17 |     celery -A cmover -n "dir_$worker" -c 16 -Q mover.dir worker --detach --pidfile=/var/run/celeryd_%n.pid -l info
18 |     celery -A cmover -n "file_$worker" -c 16 -Q mover.files worker --detach --pidfile=/var/run/celeryf_%n.pid -l info
19 | fi
20 | 
21 | 
22 | popd > /dev/null 2>&1
23 | 


--------------------------------------------------------------------------------
/run_del.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | DIRNAME=`dirname $0`
 4 | pushd $DIRNAME > /dev/null 2>&1 
 5 | 
 6 | export C_FORCE_ROOT=1 # allow serializing jobs to pickles; json serializer gives errors for filenames in weird encodings
 7 | 
 8 | worker=`hostname | sed s/\.local//g | sed s/-/_/g`
 9 | 
10 | 
11 | if [[ $1 = "try_f" ]]; then
12 |     celery -A cmover_del -n "del_file_"$worker -c 1 -Q "mover"$PREFIX"_del.files" worker --pidfile=/var/run/celeryfdel_%n.pid -l debug
13 | elif [[ $1 = "try_d" ]]; then
14 |     celery -A cmover_del -n "del_dir_"$worker -c 1 -Q "mover"$PREFIX"_del.dir" worker --pidfile=/var/run/celeryddel_%n.pid -l debug
15 | else
16 |     rm -f /var/log/cmover_del.lo*
17 |     celery -A cmover_del -n "del_dir_"$worker -c 16 -Q mover_del.dir worker --detach --pidfile=/var/run/celeryddel_%n.pid
18 |     celery -A cmover_del -n "del_file_"$worker -c 16 -Q mover_del.files worker --detach --pidfile=/var/run/celeryfdel_%n.pid
19 | fi
20 | 
21 | popd > /dev/null 2>&1
22 | 


--------------------------------------------------------------------------------
/run_dirtime.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | DIRNAME=`dirname $0`
 4 | pushd $DIRNAME > /dev/null 2>&1 
 5 | 
 6 | export C_FORCE_ROOT=1 # allow serializing jobs to pickles; json serializer gives errors for filenames in weird encodings
 7 | 
 8 | worker=`hostname | sed s/\.local//g | sed s/-/_/g`
 9 | 
10 | 
11 | if [[ $1 = "try" ]]; then
12 |     celery -A cmover_dirtime -n "dirtime_dir_$worker" -c 1 -Q mover_dirtime.dir worker --pidfile=/var/run/celeryddir_%n.pid -l debug
13 | else
14 |     rm -f /var/log/cmover_dir.lo*
15 |     celery -A cmover_dirtime -n "dirtime_dir_$worker" -c 8 -Q mover_dirtime.dir worker --detach --pidfile=/var/run/celeryddir_%n.pid
16 | fi
17 | 
18 | popd > /dev/null 2>&1
19 | 


--------------------------------------------------------------------------------
/send_graphite.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import socket
 4 | import time
 5 | import memcache
 6 | import os.path
 7 | 
 8 | cur_dir = os.path.dirname(os.path.realpath(__file__))
 9 | with open('%s/rabbitmq/rabbitmq.conf'%cur_dir,'r') as f:
10 |     rabbitmq_server = f.read().rstrip()
11 | 
12 | mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
13 | 
14 | CARBON_SERVER = 'graphite.sdsc.edu'
15 | CARBON_PORT = 2003
16 | 
17 | sock = socket.socket()
18 | sock.connect((CARBON_SERVER, CARBON_PORT))
19 | 
20 | start_time = int(time.time())-15
21 | 
22 | for node in ["mover_7_1", "mover_7_2", "mover_7_3", "mover_7_4"]: # list of hostnames running workers
23 |     for j in range(0,17): # max number of workers per node + 1
24 |             for type in ["files", "dirs", "data"]:
25 |                 worker_type = 'file' if type in ['files', 'data'] else 'dir'
26 |                 worker_name = "cmover.%s_%s_%s.%s"%(worker_type, node, j, type)
27 |                 res = mc.get(worker_name)
28 |                 if(res):
29 |                     mc.delete(worker_name)
30 |                     message = 'system.hpc.datamove.%s %s %d\n' %(worker_name, res, start_time) # set your own prefix
31 |                     sock.sendall(message) # comment to test
32 |                     #print "%s"%message # uncomment to test
33 | 
34 |             for type in ["files", "dirs"]:
35 |                 worker_type = 'del_file' if type in ['files', 'data'] else 'del_dir'
36 |                 worker_name = "cmover.%s_%s_%s.%s"%(worker_type, node, j, type)
37 |                 res = mc.get(worker_name)
38 |                 if(res):
39 |                     mc.delete(worker_name)
40 |                     message = 'system.hpc.datamove.%s %s %d\n' %(worker_name, res, start_time) # set your own prefix
41 |                     sock.sendall(message) # comment to test
42 |                     #print "%s"%message # uncomment to test
43 | sock.close()
44 | 


--------------------------------------------------------------------------------
/send_queue.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import socket
 4 | import time
 5 | import subprocess
 6 | import memcache
 7 | import os.path
 8 | 
 9 | cur_dir = os.path.dirname(os.path.realpath(__file__))
10 | with open('%s/rabbitmq/rabbitmq.conf'%cur_dir,'r') as f:
11 |     rabbitmq_server = f.read().rstrip()
12 | 
13 | mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
14 | 
15 | rabbit_stats = subprocess.Popen(["/usr/sbin/rabbitmqctl", "list_queues", "-p", "data_move"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
16 | out, err = rabbit_stats.communicate()
17 | for stat in out.splitlines():
18 |     if(stat.startswith("mover")):
19 |         (key, value) = stat.split()
20 |         mc.set(key, value)
21 |         #print ("%s %s"%(key,value))
22 | 


--------------------------------------------------------------------------------
/send_queue_graphite.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import socket
 4 | import time
 5 | import subprocess
 6 | import memcache
 7 | import os.path
 8 | 
 9 | cur_dir = os.path.dirname(os.path.realpath(__file__))
10 | with open('%s/rabbitmq/rabbitmq.conf'%cur_dir,'r') as f:
11 |     rabbitmq_server = f.read().rstrip()
12 | 
13 | CARBON_SERVER = 'graphite.sdsc.edu'
14 | CARBON_PORT = 2003
15 | 
16 | sock = socket.socket()
17 | sock.connect((CARBON_SERVER, CARBON_PORT))
18 | 
19 | start_time = int(time.time())
20 | 
21 | mc = memcache.Client(['%s:11211'%rabbitmq_server], debug=0)
22 | 
23 | for key in ("mover.dir", "mover.files", "mover_del.dir", "mover_del.files"):
24 |         value = mc.get(key)
25 |         if(value):
26 |             message = 'system.hpc.datamove.bobcat_queue.%s %s %d\n' %(key, value, start_time)
27 |             sock.sendall(message) # comment to test
28 |             #print message
29 | sock.close()
30 | 


--------------------------------------------------------------------------------
/settings.py:
--------------------------------------------------------------------------------
 1 | #CELERY_ACCEPT_CONTENT = ['pickle']
 2 | #CELERY_TASK_SERIALIZER = ['pickle']
 3 | #CELERY_ACCEPT_CONTENT = ['json']
 4 | #CELERY_TASK_SERIALIZER = 'json'
 5 | #CELERY_RESULT_SERIALIZER = 'json'
 6 | 
 7 | CELERY_ROUTES = {
 8 |     'cmover_del.procFiles': {'queue': 'mover_del.files', 'delivery_mode':1},
 9 |     'cmover_del.procDir': {'queue': 'mover_del.dir', 'delivery_mode':1},
10 |     'cmover.procFiles': {'queue': 'mover.files', 'delivery_mode':1},
11 |     'cmover.procDir': {'queue': 'mover.dir', 'delivery_mode':1},
12 |     'cmover_dirtime.procDir': {'queue': 'mover_dirtime.dir', 'delivery_mode':1}
13 | }
14 | 
15 | CELERY_ACKS_LATE=True
16 | CELERY_IGNORE_RESULT=True
17 | 
18 | 


--------------------------------------------------------------------------------