Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1#!/usr/bin/env python 

2 

3# noinspection HttpUrlsUsage 

4""" 

5camcops_server/cc_modules/cc_export.py 

6 

7=============================================================================== 

8 

9 Copyright (C) 2012-2020 Rudolf Cardinal (rudolf@pobox.com). 

10 

11 This file is part of CamCOPS. 

12 

13 CamCOPS is free software: you can redistribute it and/or modify 

14 it under the terms of the GNU General Public License as published by 

15 the Free Software Foundation, either version 3 of the License, or 

16 (at your option) any later version. 

17 

18 CamCOPS is distributed in the hope that it will be useful, 

19 but WITHOUT ANY WARRANTY; without even the implied warranty of 

20 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 

21 GNU General Public License for more details. 

22 

23 You should have received a copy of the GNU General Public License 

24 along with CamCOPS. If not, see <https://www.gnu.org/licenses/>. 

25 

26=============================================================================== 

27 

28.. _ActiveMQ: https://activemq.apache.org/ 

29.. _AMQP: https://www.amqp.org/ 

30.. _APScheduler: https://apscheduler.readthedocs.io/ 

31.. _Celery: https://www.celeryproject.org/ 

32.. _Dramatiq: https://dramatiq.io/ 

33.. _RabbitMQ: https://www.rabbitmq.com/ 

34.. _Redis: https://redis.io/ 

35.. _ZeroMQ: https://zeromq.org/ 

36 

37**Export and research dump functions.** 

38 

39Export design: 

40 

41*WHICH RECORDS TO SEND?* 

42 

43The most powerful mechanism is not to have a sending queue (which would then 

44require careful multi-instance locking), but to have a "sent" log. That way: 

45 

46- A record needs sending if it's not in the sent log (for an appropriate 

47 recipient). 

48- You can add a new recipient and the system will know about the (new) 

49 backlog automatically. 

50- You can specify criteria, e.g. don't upload records before 1/1/2014, and 

51 modify that later, and it would catch up with the backlog. 

52- Successes and failures are logged in the same table. 

53- Multiple recipients are handled with ease. 

54- No need to alter database.pl code that receives from tablets. 

55- Can run with a simple cron job. 

56 

57*LOCKING* 

58 

59- Don't use database locking: 

60 https://blog.engineyard.com/2011/5-subtle-ways-youre-using-mysql-as-a-queue-and-why-itll-bite-you 

61- Locking via UNIX lockfiles: 

62 

63 - https://pypi.python.org/pypi/lockfile 

64 - http://pythonhosted.org/lockfile/ (which also works on Windows) 

65 

66 - On UNIX, ``lockfile`` uses ``LinkLockFile``: 

67 https://github.com/smontanaro/pylockfile/blob/master/lockfile/linklockfile.py 

68 

69*MESSAGE QUEUE AND BACKEND* 

70 

71Thoughts as of 2018-12-22. 

72 

73- See https://www.fullstackpython.com/task-queues.html. Also http://queues.io/; 

74 https://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or. 

75 

76- The "default" is Celery_, with ``celery beat`` for scheduling, via an 

77 AMQP_ broker like RabbitMQ_. 

78 

79 - Downside: no longer supported under Windows as of Celery 4. 

80 

81 - There are immediate bugs when running the demo code with Celery 4.2.1, 

82 fixed by setting the environment variable ``set 

83 FORKED_BY_MULTIPROCESSING=1`` before running the worker; see 

84 https://github.com/celery/celery/issues/4178 and 

85 https://github.com/celery/celery/pull/4078. 

86 

87 - Downside: backend is complex; e.g. Erlang dependency of RabbitMQ. 

88 

89 - Celery also supports Redis_, but Redis_ doesn't support Windows directly 

90 (except the Windows Subsystem for Linux in Windows 10+). 

91 

92- Another possibility is Dramatiq_ with APScheduler_. 

93 

94 - Of note, APScheduler_ can use an SQLAlchemy database table as its job 

95 store, which might be good. 

96 - Dramatiq_ uses RabbitMQ_ or Redis_. 

97 - Dramatiq_ 1.4.0 (2018-11-25) installs cleanly under Windows. Use ``pip 

98 install --upgrade "dramatic[rabbitmq, watch]"`` (i.e. with double quotse, 

99 not the single quotes it suggests, which don't work under Windows). 

100 - However, the basic example (https://dramatiq.io/guide.html) fails under 

101 Windows; when you fire up ``dramatic count_words`` (even with ``--processes 

102 1 --threads 1``) it crashes with an error from ``ForkingPickler`` in 

103 ``multiprocessing.reduction``, i.e. 

104 https://docs.python.org/3/library/multiprocessing.html#windows. It also 

105 emits a ``PermissionError: [WinError 5] Access is denied``. This is 

106 discussed a bit at https://github.com/Bogdanp/dramatiq/issues/75; 

107 https://github.com/Bogdanp/dramatiq/blob/master/docs/source/changelog.rst. 

108 The changelog suggests 1.4.0 should work, but it doesn't. 

109 

110- Worth some thought about ZeroMQ_, which is a very different sort of thing. 

111 Very cross-platform. Needs work to guard against message loss (i.e. messages 

112 are unreliable by default). Dynamic "special socket" style. 

113 

114- Possibly also ActiveMQ_. 

115 

116- OK; so speed is not critical but we want message reliability, for it to work 

117 under Windows, and decent Python bindings with job scheduling. 

118 

119 - OUT: Redis (not Windows easily), ZeroMQ (fast but not by default reliable), 

120 ActiveMQ (few Python frameworks?). 

121 - REMAINING for message handling: RabbitMQ. 

122 - Python options therefore: Celery (but Windows not officially supported from 

123 4+); Dramatiq (but Windows also not very well supported and seems a bit 

124 bleeding-edge). 

125 

126- This is looking like a mess from the Windows perspective. 

127 

128- An alternative is just to use the database, of course. 

129 

130 - https://softwareengineering.stackexchange.com/questions/351449/message-queue-database-vs-dedicated-mq 

131 - http://mikehadlow.blogspot.com/2012/04/database-as-queue-anti-pattern.html 

132 - https://blog.jooq.org/2014/09/26/using-your-rdbms-for-messaging-is-totally-ok/ 

133 - https://stackoverflow.com/questions/13005410/why-do-we-need-message-brokers-like-rabbitmq-over-a-database-like-postgresql 

134 - https://www.quora.com/What-is-the-best-practice-using-db-tables-or-message-queues-for-moderation-of-content-approved-by-humans 

135 

136- Let's take a step back and summarize the problem. 

137 

138 - Many web threads may upload tasks. This should trigger a prompt export for 

139 all push recipients. 

140 - Whichever way we schedule a backend task job, it should be as the 

141 combination of recipient, basetable, task PK. (That way, if one recipient 

142 fails, the others can proceed independently.) 

143 - Every job should check that it's not been completed already (in case of 

144 accidental job restarts), i.e. is idempotent as far as we can make it. 

145 - How should this interact with the non-push recipients? 

146 - We should use the same locking method for push and non-push recipients. 

147 - We should make the locking granular and use file locks -- for example, for 

148 each task/recipient combination (or each whole-database export for a given 

149 recipient). 

150 

151""" # noqa 

152 

153import logging 

154import os 

155import sqlite3 

156import tempfile 

157from typing import (Dict, List, Generator, Optional, 

158 Tuple, Type, TYPE_CHECKING, Union) 

159 

160from cardinal_pythonlib.classes import gen_all_subclasses 

161from cardinal_pythonlib.datetimefunc import ( 

162 format_datetime, 

163 get_now_localtz_pendulum, 

164 get_tz_local, 

165 get_tz_utc, 

166) 

167from cardinal_pythonlib.email.sendmail import CONTENT_TYPE_TEXT 

168from cardinal_pythonlib.fileops import relative_filename_within_dir 

169from cardinal_pythonlib.json.serialize import register_for_json 

170from cardinal_pythonlib.logs import BraceStyleAdapter 

171from cardinal_pythonlib.pyramid.responses import ( 

172 OdsResponse, 

173 SqliteBinaryResponse, 

174 TextAttachmentResponse, 

175 XlsxResponse, 

176 ZipResponse, 

177) 

178from cardinal_pythonlib.sizeformatter import bytes2human 

179from cardinal_pythonlib.sqlalchemy.session import get_safe_url_from_engine 

180import lockfile 

181from pendulum import DateTime as Pendulum, Duration, Period 

182from pyramid.httpexceptions import HTTPBadRequest 

183from pyramid.renderers import render_to_response 

184from pyramid.response import Response 

185from sqlalchemy.engine import create_engine 

186from sqlalchemy.engine.result import ResultProxy 

187from sqlalchemy.orm import Session as SqlASession, sessionmaker 

188from sqlalchemy.sql.expression import text 

189from sqlalchemy.sql.schema import Column, MetaData, Table 

190from sqlalchemy.sql.sqltypes import Text 

191 

192from camcops_server.cc_modules.cc_audit import audit 

193from camcops_server.cc_modules.cc_constants import DateFormat 

194from camcops_server.cc_modules.cc_dump import copy_tasks_and_summaries 

195from camcops_server.cc_modules.cc_email import Email 

196from camcops_server.cc_modules.cc_exportmodels import ( 

197 ExportedTask, 

198 ExportRecipient, 

199 gen_tasks_having_exportedtasks, 

200 get_collection_for_export, 

201) 

202from camcops_server.cc_modules.cc_forms import UserDownloadDeleteForm 

203from camcops_server.cc_modules.cc_pyramid import Routes, ViewArg, ViewParam 

204from camcops_server.cc_modules.cc_simpleobjects import TaskExportOptions 

205from camcops_server.cc_modules.cc_sqlalchemy import sql_from_sqlite_database 

206from camcops_server.cc_modules.cc_task import Task 

207from camcops_server.cc_modules.cc_tsv import TsvCollection, TsvPage 

208from camcops_server.cc_modules.celery import ( 

209 create_user_download, 

210 email_basic_dump, 

211 export_task_backend, 

212) 

213 

214if TYPE_CHECKING: 

215 from camcops_server.cc_modules.cc_request import CamcopsRequest 

216 from camcops_server.cc_modules.cc_taskcollection import TaskCollection 

217 

218log = BraceStyleAdapter(logging.getLogger(__name__)) 

219 

220 

221# ============================================================================= 

222# Constants 

223# ============================================================================= 

224 

225INFOSCHEMA_PAGENAME = "_camcops_information_schema_columns" 

226 

227 

228# ============================================================================= 

229# Export tasks from the back end 

230# ============================================================================= 

231 

232def print_export_queue(req: "CamcopsRequest", 

233 recipient_names: List[str] = None, 

234 all_recipients: bool = False, 

235 via_index: bool = True, 

236 pretty: bool = False) -> None: 

237 """ 

238 Called from the command line. 

239 

240 Shows tasks that would be exported. 

241 

242 Args: 

243 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

244 recipient_names: list of export recipient names (as per the config 

245 file) 

246 all_recipients: use all recipients? 

247 via_index: use the task index (faster)? 

248 pretty: use ``str(task)`` not ``repr(task)`` (prettier, slower because 

249 it has to query the patient) 

250 """ 

251 recipients = req.get_export_recipients( 

252 recipient_names=recipient_names, 

253 all_recipients=all_recipients, 

254 save=False 

255 ) 

256 if not recipients: 

257 log.warning("No export recipients") 

258 return 

259 for recipient in recipients: 

260 log.info("Tasks to be exported for recipient: {}", recipient) 

261 collection = get_collection_for_export(req, recipient, 

262 via_index=via_index) 

263 for task in collection.gen_tasks_by_class(): 

264 print( 

265 f"{recipient.recipient_name}: " 

266 f"{str(task) if pretty else repr(task)}" 

267 ) 

268 

269 

270def export(req: "CamcopsRequest", 

271 recipient_names: List[str] = None, 

272 all_recipients: bool = False, 

273 via_index: bool = True, 

274 schedule_via_backend: bool = False) -> None: 

275 """ 

276 Called from the command line. 

277 

278 Exports all relevant tasks (pending incremental exports, or everything if 

279 applicable) for specified export recipients. 

280 

281 Obtains a file lock, then iterates through all recipients. 

282 

283 Args: 

284 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

285 recipient_names: list of export recipient names (as per the config 

286 file) 

287 all_recipients: use all recipients? 

288 via_index: use the task index (faster)? 

289 schedule_via_backend: schedule jobs via the backend instead? 

290 """ 

291 recipients = req.get_export_recipients(recipient_names=recipient_names, 

292 all_recipients=all_recipients) 

293 if not recipients: 

294 log.warning("No export recipients") 

295 return 

296 

297 for recipient in recipients: 

298 log.info("Exporting to recipient: {}", recipient) 

299 if recipient.using_db(): 

300 if schedule_via_backend: 

301 raise NotImplementedError() # todo: implement whole-database export via Celery backend # noqa 

302 else: 

303 export_whole_database(req, recipient, via_index=via_index) 

304 else: 

305 # Non-database recipient. 

306 export_tasks_individually( 

307 req, recipient, 

308 via_index=via_index, schedule_via_backend=schedule_via_backend) 

309 

310 

311def export_whole_database(req: "CamcopsRequest", 

312 recipient: ExportRecipient, 

313 via_index: bool = True) -> None: 

314 """ 

315 Exports to a database. 

316 

317 Holds a recipient-specific file lock in the process. 

318 

319 Args: 

320 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

321 recipient: an :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

322 via_index: use the task index (faster)? 

323 """ # noqa 

324 cfg = req.config 

325 lockfilename = cfg.get_export_lockfilename_db( 

326 recipient_name=recipient.recipient_name) 

327 try: 

328 with lockfile.FileLock(lockfilename, timeout=0): # doesn't wait 

329 collection = get_collection_for_export(req, recipient, 

330 via_index=via_index) 

331 dst_engine = create_engine(recipient.db_url, 

332 echo=recipient.db_echo) 

333 log.info("Exporting to database: {}", 

334 get_safe_url_from_engine(dst_engine)) 

335 dst_session = sessionmaker(bind=dst_engine)() # type: SqlASession 

336 task_generator = gen_tasks_having_exportedtasks(collection) 

337 export_options = TaskExportOptions( 

338 include_blobs=recipient.db_include_blobs, 

339 db_patient_id_per_row=recipient.db_patient_id_per_row, 

340 db_make_all_tables_even_empty=True, 

341 db_include_summaries=recipient.db_add_summaries, 

342 ) 

343 copy_tasks_and_summaries( 

344 tasks=task_generator, 

345 dst_engine=dst_engine, 

346 dst_session=dst_session, 

347 export_options=export_options, 

348 req=req, 

349 ) 

350 dst_session.commit() 

351 except lockfile.AlreadyLocked: 

352 log.warning("Export logfile {!r} already locked by another process; " 

353 "aborting", lockfilename) 

354 

355 

356def export_tasks_individually(req: "CamcopsRequest", 

357 recipient: ExportRecipient, 

358 via_index: bool = True, 

359 schedule_via_backend: bool = False) -> None: 

360 """ 

361 Exports all necessary tasks for a recipient. 

362 

363 Args: 

364 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

365 recipient: an :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

366 via_index: use the task index (faster)? 

367 schedule_via_backend: schedule jobs via the backend instead? 

368 """ # noqa 

369 collection = get_collection_for_export(req, recipient, via_index=via_index) 

370 if schedule_via_backend: 

371 recipient_name = recipient.recipient_name 

372 for task_or_index in collection.gen_all_tasks_or_indexes(): 

373 if isinstance(task_or_index, Task): 

374 basetable = task_or_index.tablename 

375 task_pk = task_or_index.pk 

376 else: 

377 basetable = task_or_index.task_table_name 

378 task_pk = task_or_index.task_pk 

379 log.info("Submitting background job to export task {}.{} to {}", 

380 basetable, task_pk, recipient_name) 

381 export_task_backend.delay( 

382 recipient_name=recipient_name, 

383 basetable=basetable, 

384 task_pk=task_pk 

385 ) 

386 else: 

387 for task in collection.gen_tasks_by_class(): 

388 # Do NOT use this to check the working of export_task_backend(): 

389 # export_task_backend(recipient.recipient_name, task.tablename, task.pk) # noqa 

390 # ... it will deadlock at the database (because we're already 

391 # within a query of some sort, I presume) 

392 export_task(req, recipient, task) 

393 

394 

395def export_task(req: "CamcopsRequest", 

396 recipient: ExportRecipient, 

397 task: Task) -> None: 

398 """ 

399 Exports a single task, checking that it remains valid to do so. 

400 

401 Args: 

402 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

403 recipient: an :class:`camcops_server.cc_modules.cc_exportmodels.ExportRecipient` 

404 task: a :class:`camcops_server.cc_modules.cc_task.Task` 

405 """ # noqa 

406 

407 # Double-check it's OK! Just in case, for example, an old backend task has 

408 # persisted, or someone's managed to get an iffy back-end request in some 

409 # other way. 

410 if not recipient.is_task_suitable(task): 

411 # Warning will already have been emitted (by is_task_suitable). 

412 return 

413 

414 cfg = req.config 

415 lockfilename = cfg.get_export_lockfilename_task( 

416 recipient_name=recipient.recipient_name, 

417 basetable=task.tablename, 

418 pk=task.pk, 

419 ) 

420 dbsession = req.dbsession 

421 try: 

422 with lockfile.FileLock(lockfilename, timeout=0): # doesn't wait 

423 # We recheck the export status once we hold the lock, in case 

424 # multiple jobs are competing to export it. 

425 if ExportedTask.task_already_exported( 

426 dbsession=dbsession, 

427 recipient_name=recipient.recipient_name, 

428 basetable=task.tablename, 

429 task_pk=task.pk): 

430 log.info("Task {!r} already exported to recipient {!r}; " 

431 "ignoring", task, recipient) 

432 # Not a warning; it's normal to see these because it allows the 

433 # client API to skip some checks for speed. 

434 return 

435 # OK; safe to export now. 

436 et = ExportedTask(recipient, task) 

437 dbsession.add(et) 

438 et.export(req) 

439 dbsession.commit() # so the ExportedTask is visible to others ASAP 

440 except lockfile.AlreadyLocked: 

441 log.warning("Export logfile {!r} already locked by another process; " 

442 "aborting", lockfilename) 

443 

444 

445# ============================================================================= 

446# Helpers for task collection export functions 

447# ============================================================================= 

448 

449def gen_audited_tasks_for_task_class( 

450 collection: "TaskCollection", 

451 cls: Type[Task], 

452 audit_descriptions: List[str]) -> Generator[Task, None, None]: 

453 """ 

454 Generates tasks from a collection, for a given task class, simultaneously 

455 adding to an audit description. Used for user-triggered downloads. 

456 

457 Args: 

458 collection: a :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

459 cls: the task class to generate 

460 audit_descriptions: list of strings to be modified 

461 

462 Yields: 

463 :class:`camcops_server.cc_modules.cc_task.Task` objects 

464 """ # noqa 

465 pklist = [] # type: List[int] 

466 for task in collection.tasks_for_task_class(cls): 

467 pklist.append(task.pk) 

468 yield task 

469 audit_descriptions.append( 

470 f"{cls.__tablename__}: " 

471 f"{','.join(str(pk) for pk in pklist)}" 

472 ) 

473 

474 

475def gen_audited_tasks_by_task_class( 

476 collection: "TaskCollection", 

477 audit_descriptions: List[str]) -> Generator[Task, None, None]: 

478 """ 

479 Generates tasks from a collection, across task classes, simultaneously 

480 adding to an audit description. Used for user-triggered downloads. 

481 

482 Args: 

483 collection: a :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

484 audit_descriptions: list of strings to be modified 

485 

486 Yields: 

487 :class:`camcops_server.cc_modules.cc_task.Task` objects 

488 """ # noqa 

489 for cls in collection.task_classes(): 

490 for task in gen_audited_tasks_for_task_class(collection, cls, 

491 audit_descriptions): 

492 yield task 

493 

494 

495def get_information_schema_query(req: "CamcopsRequest") -> ResultProxy: 

496 """ 

497 Returns an SQLAlchemy query object that fetches the 

498 INFORMATION_SCHEMA.COLUMNS information from our source database. 

499 

500 This is not sensitive; there is no data, just structure/comments. 

501 """ 

502 # Find our database name 

503 # https://stackoverflow.com/questions/53554458/sqlalchemy-get-database-name-from-engine 

504 dbname = req.engine.url.database 

505 # Query the information schema for our database. 

506 # https://docs.sqlalchemy.org/en/13/core/sqlelement.html#sqlalchemy.sql.expression.text # noqa 

507 query = text(""" 

508 SELECT * 

509 FROM information_schema.columns 

510 WHERE table_schema = :dbname 

511 """).bindparams(dbname=dbname) 

512 result_proxy = req.dbsession.execute(query) 

513 return result_proxy 

514 

515 

516def get_information_schema_tsv_page( 

517 req: "CamcopsRequest", 

518 page_name: str = INFOSCHEMA_PAGENAME) -> TsvPage: 

519 """ 

520 Returns the server database's ``INFORMATION_SCHEMA.COLUMNS`` table as a 

521 :class:`camcops_server.cc_modules.cc_tsv.TsvPage``. 

522 """ 

523 result_proxy = get_information_schema_query(req) 

524 return TsvPage.from_resultproxy(page_name, result_proxy) 

525 

526 

527def write_information_schema_to_dst( 

528 req: "CamcopsRequest", 

529 dst_session: SqlASession, 

530 dest_table_name: str = INFOSCHEMA_PAGENAME) -> None: 

531 """ 

532 Writes the server's information schema to a separate database session 

533 (which will be an SQLite database being created for download). 

534 

535 There must be no open transactions (i.e. please COMMIT before you call 

536 this function), since we need to create a table. 

537 """ 

538 # 1. Read the structure of INFORMATION_SCHEMA.COLUMNS itself. 

539 # https://stackoverflow.com/questions/21770829/sqlalchemy-copy-schema-and-data-of-subquery-to-another-database # noqa 

540 src_engine = req.engine 

541 dst_engine = dst_session.bind 

542 metadata = MetaData(bind=dst_engine) 

543 table = Table( 

544 "columns", # table name; see also "schema" argument 

545 metadata, # "load with the destination metadata" 

546 # Override some specific column types by hand, or they'll fail as 

547 # SQLAlchemy fails to reflect the MySQL LONGTEXT type properly: 

548 Column("COLUMN_DEFAULT", Text), 

549 Column("COLUMN_TYPE", Text), 

550 Column("GENERATION_EXPRESSION", Text), 

551 autoload=True, # "read (reflect) structure from the database" 

552 autoload_with=src_engine, # "read (reflect) structure from the source" 

553 schema="information_schema" # schema 

554 ) 

555 # 2. Write that structure to our new database. 

556 table.name = dest_table_name # create it with a different name 

557 table.schema = "" # we don't have a schema in the destination database 

558 table.create(dst_engine) # CREATE TABLE 

559 # 3. Fetch data. 

560 query = get_information_schema_query(req) 

561 # 4. Write the data. 

562 for row in query: 

563 dst_session.execute(table.insert(row)) 

564 # 5. COMMIT 

565 dst_session.commit() 

566 

567 

568# ============================================================================= 

569# Convert task collections to different export formats for user download 

570# ============================================================================= 

571 

572@register_for_json 

573class DownloadOptions(object): 

574 """ 

575 Represents options for the process of the user downloading tasks. 

576 """ 

577 DELIVERY_MODES = [ 

578 ViewArg.DOWNLOAD, 

579 ViewArg.EMAIL, 

580 ViewArg.IMMEDIATELY, 

581 ] 

582 

583 def __init__(self, 

584 user_id: int, 

585 viewtype: str, 

586 delivery_mode: str, 

587 spreadsheet_sort_by_heading: bool = False, 

588 db_include_blobs: bool = False, 

589 db_patient_id_per_row: bool = False, 

590 include_information_schema_columns: bool = True) -> None: 

591 """ 

592 Args: 

593 user_id: 

594 ID of the user creating the request (may be needed to pass to 

595 the back-end) 

596 viewtype: 

597 file format for receiving data (e.g. XLSX, SQLite) 

598 delivery_mode: 

599 method of delivery (e.g. immediate, e-mail) 

600 spreadsheet_sort_by_heading: 

601 (For spreadsheets.) 

602 Sort columns within each page by heading name? 

603 db_include_blobs: 

604 (For database downloads.) 

605 Include BLOBs? 

606 db_patient_id_per_row: 

607 (For database downloads.) 

608 Denormalize by include the patient ID in all rows of 

609 patient-related tables? 

610 include_information_schema_columns: 

611 Include descriptions of the columns provided? 

612 """ 

613 assert delivery_mode in self.DELIVERY_MODES 

614 self.user_id = user_id 

615 self.viewtype = viewtype 

616 self.delivery_mode = delivery_mode 

617 self.spreadsheet_sort_by_heading = spreadsheet_sort_by_heading 

618 self.db_include_blobs = db_include_blobs 

619 self.db_patient_id_per_row = db_patient_id_per_row 

620 self.include_information_schema_columns = include_information_schema_columns # noqa 

621 

622 

623class TaskCollectionExporter(object): 

624 """ 

625 Class to provide tasks for user download. 

626 """ 

627 

628 def __init__(self, 

629 req: "CamcopsRequest", 

630 collection: "TaskCollection", 

631 options: DownloadOptions): 

632 """ 

633 Args: 

634 req: 

635 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

636 collection: 

637 a :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

638 options: 

639 :class:`DownloadOptions` governing the download 

640 """ # noqa 

641 self.req = req 

642 self.collection = collection 

643 self.options = options 

644 

645 @property 

646 def viewtype(self) -> str: 

647 raise NotImplementedError("Exporter needs to implement 'viewtype'") 

648 

649 @property 

650 def file_extension(self) -> str: 

651 raise NotImplementedError( 

652 "Exporter needs to implement 'file_extension'" 

653 ) 

654 

655 def get_filename(self) -> str: 

656 """ 

657 Returns the filename for the download. 

658 """ 

659 timestamp = format_datetime(self.req.now, DateFormat.FILENAME) 

660 return f"CamCOPS_dump_{timestamp}.{self.file_extension}" 

661 

662 def immediate_response(self, req: "CamcopsRequest") -> Response: 

663 """ 

664 Returns either a :class:`Response` with the data, or a 

665 :class:`Response` saying how the user will obtain their data later. 

666 

667 Args: 

668 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

669 """ 

670 if self.options.delivery_mode == ViewArg.EMAIL: 

671 self.schedule_email() 

672 return render_to_response( 

673 "email_scheduled.mako", 

674 dict(), 

675 request=req 

676 ) 

677 elif self.options.delivery_mode == ViewArg.DOWNLOAD: 

678 self.schedule_download() 

679 return render_to_response( 

680 "download_scheduled.mako", 

681 dict(), 

682 request=req 

683 ) 

684 else: # ViewArg.IMMEDIATELY 

685 return self.download_now() 

686 

687 def download_now(self) -> Response: 

688 """ 

689 Download the data dump in the selected format 

690 """ 

691 filename, body = self.to_file() 

692 return self.get_data_response(body=body, filename=filename) 

693 

694 def schedule_email(self) -> None: 

695 """ 

696 Schedule the export asynchronously and e-mail the logged in user 

697 when done 

698 """ 

699 email_basic_dump.delay(self.collection, self.options) 

700 

701 def send_by_email(self) -> None: 

702 """ 

703 Send the data dump by e-mail to the logged in user 

704 """ 

705 _ = self.req.gettext 

706 config = self.req.config 

707 

708 filename, body = self.to_file() 

709 email_to = self.req.user.email 

710 email = Email( 

711 # date: automatic 

712 from_addr=config.email_from, 

713 to=email_to, 

714 subject=_("CamCOPS research data dump"), 

715 body=_("The research data dump you requested is attached."), 

716 content_type=CONTENT_TYPE_TEXT, 

717 charset="utf8", 

718 attachments_binary=[(filename, body)], 

719 ) 

720 email.send( 

721 host=config.email_host, 

722 username=config.email_host_username, 

723 password=config.email_host_password, 

724 port=config.email_port, 

725 use_tls=config.email_use_tls, 

726 ) 

727 

728 if email.sent: 

729 log.info(f"Research dump emailed to {email_to}") 

730 else: 

731 log.error( 

732 f"Failed to email research dump to {email_to}" 

733 ) 

734 

735 def schedule_download(self) -> None: 

736 """ 

737 Schedule a background export to a file that the user can download 

738 later. 

739 """ 

740 create_user_download.delay(self.collection, self.options) 

741 

742 def create_user_download_and_email(self) -> None: 

743 """ 

744 Creates a user download, and e-mails the user to let them know. 

745 """ 

746 _ = self.req.gettext 

747 config = self.req.config 

748 

749 download_dir = self.req.user_download_dir 

750 space = self.req.user_download_bytes_available 

751 filename, contents = self.to_file() 

752 size = len(contents) 

753 

754 if size > space: 

755 # Not enough space 

756 total_permitted = self.req.user_download_bytes_permitted 

757 msg = _( 

758 "You do not have enough space to create this download. " 

759 "You are allowed {total_permitted} bytes and you are have " 

760 "{space} bytes free. This download would need {size} bytes." 

761 ).format(total_permitted=total_permitted, space=space, size=size) 

762 else: 

763 # Create file 

764 fullpath = os.path.join(download_dir, filename) 

765 try: 

766 with open(fullpath, "wb") as f: 

767 f.write(contents) 

768 # Success 

769 log.info(f"Created user download: {fullpath}") 

770 msg = _( 

771 "The research data dump you requested is ready to be " 

772 "downloaded. You will find it in your download area. " 

773 "It is called %s" 

774 ) % filename 

775 except Exception as e: 

776 # Some other error 

777 msg = _( 

778 "Failed to create file {filename}. Error was: {message}" 

779 ).format(filename=filename, message=e) 

780 

781 # E-mail the user, if they have an e-mail address 

782 email_to = self.req.user.email 

783 if email_to: 

784 email = Email( 

785 # date: automatic 

786 from_addr=config.email_from, 

787 to=email_to, 

788 subject=_("CamCOPS research data dump"), 

789 body=msg, 

790 content_type=CONTENT_TYPE_TEXT, 

791 charset="utf8", 

792 ) 

793 email.send( 

794 host=config.email_host, 

795 username=config.email_host_username, 

796 password=config.email_host_password, 

797 port=config.email_port, 

798 use_tls=config.email_use_tls, 

799 ) 

800 

801 def get_data_response(self, body: bytes, filename: str) -> Response: 

802 raise NotImplementedError("Exporter needs to implement 'get_response'") 

803 

804 def to_file(self) -> Tuple[str, bytes]: 

805 """ 

806 Returns the tuple ``filename, file_contents``. 

807 """ 

808 return self.get_filename(), self.get_file_body() 

809 

810 def get_file_body(self) -> bytes: 

811 """ 

812 Returns binary data to be stored as a file. 

813 """ 

814 raise NotImplementedError("Exporter needs to implement 'get_file_body'") 

815 

816 def get_tsv_collection(self) -> TsvCollection: 

817 """ 

818 Converts the collection of tasks to a collection of spreadsheet-style 

819 data. Also audits the request as a basic data dump. 

820 

821 Returns: 

822 a :class:`camcops_server.cc_modules.cc_tsv.TsvCollection` object 

823 """ # noqa 

824 audit_descriptions = [] # type: List[str] 

825 # Task may return >1 file for TSV output (e.g. for subtables). 

826 tsvcoll = TsvCollection() 

827 # Iterate through tasks, creating the TSV collection 

828 for cls in self.collection.task_classes(): 

829 for task in gen_audited_tasks_for_task_class(self.collection, cls, 

830 audit_descriptions): 

831 tsv_pages = task.get_tsv_pages(self.req) 

832 tsvcoll.add_pages(tsv_pages) 

833 

834 if self.options.include_information_schema_columns: 

835 info_schema_page = get_information_schema_tsv_page(self.req) 

836 tsvcoll.add_page(info_schema_page) 

837 

838 tsvcoll.sort_pages() 

839 if self.options.spreadsheet_sort_by_heading: 

840 tsvcoll.sort_headings_within_all_pages() 

841 

842 audit(self.req, f"Basic dump: {'; '.join(audit_descriptions)}") 

843 

844 return tsvcoll 

845 

846 

847class OdsExporter(TaskCollectionExporter): 

848 """ 

849 Converts a set of tasks to an OpenOffice ODS file. 

850 """ 

851 file_extension = "ods" 

852 viewtype = ViewArg.ODS 

853 

854 def get_file_body(self) -> bytes: 

855 return self.get_tsv_collection().as_ods() 

856 

857 def get_data_response(self, body: bytes, filename: str) -> Response: 

858 return OdsResponse(body=body, filename=filename) 

859 

860 

861class RExporter(TaskCollectionExporter): 

862 """ 

863 Converts a set of tasks to an R script. 

864 """ 

865 file_extension = "R" 

866 viewtype = ViewArg.R 

867 

868 def __init__(self, *args, **kwargs) -> None: 

869 super().__init__(*args, **kwargs) 

870 self.encoding = "utf-8" 

871 

872 def get_file_body(self) -> bytes: 

873 return self.get_r_script().encode(self.encoding) 

874 

875 def get_r_script(self) -> str: 

876 return self.get_tsv_collection().as_r() 

877 

878 def get_data_response(self, body: bytes, filename: str) -> Response: 

879 filename = self.get_filename() 

880 r_script = self.get_r_script() 

881 return TextAttachmentResponse(body=r_script, filename=filename) 

882 

883 

884class TsvZipExporter(TaskCollectionExporter): 

885 """ 

886 Converts a set of tasks to a set of TSV (tab-separated value) file, (one 

887 per table) in a ZIP file. 

888 """ 

889 file_extension = "zip" 

890 viewtype = ViewArg.TSV_ZIP 

891 

892 def get_file_body(self) -> bytes: 

893 return self.get_tsv_collection().as_zip() 

894 

895 def get_data_response(self, body: bytes, filename: str) -> Response: 

896 return ZipResponse(body=body, filename=filename) 

897 

898 

899class XlsxExporter(TaskCollectionExporter): 

900 """ 

901 Converts a set of tasks to an Excel XLSX file. 

902 """ 

903 file_extension = "xlsx" 

904 viewtype = ViewArg.XLSX 

905 

906 def get_file_body(self) -> bytes: 

907 return self.get_tsv_collection().as_xlsx() 

908 

909 def get_data_response(self, body: bytes, filename: str) -> Response: 

910 return XlsxResponse(body=body, filename=filename) 

911 

912 

913class SqliteExporter(TaskCollectionExporter): 

914 """ 

915 Converts a set of tasks to an SQLite binary file. 

916 """ 

917 file_extension = "sqlite" 

918 viewtype = ViewArg.SQLITE 

919 

920 def get_export_options(self) -> TaskExportOptions: 

921 return TaskExportOptions( 

922 include_blobs=self.options.db_include_blobs, 

923 db_include_summaries=True, 

924 db_make_all_tables_even_empty=True, # debatable, but more consistent! # noqa 

925 db_patient_id_per_row=self.options.db_patient_id_per_row, 

926 ) 

927 

928 def get_sqlite_data(self, as_text: bool) -> Union[bytes, str]: 

929 """ 

930 Returns data as a binary SQLite database, or SQL text to create it. 

931 

932 Args: 

933 as_text: textual SQL, rather than binary SQLite? 

934 

935 Returns: 

936 ``bytes`` or ``str``, according to ``as_text`` 

937 """ 

938 # --------------------------------------------------------------------- 

939 # Create memory file, dumper, and engine 

940 # --------------------------------------------------------------------- 

941 

942 # This approach failed: 

943 # 

944 # memfile = io.StringIO() 

945 # 

946 # def dump(querysql, *multiparams, **params): 

947 # compsql = querysql.compile(dialect=engine.dialect) 

948 # memfile.write("{};\n".format(compsql)) 

949 # 

950 # engine = create_engine('{dialect}://'.format(dialect=dialect_name), 

951 # strategy='mock', executor=dump) 

952 # dst_session = sessionmaker(bind=engine)() # type: SqlASession 

953 # 

954 # ... you get the error 

955 # AttributeError: 'MockConnection' object has no attribute 'begin' 

956 # ... which is fair enough. 

957 # 

958 # Next best thing: SQLite database. 

959 # Two ways to deal with it: 

960 # (a) duplicate our C++ dump code (which itself duplicate the SQLite 

961 # command-line executable's dump facility), then create the 

962 # database, dump it to a string, serve the string; or 

963 # (b) offer the binary SQLite file. 

964 # Or... (c) both. 

965 # Aha! pymysqlite.iterdump does this for us. 

966 # 

967 # If we create an in-memory database using create_engine('sqlite://'), 

968 # can we get the binary contents out? Don't think so. 

969 # 

970 # So we should first create a temporary on-disk file, then use that. 

971 

972 # --------------------------------------------------------------------- 

973 # Make temporary file (one whose filename we can know). 

974 # --------------------------------------------------------------------- 

975 # We use tempfile.mkstemp() for security, or NamedTemporaryFile, 

976 # which is a bit easier. However, you can't necessarily open the file 

977 # again under all OSs, so that's no good. The final option is 

978 # TemporaryDirectory, which is secure and convenient. 

979 # 

980 # https://docs.python.org/3/library/tempfile.html 

981 # https://security.openstack.org/guidelines/dg_using-temporary-files-securely.html # noqa 

982 # https://stackoverflow.com/questions/3924117/how-to-use-tempfile-namedtemporaryfile-in-python # noqa 

983 db_basename = "temp.sqlite3" 

984 with tempfile.TemporaryDirectory() as tmpdirname: 

985 db_filename = os.path.join(tmpdirname, db_basename) 

986 # --------------------------------------------------------------------- 

987 # Make SQLAlchemy session 

988 # --------------------------------------------------------------------- 

989 url = "sqlite:///" + db_filename 

990 engine = create_engine(url, echo=False) 

991 dst_session = sessionmaker(bind=engine)() # type: SqlASession 

992 # --------------------------------------------------------------------- 

993 # Iterate through tasks, creating tables as we need them. 

994 # --------------------------------------------------------------------- 

995 audit_descriptions = [] # type: List[str] 

996 task_generator = gen_audited_tasks_by_task_class(self.collection, 

997 audit_descriptions) 

998 # --------------------------------------------------------------------- 

999 # Next bit very tricky. We're trying to achieve several things: 

1000 # - a copy of part of the database structure 

1001 # - a copy of part of the data, with relationships intact 

1002 # - nothing sensitive (e.g. full User records) going through 

1003 # - adding new columns for Task objects offering summary values 

1004 # - Must treat tasks all together, because otherwise we will insert 

1005 # duplicate dependency objects like Group objects. 

1006 # --------------------------------------------------------------------- 

1007 copy_tasks_and_summaries(tasks=task_generator, 

1008 dst_engine=engine, 

1009 dst_session=dst_session, 

1010 export_options=self.get_export_options(), 

1011 req=self.req) 

1012 dst_session.commit() 

1013 if self.options.include_information_schema_columns: 

1014 # Must have committed before we do this: 

1015 write_information_schema_to_dst(self.req, dst_session) 

1016 # --------------------------------------------------------------------- 

1017 # Audit 

1018 # --------------------------------------------------------------------- 

1019 audit(self.req, f"SQL dump: {'; '.join(audit_descriptions)}") 

1020 # --------------------------------------------------------------------- 

1021 # Fetch file contents, either as binary, or as SQL 

1022 # --------------------------------------------------------------------- 

1023 if as_text: 

1024 # SQL text 

1025 connection = sqlite3.connect(db_filename) # type: sqlite3.Connection # noqa 

1026 sql_text = sql_from_sqlite_database(connection) 

1027 connection.close() 

1028 return sql_text 

1029 else: 

1030 # SQLite binary 

1031 with open(db_filename, 'rb') as f: 

1032 binary_contents = f.read() 

1033 return binary_contents 

1034 

1035 def get_file_body(self) -> bytes: 

1036 return self.get_sqlite_data(as_text=False) 

1037 

1038 def get_data_response(self, body: bytes, filename: str) -> Response: 

1039 return SqliteBinaryResponse(body=body, filename=filename) 

1040 

1041 

1042class SqlExporter(SqliteExporter): 

1043 """ 

1044 Converts a set of tasks to the textual SQL needed to create an SQLite file. 

1045 """ 

1046 file_extension = "sql" 

1047 viewtype = ViewArg.SQL 

1048 

1049 def __init__(self, *args, **kwargs) -> None: 

1050 super().__init__(*args, **kwargs) 

1051 self.encoding = "utf-8" 

1052 

1053 def get_file_body(self) -> bytes: 

1054 return self.get_sql().encode(self.encoding) 

1055 

1056 def get_sql(self) -> str: 

1057 """ 

1058 Returns SQL text representing the SQLite database. 

1059 """ 

1060 return self.get_sqlite_data(as_text=True) 

1061 

1062 def download_now(self) -> Response: 

1063 """ 

1064 Download the data dump in the selected format 

1065 """ 

1066 filename = self.get_filename() 

1067 sql_text = self.get_sql() 

1068 return TextAttachmentResponse(body=sql_text, filename=filename) 

1069 

1070 def get_data_response(self, body: bytes, filename: str) -> Response: 

1071 """ 

1072 Unused. 

1073 """ 

1074 pass 

1075 

1076 

1077# Create mapping from "viewtype" to class. 

1078# noinspection PyTypeChecker 

1079DOWNLOADER_CLASSES = {} # type: Dict[str, Type[TaskCollectionExporter]] 

1080for _cls in gen_all_subclasses(TaskCollectionExporter): # type: Type[TaskCollectionExporter] # noqa 

1081 # noinspection PyTypeChecker 

1082 DOWNLOADER_CLASSES[_cls.viewtype] = _cls 

1083 

1084 

1085def make_exporter(req: "CamcopsRequest", 

1086 collection: "TaskCollection", 

1087 options: DownloadOptions) -> TaskCollectionExporter: 

1088 """ 

1089 

1090 Args: 

1091 req: 

1092 a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1093 collection: 

1094 a 

1095 :class:`camcops_server.cc_modules.cc_taskcollection.TaskCollection` 

1096 options: 

1097 :class:`camcops_server.cc_modules.cc_export.DownloadOptions` 

1098 governing the download 

1099 

1100 Returns: 

1101 a :class:`BasicTaskCollectionExporter` 

1102 

1103 Raises: 

1104 :exc:`HTTPBadRequest` if the arguments are bad 

1105 """ 

1106 _ = req.gettext 

1107 if options.delivery_mode not in DownloadOptions.DELIVERY_MODES: 

1108 raise HTTPBadRequest( 

1109 f"{_('Bad delivery mode:')} {options.delivery_mode!r} " 

1110 f"({_('permissible:')} " 

1111 f"{DownloadOptions.DELIVERY_MODES!r})") 

1112 try: 

1113 downloader_class = DOWNLOADER_CLASSES[options.viewtype] 

1114 except KeyError: 

1115 raise HTTPBadRequest( 

1116 f"{_('Bad output type:')} {options.viewtype!r} " 

1117 f"({_('permissible:')} {DOWNLOADER_CLASSES.keys()!r})") 

1118 return downloader_class( 

1119 req=req, 

1120 collection=collection, 

1121 options=options 

1122 ) 

1123 

1124 

1125# ============================================================================= 

1126# Represent files for users to download 

1127# ============================================================================= 

1128 

1129class UserDownloadFile(object): 

1130 """ 

1131 Represents a file that has been generated for the user to download. 

1132 

1133 Test code: 

1134 

1135 .. code-block:: python 

1136 

1137 from camcops_server.cc_modules.cc_export import UserDownloadFile 

1138 x = UserDownloadFile("/etc/hosts") 

1139 

1140 print(x.when_last_modified) # should match output of: ls -l /etc/hosts 

1141 

1142 many = UserDownloadFile.from_directory_scan("/etc") 

1143 

1144 """ 

1145 def __init__(self, filename: str, directory: str = "", 

1146 permitted_lifespan_min: float = 0, 

1147 req: "CamcopsRequest" = None) -> None: 

1148 """ 

1149 Args: 

1150 filename: filename relative to ``directory`` 

1151 directory: directory 

1152 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1153 

1154 Notes: 

1155 

1156 - The Unix ``ls`` command shows timestamps in the current timezone. 

1157 Try ``TZ=utc ls -l <filename>`` or ``TZ="America/New_York" ls -l 

1158 <filename>`` to see this. 

1159 - The underlying timestamp is the time (in seconds) since the Unix 

1160 "epoch", which is 00:00:00 UTC on 1 Jan 1970 

1161 (https://en.wikipedia.org/wiki/Unix_time). 

1162 """ 

1163 self.filename = filename 

1164 self.directory = directory 

1165 self.permitted_lifespan_min = permitted_lifespan_min 

1166 self.req = req 

1167 

1168 self.basename = os.path.basename(filename) 

1169 _, self.extension = os.path.splitext(filename) 

1170 if directory: 

1171 self.fullpath = os.path.join(directory, filename) 

1172 else: 

1173 self.fullpath = filename 

1174 try: 

1175 self.statinfo = os.stat(self.fullpath) 

1176 self.exists = True 

1177 except FileNotFoundError: 

1178 self.statinfo = None # type: Optional[os.stat_result] 

1179 self.exists = False 

1180 

1181 # ------------------------------------------------------------------------- 

1182 # Size 

1183 # ------------------------------------------------------------------------- 

1184 

1185 @property 

1186 def size(self) -> Optional[int]: 

1187 """ 

1188 Size of the file, in bytes. Returns ``None`` if the file does not 

1189 exist. 

1190 """ 

1191 return self.statinfo.st_size if self.exists else None 

1192 

1193 @property 

1194 def size_str(self) -> str: 

1195 """ 

1196 Returns a pretty-format string describing the file's size. 

1197 """ 

1198 size_bytes = self.size 

1199 if size_bytes is None: 

1200 return "" 

1201 return bytes2human(size_bytes) 

1202 

1203 # ------------------------------------------------------------------------- 

1204 # Timing 

1205 # ------------------------------------------------------------------------- 

1206 

1207 @property 

1208 def when_last_modified(self) -> Optional[Pendulum]: 

1209 """ 

1210 Returns the file's modification time, or ``None`` if it doesn't exist. 

1211 

1212 (Creation time is harder! See 

1213 https://stackoverflow.com/questions/237079/how-to-get-file-creation-modification-date-times-in-python.) 

1214 """ # noqa 

1215 if not self.exists: 

1216 return None 

1217 # noinspection PyTypeChecker 

1218 creation = Pendulum.fromtimestamp(self.statinfo.st_mtime, 

1219 tz=get_tz_utc()) # type: Pendulum 

1220 # ... gives the correct time in the UTC timezone 

1221 # ... note that utcfromtimestamp() gives a time without a timezone, 

1222 # which is unhelpful! 

1223 # We would like this to display in the current timezone: 

1224 return creation.in_timezone(get_tz_local()) 

1225 

1226 @property 

1227 def when_last_modified_str(self) -> str: 

1228 """ 

1229 Returns a formatted string with the file's modification time. 

1230 """ 

1231 w = self.when_last_modified 

1232 if not w: 

1233 return "" 

1234 return format_datetime(w, DateFormat.ISO8601_HUMANIZED_TO_SECONDS) 

1235 

1236 @property 

1237 def time_left(self) -> Optional[Duration]: 

1238 """ 

1239 Returns the amount of time that this file has left to live before 

1240 the server will delete it. Returns ``None`` if the file does not exist. 

1241 """ 

1242 if not self.exists: 

1243 return None 

1244 now = get_now_localtz_pendulum() 

1245 death = ( 

1246 self.when_last_modified + 

1247 Duration(minutes=self.permitted_lifespan_min) 

1248 ) 

1249 remaining = death - now # type: Period 

1250 # Note that Period is a subclass of Duration, but its __str__() 

1251 # method is different. Duration maps __str__() to in_words(), but 

1252 # Period maps __str__() to __repr__(). 

1253 return remaining 

1254 

1255 @property 

1256 def time_left_str(self) -> str: 

1257 """ 

1258 A string version of :meth:`time_left`. 

1259 """ 

1260 t = self.time_left 

1261 if not t: 

1262 return "" 

1263 return t.in_words() # Duration and Period do nice formatting 

1264 

1265 def older_than(self, when: Pendulum) -> bool: 

1266 """ 

1267 Was the file created before the specified time? 

1268 """ 

1269 m = self.when_last_modified 

1270 if not m: 

1271 return False 

1272 return m < when 

1273 

1274 # ------------------------------------------------------------------------- 

1275 # Deletion 

1276 # ------------------------------------------------------------------------- 

1277 

1278 @property 

1279 def delete_form(self) -> str: 

1280 """ 

1281 Returns HTML for a form to delete this file. 

1282 """ 

1283 if not self.req: 

1284 return "" 

1285 dest_url = self.req.route_url(Routes.DELETE_FILE) 

1286 form = UserDownloadDeleteForm( 

1287 request=self.req, 

1288 action=dest_url 

1289 ) 

1290 appstruct = {ViewParam.FILENAME: self.filename} 

1291 rendered_form = form.render(appstruct) 

1292 return rendered_form 

1293 

1294 def delete(self) -> None: 

1295 """ 

1296 Deletes the file. Does not raise an exception if the file does not 

1297 exist. 

1298 """ 

1299 try: 

1300 os.remove(self.fullpath) 

1301 log.info(f"Deleted file: {self.fullpath}") 

1302 except OSError: 

1303 pass 

1304 

1305 # ------------------------------------------------------------------------- 

1306 # Downloading 

1307 # ------------------------------------------------------------------------- 

1308 

1309 @property 

1310 def download_url(self) -> str: 

1311 """ 

1312 Returns a URL to download this file. 

1313 """ 

1314 if not self.req: 

1315 return "" 

1316 querydict = { 

1317 ViewParam.FILENAME: self.filename 

1318 } 

1319 return self.req.route_url(Routes.DOWNLOAD_FILE, _query=querydict) 

1320 

1321 @property 

1322 def contents(self) -> Optional[bytes]: 

1323 """ 

1324 The file contents. May raise :exc:`OSError` if the read fails. 

1325 """ 

1326 if not self.exists: 

1327 return None 

1328 with open(self.fullpath, "rb") as f: 

1329 return f.read() 

1330 

1331 # ------------------------------------------------------------------------- 

1332 # Bulk creation 

1333 # ------------------------------------------------------------------------- 

1334 

1335 @classmethod 

1336 def from_directory_scan( 

1337 cls, directory: str, 

1338 permitted_lifespan_min: float = 0, 

1339 req: "CamcopsRequest" = None) -> List["UserDownloadFile"]: 

1340 """ 

1341 Scans the directory and returns a list of :class:`UserDownloadFile` 

1342 objects, one for each file in the directory. 

1343 

1344 For each object, ``directory`` is the root directory (our parameter 

1345 here), and ``filename`` is the filename RELATIVE to that. 

1346 

1347 Args: 

1348 directory: directory to scan 

1349 permitted_lifespan_min: lifespan for each file 

1350 req: a :class:`camcops_server.cc_modules.cc_request.CamcopsRequest` 

1351 """ 

1352 results = [] # type: List[UserDownloadFile] 

1353 # Imagine directory == "/etc": 

1354 for root, dirs, files in os.walk(directory): 

1355 # ... then root might at times be "/etc/apache2" 

1356 for f in files: 

1357 fullpath = os.path.join(root, f) 

1358 relative_filename = relative_filename_within_dir( 

1359 fullpath, directory) 

1360 results.append(UserDownloadFile( 

1361 filename=relative_filename, 

1362 directory=directory, 

1363 permitted_lifespan_min=permitted_lifespan_min, 

1364 req=req 

1365 )) 

1366 return results