Coverage for C:\leo.repo\leo-editor\leo\core\leoAst.py : 99%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2#@+leo-ver=5-thin
3#@+node:ekr.20141012064706.18389: * @file leoAst.py
4#@@first
5# This file is part of Leo: https://leoeditor.com
6# Leo's copyright notice is based on the MIT license: http://leoeditor.com/license.html
7#@+<< docstring >>
8#@+node:ekr.20200113081838.1: ** << docstring >> (leoAst.py)
9"""
10leoAst.py: This file does not depend on Leo in any way.
12The classes in this file unify python's token-based and ast-based worlds by
13creating two-way links between tokens in the token list and ast nodes in
14the parse tree. For more details, see the "Overview" section below.
17**Stand-alone operation**
19usage:
20 leoAst.py --help
21 leoAst.py [--fstringify | --fstringify-diff | --orange | --orange-diff] PATHS
22 leoAst.py --py-cov [ARGS]
23 leoAst.py --pytest [ARGS]
24 leoAst.py --unittest [ARGS]
26examples:
27 --py-cov "-f TestOrange"
28 --pytest "-f TestOrange"
29 --unittest TestOrange
31positional arguments:
32 PATHS directory or list of files
34optional arguments:
35 -h, --help show this help message and exit
36 --fstringify leonine fstringify
37 --fstringify-diff show fstringify diff
38 --orange leonine Black
39 --orange-diff show orange diff
40 --py-cov run pytest --cov on leoAst.py
41 --pytest run pytest on leoAst.py
42 --unittest run unittest on leoAst.py
45**Overview**
47leoAst.py unifies python's token-oriented and ast-oriented worlds.
49leoAst.py defines classes that create two-way links between tokens
50created by python's tokenize module and parse tree nodes created by
51python's ast module:
53The Token Order Generator (TOG) class quickly creates the following
54links:
56- An *ordered* children array from each ast node to its children.
58- A parent link from each ast.node to its parent.
60- Two-way links between tokens in the token list, a list of Token
61 objects, and the ast nodes in the parse tree:
63 - For each token, token.node contains the ast.node "responsible" for
64 the token.
66 - For each ast node, node.first_i and node.last_i are indices into
67 the token list. These indices give the range of tokens that can be
68 said to be "generated" by the ast node.
70Once the TOG class has inserted parent/child links, the Token Order
71Traverser (TOT) class traverses trees annotated with parent/child
72links extremely quickly.
75**Applicability and importance**
77Many python developers will find asttokens meets all their needs.
78asttokens is well documented and easy to use. Nevertheless, two-way
79links are significant additions to python's tokenize and ast modules:
81- Links from tokens to nodes are assigned to the nearest possible ast
82 node, not the nearest statement, as in asttokens. Links can easily
83 be reassigned, if desired.
85- The TOG and TOT classes are intended to be the foundation of tools
86 such as fstringify and black.
88- The TOG class solves real problems, such as:
89 https://stackoverflow.com/questions/16748029/
91**Known bug**
93This file has no known bugs *except* for Python version 3.8.
95For Python 3.8, syncing tokens will fail for function call such as:
97 f(1, x=2, *[3, 4], y=5)
99that is, for calls where keywords appear before non-keyword args.
101There are no plans to fix this bug. The workaround is to use Python version
1023.9 or above.
105**Figures of merit**
107Simplicity: The code consists primarily of a set of generators, one
108for every kind of ast node.
110Speed: The TOG creates two-way links between tokens and ast nodes in
111roughly the time taken by python's tokenize.tokenize and ast.parse
112library methods. This is substantially faster than the asttokens,
113black or fstringify tools. The TOT class traverses trees annotated
114with parent/child links even more quickly.
116Memory: The TOG class makes no significant demands on python's
117resources. Generators add nothing to python's call stack.
118TOG.node_stack is the only variable-length data. This stack resides in
119python's heap, so its length is unimportant. In the worst case, it
120might contain a few thousand entries. The TOT class uses no
121variable-length data at all.
123**Links**
125Leo...
126Ask for help: https://groups.google.com/forum/#!forum/leo-editor
127Report a bug: https://github.com/leo-editor/leo-editor/issues
128leoAst.py docs: http://leoeditor.com/appendices.html#leoast-py
130Other tools...
131asttokens: https://pypi.org/project/asttokens
132black: https://pypi.org/project/black/
133fstringify: https://pypi.org/project/fstringify/
135Python modules...
136tokenize.py: https://docs.python.org/3/library/tokenize.html
137ast.py https://docs.python.org/3/library/ast.html
139**Studying this file**
141I strongly recommend that you use Leo when studying this code so that you
142will see the file's intended outline structure.
144Without Leo, you will see only special **sentinel comments** that create
145Leo's outline structure. These comments have the form::
147 `#@<comment-kind>:<user-id>.<timestamp>.<number>: <outline-level> <headline>`
148"""
149#@-<< docstring >>
150#@+<< imports >>
151#@+node:ekr.20200105054219.1: ** << imports >> (leoAst.py)
152import argparse
153import ast
154import codecs
155import difflib
156import glob
157import io
158import os
159import re
160import sys
161import textwrap
162import tokenize
163import traceback
164from typing import List, Optional
165#@-<< imports >>
166v1, v2, junk1, junk2, junk3 = sys.version_info
167py_version = (v1, v2)
169# Async tokens exist only in Python 3.5 and 3.6.
170# https://docs.python.org/3/library/token.html
171has_async_tokens = (3, 5) <= py_version <= (3, 6)
173# has_position_only_params = (v1, v2) >= (3, 8)
174#@+others
175#@+node:ekr.20191226175251.1: ** class LeoGlobals
176#@@nosearch
179class LeoGlobals: # pragma: no cover
180 """
181 Simplified version of functions in leoGlobals.py.
182 """
184 total_time = 0.0 # For unit testing.
186 #@+others
187 #@+node:ekr.20191226175903.1: *3* LeoGlobals.callerName
188 def callerName(self, n):
189 """Get the function name from the call stack."""
190 try:
191 f1 = sys._getframe(n)
192 code1 = f1.f_code
193 return code1.co_name
194 except Exception:
195 return ''
196 #@+node:ekr.20191226175426.1: *3* LeoGlobals.callers
197 def callers(self, n=4):
198 """
199 Return a string containing a comma-separated list of the callers
200 of the function that called g.callerList.
201 """
202 i, result = 2, []
203 while True:
204 s = self.callerName(n=i)
205 if s:
206 result.append(s)
207 if not s or len(result) >= n:
208 break
209 i += 1
210 return ','.join(reversed(result))
211 #@+node:ekr.20191226190709.1: *3* leoGlobals.es_exception & helper
212 def es_exception(self, full=True):
213 typ, val, tb = sys.exc_info()
214 for line in traceback.format_exception(typ, val, tb):
215 print(line)
216 fileName, n = self.getLastTracebackFileAndLineNumber()
217 return fileName, n
218 #@+node:ekr.20191226192030.1: *4* LeoGlobals.getLastTracebackFileAndLineNumber
219 def getLastTracebackFileAndLineNumber(self):
220 typ, val, tb = sys.exc_info()
221 if typ == SyntaxError:
222 # IndentationError is a subclass of SyntaxError.
223 # SyntaxError *does* have 'filename' and 'lineno' attributes.
224 return val.filename, val.lineno # type:ignore
225 #
226 # Data is a list of tuples, one per stack entry.
227 # The tuples have the form (filename, lineNumber, functionName, text).
228 data = traceback.extract_tb(tb)
229 item = data[-1] # Get the item at the top of the stack.
230 filename, n, functionName, text = item
231 return filename, n
232 #@+node:ekr.20200220065737.1: *3* LeoGlobals.objToString
233 def objToString(self, obj, tag=None):
234 """Simplified version of g.printObj."""
235 result = []
236 if tag:
237 result.append(f"{tag}...")
238 if isinstance(obj, str):
239 obj = g.splitLines(obj)
240 if isinstance(obj, list):
241 result.append('[')
242 for z in obj:
243 result.append(f" {z!r}")
244 result.append(']')
245 elif isinstance(obj, tuple):
246 result.append('(')
247 for z in obj:
248 result.append(f" {z!r}")
249 result.append(')')
250 else:
251 result.append(repr(obj))
252 result.append('')
253 return '\n'.join(result)
254 #@+node:ekr.20191226190425.1: *3* LeoGlobals.plural
255 def plural(self, obj):
256 """Return "s" or "" depending on n."""
257 if isinstance(obj, (list, tuple, str)):
258 n = len(obj)
259 else:
260 n = obj
261 return '' if n == 1 else 's'
262 #@+node:ekr.20191226175441.1: *3* LeoGlobals.printObj
263 def printObj(self, obj, tag=None):
264 """Simplified version of g.printObj."""
265 print(self.objToString(obj, tag))
266 #@+node:ekr.20191226190131.1: *3* LeoGlobals.splitLines
267 def splitLines(self, s):
268 """Split s into lines, preserving the number of lines and
269 the endings of all lines, including the last line."""
270 # g.stat()
271 if s:
272 return s.splitlines(True)
273 # This is a Python string function!
274 return []
275 #@+node:ekr.20191226190844.1: *3* LeoGlobals.toEncodedString
276 def toEncodedString(self, s, encoding='utf-8'):
277 """Convert unicode string to an encoded string."""
278 if not isinstance(s, str):
279 return s
280 try:
281 s = s.encode(encoding, "strict")
282 except UnicodeError:
283 s = s.encode(encoding, "replace")
284 print(f"toEncodedString: Error converting {s!r} to {encoding}")
285 return s
286 #@+node:ekr.20191226190006.1: *3* LeoGlobals.toUnicode
287 def toUnicode(self, s, encoding='utf-8'):
288 """Convert bytes to unicode if necessary."""
289 tag = 'g.toUnicode'
290 if isinstance(s, str):
291 return s
292 if not isinstance(s, bytes):
293 print(f"{tag}: bad s: {s!r}")
294 return ''
295 b: bytes = s
296 try:
297 s2 = b.decode(encoding, 'strict')
298 except(UnicodeDecodeError, UnicodeError):
299 s2 = b.decode(encoding, 'replace')
300 print(f"{tag}: unicode error. encoding: {encoding!r}, s2:\n{s2!r}")
301 g.trace(g.callers())
302 except Exception:
303 g.es_exception()
304 print(f"{tag}: unexpected error! encoding: {encoding!r}, s2:\n{s2!r}")
305 g.trace(g.callers())
306 return s2
307 #@+node:ekr.20191226175436.1: *3* LeoGlobals.trace
308 def trace(self, *args):
309 """Print a tracing message."""
310 # Compute the caller name.
311 try:
312 f1 = sys._getframe(1)
313 code1 = f1.f_code
314 name = code1.co_name
315 except Exception:
316 name = ''
317 print(f"{name}: {' '.join(str(z) for z in args)}")
318 #@+node:ekr.20191226190241.1: *3* LeoGlobals.truncate
319 def truncate(self, s, n):
320 """Return s truncated to n characters."""
321 if len(s) <= n:
322 return s
323 s2 = s[: n - 3] + f"...({len(s)})"
324 return s2 + '\n' if s.endswith('\n') else s2
325 #@-others
326#@+node:ekr.20200702114522.1: ** leoAst.py: top-level commands
327#@+node:ekr.20200702114557.1: *3* command: fstringify_command
328def fstringify_command(files):
329 """
330 Entry point for --fstringify.
332 Fstringify the given file, overwriting the file.
333 """
334 for filename in files: # pragma: no cover
335 if os.path.exists(filename):
336 print(f"fstringify {filename}")
337 Fstringify().fstringify_file_silent(filename)
338 else:
339 print(f"file not found: {filename}")
340#@+node:ekr.20200702121222.1: *3* command: fstringify_diff_command
341def fstringify_diff_command(files):
342 """
343 Entry point for --fstringify-diff.
345 Print the diff that would be produced by fstringify.
346 """
347 for filename in files: # pragma: no cover
348 if os.path.exists(filename):
349 print(f"fstringify-diff {filename}")
350 Fstringify().fstringify_file_diff(filename)
351 else:
352 print(f"file not found: {filename}")
353#@+node:ekr.20200702115002.1: *3* command: orange_command
354def orange_command(files):
356 for filename in files: # pragma: no cover
357 if os.path.exists(filename):
358 print(f"orange {filename}")
359 Orange().beautify_file(filename)
360 else:
361 print(f"file not found: {filename}")
362#@+node:ekr.20200702121315.1: *3* command: orange_diff_command
363def orange_diff_command(files):
365 for filename in files: # pragma: no cover
366 if os.path.exists(filename):
367 print(f"orange-diff {filename}")
368 Orange().beautify_file_diff(filename)
369 else:
370 print(f"file not found: {filename}")
371#@+node:ekr.20160521104628.1: ** leoAst.py: top-level utils
372if 1: # pragma: no cover
373 #@+others
374 #@+node:ekr.20200702102239.1: *3* function: main (leoAst.py)
375 def main():
376 """Run commands specified by sys.argv."""
377 description = textwrap.dedent("""\
378 leo-editor/leo/unittests/core/test_leoAst.py contains unit tests (100% coverage).
379 """)
380 parser = argparse.ArgumentParser(description=description, formatter_class=argparse.RawTextHelpFormatter)
381 parser.add_argument('PATHS', nargs='*', help='directory or list of files')
382 group = parser.add_mutually_exclusive_group(required=False) # Don't require any args.
383 add = group.add_argument
384 add('--fstringify', dest='f', action='store_true', help='leonine fstringify')
385 add('--fstringify-diff', dest='fd', action='store_true', help='show fstringify diff')
386 add('--orange', dest='o', action='store_true', help='leonine Black')
387 add('--orange-diff', dest='od', action='store_true', help='show orange diff')
388 args = parser.parse_args()
389 files = args.PATHS
390 if len(files) == 1 and os.path.isdir(files[0]):
391 files = glob.glob(f"{files[0]}{os.sep}*.py")
392 if args.f:
393 fstringify_command(files)
394 if args.fd:
395 fstringify_diff_command(files)
396 if args.o:
397 orange_command(files)
398 if args.od:
399 orange_diff_command(files)
400 #@+node:ekr.20200107114409.1: *3* functions: reading & writing files
401 #@+node:ekr.20200218071822.1: *4* function: regularize_nls
402 def regularize_nls(s):
403 """Regularize newlines within s."""
404 return s.replace('\r\n', '\n').replace('\r', '\n')
405 #@+node:ekr.20200106171502.1: *4* function: get_encoding_directive
406 # This is the pattern in PEP 263.
407 encoding_pattern = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)')
409 def get_encoding_directive(bb):
410 """
411 Get the encoding from the encoding directive at the start of a file.
413 bb: The bytes of the file.
415 Returns the codec name, or 'UTF-8'.
417 Adapted from pyzo. Copyright 2008 to 2020 by Almar Klein.
418 """
419 for line in bb.split(b'\n', 2)[:2]:
420 # Try to make line a string
421 try:
422 line2 = line.decode('ASCII').strip()
423 except Exception:
424 continue
425 # Does the line match the PEP 263 pattern?
426 m = encoding_pattern.match(line2)
427 if not m:
428 continue
429 # Is it a known encoding? Correct the name if it is.
430 try:
431 c = codecs.lookup(m.group(1))
432 return c.name
433 except Exception:
434 pass
435 return 'UTF-8'
436 #@+node:ekr.20200103113417.1: *4* function: read_file
437 def read_file(filename, encoding='utf-8'):
438 """
439 Return the contents of the file with the given name.
440 Print an error message and return None on error.
441 """
442 tag = 'read_file'
443 try:
444 # Translate all newlines to '\n'.
445 with open(filename, 'r', encoding=encoding) as f:
446 s = f.read()
447 return regularize_nls(s)
448 except Exception:
449 print(f"{tag}: can not read {filename}")
450 return None
451 #@+node:ekr.20200106173430.1: *4* function: read_file_with_encoding
452 def read_file_with_encoding(filename):
453 """
454 Read the file with the given name, returning (e, s), where:
456 s is the string, converted to unicode, or '' if there was an error.
458 e is the encoding of s, computed in the following order:
460 - The BOM encoding if the file starts with a BOM mark.
461 - The encoding given in the # -*- coding: utf-8 -*- line.
462 - The encoding given by the 'encoding' keyword arg.
463 - 'utf-8'.
464 """
465 # First, read the file.
466 tag = 'read_with_encoding'
467 try:
468 with open(filename, 'rb') as f:
469 bb = f.read()
470 except Exception:
471 print(f"{tag}: can not read {filename}")
472 if not bb:
473 return 'UTF-8', ''
474 # Look for the BOM.
475 e, bb = strip_BOM(bb)
476 if not e:
477 # Python's encoding comments override everything else.
478 e = get_encoding_directive(bb)
479 s = g.toUnicode(bb, encoding=e)
480 s = regularize_nls(s)
481 return e, s
482 #@+node:ekr.20200106174158.1: *4* function: strip_BOM
483 def strip_BOM(bb):
484 """
485 bb must be the bytes contents of a file.
487 If bb starts with a BOM (Byte Order Mark), return (e, bb2), where:
489 - e is the encoding implied by the BOM.
490 - bb2 is bb, stripped of the BOM.
492 If there is no BOM, return (None, bb)
493 """
494 assert isinstance(bb, bytes), bb.__class__.__name__
495 table = (
496 # Test longer bom's first.
497 (4, 'utf-32', codecs.BOM_UTF32_BE),
498 (4, 'utf-32', codecs.BOM_UTF32_LE),
499 (3, 'utf-8', codecs.BOM_UTF8),
500 (2, 'utf-16', codecs.BOM_UTF16_BE),
501 (2, 'utf-16', codecs.BOM_UTF16_LE),
502 )
503 for n, e, bom in table:
504 assert len(bom) == n
505 if bom == bb[: len(bom)]:
506 return e, bb[len(bom) :]
507 return None, bb
508 #@+node:ekr.20200103163100.1: *4* function: write_file
509 def write_file(filename, s, encoding='utf-8'):
510 """
511 Write the string s to the file whose name is given.
513 Handle all exeptions.
515 Before calling this function, the caller should ensure
516 that the file actually has been changed.
517 """
518 try:
519 # Write the file with platform-dependent newlines.
520 with open(filename, 'w', encoding=encoding) as f:
521 f.write(s)
522 except Exception as e:
523 g.trace(f"Error writing {filename}\n{e}")
524 #@+node:ekr.20200113154120.1: *3* functions: tokens
525 #@+node:ekr.20191223093539.1: *4* function: find_anchor_token
526 def find_anchor_token(node, global_token_list):
527 """
528 Return the anchor_token for node, a token such that token.node == node.
530 The search starts at node, and then all the usual child nodes.
531 """
533 node1 = node
535 def anchor_token(node):
536 """Return the anchor token in node.token_list"""
537 # Careful: some tokens in the token list may have been killed.
538 for token in get_node_token_list(node, global_token_list):
539 if is_ancestor(node1, token):
540 return token
541 return None
543 # This table only has to cover fields for ast.Nodes that
544 # won't have any associated token.
546 fields = (
547 # Common...
548 'elt', 'elts', 'body', 'value',
549 # Less common...
550 'dims', 'ifs', 'names', 's',
551 'test', 'values', 'targets',
552 )
553 while node:
554 # First, try the node itself.
555 token = anchor_token(node)
556 if token:
557 return token
558 # Second, try the most common nodes w/o token_lists:
559 if isinstance(node, ast.Call):
560 node = node.func
561 elif isinstance(node, ast.Tuple):
562 node = node.elts # type:ignore
563 # Finally, try all other nodes.
564 else:
565 # This will be used rarely.
566 for field in fields:
567 node = getattr(node, field, None)
568 if node:
569 token = anchor_token(node)
570 if token:
571 return token
572 else:
573 break
574 return None
575 #@+node:ekr.20191231160225.1: *4* function: find_paren_token (changed signature)
576 def find_paren_token(i, global_token_list):
577 """Return i of the next paren token, starting at tokens[i]."""
578 while i < len(global_token_list):
579 token = global_token_list[i]
580 if token.kind == 'op' and token.value in '()':
581 return i
582 if is_significant_token(token):
583 break
584 i += 1
585 return None
586 #@+node:ekr.20200113110505.4: *4* function: get_node_tokens_list
587 def get_node_token_list(node, global_tokens_list):
588 """
589 tokens_list must be the global tokens list.
590 Return the tokens assigned to the node, or [].
591 """
592 i = getattr(node, 'first_i', None)
593 j = getattr(node, 'last_i', None)
594 return [] if i is None else global_tokens_list[i : j + 1]
595 #@+node:ekr.20191124123830.1: *4* function: is_significant & is_significant_token
596 def is_significant(kind, value):
597 """
598 Return True if (kind, value) represent a token that can be used for
599 syncing generated tokens with the token list.
600 """
601 # Making 'endmarker' significant ensures that all tokens are synced.
602 return (
603 kind in ('async', 'await', 'endmarker', 'name', 'number', 'string') or
604 kind == 'op' and value not in ',;()')
606 def is_significant_token(token):
607 """Return True if the given token is a syncronizing token"""
608 return is_significant(token.kind, token.value)
609 #@+node:ekr.20191224093336.1: *4* function: match_parens
610 def match_parens(filename, i, j, tokens):
611 """Match parens in tokens[i:j]. Return the new j."""
612 if j >= len(tokens):
613 return len(tokens)
614 # Calculate paren level...
615 level = 0
616 for n in range(i, j + 1):
617 token = tokens[n]
618 if token.kind == 'op' and token.value == '(':
619 level += 1
620 if token.kind == 'op' and token.value == ')':
621 if level == 0:
622 break
623 level -= 1
624 # Find matching ')' tokens *after* j.
625 if level > 0:
626 while level > 0 and j + 1 < len(tokens):
627 token = tokens[j + 1]
628 if token.kind == 'op' and token.value == ')':
629 level -= 1
630 elif token.kind == 'op' and token.value == '(':
631 level += 1
632 elif is_significant_token(token):
633 break
634 j += 1
635 if level != 0: # pragma: no cover.
636 line_n = tokens[i].line_number
637 raise AssignLinksError(
638 f"\n"
639 f"Unmatched parens: level={level}\n"
640 f" file: {filename}\n"
641 f" line: {line_n}\n")
642 return j
643 #@+node:ekr.20191223053324.1: *4* function: tokens_for_node
644 def tokens_for_node(filename, node, global_token_list):
645 """Return the list of all tokens descending from node."""
646 # Find any token descending from node.
647 token = find_anchor_token(node, global_token_list)
648 if not token:
649 if 0: # A good trace for debugging.
650 print('')
651 g.trace('===== no tokens', node.__class__.__name__)
652 return []
653 assert is_ancestor(node, token)
654 # Scan backward.
655 i = first_i = token.index
656 while i >= 0:
657 token2 = global_token_list[i - 1]
658 if getattr(token2, 'node', None):
659 if is_ancestor(node, token2):
660 first_i = i - 1
661 else:
662 break
663 i -= 1
664 # Scan forward.
665 j = last_j = token.index
666 while j + 1 < len(global_token_list):
667 token2 = global_token_list[j + 1]
668 if getattr(token2, 'node', None):
669 if is_ancestor(node, token2):
670 last_j = j + 1
671 else:
672 break
673 j += 1
674 last_j = match_parens(filename, first_i, last_j, global_token_list)
675 results = global_token_list[first_i : last_j + 1]
676 return results
677 #@+node:ekr.20200101030236.1: *4* function: tokens_to_string
678 def tokens_to_string(tokens):
679 """Return the string represented by the list of tokens."""
680 if tokens is None:
681 # This indicates an internal error.
682 print('')
683 g.trace('===== token list is None ===== ')
684 print('')
685 return ''
686 return ''.join([z.to_string() for z in tokens])
687 #@+node:ekr.20191231072039.1: *3* functions: utils...
688 # General utility functions on tokens and nodes.
689 #@+node:ekr.20191119085222.1: *4* function: obj_id
690 def obj_id(obj):
691 """Return the last four digits of id(obj), for dumps & traces."""
692 return str(id(obj))[-4:]
693 #@+node:ekr.20191231060700.1: *4* function: op_name
694 #@@nobeautify
696 # https://docs.python.org/3/library/ast.html
698 _op_names = {
699 # Binary operators.
700 'Add': '+',
701 'BitAnd': '&',
702 'BitOr': '|',
703 'BitXor': '^',
704 'Div': '/',
705 'FloorDiv': '//',
706 'LShift': '<<',
707 'MatMult': '@', # Python 3.5.
708 'Mod': '%',
709 'Mult': '*',
710 'Pow': '**',
711 'RShift': '>>',
712 'Sub': '-',
713 # Boolean operators.
714 'And': ' and ',
715 'Or': ' or ',
716 # Comparison operators
717 'Eq': '==',
718 'Gt': '>',
719 'GtE': '>=',
720 'In': ' in ',
721 'Is': ' is ',
722 'IsNot': ' is not ',
723 'Lt': '<',
724 'LtE': '<=',
725 'NotEq': '!=',
726 'NotIn': ' not in ',
727 # Context operators.
728 'AugLoad': '<AugLoad>',
729 'AugStore': '<AugStore>',
730 'Del': '<Del>',
731 'Load': '<Load>',
732 'Param': '<Param>',
733 'Store': '<Store>',
734 # Unary operators.
735 'Invert': '~',
736 'Not': ' not ',
737 'UAdd': '+',
738 'USub': '-',
739 }
741 def op_name(node):
742 """Return the print name of an operator node."""
743 class_name = node.__class__.__name__
744 assert class_name in _op_names, repr(class_name)
745 return _op_names[class_name].strip()
746 #@+node:ekr.20200107114452.1: *3* node/token creators...
747 #@+node:ekr.20200103082049.1: *4* function: make_tokens
748 def make_tokens(contents):
749 """
750 Return a list (not a generator) of Token objects corresponding to the
751 list of 5-tuples generated by tokenize.tokenize.
753 Perform consistency checks and handle all exeptions.
754 """
756 def check(contents, tokens):
757 result = tokens_to_string(tokens)
758 ok = result == contents
759 if not ok:
760 print('\nRound-trip check FAILS')
761 print('Contents...\n')
762 g.printObj(contents)
763 print('\nResult...\n')
764 g.printObj(result)
765 return ok
767 try:
768 five_tuples = tokenize.tokenize(
769 io.BytesIO(contents.encode('utf-8')).readline)
770 except Exception:
771 print('make_tokens: exception in tokenize.tokenize')
772 g.es_exception()
773 return None
774 tokens = Tokenizer().create_input_tokens(contents, five_tuples)
775 assert check(contents, tokens)
776 return tokens
777 #@+node:ekr.20191027075648.1: *4* function: parse_ast
778 def parse_ast(s):
779 """
780 Parse string s, catching & reporting all exceptions.
781 Return the ast node, or None.
782 """
784 def oops(message):
785 print('')
786 print(f"parse_ast: {message}")
787 g.printObj(s)
788 print('')
790 try:
791 s1 = g.toEncodedString(s)
792 tree = ast.parse(s1, filename='before', mode='exec')
793 return tree
794 except IndentationError:
795 oops('Indentation Error')
796 except SyntaxError:
797 oops('Syntax Error')
798 except Exception:
799 oops('Unexpected Exception')
800 g.es_exception()
801 return None
802 #@+node:ekr.20191231110051.1: *3* node/token dumpers...
803 #@+node:ekr.20191027074436.1: *4* function: dump_ast
804 def dump_ast(ast, tag='dump_ast'):
805 """Utility to dump an ast tree."""
806 g.printObj(AstDumper().dump_ast(ast), tag=tag)
807 #@+node:ekr.20191228095945.4: *4* function: dump_contents
808 def dump_contents(contents, tag='Contents'):
809 print('')
810 print(f"{tag}...\n")
811 for i, z in enumerate(g.splitLines(contents)):
812 print(f"{i+1:<3} ", z.rstrip())
813 print('')
814 #@+node:ekr.20191228095945.5: *4* function: dump_lines
815 def dump_lines(tokens, tag='Token lines'):
816 print('')
817 print(f"{tag}...\n")
818 for z in tokens:
819 if z.line.strip():
820 print(z.line.rstrip())
821 else:
822 print(repr(z.line))
823 print('')
824 #@+node:ekr.20191228095945.7: *4* function: dump_results
825 def dump_results(tokens, tag='Results'):
826 print('')
827 print(f"{tag}...\n")
828 print(tokens_to_string(tokens))
829 print('')
830 #@+node:ekr.20191228095945.8: *4* function: dump_tokens
831 def dump_tokens(tokens, tag='Tokens'):
832 print('')
833 print(f"{tag}...\n")
834 if not tokens:
835 return
836 print("Note: values shown are repr(value) *except* for 'string' tokens.")
837 tokens[0].dump_header()
838 for i, z in enumerate(tokens):
839 # Confusing.
840 # if (i % 20) == 0: z.dump_header()
841 print(z.dump())
842 print('')
843 #@+node:ekr.20191228095945.9: *4* function: dump_tree
844 def dump_tree(tokens, tree, tag='Tree'):
845 print('')
846 print(f"{tag}...\n")
847 print(AstDumper().dump_tree(tokens, tree))
848 #@+node:ekr.20200107040729.1: *4* function: show_diffs
849 def show_diffs(s1, s2, filename=''):
850 """Print diffs between strings s1 and s2."""
851 lines = list(difflib.unified_diff(
852 g.splitLines(s1),
853 g.splitLines(s2),
854 fromfile=f"Old {filename}",
855 tofile=f"New {filename}",
856 ))
857 print('')
858 tag = f"Diffs for {filename}" if filename else 'Diffs'
859 g.printObj(lines, tag=tag)
860 #@+node:ekr.20191223095408.1: *3* node/token nodes...
861 # Functions that associate tokens with nodes.
862 #@+node:ekr.20200120082031.1: *4* function: find_statement_node
863 def find_statement_node(node):
864 """
865 Return the nearest statement node.
866 Return None if node has only Module for a parent.
867 """
868 if isinstance(node, ast.Module):
869 return None
870 parent = node
871 while parent:
872 if is_statement_node(parent):
873 return parent
874 parent = parent.parent
875 return None
876 #@+node:ekr.20191223054300.1: *4* function: is_ancestor
877 def is_ancestor(node, token):
878 """Return True if node is an ancestor of token."""
879 t_node = token.node
880 if not t_node:
881 assert token.kind == 'killed', repr(token)
882 return False
883 while t_node:
884 if t_node == node:
885 return True
886 t_node = t_node.parent
887 return False
888 #@+node:ekr.20200120082300.1: *4* function: is_long_statement
889 def is_long_statement(node):
890 """
891 Return True if node is an instance of a node that might be split into
892 shorter lines.
893 """
894 return isinstance(node, (
895 ast.Assign, ast.AnnAssign, ast.AsyncFor, ast.AsyncWith, ast.AugAssign,
896 ast.Call, ast.Delete, ast.ExceptHandler, ast.For, ast.Global,
897 ast.If, ast.Import, ast.ImportFrom,
898 ast.Nonlocal, ast.Return, ast.While, ast.With, ast.Yield, ast.YieldFrom))
899 #@+node:ekr.20200120110005.1: *4* function: is_statement_node
900 def is_statement_node(node):
901 """Return True if node is a top-level statement."""
902 return is_long_statement(node) or isinstance(node, (
903 ast.Break, ast.Continue, ast.Pass, ast.Try))
904 #@+node:ekr.20191231082137.1: *4* function: nearest_common_ancestor
905 def nearest_common_ancestor(node1, node2):
906 """
907 Return the nearest common ancestor node for the given nodes.
909 The nodes must have parent links.
910 """
912 def parents(node):
913 aList = []
914 while node:
915 aList.append(node)
916 node = node.parent
917 return list(reversed(aList))
919 result = None
920 parents1 = parents(node1)
921 parents2 = parents(node2)
922 while parents1 and parents2:
923 parent1 = parents1.pop(0)
924 parent2 = parents2.pop(0)
925 if parent1 == parent2:
926 result = parent1
927 else:
928 break
929 return result
930 #@+node:ekr.20191225061516.1: *3* node/token replacers...
931 # Functions that replace tokens or nodes.
932 #@+node:ekr.20191231162249.1: *4* function: add_token_to_token_list
933 def add_token_to_token_list(token, node):
934 """Insert token in the proper location of node.token_list."""
935 if getattr(node, 'first_i', None) is None:
936 node.first_i = node.last_i = token.index
937 else:
938 node.first_i = min(node.first_i, token.index)
939 node.last_i = max(node.last_i, token.index)
940 #@+node:ekr.20191225055616.1: *4* function: replace_node
941 def replace_node(new_node, old_node):
942 """Replace new_node by old_node in the parse tree."""
943 parent = old_node.parent
944 new_node.parent = parent
945 new_node.node_index = old_node.node_index
946 children = parent.children
947 i = children.index(old_node)
948 children[i] = new_node
949 fields = getattr(old_node, '_fields', None)
950 if fields:
951 for field in fields:
952 field = getattr(old_node, field)
953 if field == old_node:
954 setattr(old_node, field, new_node)
955 break
956 #@+node:ekr.20191225055626.1: *4* function: replace_token
957 def replace_token(token, kind, value):
958 """Replace kind and value of the given token."""
959 if token.kind in ('endmarker', 'killed'):
960 return
961 token.kind = kind
962 token.value = value
963 token.node = None # Should be filled later.
964 #@-others
965#@+node:ekr.20191027072910.1: ** Exception classes
966class AssignLinksError(Exception):
967 """Assigning links to ast nodes failed."""
970class AstNotEqual(Exception):
971 """The two given AST's are not equivalent."""
974class FailFast(Exception):
975 """Abort tests in TestRunner class."""
976#@+node:ekr.20141012064706.18390: ** class AstDumper
977class AstDumper: # pragma: no cover
978 """A class supporting various kinds of dumps of ast nodes."""
979 #@+others
980 #@+node:ekr.20191112033445.1: *3* dumper.dump_tree & helper
981 def dump_tree(self, tokens, tree):
982 """Briefly show a tree, properly indented."""
983 self.tokens = tokens
984 result = [self.show_header()]
985 self.dump_tree_and_links_helper(tree, 0, result)
986 return ''.join(result)
987 #@+node:ekr.20191125035321.1: *4* dumper.dump_tree_and_links_helper
988 def dump_tree_and_links_helper(self, node, level, result):
989 """Return the list of lines in result."""
990 if node is None:
991 return
992 # Let block.
993 indent = ' ' * 2 * level
994 children: List[ast.AST] = getattr(node, 'children', [])
995 node_s = self.compute_node_string(node, level)
996 # Dump...
997 if isinstance(node, (list, tuple)):
998 for z in node:
999 self.dump_tree_and_links_helper(z, level, result)
1000 elif isinstance(node, str):
1001 result.append(f"{indent}{node.__class__.__name__:>8}:{node}\n")
1002 elif isinstance(node, ast.AST):
1003 # Node and parent.
1004 result.append(node_s)
1005 # Children.
1006 for z in children:
1007 self.dump_tree_and_links_helper(z, level + 1, result)
1008 else:
1009 result.append(node_s)
1010 #@+node:ekr.20191125035600.1: *3* dumper.compute_node_string & helpers
1011 def compute_node_string(self, node, level):
1012 """Return a string summarizing the node."""
1013 indent = ' ' * 2 * level
1014 parent = getattr(node, 'parent', None)
1015 node_id = getattr(node, 'node_index', '??')
1016 parent_id = getattr(parent, 'node_index', '??')
1017 parent_s = f"{parent_id:>3}.{parent.__class__.__name__} " if parent else ''
1018 class_name = node.__class__.__name__
1019 descriptor_s = f"{node_id}.{class_name}: " + self.show_fields(
1020 class_name, node, 30)
1021 tokens_s = self.show_tokens(node, 70, 100)
1022 lines = self.show_line_range(node)
1023 full_s1 = f"{parent_s:<16} {lines:<10} {indent}{descriptor_s} "
1024 node_s = f"{full_s1:<62} {tokens_s}\n"
1025 return node_s
1026 #@+node:ekr.20191113223424.1: *4* dumper.show_fields
1027 def show_fields(self, class_name, node, truncate_n):
1028 """Return a string showing interesting fields of the node."""
1029 val = ''
1030 if class_name == 'JoinedStr':
1031 values = node.values
1032 assert isinstance(values, list)
1033 # Str tokens may represent *concatenated* strings.
1034 results = []
1035 fstrings, strings = 0, 0
1036 for z in values:
1037 assert isinstance(z, (ast.FormattedValue, ast.Str))
1038 if isinstance(z, ast.Str):
1039 results.append(z.s)
1040 strings += 1
1041 else:
1042 results.append(z.__class__.__name__)
1043 fstrings += 1
1044 val = f"{strings} str, {fstrings} f-str"
1045 elif class_name == 'keyword':
1046 if isinstance(node.value, ast.Str):
1047 val = f"arg={node.arg}..Str.value.s={node.value.s}"
1048 elif isinstance(node.value, ast.Name):
1049 val = f"arg={node.arg}..Name.value.id={node.value.id}"
1050 else:
1051 val = f"arg={node.arg}..value={node.value.__class__.__name__}"
1052 elif class_name == 'Name':
1053 val = f"id={node.id!r}"
1054 elif class_name == 'NameConstant':
1055 val = f"value={node.value!r}"
1056 elif class_name == 'Num':
1057 val = f"n={node.n}"
1058 elif class_name == 'Starred':
1059 if isinstance(node.value, ast.Str):
1060 val = f"s={node.value.s}"
1061 elif isinstance(node.value, ast.Name):
1062 val = f"id={node.value.id}"
1063 else:
1064 val = f"s={node.value.__class__.__name__}"
1065 elif class_name == 'Str':
1066 val = f"s={node.s!r}"
1067 elif class_name in ('AugAssign', 'BinOp', 'BoolOp', 'UnaryOp'): # IfExp
1068 name = node.op.__class__.__name__
1069 val = f"op={_op_names.get(name, name)}"
1070 elif class_name == 'Compare':
1071 ops = ','.join([op_name(z) for z in node.ops])
1072 val = f"ops='{ops}'"
1073 else:
1074 val = ''
1075 return g.truncate(val, truncate_n)
1076 #@+node:ekr.20191114054726.1: *4* dumper.show_line_range
1077 def show_line_range(self, node):
1079 token_list = get_node_token_list(node, self.tokens)
1080 if not token_list:
1081 return ''
1082 min_ = min([z.line_number for z in token_list])
1083 max_ = max([z.line_number for z in token_list])
1084 return f"{min_}" if min_ == max_ else f"{min_}..{max_}"
1085 #@+node:ekr.20191113223425.1: *4* dumper.show_tokens
1086 def show_tokens(self, node, n, m, show_cruft=False):
1087 """
1088 Return a string showing node.token_list.
1090 Split the result if n + len(result) > m
1091 """
1092 token_list = get_node_token_list(node, self.tokens)
1093 result = []
1094 for z in token_list:
1095 val = None
1096 if z.kind == 'comment':
1097 if show_cruft:
1098 val = g.truncate(z.value, 10) # Short is good.
1099 result.append(f"{z.kind}.{z.index}({val})")
1100 elif z.kind == 'name':
1101 val = g.truncate(z.value, 20)
1102 result.append(f"{z.kind}.{z.index}({val})")
1103 elif z.kind == 'newline':
1104 # result.append(f"{z.kind}.{z.index}({z.line_number}:{len(z.line)})")
1105 result.append(f"{z.kind}.{z.index}")
1106 elif z.kind == 'number':
1107 result.append(f"{z.kind}.{z.index}({z.value})")
1108 elif z.kind == 'op':
1109 if z.value not in ',()' or show_cruft:
1110 result.append(f"{z.kind}.{z.index}({z.value})")
1111 elif z.kind == 'string':
1112 val = g.truncate(z.value, 30)
1113 result.append(f"{z.kind}.{z.index}({val})")
1114 elif z.kind == 'ws':
1115 if show_cruft:
1116 result.append(f"{z.kind}.{z.index}({len(z.value)})")
1117 else:
1118 # Indent, dedent, encoding, etc.
1119 # Don't put a blank.
1120 continue
1121 if result and result[-1] != ' ':
1122 result.append(' ')
1123 #
1124 # split the line if it is too long.
1125 # g.printObj(result, tag='show_tokens')
1126 if 1:
1127 return ''.join(result)
1128 line, lines = [], []
1129 for r in result:
1130 line.append(r)
1131 if n + len(''.join(line)) >= m:
1132 lines.append(''.join(line))
1133 line = []
1134 lines.append(''.join(line))
1135 pad = '\n' + ' ' * n
1136 return pad.join(lines)
1137 #@+node:ekr.20191110165235.5: *3* dumper.show_header
1138 def show_header(self):
1139 """Return a header string, but only the fist time."""
1140 return (
1141 f"{'parent':<16} {'lines':<10} {'node':<34} {'tokens'}\n"
1142 f"{'======':<16} {'=====':<10} {'====':<34} {'======'}\n")
1143 #@+node:ekr.20141012064706.18392: *3* dumper.dump_ast & helper
1144 annotate_fields = False
1145 include_attributes = False
1146 indent_ws = ' '
1148 def dump_ast(self, node, level=0):
1149 """
1150 Dump an ast tree. Adapted from ast.dump.
1151 """
1152 sep1 = '\n%s' % (self.indent_ws * (level + 1))
1153 if isinstance(node, ast.AST):
1154 fields = [(a, self.dump_ast(b, level + 1)) for a, b in self.get_fields(node)]
1155 if self.include_attributes and node._attributes:
1156 fields.extend([(a, self.dump_ast(getattr(node, a), level + 1))
1157 for a in node._attributes])
1158 if self.annotate_fields:
1159 aList = ['%s=%s' % (a, b) for a, b in fields]
1160 else:
1161 aList = [b for a, b in fields]
1162 name = node.__class__.__name__
1163 sep = '' if len(aList) <= 1 else sep1
1164 return '%s(%s%s)' % (name, sep, sep1.join(aList))
1165 if isinstance(node, list):
1166 sep = sep1
1167 return 'LIST[%s]' % ''.join(
1168 ['%s%s' % (sep, self.dump_ast(z, level + 1)) for z in node])
1169 return repr(node)
1170 #@+node:ekr.20141012064706.18393: *4* dumper.get_fields
1171 def get_fields(self, node):
1173 return (
1174 (a, b) for a, b in ast.iter_fields(node)
1175 if a not in ['ctx',] and b not in (None, [])
1176 )
1177 #@-others
1178#@+node:ekr.20191227170628.1: ** TOG classes...
1179#@+node:ekr.20191113063144.1: *3* class TokenOrderGenerator
1180class TokenOrderGenerator:
1181 """
1182 A class that traverses ast (parse) trees in token order.
1184 Overview: https://github.com/leo-editor/leo-editor/issues/1440#issue-522090981
1186 Theory of operation:
1187 - https://github.com/leo-editor/leo-editor/issues/1440#issuecomment-573661883
1188 - http://leoeditor.com/appendices.html#tokenorder-classes-theory-of-operation
1190 How to: http://leoeditor.com/appendices.html#tokenorder-class-how-to
1192 Project history: https://github.com/leo-editor/leo-editor/issues/1440#issuecomment-574145510
1193 """
1195 n_nodes = 0 # The number of nodes that have been visited.
1196 #@+others
1197 #@+node:ekr.20200103174914.1: *4* tog: Init...
1198 #@+node:ekr.20191228184647.1: *5* tog.balance_tokens
1199 def balance_tokens(self, tokens):
1200 """
1201 TOG.balance_tokens.
1203 Insert two-way links between matching paren tokens.
1204 """
1205 count, stack = 0, []
1206 for token in tokens:
1207 if token.kind == 'op':
1208 if token.value == '(':
1209 count += 1
1210 stack.append(token.index)
1211 if token.value == ')':
1212 if stack:
1213 index = stack.pop()
1214 tokens[index].matching_paren = token.index
1215 tokens[token.index].matching_paren = index
1216 else: # pragma: no cover
1217 g.trace(f"unmatched ')' at index {token.index}")
1218 if stack: # pragma: no cover
1219 g.trace("unmatched '(' at {','.join(stack)}")
1220 return count
1221 #@+node:ekr.20191113063144.4: *5* tog.create_links
1222 def create_links(self, tokens, tree, file_name=''):
1223 """
1224 A generator creates two-way links between the given tokens and ast-tree.
1226 Callers should call this generator with list(tog.create_links(...))
1228 The sync_tokens method creates the links and verifies that the resulting
1229 tree traversal generates exactly the given tokens in exact order.
1231 tokens: the list of Token instances for the input.
1232 Created by make_tokens().
1233 tree: the ast tree for the input.
1234 Created by parse_ast().
1235 """
1236 #
1237 # Init all ivars.
1238 self.file_name = file_name # For tests.
1239 self.level = 0 # Python indentation level.
1240 self.node = None # The node being visited.
1241 self.tokens = tokens # The immutable list of input tokens.
1242 self.tree = tree # The tree of ast.AST nodes.
1243 #
1244 # Traverse the tree.
1245 try:
1246 while True:
1247 next(self.visitor(tree))
1248 except StopIteration:
1249 pass
1250 #
1251 # Ensure that all tokens are patched.
1252 self.node = tree
1253 yield from self.gen_token('endmarker', '')
1254 #@+node:ekr.20191229071733.1: *5* tog.init_from_file
1255 def init_from_file(self, filename): # pragma: no cover
1256 """
1257 Create the tokens and ast tree for the given file.
1258 Create links between tokens and the parse tree.
1259 Return (contents, encoding, tokens, tree).
1260 """
1261 self.level = 0
1262 self.filename = filename
1263 encoding, contents = read_file_with_encoding(filename)
1264 if not contents:
1265 return None, None, None, None
1266 self.tokens = tokens = make_tokens(contents)
1267 self.tree = tree = parse_ast(contents)
1268 list(self.create_links(tokens, tree))
1269 return contents, encoding, tokens, tree
1270 #@+node:ekr.20191229071746.1: *5* tog.init_from_string
1271 def init_from_string(self, contents, filename): # pragma: no cover
1272 """
1273 Tokenize, parse and create links in the contents string.
1275 Return (tokens, tree).
1276 """
1277 self.filename = filename
1278 self.level = 0
1279 self.tokens = tokens = make_tokens(contents)
1280 self.tree = tree = parse_ast(contents)
1281 list(self.create_links(tokens, tree))
1282 return tokens, tree
1283 #@+node:ekr.20191223052749.1: *4* tog: Traversal...
1284 #@+node:ekr.20191113063144.3: *5* tog.begin_visitor
1285 begin_end_stack: List[str] = []
1286 node_index = 0 # The index into the node_stack.
1287 node_stack: List[ast.AST] = [] # The stack of parent nodes.
1289 def begin_visitor(self, node):
1290 """Enter a visitor."""
1291 # Update the stats.
1292 self.n_nodes += 1
1293 # Do this first, *before* updating self.node.
1294 node.parent = self.node
1295 if self.node:
1296 children = getattr(self.node, 'children', []) # type:ignore
1297 children.append(node)
1298 self.node.children = children
1299 # Inject the node_index field.
1300 assert not hasattr(node, 'node_index'), g.callers()
1301 node.node_index = self.node_index
1302 self.node_index += 1
1303 # begin_visitor and end_visitor must be paired.
1304 self.begin_end_stack.append(node.__class__.__name__)
1305 # Push the previous node.
1306 self.node_stack.append(self.node)
1307 # Update self.node *last*.
1308 self.node = node
1309 #@+node:ekr.20200104032811.1: *5* tog.end_visitor
1310 def end_visitor(self, node):
1311 """Leave a visitor."""
1312 # begin_visitor and end_visitor must be paired.
1313 entry_name = self.begin_end_stack.pop()
1314 assert entry_name == node.__class__.__name__, f"{entry_name!r} {node.__class__.__name__}"
1315 assert self.node == node, (repr(self.node), repr(node))
1316 # Restore self.node.
1317 self.node = self.node_stack.pop()
1318 #@+node:ekr.20200110162044.1: *5* tog.find_next_significant_token
1319 def find_next_significant_token(self):
1320 """
1321 Scan from *after* self.tokens[px] looking for the next significant
1322 token.
1324 Return the token, or None. Never change self.px.
1325 """
1326 px = self.px + 1
1327 while px < len(self.tokens):
1328 token = self.tokens[px]
1329 px += 1
1330 if is_significant_token(token):
1331 return token
1332 # This will never happen, because endtoken is significant.
1333 return None # pragma: no cover
1334 #@+node:ekr.20191121180100.1: *5* tog.gen*
1335 # Useful wrappers...
1337 def gen(self, z):
1338 yield from self.visitor(z)
1340 def gen_name(self, val):
1341 yield from self.visitor(self.sync_name(val)) # type:ignore
1343 def gen_op(self, val):
1344 yield from self.visitor(self.sync_op(val)) # type:ignore
1346 def gen_token(self, kind, val):
1347 yield from self.visitor(self.sync_token(kind, val)) # type:ignore
1348 #@+node:ekr.20191113063144.7: *5* tog.sync_token & set_links
1349 px = -1 # Index of the previously synced token.
1351 def sync_token(self, kind, val):
1352 """
1353 Sync to a token whose kind & value are given. The token need not be
1354 significant, but it must be guaranteed to exist in the token list.
1356 The checks in this method constitute a strong, ever-present, unit test.
1358 Scan the tokens *after* px, looking for a token T matching (kind, val).
1359 raise AssignLinksError if a significant token is found that doesn't match T.
1360 Otherwise:
1361 - Create two-way links between all assignable tokens between px and T.
1362 - Create two-way links between T and self.node.
1363 - Advance by updating self.px to point to T.
1364 """
1365 node, tokens = self.node, self.tokens
1366 assert isinstance(node, ast.AST), repr(node)
1367 # g.trace(
1368 # f"px: {self.px:2} "
1369 # f"node: {node.__class__.__name__:<10} "
1370 # f"kind: {kind:>10}: val: {val!r}")
1371 #
1372 # Step one: Look for token T.
1373 old_px = px = self.px + 1
1374 while px < len(self.tokens):
1375 token = tokens[px]
1376 if (kind, val) == (token.kind, token.value):
1377 break # Success.
1378 if kind == token.kind == 'number':
1379 val = token.value
1380 break # Benign: use the token's value, a string, instead of a number.
1381 if is_significant_token(token): # pragma: no cover
1382 line_s = f"line {token.line_number}:"
1383 val = str(val) # for g.truncate.
1384 raise AssignLinksError(
1385 f" file: {self.filename}\n"
1386 f"{line_s:>12} {token.line.strip()}\n"
1387 f"Looking for: {kind}.{g.truncate(val, 40)!r}\n"
1388 f" found: {token.kind}.{token.value!r}\n"
1389 f"token.index: {token.index}\n")
1390 # Skip the insignificant token.
1391 px += 1
1392 else: # pragma: no cover
1393 val = str(val) # for g.truncate.
1394 raise AssignLinksError(
1395 f" file: {self.filename}\n"
1396 f"Looking for: {kind}.{g.truncate(val, 40)}\n"
1397 f" found: end of token list")
1398 #
1399 # Step two: Assign *secondary* links only for newline tokens.
1400 # Ignore all other non-significant tokens.
1401 while old_px < px:
1402 token = tokens[old_px]
1403 old_px += 1
1404 if token.kind in ('comment', 'newline', 'nl'):
1405 self.set_links(node, token)
1406 #
1407 # Step three: Set links in the found token.
1408 token = tokens[px]
1409 self.set_links(node, token)
1410 #
1411 # Step four: Advance.
1412 self.px = px
1413 #@+node:ekr.20191125120814.1: *6* tog.set_links
1414 last_statement_node = None
1416 def set_links(self, node, token):
1417 """Make two-way links between token and the given node."""
1418 # Don't bother assigning comment, comma, parens, ws and endtoken tokens.
1419 if token.kind == 'comment':
1420 # Append the comment to node.comment_list.
1421 comment_list = getattr(node, 'comment_list', []) # type:ignore
1422 node.comment_list = comment_list + [token]
1423 return
1424 if token.kind in ('endmarker', 'ws'):
1425 return
1426 if token.kind == 'op' and token.value in ',()':
1427 return
1428 # *Always* remember the last statement.
1429 statement = find_statement_node(node)
1430 if statement:
1431 self.last_statement_node = statement # type:ignore
1432 assert not isinstance(self.last_statement_node, ast.Module)
1433 if token.node is not None: # pragma: no cover
1434 line_s = f"line {token.line_number}:"
1435 raise AssignLinksError(
1436 f" file: {self.filename}\n"
1437 f"{line_s:>12} {token.line.strip()}\n"
1438 f"token index: {self.px}\n"
1439 f"token.node is not None\n"
1440 f" token.node: {token.node.__class__.__name__}\n"
1441 f" callers: {g.callers()}")
1442 # Assign newlines to the previous statement node, if any.
1443 if token.kind in ('newline', 'nl'):
1444 # Set an *auxilliary* link for the split/join logic.
1445 # Do *not* set token.node!
1446 token.statement_node = self.last_statement_node
1447 return
1448 if is_significant_token(token):
1449 # Link the token to the ast node.
1450 token.node = node # type:ignore
1451 # Add the token to node's token_list.
1452 add_token_to_token_list(token, node)
1453 #@+node:ekr.20191124083124.1: *5* tog.sync_name and sync_op
1454 # It's valid for these to return None.
1456 def sync_name(self, val):
1457 aList = val.split('.')
1458 if len(aList) == 1:
1459 self.sync_token('name', val)
1460 else:
1461 for i, part in enumerate(aList):
1462 self.sync_token('name', part)
1463 if i < len(aList) - 1:
1464 self.sync_op('.')
1466 def sync_op(self, val):
1467 """
1468 Sync to the given operator.
1470 val may be '(' or ')' *only* if the parens *will* actually exist in the
1471 token list.
1472 """
1473 self.sync_token('op', val)
1474 #@+node:ekr.20191113081443.1: *5* tog.visitor (calls begin/end_visitor)
1475 def visitor(self, node):
1476 """Given an ast node, return a *generator* from its visitor."""
1477 # This saves a lot of tests.
1478 trace = False
1479 if node is None:
1480 return
1481 if trace: # pragma: no cover
1482 # Keep this trace. It's useful.
1483 cn = node.__class__.__name__ if node else ' '
1484 caller1, caller2 = g.callers(2).split(',')
1485 g.trace(f"{caller1:>15} {caller2:<14} {cn}")
1486 # More general, more convenient.
1487 if isinstance(node, (list, tuple)):
1488 for z in node or []:
1489 if isinstance(z, ast.AST):
1490 yield from self.visitor(z)
1491 else: # pragma: no cover
1492 # Some fields may contain ints or strings.
1493 assert isinstance(z, (int, str)), z.__class__.__name__
1494 return
1495 # We *do* want to crash if the visitor doesn't exist.
1496 method = getattr(self, 'do_' + node.__class__.__name__)
1497 # Allow begin/end visitor to be generators.
1498 self.begin_visitor(node)
1499 yield from method(node)
1500 self.end_visitor(node)
1501 #@+node:ekr.20191113063144.13: *4* tog: Visitors...
1502 #@+node:ekr.20191113063144.32: *5* tog.keyword: not called!
1503 # keyword arguments supplied to call (NULL identifier for **kwargs)
1505 # keyword = (identifier? arg, expr value)
1507 def do_keyword(self, node): # pragma: no cover
1508 """A keyword arg in an ast.Call."""
1509 # This should never be called.
1510 # tog.hande_call_arguments calls self.gen(kwarg_arg.value) instead.
1511 filename = getattr(self, 'filename', '<no file>')
1512 raise AssignLinksError(
1513 f"file: {filename}\n"
1514 f"do_keyword should never be called\n"
1515 f"{g.callers(8)}")
1516 #@+node:ekr.20191113063144.14: *5* tog: Contexts
1517 #@+node:ekr.20191113063144.28: *6* tog.arg
1518 # arg = (identifier arg, expr? annotation)
1520 def do_arg(self, node):
1521 """This is one argument of a list of ast.Function or ast.Lambda arguments."""
1522 yield from self.gen_name(node.arg)
1523 annotation = getattr(node, 'annotation', None)
1524 if annotation is not None:
1525 yield from self.gen_op(':')
1526 yield from self.gen(node.annotation)
1527 #@+node:ekr.20191113063144.27: *6* tog.arguments
1528 # arguments = (
1529 # arg* posonlyargs, arg* args, arg? vararg, arg* kwonlyargs,
1530 # expr* kw_defaults, arg? kwarg, expr* defaults
1531 # )
1533 def do_arguments(self, node):
1534 """Arguments to ast.Function or ast.Lambda, **not** ast.Call."""
1535 #
1536 # No need to generate commas anywhere below.
1537 #
1538 # Let block. Some fields may not exist pre Python 3.8.
1539 n_plain = len(node.args) - len(node.defaults)
1540 posonlyargs = getattr(node, 'posonlyargs', []) # type:ignore
1541 vararg = getattr(node, 'vararg', None)
1542 kwonlyargs = getattr(node, 'kwonlyargs', []) # type:ignore
1543 kw_defaults = getattr(node, 'kw_defaults', []) # type:ignore
1544 kwarg = getattr(node, 'kwarg', None)
1545 if 0:
1546 g.printObj(ast.dump(node.vararg) if node.vararg else 'None', tag='node.vararg')
1547 g.printObj([ast.dump(z) for z in node.args], tag='node.args')
1548 g.printObj([ast.dump(z) for z in node.defaults], tag='node.defaults')
1549 g.printObj([ast.dump(z) for z in posonlyargs], tag='node.posonlyargs')
1550 g.printObj([ast.dump(z) for z in kwonlyargs], tag='kwonlyargs')
1551 g.printObj([ast.dump(z) if z else 'None' for z in kw_defaults], tag='kw_defaults')
1552 # 1. Sync the position-only args.
1553 if posonlyargs:
1554 for n, z in enumerate(posonlyargs):
1555 # g.trace('pos-only', ast.dump(z))
1556 yield from self.gen(z)
1557 yield from self.gen_op('/')
1558 # 2. Sync all args.
1559 for i, z in enumerate(node.args):
1560 yield from self.gen(z)
1561 if i >= n_plain:
1562 yield from self.gen_op('=')
1563 yield from self.gen(node.defaults[i - n_plain])
1564 # 3. Sync the vararg.
1565 if vararg:
1566 # g.trace('vararg', ast.dump(vararg))
1567 yield from self.gen_op('*')
1568 yield from self.gen(vararg)
1569 # 4. Sync the keyword-only args.
1570 if kwonlyargs:
1571 if not vararg:
1572 yield from self.gen_op('*')
1573 for n, z in enumerate(kwonlyargs):
1574 # g.trace('keyword-only', ast.dump(z))
1575 yield from self.gen(z)
1576 val = kw_defaults[n]
1577 if val is not None:
1578 yield from self.gen_op('=')
1579 yield from self.gen(val)
1580 # 5. Sync the kwarg.
1581 if kwarg:
1582 # g.trace('kwarg', ast.dump(kwarg))
1583 yield from self.gen_op('**')
1584 yield from self.gen(kwarg)
1586 #@+node:ekr.20191113063144.15: *6* tog.AsyncFunctionDef
1587 # AsyncFunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list,
1588 # expr? returns)
1590 def do_AsyncFunctionDef(self, node):
1592 if node.decorator_list:
1593 for z in node.decorator_list:
1594 # '@%s\n'
1595 yield from self.gen_op('@')
1596 yield from self.gen(z)
1597 # 'asynch def (%s): -> %s\n'
1598 # 'asynch def %s(%s):\n'
1599 async_token_type = 'async' if has_async_tokens else 'name'
1600 yield from self.gen_token(async_token_type, 'async')
1601 yield from self.gen_name('def')
1602 yield from self.gen_name(node.name) # A string
1603 yield from self.gen_op('(')
1604 yield from self.gen(node.args)
1605 yield from self.gen_op(')')
1606 returns = getattr(node, 'returns', None)
1607 if returns is not None:
1608 yield from self.gen_op('->')
1609 yield from self.gen(node.returns)
1610 yield from self.gen_op(':')
1611 self.level += 1
1612 yield from self.gen(node.body)
1613 self.level -= 1
1614 #@+node:ekr.20191113063144.16: *6* tog.ClassDef
1615 def do_ClassDef(self, node, print_body=True):
1617 for z in node.decorator_list or []:
1618 # @{z}\n
1619 yield from self.gen_op('@')
1620 yield from self.gen(z)
1621 # class name(bases):\n
1622 yield from self.gen_name('class')
1623 yield from self.gen_name(node.name) # A string.
1624 if node.bases:
1625 yield from self.gen_op('(')
1626 yield from self.gen(node.bases)
1627 yield from self.gen_op(')')
1628 yield from self.gen_op(':')
1629 # Body...
1630 self.level += 1
1631 yield from self.gen(node.body)
1632 self.level -= 1
1633 #@+node:ekr.20191113063144.17: *6* tog.FunctionDef
1634 # FunctionDef(
1635 # identifier name, arguments args,
1636 # stmt* body,
1637 # expr* decorator_list,
1638 # expr? returns,
1639 # string? type_comment)
1641 def do_FunctionDef(self, node):
1643 # Guards...
1644 returns = getattr(node, 'returns', None)
1645 # Decorators...
1646 # @{z}\n
1647 for z in node.decorator_list or []:
1648 yield from self.gen_op('@')
1649 yield from self.gen(z)
1650 # Signature...
1651 # def name(args): -> returns\n
1652 # def name(args):\n
1653 yield from self.gen_name('def')
1654 yield from self.gen_name(node.name) # A string.
1655 yield from self.gen_op('(')
1656 yield from self.gen(node.args)
1657 yield from self.gen_op(')')
1658 if returns is not None:
1659 yield from self.gen_op('->')
1660 yield from self.gen(node.returns)
1661 yield from self.gen_op(':')
1662 # Body...
1663 self.level += 1
1664 yield from self.gen(node.body)
1665 self.level -= 1
1666 #@+node:ekr.20191113063144.18: *6* tog.Interactive
1667 def do_Interactive(self, node): # pragma: no cover
1669 yield from self.gen(node.body)
1670 #@+node:ekr.20191113063144.20: *6* tog.Lambda
1671 def do_Lambda(self, node):
1673 yield from self.gen_name('lambda')
1674 yield from self.gen(node.args)
1675 yield from self.gen_op(':')
1676 yield from self.gen(node.body)
1677 #@+node:ekr.20191113063144.19: *6* tog.Module
1678 def do_Module(self, node):
1680 # Encoding is a non-syncing statement.
1681 yield from self.gen(node.body)
1682 #@+node:ekr.20191113063144.21: *5* tog: Expressions
1683 #@+node:ekr.20191113063144.22: *6* tog.Expr
1684 def do_Expr(self, node):
1685 """An outer expression."""
1686 # No need to put parentheses.
1687 yield from self.gen(node.value)
1688 #@+node:ekr.20191113063144.23: *6* tog.Expression
1689 def do_Expression(self, node): # pragma: no cover
1690 """An inner expression."""
1691 # No need to put parentheses.
1692 yield from self.gen(node.body)
1693 #@+node:ekr.20191113063144.24: *6* tog.GeneratorExp
1694 def do_GeneratorExp(self, node):
1696 # '<gen %s for %s>' % (elt, ','.join(gens))
1697 # No need to put parentheses or commas.
1698 yield from self.gen(node.elt)
1699 yield from self.gen(node.generators)
1700 #@+node:ekr.20210321171703.1: *6* tog.NamedExpr
1701 # NamedExpr(expr target, expr value)
1703 def do_NamedExpr(self, node): # Python 3.8+
1705 yield from self.gen(node.target)
1706 yield from self.gen_op(':=')
1707 yield from self.gen(node.value)
1708 #@+node:ekr.20191113063144.26: *5* tog: Operands
1709 #@+node:ekr.20191113063144.29: *6* tog.Attribute
1710 # Attribute(expr value, identifier attr, expr_context ctx)
1712 def do_Attribute(self, node):
1714 yield from self.gen(node.value)
1715 yield from self.gen_op('.')
1716 yield from self.gen_name(node.attr) # A string.
1717 #@+node:ekr.20191113063144.30: *6* tog.Bytes
1718 def do_Bytes(self, node):
1720 """
1721 It's invalid to mix bytes and non-bytes literals, so just
1722 advancing to the next 'string' token suffices.
1723 """
1724 token = self.find_next_significant_token()
1725 yield from self.gen_token('string', token.value)
1726 #@+node:ekr.20191113063144.33: *6* tog.comprehension
1727 # comprehension = (expr target, expr iter, expr* ifs, int is_async)
1729 def do_comprehension(self, node):
1731 # No need to put parentheses.
1732 yield from self.gen_name('for') # #1858.
1733 yield from self.gen(node.target) # A name
1734 yield from self.gen_name('in')
1735 yield from self.gen(node.iter)
1736 for z in node.ifs or []:
1737 yield from self.gen_name('if')
1738 yield from self.gen(z)
1739 #@+node:ekr.20191113063144.34: *6* tog.Constant
1740 def do_Constant(self, node): # pragma: no cover
1741 """
1743 https://greentreesnakes.readthedocs.io/en/latest/nodes.html
1745 A constant. The value attribute holds the Python object it represents.
1746 This can be simple types such as a number, string or None, but also
1747 immutable container types (tuples and frozensets) if all of their
1748 elements are constant.
1749 """
1751 # Support Python 3.8.
1752 if node.value is None or isinstance(node.value, bool):
1753 # Weird: return a name!
1754 yield from self.gen_token('name', repr(node.value))
1755 elif node.value == Ellipsis:
1756 yield from self.gen_op('...')
1757 elif isinstance(node.value, str):
1758 yield from self.do_Str(node)
1759 elif isinstance(node.value, (int, float)):
1760 yield from self.gen_token('number', repr(node.value))
1761 elif isinstance(node.value, bytes):
1762 yield from self.do_Bytes(node)
1763 elif isinstance(node.value, tuple):
1764 yield from self.do_Tuple(node)
1765 elif isinstance(node.value, frozenset):
1766 yield from self.do_Set(node)
1767 else:
1768 # Unknown type.
1769 g.trace('----- Oops -----', repr(node.value), g.callers())
1770 #@+node:ekr.20191113063144.35: *6* tog.Dict
1771 # Dict(expr* keys, expr* values)
1773 def do_Dict(self, node):
1775 assert len(node.keys) == len(node.values)
1776 yield from self.gen_op('{')
1777 # No need to put commas.
1778 for i, key in enumerate(node.keys):
1779 key, value = node.keys[i], node.values[i]
1780 yield from self.gen(key) # a Str node.
1781 yield from self.gen_op(':')
1782 if value is not None:
1783 yield from self.gen(value)
1784 yield from self.gen_op('}')
1785 #@+node:ekr.20191113063144.36: *6* tog.DictComp
1786 # DictComp(expr key, expr value, comprehension* generators)
1788 # d2 = {val: key for key, val in d}
1790 def do_DictComp(self, node):
1792 yield from self.gen_token('op', '{')
1793 yield from self.gen(node.key)
1794 yield from self.gen_op(':')
1795 yield from self.gen(node.value)
1796 for z in node.generators or []:
1797 yield from self.gen(z)
1798 yield from self.gen_token('op', '}')
1799 #@+node:ekr.20191113063144.37: *6* tog.Ellipsis
1800 def do_Ellipsis(self, node): # pragma: no cover (Does not exist for python 3.8+)
1802 yield from self.gen_op('...')
1803 #@+node:ekr.20191113063144.38: *6* tog.ExtSlice
1804 # https://docs.python.org/3/reference/expressions.html#slicings
1806 # ExtSlice(slice* dims)
1808 def do_ExtSlice(self, node): # pragma: no cover (deprecated)
1810 # ','.join(node.dims)
1811 for i, z in enumerate(node.dims):
1812 yield from self.gen(z)
1813 if i < len(node.dims) - 1:
1814 yield from self.gen_op(',')
1815 #@+node:ekr.20191113063144.40: *6* tog.Index
1816 def do_Index(self, node): # pragma: no cover (deprecated)
1818 yield from self.gen(node.value)
1819 #@+node:ekr.20191113063144.39: *6* tog.FormattedValue: not called!
1820 # FormattedValue(expr value, int? conversion, expr? format_spec)
1822 def do_FormattedValue(self, node): # pragma: no cover
1823 """
1824 This node represents the *components* of a *single* f-string.
1826 Happily, JoinedStr nodes *also* represent *all* f-strings,
1827 so the TOG should *never visit this node!
1828 """
1829 filename = getattr(self, 'filename', '<no file>')
1830 raise AssignLinksError(
1831 f"file: {filename}\n"
1832 f"do_FormattedValue should never be called")
1834 # This code has no chance of being useful...
1836 # conv = node.conversion
1837 # spec = node.format_spec
1838 # yield from self.gen(node.value)
1839 # if conv is not None:
1840 # yield from self.gen_token('number', conv)
1841 # if spec is not None:
1842 # yield from self.gen(node.format_spec)
1843 #@+node:ekr.20191113063144.41: *6* tog.JoinedStr & helpers
1844 # JoinedStr(expr* values)
1846 def do_JoinedStr(self, node):
1847 """
1848 JoinedStr nodes represent at least one f-string and all other strings
1849 concatentated to it.
1851 Analyzing JoinedStr.values would be extremely tricky, for reasons that
1852 need not be explained here.
1854 Instead, we get the tokens *from the token list itself*!
1855 """
1856 for z in self.get_concatenated_string_tokens():
1857 yield from self.gen_token(z.kind, z.value)
1858 #@+node:ekr.20191113063144.42: *6* tog.List
1859 def do_List(self, node):
1861 # No need to put commas.
1862 yield from self.gen_op('[')
1863 yield from self.gen(node.elts)
1864 yield from self.gen_op(']')
1865 #@+node:ekr.20191113063144.43: *6* tog.ListComp
1866 # ListComp(expr elt, comprehension* generators)
1868 def do_ListComp(self, node):
1870 yield from self.gen_op('[')
1871 yield from self.gen(node.elt)
1872 for z in node.generators:
1873 yield from self.gen(z)
1874 yield from self.gen_op(']')
1875 #@+node:ekr.20191113063144.44: *6* tog.Name & NameConstant
1876 def do_Name(self, node):
1878 yield from self.gen_name(node.id)
1880 def do_NameConstant(self, node): # pragma: no cover (Does not exist in Python 3.8+)
1882 yield from self.gen_name(repr(node.value))
1884 #@+node:ekr.20191113063144.45: *6* tog.Num
1885 def do_Num(self, node): # pragma: no cover (Does not exist in Python 3.8+)
1887 yield from self.gen_token('number', node.n)
1888 #@+node:ekr.20191113063144.47: *6* tog.Set
1889 # Set(expr* elts)
1891 def do_Set(self, node):
1893 yield from self.gen_op('{')
1894 yield from self.gen(node.elts)
1895 yield from self.gen_op('}')
1896 #@+node:ekr.20191113063144.48: *6* tog.SetComp
1897 # SetComp(expr elt, comprehension* generators)
1899 def do_SetComp(self, node):
1901 yield from self.gen_op('{')
1902 yield from self.gen(node.elt)
1903 for z in node.generators or []:
1904 yield from self.gen(z)
1905 yield from self.gen_op('}')
1906 #@+node:ekr.20191113063144.49: *6* tog.Slice
1907 # slice = Slice(expr? lower, expr? upper, expr? step)
1909 def do_Slice(self, node):
1911 lower = getattr(node, 'lower', None)
1912 upper = getattr(node, 'upper', None)
1913 step = getattr(node, 'step', None)
1914 if lower is not None:
1915 yield from self.gen(lower)
1916 # Always put the colon between upper and lower.
1917 yield from self.gen_op(':')
1918 if upper is not None:
1919 yield from self.gen(upper)
1920 # Put the second colon if it exists in the token list.
1921 if step is None:
1922 token = self.find_next_significant_token()
1923 if token and token.value == ':':
1924 yield from self.gen_op(':')
1925 else:
1926 yield from self.gen_op(':')
1927 yield from self.gen(step)
1928 #@+node:ekr.20191113063144.50: *6* tog.Str & helper
1929 def do_Str(self, node):
1930 """This node represents a string constant."""
1931 # This loop is necessary to handle string concatenation.
1932 for z in self.get_concatenated_string_tokens():
1933 yield from self.gen_token(z.kind, z.value)
1934 #@+node:ekr.20200111083914.1: *7* tog.get_concatenated_tokens
1935 def get_concatenated_string_tokens(self):
1936 """
1937 Return the next 'string' token and all 'string' tokens concatenated to
1938 it. *Never* update self.px here.
1939 """
1940 trace = False
1941 tag = 'tog.get_concatenated_string_tokens'
1942 i = self.px
1943 # First, find the next significant token. It should be a string.
1944 i, token = i + 1, None
1945 while i < len(self.tokens):
1946 token = self.tokens[i]
1947 i += 1
1948 if token.kind == 'string':
1949 # Rescan the string.
1950 i -= 1
1951 break
1952 # An error.
1953 if is_significant_token(token): # pragma: no cover
1954 break
1955 # Raise an error if we didn't find the expected 'string' token.
1956 if not token or token.kind != 'string': # pragma: no cover
1957 if not token:
1958 token = self.tokens[-1]
1959 filename = getattr(self, 'filename', '<no filename>')
1960 raise AssignLinksError(
1961 f"\n"
1962 f"{tag}...\n"
1963 f"file: {filename}\n"
1964 f"line: {token.line_number}\n"
1965 f" i: {i}\n"
1966 f"expected 'string' token, got {token!s}")
1967 # Accumulate string tokens.
1968 assert self.tokens[i].kind == 'string'
1969 results = []
1970 while i < len(self.tokens):
1971 token = self.tokens[i]
1972 i += 1
1973 if token.kind == 'string':
1974 results.append(token)
1975 elif token.kind == 'op' or is_significant_token(token):
1976 # Any significant token *or* any op will halt string concatenation.
1977 break
1978 # 'ws', 'nl', 'newline', 'comment', 'indent', 'dedent', etc.
1979 # The (significant) 'endmarker' token ensures we will have result.
1980 assert results
1981 if trace: # pragma: no cover
1982 g.printObj(results, tag=f"{tag}: Results")
1983 return results
1984 #@+node:ekr.20191113063144.51: *6* tog.Subscript
1985 # Subscript(expr value, slice slice, expr_context ctx)
1987 def do_Subscript(self, node):
1989 yield from self.gen(node.value)
1990 yield from self.gen_op('[')
1991 yield from self.gen(node.slice)
1992 yield from self.gen_op(']')
1993 #@+node:ekr.20191113063144.52: *6* tog.Tuple
1994 # Tuple(expr* elts, expr_context ctx)
1996 def do_Tuple(self, node):
1998 # Do not call gen_op for parens or commas here.
1999 # They do not necessarily exist in the token list!
2000 yield from self.gen(node.elts)
2001 #@+node:ekr.20191113063144.53: *5* tog: Operators
2002 #@+node:ekr.20191113063144.55: *6* tog.BinOp
2003 def do_BinOp(self, node):
2005 op_name_ = op_name(node.op)
2006 yield from self.gen(node.left)
2007 yield from self.gen_op(op_name_)
2008 yield from self.gen(node.right)
2009 #@+node:ekr.20191113063144.56: *6* tog.BoolOp
2010 # BoolOp(boolop op, expr* values)
2012 def do_BoolOp(self, node):
2014 # op.join(node.values)
2015 op_name_ = op_name(node.op)
2016 for i, z in enumerate(node.values):
2017 yield from self.gen(z)
2018 if i < len(node.values) - 1:
2019 yield from self.gen_name(op_name_)
2020 #@+node:ekr.20191113063144.57: *6* tog.Compare
2021 # Compare(expr left, cmpop* ops, expr* comparators)
2023 def do_Compare(self, node):
2025 assert len(node.ops) == len(node.comparators)
2026 yield from self.gen(node.left)
2027 for i, z in enumerate(node.ops):
2028 op_name_ = op_name(node.ops[i])
2029 if op_name_ in ('not in', 'is not'):
2030 for z in op_name_.split(' '):
2031 yield from self.gen_name(z)
2032 elif op_name_.isalpha():
2033 yield from self.gen_name(op_name_)
2034 else:
2035 yield from self.gen_op(op_name_)
2036 yield from self.gen(node.comparators[i])
2037 #@+node:ekr.20191113063144.58: *6* tog.UnaryOp
2038 def do_UnaryOp(self, node):
2040 op_name_ = op_name(node.op)
2041 if op_name_.isalpha():
2042 yield from self.gen_name(op_name_)
2043 else:
2044 yield from self.gen_op(op_name_)
2045 yield from self.gen(node.operand)
2046 #@+node:ekr.20191113063144.59: *6* tog.IfExp (ternary operator)
2047 # IfExp(expr test, expr body, expr orelse)
2049 def do_IfExp(self, node):
2051 #'%s if %s else %s'
2052 yield from self.gen(node.body)
2053 yield from self.gen_name('if')
2054 yield from self.gen(node.test)
2055 yield from self.gen_name('else')
2056 yield from self.gen(node.orelse)
2057 #@+node:ekr.20191113063144.60: *5* tog: Statements
2058 #@+node:ekr.20191113063144.83: *6* tog.Starred
2059 # Starred(expr value, expr_context ctx)
2061 def do_Starred(self, node):
2062 """A starred argument to an ast.Call"""
2063 yield from self.gen_op('*')
2064 yield from self.gen(node.value)
2065 #@+node:ekr.20191113063144.61: *6* tog.AnnAssign
2066 # AnnAssign(expr target, expr annotation, expr? value, int simple)
2068 def do_AnnAssign(self, node):
2070 # {node.target}:{node.annotation}={node.value}\n'
2071 yield from self.gen(node.target)
2072 yield from self.gen_op(':')
2073 yield from self.gen(node.annotation)
2074 if node.value is not None: # #1851
2075 yield from self.gen_op('=')
2076 yield from self.gen(node.value)
2077 #@+node:ekr.20191113063144.62: *6* tog.Assert
2078 # Assert(expr test, expr? msg)
2080 def do_Assert(self, node):
2082 # Guards...
2083 msg = getattr(node, 'msg', None)
2084 # No need to put parentheses or commas.
2085 yield from self.gen_name('assert')
2086 yield from self.gen(node.test)
2087 if msg is not None:
2088 yield from self.gen(node.msg)
2089 #@+node:ekr.20191113063144.63: *6* tog.Assign
2090 def do_Assign(self, node):
2092 for z in node.targets:
2093 yield from self.gen(z)
2094 yield from self.gen_op('=')
2095 yield from self.gen(node.value)
2096 #@+node:ekr.20191113063144.64: *6* tog.AsyncFor
2097 def do_AsyncFor(self, node):
2099 # The def line...
2100 # Py 3.8 changes the kind of token.
2101 async_token_type = 'async' if has_async_tokens else 'name'
2102 yield from self.gen_token(async_token_type, 'async')
2103 yield from self.gen_name('for')
2104 yield from self.gen(node.target)
2105 yield from self.gen_name('in')
2106 yield from self.gen(node.iter)
2107 yield from self.gen_op(':')
2108 # Body...
2109 self.level += 1
2110 yield from self.gen(node.body)
2111 # Else clause...
2112 if node.orelse:
2113 yield from self.gen_name('else')
2114 yield from self.gen_op(':')
2115 yield from self.gen(node.orelse)
2116 self.level -= 1
2117 #@+node:ekr.20191113063144.65: *6* tog.AsyncWith
2118 def do_AsyncWith(self, node):
2120 async_token_type = 'async' if has_async_tokens else 'name'
2121 yield from self.gen_token(async_token_type, 'async')
2122 yield from self.do_With(node)
2123 #@+node:ekr.20191113063144.66: *6* tog.AugAssign
2124 # AugAssign(expr target, operator op, expr value)
2126 def do_AugAssign(self, node):
2128 # %s%s=%s\n'
2129 op_name_ = op_name(node.op)
2130 yield from self.gen(node.target)
2131 yield from self.gen_op(op_name_ + '=')
2132 yield from self.gen(node.value)
2133 #@+node:ekr.20191113063144.67: *6* tog.Await
2134 # Await(expr value)
2136 def do_Await(self, node):
2138 #'await %s\n'
2139 async_token_type = 'await' if has_async_tokens else 'name'
2140 yield from self.gen_token(async_token_type, 'await')
2141 yield from self.gen(node.value)
2142 #@+node:ekr.20191113063144.68: *6* tog.Break
2143 def do_Break(self, node):
2145 yield from self.gen_name('break')
2146 #@+node:ekr.20191113063144.31: *6* tog.Call & helpers
2147 # Call(expr func, expr* args, keyword* keywords)
2149 # Python 3 ast.Call nodes do not have 'starargs' or 'kwargs' fields.
2151 def do_Call(self, node):
2153 # The calls to gen_op(')') and gen_op('(') do nothing by default.
2154 # Subclasses might handle them in an overridden tog.set_links.
2155 yield from self.gen(node.func)
2156 yield from self.gen_op('(')
2157 # No need to generate any commas.
2158 yield from self.handle_call_arguments(node)
2159 yield from self.gen_op(')')
2160 #@+node:ekr.20191204114930.1: *7* tog.arg_helper
2161 def arg_helper(self, node):
2162 """
2163 Yield the node, with a special case for strings.
2164 """
2165 if isinstance(node, str):
2166 yield from self.gen_token('name', node)
2167 else:
2168 yield from self.gen(node)
2169 #@+node:ekr.20191204105506.1: *7* tog.handle_call_arguments
2170 def handle_call_arguments(self, node):
2171 """
2172 Generate arguments in the correct order.
2174 Call(expr func, expr* args, keyword* keywords)
2176 https://docs.python.org/3/reference/expressions.html#calls
2178 Warning: This code will fail on Python 3.8 only for calls
2179 containing kwargs in unexpected places.
2180 """
2181 # *args: in node.args[]: Starred(value=Name(id='args'))
2182 # *[a, 3]: in node.args[]: Starred(value=List(elts=[Name(id='a'), Num(n=3)])
2183 # **kwargs: in node.keywords[]: keyword(arg=None, value=Name(id='kwargs'))
2184 #
2185 # Scan args for *name or *List
2186 args = node.args or []
2187 keywords = node.keywords or []
2189 def get_pos(obj):
2190 line1 = getattr(obj, 'lineno', None)
2191 col1 = getattr(obj, 'col_offset', None)
2192 return line1, col1, obj
2194 def sort_key(aTuple):
2195 line, col, obj = aTuple
2196 return line * 1000 + col
2198 if 0:
2199 g.printObj([ast.dump(z) for z in args], tag='args')
2200 g.printObj([ast.dump(z) for z in keywords], tag='keywords')
2202 if py_version >= (3, 9):
2203 places = [get_pos(z) for z in args + keywords]
2204 places.sort(key=sort_key)
2205 ordered_args = [z[2] for z in places]
2206 for z in ordered_args:
2207 if isinstance(z, ast.Starred):
2208 yield from self.gen_op('*')
2209 yield from self.gen(z.value)
2210 elif isinstance(z, ast.keyword):
2211 if getattr(z, 'arg', None) is None:
2212 yield from self.gen_op('**')
2213 yield from self.arg_helper(z.value)
2214 else:
2215 yield from self.arg_helper(z.arg)
2216 yield from self.gen_op('=')
2217 yield from self.arg_helper(z.value)
2218 else:
2219 yield from self.arg_helper(z)
2220 else: # pragma: no cover
2221 #
2222 # Legacy code: May fail for Python 3.8
2223 #
2224 # Scan args for *arg and *[...]
2225 kwarg_arg = star_arg = None
2226 for z in args:
2227 if isinstance(z, ast.Starred):
2228 if isinstance(z.value, ast.Name): # *Name.
2229 star_arg = z
2230 args.remove(z)
2231 break
2232 elif isinstance(z.value, (ast.List, ast.Tuple)): # *[...]
2233 # star_list = z
2234 break
2235 raise AttributeError(f"Invalid * expression: {ast.dump(z)}") # pragma: no cover
2236 # Scan keywords for **name.
2237 for z in keywords:
2238 if hasattr(z, 'arg') and z.arg is None:
2239 kwarg_arg = z
2240 keywords.remove(z)
2241 break
2242 # Sync the plain arguments.
2243 for z in args:
2244 yield from self.arg_helper(z)
2245 # Sync the keyword args.
2246 for z in keywords:
2247 yield from self.arg_helper(z.arg)
2248 yield from self.gen_op('=')
2249 yield from self.arg_helper(z.value)
2250 # Sync the * arg.
2251 if star_arg:
2252 yield from self.arg_helper(star_arg)
2253 # Sync the ** kwarg.
2254 if kwarg_arg:
2255 yield from self.gen_op('**')
2256 yield from self.gen(kwarg_arg.value)
2257 #@+node:ekr.20191113063144.69: *6* tog.Continue
2258 def do_Continue(self, node):
2260 yield from self.gen_name('continue')
2261 #@+node:ekr.20191113063144.70: *6* tog.Delete
2262 def do_Delete(self, node):
2264 # No need to put commas.
2265 yield from self.gen_name('del')
2266 yield from self.gen(node.targets)
2267 #@+node:ekr.20191113063144.71: *6* tog.ExceptHandler
2268 def do_ExceptHandler(self, node):
2270 # Except line...
2271 yield from self.gen_name('except')
2272 if getattr(node, 'type', None):
2273 yield from self.gen(node.type)
2274 if getattr(node, 'name', None):
2275 yield from self.gen_name('as')
2276 yield from self.gen_name(node.name)
2277 yield from self.gen_op(':')
2278 # Body...
2279 self.level += 1
2280 yield from self.gen(node.body)
2281 self.level -= 1
2282 #@+node:ekr.20191113063144.73: *6* tog.For
2283 def do_For(self, node):
2285 # The def line...
2286 yield from self.gen_name('for')
2287 yield from self.gen(node.target)
2288 yield from self.gen_name('in')
2289 yield from self.gen(node.iter)
2290 yield from self.gen_op(':')
2291 # Body...
2292 self.level += 1
2293 yield from self.gen(node.body)
2294 # Else clause...
2295 if node.orelse:
2296 yield from self.gen_name('else')
2297 yield from self.gen_op(':')
2298 yield from self.gen(node.orelse)
2299 self.level -= 1
2300 #@+node:ekr.20191113063144.74: *6* tog.Global
2301 # Global(identifier* names)
2303 def do_Global(self, node):
2305 yield from self.gen_name('global')
2306 for z in node.names:
2307 yield from self.gen_name(z)
2308 #@+node:ekr.20191113063144.75: *6* tog.If & helpers
2309 # If(expr test, stmt* body, stmt* orelse)
2311 def do_If(self, node):
2312 #@+<< do_If docstring >>
2313 #@+node:ekr.20191122222412.1: *7* << do_If docstring >>
2314 """
2315 The parse trees for the following are identical!
2317 if 1: if 1:
2318 pass pass
2319 else: elif 2:
2320 if 2: pass
2321 pass
2323 So there is *no* way for the 'if' visitor to disambiguate the above two
2324 cases from the parse tree alone.
2326 Instead, we scan the tokens list for the next 'if', 'else' or 'elif' token.
2327 """
2328 #@-<< do_If docstring >>
2329 # Use the next significant token to distinguish between 'if' and 'elif'.
2330 token = self.find_next_significant_token()
2331 yield from self.gen_name(token.value)
2332 yield from self.gen(node.test)
2333 yield from self.gen_op(':')
2334 #
2335 # Body...
2336 self.level += 1
2337 yield from self.gen(node.body)
2338 self.level -= 1
2339 #
2340 # Else and elif clauses...
2341 if node.orelse:
2342 self.level += 1
2343 token = self.find_next_significant_token()
2344 if token.value == 'else':
2345 yield from self.gen_name('else')
2346 yield from self.gen_op(':')
2347 yield from self.gen(node.orelse)
2348 else:
2349 yield from self.gen(node.orelse)
2350 self.level -= 1
2351 #@+node:ekr.20191113063144.76: *6* tog.Import & helper
2352 def do_Import(self, node):
2354 yield from self.gen_name('import')
2355 for alias in node.names:
2356 yield from self.gen_name(alias.name)
2357 if alias.asname:
2358 yield from self.gen_name('as')
2359 yield from self.gen_name(alias.asname)
2360 #@+node:ekr.20191113063144.77: *6* tog.ImportFrom
2361 # ImportFrom(identifier? module, alias* names, int? level)
2363 def do_ImportFrom(self, node):
2365 yield from self.gen_name('from')
2366 for i in range(node.level):
2367 yield from self.gen_op('.')
2368 if node.module:
2369 yield from self.gen_name(node.module)
2370 yield from self.gen_name('import')
2371 # No need to put commas.
2372 for alias in node.names:
2373 if alias.name == '*': # #1851.
2374 yield from self.gen_op('*')
2375 else:
2376 yield from self.gen_name(alias.name)
2377 if alias.asname:
2378 yield from self.gen_name('as')
2379 yield from self.gen_name(alias.asname)
2380 #@+node:ekr.20191113063144.78: *6* tog.Nonlocal
2381 # Nonlocal(identifier* names)
2383 def do_Nonlocal(self, node):
2385 # nonlocal %s\n' % ','.join(node.names))
2386 # No need to put commas.
2387 yield from self.gen_name('nonlocal')
2388 for z in node.names:
2389 yield from self.gen_name(z)
2390 #@+node:ekr.20191113063144.79: *6* tog.Pass
2391 def do_Pass(self, node):
2393 yield from self.gen_name('pass')
2394 #@+node:ekr.20191113063144.81: *6* tog.Raise
2395 # Raise(expr? exc, expr? cause)
2397 def do_Raise(self, node):
2399 # No need to put commas.
2400 yield from self.gen_name('raise')
2401 exc = getattr(node, 'exc', None)
2402 cause = getattr(node, 'cause', None)
2403 tback = getattr(node, 'tback', None)
2404 yield from self.gen(exc)
2405 if cause:
2406 yield from self.gen_name('from') # #2446.
2407 yield from self.gen(cause)
2408 yield from self.gen(tback)
2409 #@+node:ekr.20191113063144.82: *6* tog.Return
2410 def do_Return(self, node):
2412 yield from self.gen_name('return')
2413 yield from self.gen(node.value)
2414 #@+node:ekr.20191113063144.85: *6* tog.Try
2415 # Try(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody)
2417 def do_Try(self, node):
2419 # Try line...
2420 yield from self.gen_name('try')
2421 yield from self.gen_op(':')
2422 # Body...
2423 self.level += 1
2424 yield from self.gen(node.body)
2425 yield from self.gen(node.handlers)
2426 # Else...
2427 if node.orelse:
2428 yield from self.gen_name('else')
2429 yield from self.gen_op(':')
2430 yield from self.gen(node.orelse)
2431 # Finally...
2432 if node.finalbody:
2433 yield from self.gen_name('finally')
2434 yield from self.gen_op(':')
2435 yield from self.gen(node.finalbody)
2436 self.level -= 1
2437 #@+node:ekr.20191113063144.88: *6* tog.While
2438 def do_While(self, node):
2440 # While line...
2441 # while %s:\n'
2442 yield from self.gen_name('while')
2443 yield from self.gen(node.test)
2444 yield from self.gen_op(':')
2445 # Body...
2446 self.level += 1
2447 yield from self.gen(node.body)
2448 # Else clause...
2449 if node.orelse:
2450 yield from self.gen_name('else')
2451 yield from self.gen_op(':')
2452 yield from self.gen(node.orelse)
2453 self.level -= 1
2454 #@+node:ekr.20191113063144.89: *6* tog.With
2455 # With(withitem* items, stmt* body)
2457 # withitem = (expr context_expr, expr? optional_vars)
2459 def do_With(self, node):
2461 expr: Optional[ast.AST] = getattr(node, 'context_expression', None)
2462 items: List[ast.AST] = getattr(node, 'items', [])
2463 yield from self.gen_name('with')
2464 yield from self.gen(expr)
2465 # No need to put commas.
2466 for item in items:
2467 yield from self.gen(item.context_expr) # type:ignore
2468 optional_vars = getattr(item, 'optional_vars', None)
2469 if optional_vars is not None:
2470 yield from self.gen_name('as')
2471 yield from self.gen(item.optional_vars) # type:ignore
2472 # End the line.
2473 yield from self.gen_op(':')
2474 # Body...
2475 self.level += 1
2476 yield from self.gen(node.body)
2477 self.level -= 1
2478 #@+node:ekr.20191113063144.90: *6* tog.Yield
2479 def do_Yield(self, node):
2481 yield from self.gen_name('yield')
2482 if hasattr(node, 'value'):
2483 yield from self.gen(node.value)
2484 #@+node:ekr.20191113063144.91: *6* tog.YieldFrom
2485 # YieldFrom(expr value)
2487 def do_YieldFrom(self, node):
2489 yield from self.gen_name('yield')
2490 yield from self.gen_name('from')
2491 yield from self.gen(node.value)
2492 #@-others
2493#@+node:ekr.20191226195813.1: *3* class TokenOrderTraverser
2494class TokenOrderTraverser:
2495 """
2496 Traverse an ast tree using the parent/child links created by the
2497 TokenOrderInjector class.
2498 """
2499 #@+others
2500 #@+node:ekr.20191226200154.1: *4* TOT.traverse
2501 def traverse(self, tree):
2502 """
2503 Call visit, in token order, for all nodes in tree.
2505 Recursion is not allowed.
2507 The code follows p.moveToThreadNext exactly.
2508 """
2510 def has_next(i, node, stack):
2511 """Return True if stack[i] is a valid child of node.parent."""
2512 # g.trace(node.__class__.__name__, stack)
2513 parent = node.parent
2514 return bool(parent and parent.children and i < len(parent.children))
2516 # Update stats
2518 self.last_node_index = -1 # For visit
2519 # The stack contains child indices.
2520 node, stack = tree, [0]
2521 seen = set()
2522 while node and stack:
2523 # g.trace(
2524 # f"{node.node_index:>3} "
2525 # f"{node.__class__.__name__:<12} {stack}")
2526 # Visit the node.
2527 assert node.node_index not in seen, node.node_index
2528 seen.add(node.node_index)
2529 self.visit(node)
2530 # if p.v.children: p.moveToFirstChild()
2531 children: List[ast.AST] = getattr(node, 'children', [])
2532 if children:
2533 # Move to the first child.
2534 stack.append(0)
2535 node = children[0]
2536 # g.trace(' child:', node.__class__.__name__, stack)
2537 continue
2538 # elif p.hasNext(): p.moveToNext()
2539 stack[-1] += 1
2540 i = stack[-1]
2541 if has_next(i, node, stack):
2542 node = node.parent.children[i]
2543 continue
2544 # else...
2545 # p.moveToParent()
2546 node = node.parent
2547 stack.pop()
2548 # while p:
2549 while node and stack:
2550 # if p.hasNext():
2551 stack[-1] += 1
2552 i = stack[-1]
2553 if has_next(i, node, stack):
2554 # Move to the next sibling.
2555 node = node.parent.children[i]
2556 break # Found.
2557 # p.moveToParent()
2558 node = node.parent
2559 stack.pop()
2560 # not found.
2561 else:
2562 break # pragma: no cover
2563 return self.last_node_index
2564 #@+node:ekr.20191227160547.1: *4* TOT.visit
2565 def visit(self, node):
2567 self.last_node_index += 1
2568 assert self.last_node_index == node.node_index, (
2569 self.last_node_index, node.node_index)
2570 #@-others
2571#@+node:ekr.20200107165250.1: *3* class Orange
2572class Orange:
2573 """
2574 A flexible and powerful beautifier for Python.
2575 Orange is the new black.
2577 *Important*: This is a predominantly a *token*-based beautifier.
2578 However, orange.colon and orange.possible_unary_op use the parse
2579 tree to provide context that would otherwise be difficult to
2580 deduce.
2581 """
2582 # This switch is really a comment. It will always be false.
2583 # It marks the code that simulates the operation of the black tool.
2584 black_mode = False
2586 # Patterns...
2587 nobeautify_pat = re.compile(r'\s*#\s*pragma:\s*no\s*beautify\b|#\s*@@nobeautify')
2589 # Patterns from FastAtRead class, specialized for python delims.
2590 node_pat = re.compile(r'^(\s*)#@\+node:([^:]+): \*(\d+)?(\*?) (.*)$') # @node
2591 start_doc_pat = re.compile(r'^\s*#@\+(at|doc)?(\s.*?)?$') # @doc or @
2592 at_others_pat = re.compile(r'^(\s*)#@(\+|-)others\b(.*)$') # @others
2594 # Doc parts end with @c or a node sentinel. Specialized for python.
2595 end_doc_pat = re.compile(r"^\s*#@(@(c(ode)?)|([+]node\b.*))$")
2596 #@+others
2597 #@+node:ekr.20200107165250.2: *4* orange.ctor
2598 def __init__(self, settings=None):
2599 """Ctor for Orange class."""
2600 if settings is None:
2601 settings = {}
2602 valid_keys = (
2603 'allow_joined_strings',
2604 'max_join_line_length',
2605 'max_split_line_length',
2606 'orange',
2607 'tab_width',
2608 )
2609 # For mypy...
2610 self.kind: str = ''
2611 # Default settings...
2612 self.allow_joined_strings = False # EKR's preference.
2613 self.max_join_line_length = 88
2614 self.max_split_line_length = 88
2615 self.tab_width = 4
2616 # Override from settings dict...
2617 for key in settings: # pragma: no cover
2618 value = settings.get(key)
2619 if key in valid_keys and value is not None:
2620 setattr(self, key, value)
2621 else:
2622 g.trace(f"Unexpected setting: {key} = {value!r}")
2623 #@+node:ekr.20200107165250.51: *4* orange.push_state
2624 def push_state(self, kind, value=None):
2625 """Append a state to the state stack."""
2626 state = ParseState(kind, value)
2627 self.state_stack.append(state)
2628 #@+node:ekr.20200107165250.8: *4* orange: Entries
2629 #@+node:ekr.20200107173542.1: *5* orange.beautify (main token loop)
2630 def oops(self): # pragma: no cover
2631 g.trace(f"Unknown kind: {self.kind}")
2633 def beautify(self, contents, filename, tokens, tree, max_join_line_length=None, max_split_line_length=None):
2634 """
2635 The main line. Create output tokens and return the result as a string.
2636 """
2637 # Config overrides
2638 if max_join_line_length is not None:
2639 self.max_join_line_length = max_join_line_length
2640 if max_split_line_length is not None:
2641 self.max_split_line_length = max_split_line_length
2642 # State vars...
2643 self.curly_brackets_level = 0 # Number of unmatched '{' tokens.
2644 self.decorator_seen = False # Set by do_name for do_op.
2645 self.in_arg_list = 0 # > 0 if in an arg list of a def.
2646 self.level = 0 # Set only by do_indent and do_dedent.
2647 self.lws = '' # Leading whitespace.
2648 self.paren_level = 0 # Number of unmatched '(' tokens.
2649 self.square_brackets_stack: List[bool] = [] # A stack of bools, for self.word().
2650 self.state_stack: List["ParseState"] = [] # Stack of ParseState objects.
2651 self.val = None # The input token's value (a string).
2652 self.verbatim = False # True: don't beautify.
2653 #
2654 # Init output list and state...
2655 self.code_list: List[Token] = [] # The list of output tokens.
2656 self.code_list_index = 0 # The token's index.
2657 self.tokens = tokens # The list of input tokens.
2658 self.tree = tree
2659 self.add_token('file-start', '')
2660 self.push_state('file-start')
2661 for i, token in enumerate(tokens):
2662 self.token = token
2663 self.kind, self.val, self.line = token.kind, token.value, token.line
2664 if self.verbatim:
2665 self.do_verbatim()
2666 else:
2667 func = getattr(self, f"do_{token.kind}", self.oops)
2668 func()
2669 # Any post pass would go here.
2670 return tokens_to_string(self.code_list)
2671 #@+node:ekr.20200107172450.1: *5* orange.beautify_file (entry)
2672 def beautify_file(self, filename): # pragma: no cover
2673 """
2674 Orange: Beautify the the given external file.
2676 Return True if the file was changed.
2677 """
2678 tag = 'beautify-file'
2679 self.filename = filename
2680 tog = TokenOrderGenerator()
2681 contents, encoding, tokens, tree = tog.init_from_file(filename)
2682 if not contents or not tokens or not tree:
2683 print(f"{tag}: Can not beautify: {filename}")
2684 return False
2685 # Beautify.
2686 results = self.beautify(contents, filename, tokens, tree)
2687 # Something besides newlines must change.
2688 if regularize_nls(contents) == regularize_nls(results):
2689 print(f"{tag}: Unchanged: {filename}")
2690 return False
2691 if 0: # This obscures more import error messages.
2692 # Show the diffs.
2693 show_diffs(contents, results, filename=filename)
2694 # Write the results
2695 print(f"{tag}: Wrote {filename}")
2696 write_file(filename, results, encoding=encoding)
2697 return True
2698 #@+node:ekr.20200107172512.1: *5* orange.beautify_file_diff (entry)
2699 def beautify_file_diff(self, filename): # pragma: no cover
2700 """
2701 Orange: Print the diffs that would resulf from the orange-file command.
2703 Return True if the file would be changed.
2704 """
2705 tag = 'diff-beautify-file'
2706 self.filename = filename
2707 tog = TokenOrderGenerator()
2708 contents, encoding, tokens, tree = tog.init_from_file(filename)
2709 if not contents or not tokens or not tree:
2710 print(f"{tag}: Can not beautify: {filename}")
2711 return False
2712 # fstringify.
2713 results = self.beautify(contents, filename, tokens, tree)
2714 # Something besides newlines must change.
2715 if regularize_nls(contents) == regularize_nls(results):
2716 print(f"{tag}: Unchanged: {filename}")
2717 return False
2718 # Show the diffs.
2719 show_diffs(contents, results, filename=filename)
2720 return True
2721 #@+node:ekr.20200107165250.13: *4* orange: Input token handlers
2722 #@+node:ekr.20200107165250.14: *5* orange.do_comment
2723 in_doc_part = False
2725 def do_comment(self):
2726 """Handle a comment token."""
2727 val = self.val
2728 #
2729 # Leo-specific code...
2730 if self.node_pat.match(val):
2731 # Clear per-node state.
2732 self.in_doc_part = False
2733 self.verbatim = False
2734 self.decorator_seen = False
2735 # Do *not clear other state, which may persist across @others.
2736 # self.curly_brackets_level = 0
2737 # self.in_arg_list = 0
2738 # self.level = 0
2739 # self.lws = ''
2740 # self.paren_level = 0
2741 # self.square_brackets_stack = []
2742 # self.state_stack = []
2743 else:
2744 # Keep track of verbatim mode.
2745 if self.beautify_pat.match(val):
2746 self.verbatim = False
2747 elif self.nobeautify_pat.match(val):
2748 self.verbatim = True
2749 # Keep trace of @doc parts, to honor the convention for splitting lines.
2750 if self.start_doc_pat.match(val):
2751 self.in_doc_part = True
2752 if self.end_doc_pat.match(val):
2753 self.in_doc_part = False
2754 #
2755 # General code: Generate the comment.
2756 self.clean('blank')
2757 entire_line = self.line.lstrip().startswith('#')
2758 if entire_line:
2759 self.clean('hard-blank')
2760 self.clean('line-indent')
2761 # #1496: No further munging needed.
2762 val = self.line.rstrip()
2763 else:
2764 # Exactly two spaces before trailing comments.
2765 val = ' ' + self.val.rstrip()
2766 self.add_token('comment', val)
2767 #@+node:ekr.20200107165250.15: *5* orange.do_encoding
2768 def do_encoding(self):
2769 """
2770 Handle the encoding token.
2771 """
2772 pass
2773 #@+node:ekr.20200107165250.16: *5* orange.do_endmarker
2774 def do_endmarker(self):
2775 """Handle an endmarker token."""
2776 # Ensure exactly one blank at the end of the file.
2777 self.clean_blank_lines()
2778 self.add_token('line-end', '\n')
2779 #@+node:ekr.20200107165250.18: *5* orange.do_indent & do_dedent & helper
2780 def do_dedent(self):
2781 """Handle dedent token."""
2782 self.level -= 1
2783 self.lws = self.level * self.tab_width * ' '
2784 self.line_indent()
2785 if self.black_mode: # pragma: no cover (black)
2786 state = self.state_stack[-1]
2787 if state.kind == 'indent' and state.value == self.level:
2788 self.state_stack.pop()
2789 state = self.state_stack[-1]
2790 if state.kind in ('class', 'def'):
2791 self.state_stack.pop()
2792 self.handle_dedent_after_class_or_def(state.kind)
2794 def do_indent(self):
2795 """Handle indent token."""
2796 new_indent = self.val
2797 old_indent = self.level * self.tab_width * ' '
2798 if new_indent > old_indent:
2799 self.level += 1
2800 elif new_indent < old_indent: # pragma: no cover (defensive)
2801 g.trace('\n===== can not happen', repr(new_indent), repr(old_indent))
2802 self.lws = new_indent
2803 self.line_indent()
2804 #@+node:ekr.20200220054928.1: *6* orange.handle_dedent_after_class_or_def
2805 def handle_dedent_after_class_or_def(self, kind): # pragma: no cover (black)
2806 """
2807 Insert blank lines after a class or def as the result of a 'dedent' token.
2809 Normal comment lines may precede the 'dedent'.
2810 Insert the blank lines *before* such comment lines.
2811 """
2812 #
2813 # Compute the tail.
2814 i = len(self.code_list) - 1
2815 tail: List[Token] = []
2816 while i > 0:
2817 t = self.code_list.pop()
2818 i -= 1
2819 if t.kind == 'line-indent':
2820 pass
2821 elif t.kind == 'line-end':
2822 tail.insert(0, t)
2823 elif t.kind == 'comment':
2824 # Only underindented single-line comments belong in the tail.
2825 # @+node comments must never be in the tail.
2826 single_line = self.code_list[i].kind in ('line-end', 'line-indent')
2827 lws = len(t.value) - len(t.value.lstrip())
2828 underindent = lws <= len(self.lws)
2829 if underindent and single_line and not self.node_pat.match(t.value):
2830 # A single-line comment.
2831 tail.insert(0, t)
2832 else:
2833 self.code_list.append(t)
2834 break
2835 else:
2836 self.code_list.append(t)
2837 break
2838 #
2839 # Remove leading 'line-end' tokens from the tail.
2840 while tail and tail[0].kind == 'line-end':
2841 tail = tail[1:]
2842 #
2843 # Put the newlines *before* the tail.
2844 # For Leo, always use 1 blank lines.
2845 n = 1 # n = 2 if kind == 'class' else 1
2846 # Retain the token (intention) for debugging.
2847 self.add_token('blank-lines', n)
2848 for i in range(0, n + 1):
2849 self.add_token('line-end', '\n')
2850 if tail:
2851 self.code_list.extend(tail)
2852 self.line_indent()
2853 #@+node:ekr.20200107165250.20: *5* orange.do_name
2854 def do_name(self):
2855 """Handle a name token."""
2856 name = self.val
2857 if self.black_mode and name in ('class', 'def'): # pragma: no cover (black)
2858 # Handle newlines before and after 'class' or 'def'
2859 self.decorator_seen = False
2860 state = self.state_stack[-1]
2861 if state.kind == 'decorator':
2862 # Always do this, regardless of @bool clean-blank-lines.
2863 self.clean_blank_lines()
2864 # Suppress split/join.
2865 self.add_token('hard-newline', '\n')
2866 self.add_token('line-indent', self.lws)
2867 self.state_stack.pop()
2868 else:
2869 # Always do this, regardless of @bool clean-blank-lines.
2870 self.blank_lines(2 if name == 'class' else 1)
2871 self.push_state(name)
2872 self.push_state('indent', self.level)
2873 # For trailing lines after inner classes/defs.
2874 self.word(name)
2875 return
2876 #
2877 # Leo mode...
2878 if name in ('class', 'def'):
2879 self.word(name)
2880 elif name in (
2881 'and', 'elif', 'else', 'for', 'if', 'in', 'not', 'not in', 'or', 'while'
2882 ):
2883 self.word_op(name)
2884 else:
2885 self.word(name)
2886 #@+node:ekr.20200107165250.21: *5* orange.do_newline & do_nl
2887 def do_newline(self):
2888 """Handle a regular newline."""
2889 self.line_end()
2891 def do_nl(self):
2892 """Handle a continuation line."""
2893 self.line_end()
2894 #@+node:ekr.20200107165250.22: *5* orange.do_number
2895 def do_number(self):
2896 """Handle a number token."""
2897 self.blank()
2898 self.add_token('number', self.val)
2899 #@+node:ekr.20200107165250.23: *5* orange.do_op
2900 def do_op(self):
2901 """Handle an op token."""
2902 val = self.val
2903 if val == '.':
2904 self.clean('blank')
2905 prev = self.code_list[-1]
2906 # #2495: Special case for 'from .'
2907 if prev.kind == 'word' and prev.value == 'from':
2908 self.blank()
2909 self.add_token('op', val)
2910 self.blank()
2911 else:
2912 self.add_token('op-no-blanks', val)
2913 elif val == '@':
2914 if self.black_mode: # pragma: no cover (black)
2915 if not self.decorator_seen:
2916 self.blank_lines(1)
2917 self.decorator_seen = True
2918 self.clean('blank')
2919 self.add_token('op-no-blanks', val)
2920 self.push_state('decorator')
2921 elif val == ':':
2922 # Treat slices differently.
2923 self.colon(val)
2924 elif val in ',;':
2925 # Pep 8: Avoid extraneous whitespace immediately before
2926 # comma, semicolon, or colon.
2927 self.clean('blank')
2928 self.add_token('op', val)
2929 self.blank()
2930 elif val in '([{':
2931 # Pep 8: Avoid extraneous whitespace immediately inside
2932 # parentheses, brackets or braces.
2933 self.lt(val)
2934 elif val in ')]}':
2935 # Ditto.
2936 self.rt(val)
2937 elif val == '=':
2938 # Pep 8: Don't use spaces around the = sign when used to indicate
2939 # a keyword argument or a default parameter value.
2940 if self.paren_level:
2941 self.clean('blank')
2942 self.add_token('op-no-blanks', val)
2943 else:
2944 self.blank()
2945 self.add_token('op', val)
2946 self.blank()
2947 elif val in '~+-':
2948 self.possible_unary_op(val)
2949 elif val == '*':
2950 self.star_op()
2951 elif val == '**':
2952 self.star_star_op()
2953 else:
2954 # Pep 8: always surround binary operators with a single space.
2955 # '==','+=','-=','*=','**=','/=','//=','%=','!=','<=','>=','<','>',
2956 # '^','~','*','**','&','|','/','//',
2957 # Pep 8: If operators with different priorities are used,
2958 # consider adding whitespace around the operators with the lowest priority(ies).
2959 self.blank()
2960 self.add_token('op', val)
2961 self.blank()
2962 #@+node:ekr.20200107165250.24: *5* orange.do_string
2963 def do_string(self):
2964 """Handle a 'string' token."""
2965 # Careful: continued strings may contain '\r'
2966 val = regularize_nls(self.val)
2967 self.add_token('string', val)
2968 self.blank()
2969 #@+node:ekr.20200210175117.1: *5* orange.do_verbatim
2970 beautify_pat = re.compile(
2971 r'#\s*pragma:\s*beautify\b|#\s*@@beautify|#\s*@\+node|#\s*@[+-]others|#\s*@[+-]<<')
2973 def do_verbatim(self):
2974 """
2975 Handle one token in verbatim mode.
2976 End verbatim mode when the appropriate comment is seen.
2977 """
2978 kind = self.kind
2979 #
2980 # Careful: tokens may contain '\r'
2981 val = regularize_nls(self.val)
2982 if kind == 'comment':
2983 if self.beautify_pat.match(val):
2984 self.verbatim = False
2985 val = val.rstrip()
2986 self.add_token('comment', val)
2987 return
2988 if kind == 'indent':
2989 self.level += 1
2990 self.lws = self.level * self.tab_width * ' '
2991 if kind == 'dedent':
2992 self.level -= 1
2993 self.lws = self.level * self.tab_width * ' '
2994 self.add_token('verbatim', val)
2995 #@+node:ekr.20200107165250.25: *5* orange.do_ws
2996 def do_ws(self):
2997 """
2998 Handle the "ws" pseudo-token.
3000 Put the whitespace only if if ends with backslash-newline.
3001 """
3002 val = self.val
3003 # Handle backslash-newline.
3004 if '\\\n' in val:
3005 self.clean('blank')
3006 self.add_token('op-no-blanks', val)
3007 return
3008 # Handle start-of-line whitespace.
3009 prev = self.code_list[-1]
3010 inner = self.paren_level or self.square_brackets_stack or self.curly_brackets_level
3011 if prev.kind == 'line-indent' and inner:
3012 # Retain the indent that won't be cleaned away.
3013 self.clean('line-indent')
3014 self.add_token('hard-blank', val)
3015 #@+node:ekr.20200107165250.26: *4* orange: Output token generators
3016 #@+node:ekr.20200118145044.1: *5* orange.add_line_end
3017 def add_line_end(self):
3018 """Add a line-end request to the code list."""
3019 # This may be called from do_name as well as do_newline and do_nl.
3020 assert self.token.kind in ('newline', 'nl'), self.token.kind
3021 self.clean('blank') # Important!
3022 self.clean('line-indent')
3023 t = self.add_token('line-end', '\n')
3024 # Distinguish between kinds of 'line-end' tokens.
3025 t.newline_kind = self.token.kind
3026 return t
3027 #@+node:ekr.20200107170523.1: *5* orange.add_token
3028 def add_token(self, kind, value):
3029 """Add an output token to the code list."""
3030 tok = Token(kind, value)
3031 tok.index = self.code_list_index # For debugging only.
3032 self.code_list_index += 1
3033 self.code_list.append(tok)
3034 return tok
3035 #@+node:ekr.20200107165250.27: *5* orange.blank
3036 def blank(self):
3037 """Add a blank request to the code list."""
3038 prev = self.code_list[-1]
3039 if prev.kind not in (
3040 'blank',
3041 'blank-lines',
3042 'file-start',
3043 'hard-blank', # Unique to orange.
3044 'line-end',
3045 'line-indent',
3046 'lt',
3047 'op-no-blanks',
3048 'unary-op',
3049 ):
3050 self.add_token('blank', ' ')
3051 #@+node:ekr.20200107165250.29: *5* orange.blank_lines (black only)
3052 def blank_lines(self, n): # pragma: no cover (black)
3053 """
3054 Add a request for n blank lines to the code list.
3055 Multiple blank-lines request yield at least the maximum of all requests.
3056 """
3057 self.clean_blank_lines()
3058 prev = self.code_list[-1]
3059 if prev.kind == 'file-start':
3060 self.add_token('blank-lines', n)
3061 return
3062 for i in range(0, n + 1):
3063 self.add_token('line-end', '\n')
3064 # Retain the token (intention) for debugging.
3065 self.add_token('blank-lines', n)
3066 self.line_indent()
3067 #@+node:ekr.20200107165250.30: *5* orange.clean
3068 def clean(self, kind):
3069 """Remove the last item of token list if it has the given kind."""
3070 prev = self.code_list[-1]
3071 if prev.kind == kind:
3072 self.code_list.pop()
3073 #@+node:ekr.20200107165250.31: *5* orange.clean_blank_lines
3074 def clean_blank_lines(self):
3075 """
3076 Remove all vestiges of previous blank lines.
3078 Return True if any of the cleaned 'line-end' tokens represented "hard" newlines.
3079 """
3080 cleaned_newline = False
3081 table = ('blank-lines', 'line-end', 'line-indent')
3082 while self.code_list[-1].kind in table:
3083 t = self.code_list.pop()
3084 if t.kind == 'line-end' and getattr(t, 'newline_kind', None) != 'nl':
3085 cleaned_newline = True
3086 return cleaned_newline
3087 #@+node:ekr.20200107165250.32: *5* orange.colon
3088 def colon(self, val):
3089 """Handle a colon."""
3091 def is_expr(node):
3092 """True if node is any expression other than += number."""
3093 if isinstance(node, (ast.BinOp, ast.Call, ast.IfExp)):
3094 return True
3095 return isinstance(
3096 node, ast.UnaryOp) and not isinstance(node.operand, ast.Num)
3098 node = self.token.node
3099 self.clean('blank')
3100 if not isinstance(node, ast.Slice):
3101 self.add_token('op', val)
3102 self.blank()
3103 return
3104 # A slice.
3105 lower = getattr(node, 'lower', None)
3106 upper = getattr(node, 'upper', None)
3107 step = getattr(node, 'step', None)
3108 if any(is_expr(z) for z in (lower, upper, step)):
3109 prev = self.code_list[-1]
3110 if prev.value not in '[:':
3111 self.blank()
3112 self.add_token('op', val)
3113 self.blank()
3114 else:
3115 self.add_token('op-no-blanks', val)
3116 #@+node:ekr.20200107165250.33: *5* orange.line_end
3117 def line_end(self):
3118 """Add a line-end request to the code list."""
3119 # This should be called only be do_newline and do_nl.
3120 node, token = self.token.statement_node, self.token
3121 assert token.kind in ('newline', 'nl'), (token.kind, g.callers())
3122 # Create the 'line-end' output token.
3123 self.add_line_end()
3124 # Attempt to split the line.
3125 was_split = self.split_line(node, token)
3126 # Attempt to join the line only if it has not just been split.
3127 if not was_split and self.max_join_line_length > 0:
3128 self.join_lines(node, token)
3129 self.line_indent()
3130 # Add the indentation for all lines
3131 # until the next indent or unindent token.
3132 #@+node:ekr.20200107165250.40: *5* orange.line_indent
3133 def line_indent(self):
3134 """Add a line-indent token."""
3135 self.clean('line-indent')
3136 # Defensive. Should never happen.
3137 self.add_token('line-indent', self.lws)
3138 #@+node:ekr.20200107165250.41: *5* orange.lt & rt
3139 #@+node:ekr.20200107165250.42: *6* orange.lt
3140 def lt(self, val):
3141 """Generate code for a left paren or curly/square bracket."""
3142 assert val in '([{', repr(val)
3143 if val == '(':
3144 self.paren_level += 1
3145 elif val == '[':
3146 self.square_brackets_stack.append(False)
3147 else:
3148 self.curly_brackets_level += 1
3149 self.clean('blank')
3150 prev = self.code_list[-1]
3151 if prev.kind in ('op', 'word-op'):
3152 self.blank()
3153 self.add_token('lt', val)
3154 elif prev.kind == 'word':
3155 # Only suppress blanks before '(' or '[' for non-keyworks.
3156 if val == '{' or prev.value in ('if', 'else', 'return', 'for'):
3157 self.blank()
3158 elif val == '(':
3159 self.in_arg_list += 1
3160 self.add_token('lt', val)
3161 else:
3162 self.clean('blank')
3163 self.add_token('op-no-blanks', val)
3164 #@+node:ekr.20200107165250.43: *6* orange.rt
3165 def rt(self, val):
3166 """Generate code for a right paren or curly/square bracket."""
3167 assert val in ')]}', repr(val)
3168 if val == ')':
3169 self.paren_level -= 1
3170 self.in_arg_list = max(0, self.in_arg_list - 1)
3171 elif val == ']':
3172 self.square_brackets_stack.pop()
3173 else:
3174 self.curly_brackets_level -= 1
3175 self.clean('blank')
3176 self.add_token('rt', val)
3177 #@+node:ekr.20200107165250.45: *5* orange.possible_unary_op & unary_op
3178 def possible_unary_op(self, s):
3179 """Add a unary or binary op to the token list."""
3180 node = self.token.node
3181 self.clean('blank')
3182 if isinstance(node, ast.UnaryOp):
3183 self.unary_op(s)
3184 else:
3185 self.blank()
3186 self.add_token('op', s)
3187 self.blank()
3189 def unary_op(self, s):
3190 """Add an operator request to the code list."""
3191 assert s and isinstance(s, str), repr(s)
3192 self.clean('blank')
3193 prev = self.code_list[-1]
3194 if prev.kind == 'lt':
3195 self.add_token('unary-op', s)
3196 else:
3197 self.blank()
3198 self.add_token('unary-op', s)
3199 #@+node:ekr.20200107165250.46: *5* orange.star_op
3200 def star_op(self):
3201 """Put a '*' op, with special cases for *args."""
3202 val = '*'
3203 self.clean('blank')
3204 if self.paren_level > 0:
3205 prev = self.code_list[-1]
3206 if prev.kind == 'lt' or (prev.kind, prev.value) == ('op', ','):
3207 self.blank()
3208 self.add_token('op', val)
3209 return
3210 self.blank()
3211 self.add_token('op', val)
3212 self.blank()
3213 #@+node:ekr.20200107165250.47: *5* orange.star_star_op
3214 def star_star_op(self):
3215 """Put a ** operator, with a special case for **kwargs."""
3216 val = '**'
3217 self.clean('blank')
3218 if self.paren_level > 0:
3219 prev = self.code_list[-1]
3220 if prev.kind == 'lt' or (prev.kind, prev.value) == ('op', ','):
3221 self.blank()
3222 self.add_token('op', val)
3223 return
3224 self.blank()
3225 self.add_token('op', val)
3226 self.blank()
3227 #@+node:ekr.20200107165250.48: *5* orange.word & word_op
3228 def word(self, s):
3229 """Add a word request to the code list."""
3230 assert s and isinstance(s, str), repr(s)
3231 if self.square_brackets_stack:
3232 # A previous 'op-no-blanks' token may cancel this blank.
3233 self.blank()
3234 self.add_token('word', s)
3235 elif self.in_arg_list > 0:
3236 self.add_token('word', s)
3237 self.blank()
3238 else:
3239 self.blank()
3240 self.add_token('word', s)
3241 self.blank()
3243 def word_op(self, s):
3244 """Add a word-op request to the code list."""
3245 assert s and isinstance(s, str), repr(s)
3246 self.blank()
3247 self.add_token('word-op', s)
3248 self.blank()
3249 #@+node:ekr.20200118120049.1: *4* orange: Split/join
3250 #@+node:ekr.20200107165250.34: *5* orange.split_line & helpers
3251 def split_line(self, node, token):
3252 """
3253 Split token's line, if possible and enabled.
3255 Return True if the line was broken into two or more lines.
3256 """
3257 assert token.kind in ('newline', 'nl'), repr(token)
3258 # Return if splitting is disabled:
3259 if self.max_split_line_length <= 0: # pragma: no cover (user option)
3260 return False
3261 # Return if the node can't be split.
3262 if not is_long_statement(node):
3263 return False
3264 # Find the *output* tokens of the previous lines.
3265 line_tokens = self.find_prev_line()
3266 line_s = ''.join([z.to_string() for z in line_tokens])
3267 # Do nothing for short lines.
3268 if len(line_s) < self.max_split_line_length:
3269 return False
3270 # Return if the previous line has no opening delim: (, [ or {.
3271 if not any(z.kind == 'lt' for z in line_tokens): # pragma: no cover (defensive)
3272 return False
3273 prefix = self.find_line_prefix(line_tokens)
3274 # Calculate the tail before cleaning the prefix.
3275 tail = line_tokens[len(prefix) :]
3276 # Cut back the token list: subtract 1 for the trailing line-end.
3277 self.code_list = self.code_list[: len(self.code_list) - len(line_tokens) - 1]
3278 # Append the tail, splitting it further, as needed.
3279 self.append_tail(prefix, tail)
3280 # Add the line-end token deleted by find_line_prefix.
3281 self.add_token('line-end', '\n')
3282 return True
3283 #@+node:ekr.20200107165250.35: *6* orange.append_tail
3284 def append_tail(self, prefix, tail):
3285 """Append the tail tokens, splitting the line further as necessary."""
3286 tail_s = ''.join([z.to_string() for z in tail])
3287 if len(tail_s) < self.max_split_line_length:
3288 # Add the prefix.
3289 self.code_list.extend(prefix)
3290 # Start a new line and increase the indentation.
3291 self.add_token('line-end', '\n')
3292 self.add_token('line-indent', self.lws + ' ' * 4)
3293 self.code_list.extend(tail)
3294 return
3295 # Still too long. Split the line at commas.
3296 self.code_list.extend(prefix)
3297 # Start a new line and increase the indentation.
3298 self.add_token('line-end', '\n')
3299 self.add_token('line-indent', self.lws + ' ' * 4)
3300 open_delim = Token(kind='lt', value=prefix[-1].value)
3301 value = open_delim.value.replace('(', ')').replace('[', ']').replace('{', '}')
3302 close_delim = Token(kind='rt', value=value)
3303 delim_count = 1
3304 lws = self.lws + ' ' * 4
3305 for i, t in enumerate(tail):
3306 if t.kind == 'op' and t.value == ',':
3307 if delim_count == 1:
3308 # Start a new line.
3309 self.add_token('op-no-blanks', ',')
3310 self.add_token('line-end', '\n')
3311 self.add_token('line-indent', lws)
3312 # Kill a following blank.
3313 if i + 1 < len(tail):
3314 next_t = tail[i + 1]
3315 if next_t.kind == 'blank':
3316 next_t.kind = 'no-op'
3317 next_t.value = ''
3318 else:
3319 self.code_list.append(t)
3320 elif t.kind == close_delim.kind and t.value == close_delim.value:
3321 # Done if the delims match.
3322 delim_count -= 1
3323 if delim_count == 0:
3324 # Start a new line
3325 self.add_token('op-no-blanks', ',')
3326 self.add_token('line-end', '\n')
3327 self.add_token('line-indent', self.lws)
3328 self.code_list.extend(tail[i:])
3329 return
3330 lws = lws[:-4]
3331 self.code_list.append(t)
3332 elif t.kind == open_delim.kind and t.value == open_delim.value:
3333 delim_count += 1
3334 lws = lws + ' ' * 4
3335 self.code_list.append(t)
3336 else:
3337 self.code_list.append(t)
3338 g.trace('BAD DELIMS', delim_count) # pragma: no cover
3339 #@+node:ekr.20200107165250.36: *6* orange.find_prev_line
3340 def find_prev_line(self):
3341 """Return the previous line, as a list of tokens."""
3342 line = []
3343 for t in reversed(self.code_list[:-1]):
3344 if t.kind in ('hard-newline', 'line-end'):
3345 break
3346 line.append(t)
3347 return list(reversed(line))
3348 #@+node:ekr.20200107165250.37: *6* orange.find_line_prefix
3349 def find_line_prefix(self, token_list):
3350 """
3351 Return all tokens up to and including the first lt token.
3352 Also add all lt tokens directly following the first lt token.
3353 """
3354 result = []
3355 for i, t in enumerate(token_list):
3356 result.append(t)
3357 if t.kind == 'lt':
3358 break
3359 return result
3360 #@+node:ekr.20200107165250.39: *5* orange.join_lines
3361 def join_lines(self, node, token):
3362 """
3363 Join preceding lines, if possible and enabled.
3364 token is a line_end token. node is the corresponding ast node.
3365 """
3366 if self.max_join_line_length <= 0: # pragma: no cover (user option)
3367 return
3368 assert token.kind in ('newline', 'nl'), repr(token)
3369 if token.kind == 'nl':
3370 return
3371 # Scan backward in the *code* list,
3372 # looking for 'line-end' tokens with tok.newline_kind == 'nl'
3373 nls = 0
3374 i = len(self.code_list) - 1
3375 t = self.code_list[i]
3376 assert t.kind == 'line-end', repr(t)
3377 # Not all tokens have a newline_kind ivar.
3378 assert t.newline_kind == 'newline' # type:ignore
3379 i -= 1
3380 while i >= 0:
3381 t = self.code_list[i]
3382 if t.kind == 'comment':
3383 # Can't join.
3384 return
3385 if t.kind == 'string' and not self.allow_joined_strings:
3386 # An EKR preference: don't join strings, no matter what black does.
3387 # This allows "short" f-strings to be aligned.
3388 return
3389 if t.kind == 'line-end':
3390 if getattr(t, 'newline_kind', None) == 'nl':
3391 nls += 1
3392 else:
3393 break # pragma: no cover
3394 i -= 1
3395 # Retain at the file-start token.
3396 if i <= 0:
3397 i = 1
3398 if nls <= 0: # pragma: no cover (rare)
3399 return
3400 # Retain line-end and and any following line-indent.
3401 # Required, so that the regex below won't eat too much.
3402 while True:
3403 t = self.code_list[i]
3404 if t.kind == 'line-end':
3405 if getattr(t, 'newline_kind', None) == 'nl': # pragma: no cover (rare)
3406 nls -= 1
3407 i += 1
3408 elif self.code_list[i].kind == 'line-indent':
3409 i += 1
3410 else:
3411 break # pragma: no cover (defensive)
3412 if nls <= 0: # pragma: no cover (defensive)
3413 return
3414 # Calculate the joined line.
3415 tail = self.code_list[i:]
3416 tail_s = tokens_to_string(tail)
3417 tail_s = re.sub(r'\n\s*', ' ', tail_s)
3418 tail_s = tail_s.replace('( ', '(').replace(' )', ')')
3419 tail_s = tail_s.rstrip()
3420 # Don't join the lines if they would be too long.
3421 if len(tail_s) > self.max_join_line_length: # pragma: no cover (defensive)
3422 return
3423 # Cut back the code list.
3424 self.code_list = self.code_list[:i]
3425 # Add the new output tokens.
3426 self.add_token('string', tail_s)
3427 self.add_token('line-end', '\n')
3428 #@-others
3429#@+node:ekr.20200107170847.1: *3* class OrangeSettings
3430class OrangeSettings:
3432 pass
3433#@+node:ekr.20200107170126.1: *3* class ParseState
3434class ParseState:
3435 """
3436 A class representing items in the parse state stack.
3438 The present states:
3440 'file-start': Ensures the stack stack is never empty.
3442 'decorator': The last '@' was a decorator.
3444 do_op(): push_state('decorator')
3445 do_name(): pops the stack if state.kind == 'decorator'.
3447 'indent': The indentation level for 'class' and 'def' names.
3449 do_name(): push_state('indent', self.level)
3450 do_dendent(): pops the stack once or twice if state.value == self.level.
3452 """
3454 def __init__(self, kind, value):
3455 self.kind = kind
3456 self.value = value
3458 def __repr__(self):
3459 return f"State: {self.kind} {self.value!r}" # pragma: no cover
3461 __str__ = __repr__
3462#@+node:ekr.20200122033203.1: ** TOT classes...
3463#@+node:ekr.20191222083453.1: *3* class Fstringify (TOT)
3464class Fstringify(TokenOrderTraverser):
3465 """A class to fstringify files."""
3467 silent = True # for pytest. Defined in all entries.
3468 line_number = 0
3469 line = ''
3471 #@+others
3472 #@+node:ekr.20191222083947.1: *4* fs.fstringify
3473 def fstringify(self, contents, filename, tokens, tree):
3474 """
3475 Fstringify.fstringify:
3477 f-stringify the sources given by (tokens, tree).
3479 Return the resulting string.
3480 """
3481 self.filename = filename
3482 self.tokens = tokens
3483 self.tree = tree
3484 # Prepass: reassign tokens.
3485 ReassignTokens().reassign(filename, tokens, tree)
3486 # Main pass.
3487 self.traverse(self.tree)
3488 results = tokens_to_string(self.tokens)
3489 return results
3490 #@+node:ekr.20200103054101.1: *4* fs.fstringify_file (entry)
3491 def fstringify_file(self, filename): # pragma: no cover
3492 """
3493 Fstringify.fstringify_file.
3495 The entry point for the fstringify-file command.
3497 f-stringify the given external file with the Fstrinfify class.
3499 Return True if the file was changed.
3500 """
3501 tag = 'fstringify-file'
3502 self.filename = filename
3503 self.silent = False
3504 tog = TokenOrderGenerator()
3505 try:
3506 contents, encoding, tokens, tree = tog.init_from_file(filename)
3507 if not contents or not tokens or not tree:
3508 print(f"{tag}: Can not fstringify: {filename}")
3509 return False
3510 results = self.fstringify(contents, filename, tokens, tree)
3511 except Exception as e:
3512 print(e)
3513 return False
3514 # Something besides newlines must change.
3515 changed = regularize_nls(contents) != regularize_nls(results)
3516 status = 'Wrote' if changed else 'Unchanged'
3517 print(f"{tag}: {status:>9}: {filename}")
3518 if changed:
3519 write_file(filename, results, encoding=encoding)
3520 return changed
3521 #@+node:ekr.20200103065728.1: *4* fs.fstringify_file_diff (entry)
3522 def fstringify_file_diff(self, filename): # pragma: no cover
3523 """
3524 Fstringify.fstringify_file_diff.
3526 The entry point for the diff-fstringify-file command.
3528 Print the diffs that would resulf from the fstringify-file command.
3530 Return True if the file would be changed.
3531 """
3532 tag = 'diff-fstringify-file'
3533 self.filename = filename
3534 self.silent = False
3535 tog = TokenOrderGenerator()
3536 try:
3537 contents, encoding, tokens, tree = tog.init_from_file(filename)
3538 if not contents or not tokens or not tree:
3539 return False
3540 results = self.fstringify(contents, filename, tokens, tree)
3541 except Exception as e:
3542 print(e)
3543 return False
3544 # Something besides newlines must change.
3545 changed = regularize_nls(contents) != regularize_nls(results)
3546 if changed:
3547 show_diffs(contents, results, filename=filename)
3548 else:
3549 print(f"{tag}: Unchanged: {filename}")
3550 return changed
3551 #@+node:ekr.20200112060218.1: *4* fs.fstringify_file_silent (entry)
3552 def fstringify_file_silent(self, filename): # pragma: no cover
3553 """
3554 Fstringify.fstringify_file_silent.
3556 The entry point for the silent-fstringify-file command.
3558 fstringify the given file, suppressing all but serious error messages.
3560 Return True if the file would be changed.
3561 """
3562 self.filename = filename
3563 self.silent = True
3564 tog = TokenOrderGenerator()
3565 try:
3566 contents, encoding, tokens, tree = tog.init_from_file(filename)
3567 if not contents or not tokens or not tree:
3568 return False
3569 results = self.fstringify(contents, filename, tokens, tree)
3570 except Exception as e:
3571 print(e)
3572 return False
3573 # Something besides newlines must change.
3574 changed = regularize_nls(contents) != regularize_nls(results)
3575 status = 'Wrote' if changed else 'Unchanged'
3576 # Write the results.
3577 print(f"{status:>9}: {filename}")
3578 if changed:
3579 write_file(filename, results, encoding=encoding)
3580 return changed
3581 #@+node:ekr.20191222095754.1: *4* fs.make_fstring & helpers
3582 def make_fstring(self, node):
3583 """
3584 node is BinOp node representing an '%' operator.
3585 node.left is an ast.Str node.
3586 node.right reprsents the RHS of the '%' operator.
3588 Convert this tree to an f-string, if possible.
3589 Replace the node's entire tree with a new ast.Str node.
3590 Replace all the relevant tokens with a single new 'string' token.
3591 """
3592 trace = False
3593 assert isinstance(node.left, ast.Str), (repr(node.left), g.callers())
3594 # Careful: use the tokens, not Str.s. This preserves spelling.
3595 lt_token_list = get_node_token_list(node.left, self.tokens)
3596 if not lt_token_list: # pragma: no cover
3597 print('')
3598 g.trace('Error: no token list in Str')
3599 dump_tree(self.tokens, node)
3600 print('')
3601 return
3602 lt_s = tokens_to_string(lt_token_list)
3603 if trace:
3604 g.trace('lt_s:', lt_s) # pragma: no cover
3605 # Get the RHS values, a list of token lists.
3606 values = self.scan_rhs(node.right)
3607 if trace: # pragma: no cover
3608 for i, z in enumerate(values):
3609 dump_tokens(z, tag=f"RHS value {i}")
3610 # Compute rt_s, self.line and self.line_number for later messages.
3611 token0 = lt_token_list[0]
3612 self.line_number = token0.line_number
3613 self.line = token0.line.strip()
3614 rt_s = ''.join(tokens_to_string(z) for z in values)
3615 # Get the % specs in the LHS string.
3616 specs = self.scan_format_string(lt_s)
3617 if len(values) != len(specs): # pragma: no cover
3618 self.message(
3619 f"can't create f-fstring: {lt_s!r}\n"
3620 f":f-string mismatch: "
3621 f"{len(values)} value{g.plural(len(values))}, "
3622 f"{len(specs)} spec{g.plural(len(specs))}")
3623 return
3624 # Replace specs with values.
3625 results = self.substitute_values(lt_s, specs, values)
3626 result = self.compute_result(lt_s, results)
3627 if not result:
3628 return
3629 # Remove whitespace before ! and :.
3630 result = self.clean_ws(result)
3631 # Show the results
3632 if trace: # pragma: no cover
3633 before = (lt_s + ' % ' + rt_s).replace('\n', '<NL>')
3634 after = result.replace('\n', '<NL>')
3635 self.message(
3636 f"trace:\n"
3637 f":from: {before!s}\n"
3638 f": to: {after!s}")
3639 # Adjust the tree and the token list.
3640 self.replace(node, result, values)
3641 #@+node:ekr.20191222102831.3: *5* fs.clean_ws
3642 ws_pat = re.compile(r'(\s+)([:!][0-9]\})')
3644 def clean_ws(self, s):
3645 """Carefully remove whitespace before ! and : specifiers."""
3646 s = re.sub(self.ws_pat, r'\2', s)
3647 return s
3648 #@+node:ekr.20191222102831.4: *5* fs.compute_result & helpers
3649 def compute_result(self, lt_s, tokens):
3650 """
3651 Create the final result, with various kinds of munges.
3653 Return the result string, or None if there are errors.
3654 """
3655 # Fail if there is a backslash within { and }.
3656 if not self.check_back_slashes(lt_s, tokens):
3657 return None # pragma: no cover
3658 # Ensure consistent quotes.
3659 if not self.change_quotes(lt_s, tokens):
3660 return None # pragma: no cover
3661 return tokens_to_string(tokens)
3662 #@+node:ekr.20200215074309.1: *6* fs.check_back_slashes
3663 def check_back_slashes(self, lt_s, tokens):
3664 """
3665 Return False if any backslash appears with an {} expression.
3667 Tokens is a list of lokens on the RHS.
3668 """
3669 count = 0
3670 for z in tokens:
3671 if z.kind == 'op':
3672 if z.value == '{':
3673 count += 1
3674 elif z.value == '}':
3675 count -= 1
3676 if (count % 2) == 1 and '\\' in z.value:
3677 if not self.silent:
3678 self.message( # pragma: no cover (silent during unit tests)
3679 f"can't create f-fstring: {lt_s!r}\n"
3680 f":backslash in {{expr}}:")
3681 return False
3682 return True
3683 #@+node:ekr.20191222102831.7: *6* fs.change_quotes
3684 def change_quotes(self, lt_s, aList):
3685 """
3686 Carefully check quotes in all "inner" tokens as necessary.
3688 Return False if the f-string would contain backslashes.
3690 We expect the following "outer" tokens.
3692 aList[0]: ('string', 'f')
3693 aList[1]: ('string', a single or double quote.
3694 aList[-1]: ('string', a single or double quote matching aList[1])
3695 """
3696 # Sanity checks.
3697 if len(aList) < 4:
3698 return True # pragma: no cover (defensive)
3699 if not lt_s: # pragma: no cover (defensive)
3700 self.message("can't create f-fstring: no lt_s!")
3701 return False
3702 delim = lt_s[0]
3703 # Check tokens 0, 1 and -1.
3704 token0 = aList[0]
3705 token1 = aList[1]
3706 token_last = aList[-1]
3707 for token in token0, token1, token_last:
3708 # These are the only kinds of tokens we expect to generate.
3709 ok = (
3710 token.kind == 'string' or
3711 token.kind == 'op' and token.value in '{}')
3712 if not ok: # pragma: no cover (defensive)
3713 self.message(
3714 f"unexpected token: {token.kind} {token.value}\n"
3715 f": lt_s: {lt_s!r}")
3716 return False
3717 # These checks are important...
3718 if token0.value != 'f':
3719 return False # pragma: no cover (defensive)
3720 val1 = token1.value
3721 if delim != val1:
3722 return False # pragma: no cover (defensive)
3723 val_last = token_last.value
3724 if delim != val_last:
3725 return False # pragma: no cover (defensive)
3726 #
3727 # Check for conflicting delims, preferring f"..." to f'...'.
3728 for delim in ('"', "'"):
3729 aList[1] = aList[-1] = Token('string', delim)
3730 for z in aList[2:-1]:
3731 if delim in z.value:
3732 break
3733 else:
3734 return True
3735 if not self.silent: # pragma: no cover (silent unit test)
3736 self.message(
3737 f"can't create f-fstring: {lt_s!r}\n"
3738 f": conflicting delims:")
3739 return False
3740 #@+node:ekr.20191222102831.6: *5* fs.munge_spec
3741 def munge_spec(self, spec):
3742 """
3743 Return (head, tail).
3745 The format is spec !head:tail or :tail
3747 Example specs: s2, r3
3748 """
3749 # To do: handle more specs.
3750 head, tail = [], []
3751 if spec.startswith('+'):
3752 pass # Leave it alone!
3753 elif spec.startswith('-'):
3754 tail.append('>')
3755 spec = spec[1:]
3756 if spec.endswith('s'):
3757 spec = spec[:-1]
3758 if spec.endswith('r'):
3759 head.append('r')
3760 spec = spec[:-1]
3761 tail_s = ''.join(tail) + spec
3762 head_s = ''.join(head)
3763 return head_s, tail_s
3764 #@+node:ekr.20191222102831.9: *5* fs.scan_format_string
3765 # format_spec ::= [[fill]align][sign][#][0][width][,][.precision][type]
3766 # fill ::= <any character>
3767 # align ::= "<" | ">" | "=" | "^"
3768 # sign ::= "+" | "-" | " "
3769 # width ::= integer
3770 # precision ::= integer
3771 # type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
3773 format_pat = re.compile(r'%(([+-]?[0-9]*(\.)?[0.9]*)*[bcdeEfFgGnoxrsX]?)')
3775 def scan_format_string(self, s):
3776 """Scan the format string s, returning a list match objects."""
3777 result = list(re.finditer(self.format_pat, s))
3778 return result
3779 #@+node:ekr.20191222104224.1: *5* fs.scan_rhs
3780 def scan_rhs(self, node):
3781 """
3782 Scan the right-hand side of a potential f-string.
3784 Return a list of the token lists for each element.
3785 """
3786 trace = False
3787 # First, Try the most common cases.
3788 if isinstance(node, ast.Str):
3789 token_list = get_node_token_list(node, self.tokens)
3790 return [token_list]
3791 if isinstance(node, (list, tuple, ast.Tuple)):
3792 result = []
3793 elts = node.elts if isinstance(node, ast.Tuple) else node
3794 for i, elt in enumerate(elts):
3795 tokens = tokens_for_node(self.filename, elt, self.tokens)
3796 result.append(tokens)
3797 if trace: # pragma: no cover
3798 g.trace(f"item: {i}: {elt.__class__.__name__}")
3799 g.printObj(tokens, tag=f"Tokens for item {i}")
3800 return result
3801 # Now we expect only one result.
3802 tokens = tokens_for_node(self.filename, node, self.tokens)
3803 return [tokens]
3804 #@+node:ekr.20191226155316.1: *5* fs.substitute_values
3805 def substitute_values(self, lt_s, specs, values):
3806 """
3807 Replace specifiers with values in lt_s string.
3809 Double { and } as needed.
3810 """
3811 i, results = 0, [Token('string', 'f')]
3812 for spec_i, m in enumerate(specs):
3813 value = tokens_to_string(values[spec_i])
3814 start, end, spec = m.start(0), m.end(0), m.group(1)
3815 if start > i:
3816 val = lt_s[i:start].replace('{', '{{').replace('}', '}}')
3817 results.append(Token('string', val[0]))
3818 results.append(Token('string', val[1:]))
3819 head, tail = self.munge_spec(spec)
3820 results.append(Token('op', '{'))
3821 results.append(Token('string', value))
3822 if head:
3823 results.append(Token('string', '!'))
3824 results.append(Token('string', head))
3825 if tail:
3826 results.append(Token('string', ':'))
3827 results.append(Token('string', tail))
3828 results.append(Token('op', '}'))
3829 i = end
3830 # Add the tail.
3831 tail = lt_s[i:]
3832 if tail:
3833 tail = tail.replace('{', '{{').replace('}', '}}')
3834 results.append(Token('string', tail[:-1]))
3835 results.append(Token('string', tail[-1]))
3836 return results
3837 #@+node:ekr.20200214142019.1: *4* fs.message
3838 def message(self, message): # pragma: no cover.
3839 """
3840 Print one or more message lines aligned on the first colon of the message.
3841 """
3842 # Print a leading blank line.
3843 print('')
3844 # Calculate the padding.
3845 lines = g.splitLines(message)
3846 pad = max(lines[0].find(':'), 30)
3847 # Print the first line.
3848 z = lines[0]
3849 i = z.find(':')
3850 if i == -1:
3851 print(z.rstrip())
3852 else:
3853 print(f"{z[:i+2].strip():>{pad+1}} {z[i+2:].strip()}")
3854 # Print the remaining message lines.
3855 for z in lines[1:]:
3856 if z.startswith('<'):
3857 # Print left aligned.
3858 print(z[1:].strip())
3859 elif z.startswith(':') and -1 < z[1:].find(':') <= pad:
3860 # Align with the first line.
3861 i = z[1:].find(':')
3862 print(f"{z[1:i+2].strip():>{pad+1}} {z[i+2:].strip()}")
3863 elif z.startswith('>'):
3864 # Align after the aligning colon.
3865 print(f"{' ':>{pad+2}}{z[1:].strip()}")
3866 else:
3867 # Default: Put the entire line after the aligning colon.
3868 print(f"{' ':>{pad+2}}{z.strip()}")
3869 # Print the standard message lines.
3870 file_s = f"{'file':>{pad}}"
3871 ln_n_s = f"{'line number':>{pad}}"
3872 line_s = f"{'line':>{pad}}"
3873 print(
3874 f"{file_s}: {self.filename}\n"
3875 f"{ln_n_s}: {self.line_number}\n"
3876 f"{line_s}: {self.line!r}")
3877 #@+node:ekr.20191225054848.1: *4* fs.replace
3878 def replace(self, node, s, values):
3879 """
3880 Replace node with an ast.Str node for s.
3881 Replace all tokens in the range of values with a single 'string' node.
3882 """
3883 # Replace the tokens...
3884 tokens = tokens_for_node(self.filename, node, self.tokens)
3885 i1 = i = tokens[0].index
3886 replace_token(self.tokens[i], 'string', s)
3887 j = 1
3888 while j < len(tokens):
3889 replace_token(self.tokens[i1 + j], 'killed', '')
3890 j += 1
3891 # Replace the node.
3892 new_node = ast.Str()
3893 new_node.s = s
3894 replace_node(new_node, node)
3895 # Update the token.
3896 token = self.tokens[i1]
3897 token.node = new_node # type:ignore
3898 # Update the token list.
3899 add_token_to_token_list(token, new_node)
3900 #@+node:ekr.20191231055008.1: *4* fs.visit
3901 def visit(self, node):
3902 """
3903 FStringify.visit. (Overrides TOT visit).
3905 Call fs.makes_fstring if node is a BinOp that might be converted to an
3906 f-string.
3907 """
3908 if (
3909 isinstance(node, ast.BinOp)
3910 and op_name(node.op) == '%'
3911 and isinstance(node.left, ast.Str)
3912 ):
3913 self.make_fstring(node)
3914 #@-others
3915#@+node:ekr.20191231084514.1: *3* class ReassignTokens (TOT)
3916class ReassignTokens(TokenOrderTraverser):
3917 """A class that reassigns tokens to more appropriate ast nodes."""
3918 #@+others
3919 #@+node:ekr.20191231084640.1: *4* reassign.reassign
3920 def reassign(self, filename, tokens, tree):
3921 """The main entry point."""
3922 self.filename = filename
3923 self.tokens = tokens
3924 self.tree = tree
3925 self.traverse(tree)
3926 #@+node:ekr.20191231084853.1: *4* reassign.visit
3927 def visit(self, node):
3928 """ReassignTokens.visit"""
3929 # For now, just handle call nodes.
3930 if not isinstance(node, ast.Call):
3931 return
3932 tokens = tokens_for_node(self.filename, node, self.tokens)
3933 node0, node9 = tokens[0].node, tokens[-1].node
3934 nca = nearest_common_ancestor(node0, node9)
3935 if not nca:
3936 return
3937 # g.trace(f"{self.filename:20} nca: {nca.__class__.__name__}")
3938 # Associate () with the call node.
3939 i = tokens[-1].index
3940 j = find_paren_token(i + 1, self.tokens)
3941 if j is None:
3942 return # pragma: no cover
3943 k = find_paren_token(j + 1, self.tokens)
3944 if k is None:
3945 return # pragma: no cover
3946 self.tokens[j].node = nca # type:ignore
3947 self.tokens[k].node = nca # type:ignore
3948 add_token_to_token_list(self.tokens[j], nca)
3949 add_token_to_token_list(self.tokens[k], nca)
3950 #@-others
3951#@+node:ekr.20191227170803.1: ** Token classes
3952#@+node:ekr.20191110080535.1: *3* class Token
3953class Token:
3954 """
3955 A class representing a 5-tuple, plus additional data.
3957 The TokenOrderTraverser class creates a list of such tokens.
3958 """
3960 def __init__(self, kind, value):
3962 self.kind = kind
3963 self.value = value
3964 #
3965 # Injected by Tokenizer.add_token.
3966 self.five_tuple = None
3967 self.index = 0
3968 # The entire line containing the token.
3969 # Same as five_tuple.line.
3970 self.line = ''
3971 # The line number, for errors and dumps.
3972 # Same as five_tuple.start[0]
3973 self.line_number = 0
3974 #
3975 # Injected by Tokenizer.add_token.
3976 self.level = 0
3977 self.node = None
3979 def __repr__(self): # pragma: no cover
3980 nl_kind = getattr(self, 'newline_kind', '')
3981 s = f"{self.kind:}.{self.index:<3}"
3982 return f"{s:>18}:{nl_kind:7} {self.show_val(80)}"
3984 def __str__(self): # pragma: no cover
3985 nl_kind = getattr(self, 'newline_kind', '')
3986 return f"{self.kind}.{self.index:<3}{nl_kind:8} {self.show_val(80)}"
3988 def to_string(self):
3989 """Return the contribution of the token to the source file."""
3990 return self.value if isinstance(self.value, str) else ''
3991 #@+others
3992 #@+node:ekr.20191231114927.1: *4* token.brief_dump
3993 def brief_dump(self): # pragma: no cover
3994 """Dump a token."""
3995 return (
3996 f"{self.index:>3} line: {self.line_number:<2} "
3997 f"{self.kind:>11} {self.show_val(100)}")
3998 #@+node:ekr.20200223022950.11: *4* token.dump
3999 def dump(self): # pragma: no cover
4000 """Dump a token and related links."""
4001 # Let block.
4002 node_id = self.node.node_index if self.node else ''
4003 node_cn = self.node.__class__.__name__ if self.node else ''
4004 return (
4005 f"{self.line_number:4} "
4006 f"{node_id:5} {node_cn:16} "
4007 f"{self.index:>5} {self.kind:>11} "
4008 f"{self.show_val(100)}")
4009 #@+node:ekr.20200121081151.1: *4* token.dump_header
4010 def dump_header(self): # pragma: no cover
4011 """Print the header for token.dump"""
4012 print(
4013 f"\n"
4014 f" node {'':10} token token\n"
4015 f"line index class {'':10} index kind value\n"
4016 f"==== ===== ===== {'':10} ===== ==== =====\n")
4017 #@+node:ekr.20191116154328.1: *4* token.error_dump
4018 def error_dump(self): # pragma: no cover
4019 """Dump a token or result node for error message."""
4020 if self.node:
4021 node_id = obj_id(self.node)
4022 node_s = f"{node_id} {self.node.__class__.__name__}"
4023 else:
4024 node_s = "None"
4025 return (
4026 f"index: {self.index:<3} {self.kind:>12} {self.show_val(20):<20} "
4027 f"{node_s}")
4028 #@+node:ekr.20191113095507.1: *4* token.show_val
4029 def show_val(self, truncate_n): # pragma: no cover
4030 """Return the token.value field."""
4031 if self.kind in ('ws', 'indent'):
4032 val = len(self.value)
4033 elif self.kind == 'string':
4034 # Important: don't add a repr for 'string' tokens.
4035 # repr just adds another layer of confusion.
4036 val = g.truncate(self.value, truncate_n) # type:ignore
4037 else:
4038 val = g.truncate(repr(self.value), truncate_n) # type:ignore
4039 return val
4040 #@-others
4041#@+node:ekr.20191110165235.1: *3* class Tokenizer
4042class Tokenizer:
4044 """Create a list of Tokens from contents."""
4046 results: List[Token] = []
4048 #@+others
4049 #@+node:ekr.20191110165235.2: *4* tokenizer.add_token
4050 token_index = 0
4051 prev_line_token = None
4053 def add_token(self, kind, five_tuple, line, s_row, value):
4054 """
4055 Add a token to the results list.
4057 Subclasses could override this method to filter out specific tokens.
4058 """
4059 tok = Token(kind, value)
4060 tok.five_tuple = five_tuple
4061 tok.index = self.token_index
4062 # Bump the token index.
4063 self.token_index += 1
4064 tok.line = line
4065 tok.line_number = s_row
4066 self.results.append(tok)
4067 #@+node:ekr.20191110170551.1: *4* tokenizer.check_results
4068 def check_results(self, contents):
4070 # Split the results into lines.
4071 result = ''.join([z.to_string() for z in self.results])
4072 result_lines = g.splitLines(result)
4073 # Check.
4074 ok = result == contents and result_lines == self.lines
4075 assert ok, (
4076 f"\n"
4077 f" result: {result!r}\n"
4078 f" contents: {contents!r}\n"
4079 f"result_lines: {result_lines}\n"
4080 f" lines: {self.lines}"
4081 )
4082 #@+node:ekr.20191110165235.3: *4* tokenizer.create_input_tokens
4083 def create_input_tokens(self, contents, tokens):
4084 """
4085 Generate a list of Token's from tokens, a list of 5-tuples.
4086 """
4087 # Create the physical lines.
4088 self.lines = contents.splitlines(True)
4089 # Create the list of character offsets of the start of each physical line.
4090 last_offset, self.offsets = 0, [0]
4091 for line in self.lines:
4092 last_offset += len(line)
4093 self.offsets.append(last_offset)
4094 # Handle each token, appending tokens and between-token whitespace to results.
4095 self.prev_offset, self.results = -1, []
4096 for token in tokens:
4097 self.do_token(contents, token)
4098 # Print results when tracing.
4099 self.check_results(contents)
4100 # Return results, as a list.
4101 return self.results
4102 #@+node:ekr.20191110165235.4: *4* tokenizer.do_token (the gem)
4103 header_has_been_shown = False
4105 def do_token(self, contents, five_tuple):
4106 """
4107 Handle the given token, optionally including between-token whitespace.
4109 This is part of the "gem".
4111 Links:
4113 - 11/13/19: ENB: A much better untokenizer
4114 https://groups.google.com/forum/#!msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ
4116 - Untokenize does not round-trip ws before bs-nl
4117 https://bugs.python.org/issue38663
4118 """
4119 import token as token_module
4120 # Unpack..
4121 tok_type, val, start, end, line = five_tuple
4122 s_row, s_col = start # row/col offsets of start of token.
4123 e_row, e_col = end # row/col offsets of end of token.
4124 kind = token_module.tok_name[tok_type].lower()
4125 # Calculate the token's start/end offsets: character offsets into contents.
4126 s_offset = self.offsets[max(0, s_row - 1)] + s_col
4127 e_offset = self.offsets[max(0, e_row - 1)] + e_col
4128 # tok_s is corresponding string in the line.
4129 tok_s = contents[s_offset:e_offset]
4130 # Add any preceding between-token whitespace.
4131 ws = contents[self.prev_offset:s_offset]
4132 if ws:
4133 # No need for a hook.
4134 self.add_token('ws', five_tuple, line, s_row, ws)
4135 # Always add token, even if it contributes no text!
4136 self.add_token(kind, five_tuple, line, s_row, tok_s)
4137 # Update the ending offset.
4138 self.prev_offset = e_offset
4139 #@-others
4140#@-others
4141g = LeoGlobals()
4142if __name__ == '__main__':
4143 main() # pragma: no cover
4144#@@language python
4145#@@tabwidth -4
4146#@@pagewidth 70
4147#@-leo