Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

# -*- coding: UTF-8 -*- 

# Copyright 2006-2016 Luc Saffre 

# License: BSD (see file COPYING for details) 

 

r""" A simple markup parser that expands "commands" found in an input 

string to produce a resulting output string.  Commands are in the form 

``[KEYWORD ARGS]``.  The caller defines itself all commands, there are 

no predefined commands. 

 

..  This document is part of the Lino test suite. You can test only 

    this document with:: 

 

        $ python setup.py test -s tests.UtilsTests.test_memo 

 

 

Usage example 

------------- 

 

Instantiate a parser: 

 

>>> from lino.utils.memo import Parser 

>>> p = Parser() 

 

We declare a "command handler" function `url2html` and register it: 

 

>>> def url2html(parser, s): 

...     print("[DEBUG] url2html() got %r" % s) 

...     if not s: return "XXX" 

...     url, text = s.split(None,1) 

...     return '<a href="%s">%s</a>' % (url,text) 

>>> p.register_command('url', url2html) 

 

The intended usage of our example handler is ``[url URL TEXT]``, where 

URL is the URL to link to, and TEXT is the label of the link: 

 

>>> print(p.parse('This is a [url http://xyz.com test].')) 

[DEBUG] url2html() got 'http://xyz.com test' 

This is a <a href="http://xyz.com">test</a>. 

 

 

A command handler will be called with one parameter: the portion of 

text between the KEYWORD and the closing square bracket.  Not 

including the whitespace after the keyword.  It must return the text 

which is to replace the ``[KEYWORD ARGS]`` fragment.  It is 

responsible for parsing the text that it receives as parameter. 

 

If an exception occurs during the command handler, the final exception 

message is inserted into the result.  The whole traceback is being 

logged to the lino logger. 

 

To demonstrate this, our example implementation has a bug, it doesn't 

support the case of having only an URL without TEXT: 

 

>>> print(p.parse('This is a [url http://xyz.com].')) 

[DEBUG] url2html() got 'http://xyz.com' 

This is a [ERROR need more than 1 value to unpack in '[url http://xyz.com]' at position 10-30]. 

 

Newlines preceded by a backslash will be removed before the command 

handler is called: 

 

>>> print(p.parse('''This is [url http://xy\ 

... z.com another test].''')) 

[DEBUG] url2html() got 'http://xyz.com another test' 

This is <a href="http://xyz.com">another test</a>. 

 

The whitespace between the KEYWORD and ARGS can be any whitespace, 

including newlines: 

 

>>> print(p.parse('''This is a [url 

... http://xyz.com test].''')) 

[DEBUG] url2html() got 'http://xyz.com test' 

This is a <a href="http://xyz.com">test</a>. 

 

The ARGS part is optional (it's up to the command handler to react 

accordingly, our handler function returns XXX in that case): 

 

>>> print(p.parse('''This is a [url] test.''')) 

[DEBUG] url2html() got '' 

This is a XXX test. 

 

The ARGS part may contain pairs of square brackets: 

 

>>> print(p.parse('''This is a [url  

... http://xyz.com test with [more] brackets].''')) 

[DEBUG] url2html() got 'http://xyz.com test with [more] brackets' 

This is a <a href="http://xyz.com">test with [more] brackets</a>. 

 

Fragments of text between brackets that do not match any registered 

command will be left unchanged: 

 

>>> print(p.parse('''This is a [1] test.''')) 

This is a [1] test. 

 

>>> print(p.parse('''This is a [foo bar] test.''')) 

This is a [foo bar] test. 

 

>>> print(p.parse('''Text with only [opening square bracket.''')) 

Text with only [opening square bracket. 

 

 

Limits 

------ 

 

A single closing square bracket as part of ARGS will not produce the 

desired result: 

 

>>> print(p.parse('''This is a [url 

... http://xyz.com The character "\]"].''')) 

[DEBUG] url2html() got 'http://xyz.com The character "\\' 

This is a <a href="http://xyz.com">The character "\</a>"]. 

 

Execution flow statements like `[if ...]` and `[endif ...]` or ``[for 

...]`` and ``[endfor ...]`` would be nice. 

 

 

 

The ``[=expression]`` form 

-------------------------- 

 

Instantiate a new parser with and without a context: 

 

>>> print(p.parse('''\ 

... The answer is [=a*a*5-a].''', a=3)) 

The answer is 42. 

 

>>> print(p.parse('''<ul>[="".join(['<li>%s</li>' % (i+1) for i in range(5)])]</ul>''')) 

<ul><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ul> 

 

""" 

from builtins import str 

from builtins import object 

 

import logging 

logger = logging.getLogger(__name__) 

 

 

import re 

COMMAND_REGEX = re.compile(r"\[(\w+)\s*((?:[^[\]]|\[.*?\])*?)\]") 

#                                       ===...... .......= 

 

EVAL_REGEX = re.compile(r"\[=((?:[^[\]]|\[.*?\])*?)\]") 

 

from lino.utils.xmlgen import etree 

 

 

class Parser(object): 

 

    safe_mode = False 

 

    def __init__(self, **context): 

        self.commands = dict() 

        self.context = context 

 

    def register_command(self, cmd, func): 

        self.commands[cmd] = func 

 

    def eval_match(self, matchobj): 

        expr = matchobj.group(1) 

        try: 

            return self.format_value(eval(expr, self.context)) 

        except Exception as e: 

            logger.exception(e) 

            return self.handle_error(matchobj, e) 

 

    def format_value(self, v): 

        if etree.iselement(v): 

            return str(etree.tostring(v)) 

        return str(v) 

 

    def cmd_match(self, matchobj): 

        cmd = matchobj.group(1) 

        params = matchobj.group(2) 

        params = params.replace('\\\n', '') 

        cmdh = self.commands.get(cmd, None) 

        if cmdh is None: 

            return matchobj.group(0) 

        try: 

            return self.format_value(cmdh(self, params)) 

        except Exception as e: 

            logger.exception(e) 

            return self.handle_error(matchobj, e) 

 

    def handle_error(self, mo, e): 

        #~ return mo.group(0) 

        msg = "[ERROR %s in %r at position %d-%d]" % ( 

            e, mo.group(0), mo.start(), mo.end()) 

        logger.error(msg) 

        return msg 

 

    def parse(self, s, **context): 

        #~ self.context = context 

        self.context.update(context) 

        s = COMMAND_REGEX.sub(self.cmd_match, s) 

        if not self.safe_mode: 

            s = EVAL_REGEX.sub(self.eval_match, s) 

        return s 

 

 

def _test(): 

    import doctest 

    doctest.testmod() 

 

if __name__ == "__main__": 

    _test()