Package cssutils :: Module tokenize :: Class Tokenizer
[hide private]
[frames] | no frames]

Class Tokenizer

source code

object --+
         |
        Tokenizer

generates a list of Token objects

Nested Classes [hide private]
  ttypes
constants for Tokenizer and Parser to use values are just identifiers!
Instance Methods [hide private]
 
__init__(self)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
getttype(self, t)
check type of tokentype in t which may be string or list returns ttype
source code
 
addtoken(self, value, ttype=None)
adds a new Token to self.tokens
source code
 
getescape(self)
http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#q6
source code
 
dostrorcomment(self, t=[], end=None, ttype=None, _fullSheet=False)
strings: "..." or '...' comment: /.../
source code
 
tokenize(self, text, _fullSheet=False)
tokenizes text and returns tokens
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Variables [hide private]
  WS = ' \t\r\n\x0c'
  _typelist = [(<function <lambda> at 0x00DD73F0>, u';'), (<func...
  _delimmap = {u'*': u'*', u'+': u'+', u'.': u'.', u'>': u'>', u...
  _attmap = {u'$=': u'$=', u'*=': u'*=', u'^=': u'^=', u'|=': u'...
  _atkeywordmap = {u'charset': u'@charset', u'import': u'@import...
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self)
(Constructor)

source code 
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
Overrides: object.__init__
(inherited documentation)

getescape(self)

source code 

http://www.w3.org/TR/2004/CR-CSS21-20040225/syndata.html#q6

Third, backslash escapes allow authors to refer to characters they can't easily put in a document. In this case, the backslash is followed by at most six hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number, which must not be zero. If a character in the range [0-9a-fA-F] follows the hexadecimal number, the end of the number needs to be made clear. There are two ways to do that:

  1. with a space (or other whitespace character): " B"
("&B"). In this case, user agents should treat a "CR/LF" pair (U+000D/U+000A) as a single whitespace character.
  1. by providing exactly 6 hexadecimal digits: "026B"
("&B")

In fact, these two methods may be combined. Only one whitespace character is ignored after a hexadecimal escape. Note that this means that a "real" space after the escape sequence must itself either be escaped or doubled.

dostrorcomment(self, t=[], end=None, ttype=None, _fullSheet=False)

source code 
handles
strings: "..." or '...' comment: /.../
t
initial token to start result with
end
string at which to end
ttype
str description of token to be found
_fullSheet
if no more tokens complete found tokens

Class Variable Details [hide private]

_typelist

Value:
[(lambda t: t== u';', tokentype.SEMICOLON), (lambda t: t== u'{', token\
type.LBRACE), (lambda t: t== u'}', tokentype.RBRACE), (lambda t: t== u\
'[', tokentype.LBRACKET), (lambda t: t== u']', tokentype.RBRACKET), (l\
ambda t: t== u'(', tokentype.LPARANTHESIS), (lambda t: t== u')', token\
type.RPARANTHESIS), (lambda t: t== u',', tokentype.COMMA), (lambda t: \
t== u'.', tokentype.CLASS), (tokenregex.w, tokentype.S), (tokenregex.n\
um, tokentype.NUMBER), (tokenregex.atkeyword, tokentype.ATKEYWORD), (t\
okenregex.HASH, tokentype.HASH), (tokenregex.DIMENSION, tokentype.DIME\
...

_delimmap

Value:
{u'*': u'*', u'+': u'+', u'.': u'.', u'>': u'>', u'~': u'~'}

_attmap

Value:
{u'$=': u'$=', u'*=': u'*=', u'^=': u'^=', u'|=': u'|=', u'~=': u'~='}

_atkeywordmap

Value:
{u'charset': u'@charset',
 u'import': u'@import',
 u'media': u'@media',
 u'namespace': u'@namespace',
 u'page': u'@page'}