pytomo Package

pytomo Package

as_matching Module

cdfplot_new Module

Module to plot cdf from data or file. Can be called directly.

class pytomo.cdfplot_new.CdfFigure(xlabel='x', ylabel='P(X$\leq$x)', title='Empirical Distribution', fontsize='xx-large', legend_fontsize='large', legend_ncol=1, subplot_top=None)[source]

Bases: object

Hold the figure and its default properties

adjust_lines(dashes=True, leg_loc='best')[source]

Put correct styles in the axes lines Should be launch when all lines are plotted Optimised for up to 8 lines in the plot

adjust_plot(leg_loc='best')[source]

Adjust main plot properties (grid, ticks, legend)

adjust_ticks()[source]

Adjusts ticks sizes To call after a rescale (log...)

bar(*args, **kwargs)[source]

Plot in the axis: interface to plt.Axes.bar

ccdfplot(data_in, name='Data', finalize=False)[source]

Plot the cdf of a data array Wrapper to call the plot method of axes

ccdfplotdata(list_data_name, **kwargs)[source]

Method to be able to append data to the figure

cdfplot(data_in, name='Data', finalize=False)[source]

Plot the cdf of a data array Wrapper to call the plot method of axes

cdfplotdata(list_data_name, **kwargs)[source]

Method to be able to append data to the figure

static generate_line_properties()[source]

Cycle through the lines properties

get_xlim(*args, **kwargs)[source]

Plot in the axis: interface to plt.Axes.get_xlim()

legend(loc='best')[source]

Plot legend with correct font size

plot(*args, **kwargs)[source]

Plot in the axis: interface to plt.Axes.plot

put_labels()[source]

Put labels for axes and title

savefig(*args, **kwargs)[source]

Saves the figure: interface to plt.Figure.savefig

set_xlim(*args, **kwargs)[source]

Plot in the axis: interface to plt.Axes.set_xlim()

set_ylim(*args, **kwargs)[source]

Plot in the axis: interface to plt.Axes.set_ylim()

setgraph_loglog()[source]

Set graph in xlogscale and adjusts plot (grid, ticks, legend)

setgraph_logx()[source]

Set graph in xlogscale and adjusts plot (grid, ticks, legend)

setgraph_logy()[source]

Set graph in xlogscale and adjusts plot (grid, ticks, legend)

show()[source]

Show the figure, and hold to do interactive drawing

pytomo.cdfplot_new.bin_plot(datas, title='Bin Plot', xlabel='X', ylabel='Y', logx=False, logy=False)[source]

Plot a bin plot of dictionary

pytomo.cdfplot_new.cdfplot(in_file, col=0)[source]

Plot the cdf of a column in file

pytomo.cdfplot_new.cdfplotdata(list_data_name, figure=None, xlabel='x', loc='best', fs_legend='large', title='Empirical Distribution', logx=True, logy=False, cdf=True, dashes=True, legend_ncol=1)[source]

Plot the cdf of a list of names and data arrays

pytomo.cdfplot_new.scatter_plot(data, title='Scatterplot', xlabel='X', ylabel='Y', logx=False, logy=False)[source]

Plot a scatter plot of data

pytomo.cdfplot_new.scatter_plot_multi(datas, title='Scatterplot', xlabel='X', ylabel='Y', logx=False, logy=False)[source]

Plot a scatter plot of dictionary

config_pytomo Module

The config file for the pytomo setup Lines starting with # are comments

exception pytomo.config_pytomo.BlackListException[source]

Bases: exceptions.Exception

Exception in case the crawler has been blacklisted

lib_dailymotion_api Module

Module to interact with the Dailymotion API:
  • function to get the most popular dailymotion videos according to the time

frame (adapted from lib_youtube_api); - function to retrieve the related videos from the Dailymotion api in a list of links.

Author: Ana Oprea Date: 04.09.2012 - modified 17.09.2012 for related links

Usage: To use the functions provided in this module independently, first place yourself just above pytomo folder.Then:

>>> import pytomo.start_pytomo as start_pytomo
>>> TIMESTAMP = 'test_timestamp'
>>> start_pytomo.configure_log_file(TIMESTAMP)
>>> import pytomo.lib_dailymotion_api as lib_dailymotion_api
>>> url = 'http://www.dailymotion.com/video/xkqa0p'
>>> time_f = 'today' # choose from 'today' or 'month' or 'week' or all_time'
>>> max_results = 20
>>> lib_dailymotion_api.get_popular_links(time_f, max_results)
>>> max_per_page = 25
>>> max_per_url = 10
>>> lib_dailymotion_api.get_dailymotion_links(url, max_per_page)
>>> lib_dailymotion_api.get_related_urls(url, max_per_page, max_per_url)

Parse and return a list of the ids of the related videos from the Dailymotion api:

>>> get_all_related_ids('http://www.dailymotion.com/video/xv7ent', 20)
['xv8xoj', 'xvajbn', 'xvbhdi', 'xvam4y', 'xv8x1t', 'xv8sn2', 'xvbx1x',
 'xv7gkf', 'xv5cnr', 'xvajng', 'xv9ir4', 'xvakfr', 'xv9hjr', 'xvbwax',
 'xv8ttw', 'xv75ou', 'xv587j', 'xvakwj', 'xv8xqp', 'xv9ihm']

Return a set of only Dailymotion links from url

pytomo.lib_dailymotion_api.get_id(url, keep_all=True)[source]

Return the id of a Dailymotion url. >>> url = ‘http://www.dailymotion.com/video/xkqa0p‘ >>> get_id(url) ‘xkqa0p’ >>> url = ‘http://www.dailymotion.com/video/xkqa0p_angry-birds-theme-covered-by-pomplamoose_music‘ >>> get_id(url) ‘xkqa0p’ >>> url = ‘http://www.dailymotion.com/video/xkqa0p?background=493D27&foreground=E8D9AC&highlight=FFFFF0&autoPlay=1‘ >>> get_id(url) ‘xkqa0p’ >>> url = ‘http://vid.ak.dmcdn.net/video/986/034/42430689_mp4_h264_aac.mp4?primaryToken=1343398942_d77027d09aac0c5d5de74d5428fb9e5b‘ >>> get_id(url) ‘42430689’ >>> url = ‘http://www.dailymotion.com/video/xscdm4_le-losc-au-pays-basque_sport?no_track=1‘ >>> get_id(url) ‘xscdm4’ >>> url = ‘http://vid.ec.dmcdn.net/cdn/H264-512x384/video/xmcyww.mp4?77838fedd64fa52abe6a11b3bdbb4e62f4387ebf7cbce2147ea4becc5eee5c418aaa6598bb98a61fc95a02997247e59bfb0dcd58cdf05c1601ded04f75ae357b225da725baad5e97ea6cce6d6a12e17d1c01‘ >>> get_id(url) ‘xmcyww’ >>> url = ‘http://proxy-60.dailymotion.com/video/246/655/37556642_mp4_h264_aac.mp4?auth=1343399602-4098-bdkyfgul-eb00ad223e1964e40b327d75367b273b‘ >>> get_id(url) ‘37556642’ >>> url = ‘http://docs.python.org/tutorial/inputoutput.html‘ >>> get_id(url) ‘inputoutput.html’

Returns the most popular dailymotion links for France. The country should be set as parameter in start_pytomo if user should specify it. The number of videos returned is given as Total_pages. (The results returned are in no particular order). A set of only dailymotion links from url

Return a set of max_links randomly chosen related urls

pytomo.lib_dailymotion_api.get_time_frame(input_time='week')[source]

Returns the time frame in the form accepted by youtube_api >>> get_time_frame(‘today’) ‘popular-today’ >>> get_time_frame(‘week’) ‘popular-week’ >>> get_time_frame(‘month’) ‘popular-month’ >>> get_time_frame(‘all_time’) ‘popular’

pytomo.lib_dailymotion_api.get_time_frame_global(input_time='week')[source]

Returns the time frame in the form accepted by youtube_api >>> get_time_frame(‘today’) ‘popular-today’ >>> get_time_frame(‘week’) ‘popular-week’ >>> get_time_frame(‘month’) ‘popular-month’ >>> get_time_frame(‘all_time’) ‘popular’

pytomo.lib_dailymotion_api.set_id(url_id)[source]

Return the complete link of a Dailymotion url. >>> url_id = ‘x1y0ap’ >>> set_id(url_id) ‘http://www.dailymotion.com/video/x1y0ap

lib_dailymotion_download Module

Adapted from lib_youtube_download.py to Dailymotion Module to download Dailymotion video for a limited amount of time and calculate the data downloaded within that time

Usage:
This module provides two classes: FileDownloader class and the InfoExtractor class. This module is not meant to be called directly.
class pytomo.lib_dailymotion_download.DailymotionIE(downloader=None)[source]

Bases: pytomo.lib_general_download.InfoExtractor

Information Extractor for Dailymotion

IE_NAME = u'dailymotion'
get_media_url(video_id, webpage)[source]

Extract URL, uploader and title from webpage

get_video_info(url)[source]

Return the video url extracted by _real_extract

get_webpage(video_id, url)[source]

Retrieve video webpage to extract further information

report_download_webpage(video_id)[source]

Report webpage download.

report_extraction(video_id)[source]

Report information extraction.

static suitable(url)[source]

Returns True if URL is suitable to this IE else False >>> die = DailymotionIE(InfoExtractor) >>> die.suitable(‘http://www.dailymotion.com/video/xscdm4_le-losc-au-pays-basque_sport?no_track=1‘) True >>> die.suitable(‘http://www.dailymotion.com‘) False >>> die.suitable(‘http://vid.ec.dmcdn.net/cdn/H264-512x384/video/xscdm4.mp4?77838fedd64fa52abe6a11b3bdbb4e62f4387ebf7cbce2147ea4becc5fe6574d7c3ec5681aa355d923bdca173f151658eefcd8763fc08a9380a7e2f26cbe49b67e583118fb414738b9d3e9db8882d33200be&ec_prebuf=20&ec_rate=68‘) True

pytomo.lib_dailymotion_download.get_cache_url(url, redirect=False)[source]

Return the cache url of the video (Wrote mock test). Cache url is returned as the first redirect from dailymotion.com or as the video url on dailymotion.

pytomo.lib_dailymotion_download.get_dailymotion_info_extractor(download_time=30.0)[source]

Return an info extractor for Dailymotion with correct mocks

lib_database Module

Module for sqllite interface to the pytomo database Usage (to be run interactively above the pytomo directory):

import pytomo.start_pytomo as start_pytomo start_pytomo.configure_log_file(‘doc_test’) import pytomo.lib_database as lib_database import time import datetime timestamp = time.strftime(“%Y-%m-%d.%H_%M_%S”) # to make sure a new file is created for every run. db_name = ‘doc_test’ + str(timestamp) + ‘.db’ doc_db = lib_database.PytomoDatabase(db_name) doc_db.create_pytomo_table(‘doc_test_table’) doc_db.describe_tables() row = (datetime.datetime(2011, 5, 6, 15, 30, 50, 103775),

‘Youtube’, ‘http://www.youtube.com/watch?v=RcmKbTR–iA‘, ‘http://v15.lscache3.c.youtube.com‘, ‘173.194.20.56’,’default_10.193.225.12’, None, None, None, 8.9944229125976562, ‘mp4’, 225, 115012833.0, 511168.14666666667, 9575411, 0, 1024, 100, 0.99954795837402344, 7.9875903129577637, 40, 11.722306421319782, 1192528.8804511931, ‘http://www.youtube.com/fake_redirect‘)

doc_db.insert_record(row) doc_db.fetch_all() doc_db.fetch_all_parameters([‘DownloadTime’, ‘PingMin’, ‘PingMax’])

>>> import time
>>> timestamp = time.strftime("%Y-%m-%d.%H_%M_%S")
>>> # to make sure a new file is created for every run we use
>>> # timestamp.
>>> db_name = 'doc_test_lib_db' + str(timestamp) + '.db'
>>> # import pytomo.lib_database as lib_database
>>> doc_db = PytomoDatabase(db_name)
>>> doc_db.create_pytomo_table('doc_test_table')
>>> doc_db.describe_tables() 
(u'CREATE TABLE doc_test_table(ID TIMESTAMP,\n
    Service text,\n                       Url text,\n
    CacheUrl text,\n                       IP text,\n
    Resolver text,\n                       PingMin real,\n
    PingAvg real,\n                       PingMax real,\n
    DownloadTime real,\n                       VideoType text,\n
    VideoDuration real,\n                       VideoLength real,\n
    EncodingRate real,\n                       DownloadBytes int,\n
    DownloadInterruptions int,\n                       InitialData
    real,\n                       InitialRate real,\n
    InitialPlaybacKBuffer real,\n
    BufferingDuration real,\n                       PlaybackDuration
    real,\n                       BufferDurationAtEnd real,\n
    MaxInstantThp real,\n                       RedirectUrl text\n
    )',)
>>> import datetime
>>> record = (datetime.datetime(2011, 5, 6, 15, 30, 50, 103775),
... 'Youtube', 'http://www.youtube.com/watch?v=RcmKbTR--iA',
... 'http://v15.lscache3.c.youtube.com',
... '173.194.20.56','default_10.193.225.12', None, None, None,
... 8.9944229125976562, 'mp4', 225, 115012833.0, 511168.14666666667,
... 9575411, 0, 1024 ,100,  0.99954795837402344, 7.9875903129577637,
... 35, 11.722306421319782, 1192528.8804511931, None)
>>> doc_db.insert_record(record)
>>> record = (datetime.datetime(2011, 5, 6, 15, 40, 50, 103775),
... 'Youtube', 'http://www.youtube.com/watch?v=RcmKbTR--iA',
... 'http://v15.lscache3.c.youtube.com',
... '173.194.20.56','default_10.193.225.12', None, None, None,
... 8.9944229125976562, 'mp4', 225, 115012833.0, 511168.14666666667,
... 9575411, 0, 1024, 100, 0.99954795837402344, 7.9875903129577637,
... 40, 11.722306421319782, 1192528.8804511931,
... 'http://www.youtube.com/fake_redirect')
>>> doc_db.insert_record(record)
>>> doc_db.fetch_all() 
(u'2011-05-06 15:30:50.103775',
 u'Youtube',
 u'http://www.youtube.com/watch?v=RcmKbTR--iA',
 u'http://v15.lscache3.c.youtube.com',
 u'173.194.20.56',
 u'default_10.193.225.12',
 None,
 None,
 None,
 8.9944229125976562,
 u'mp4',
 225.0,
 115012833.0,
 511168.14666666667,
 9575411,
 0,
 1024.0,
 100.0,
 0.99954795837402344,
 7.9875903129577637,
 35.0,
 11.722306421319782,
 1192528.8804511931,
 None)
(u'2011-05-06 15:40:50.103775',
 u'Youtube',
 u'http://www.youtube.com/watch?v=RcmKbTR--iA',
 u'http://v15.lscache3.c.youtube.com',
 u'173.194.20.56',
 u'default_10.193.225.12',
 None,
 None,
 None,
 8.9944229125976562,
 u'mp4',
 225.0,
 115012833.0,
 511168.14666666667,
 9575411,
 0,
 1024.0,
 100.0,
 0.99954795837402344,
 7.9875903129577637,
 40.0,
 11.722306421319782,
 1192528.8804511931,
 u'http://www.youtube.com/fake_redirect')
>>> doc_db.fetch_single_parameter('DownloadTime')
... 
 [(u'2011-05-06 15:30:50.103775', 8.9944229125976562),
 (u'2011-05-06 15:40:50.103775', 8.9944229125976562)]
 >>> doc_db.fetch_all_parameters(['DownloadTime', 'PingMin', 'PingMax'])
 ... 
 [(8.9944229125976562, None, None, u'2011-05-06 15:30:50.103775'),
 (8.9944229125976562, None, None, u'2011-05-06 15:40:50.103775')]
 >>> doc_db.fetch_start_time()
 1304688650
 >>> from os import unlink
 >>> unlink(db_name)
class pytomo.lib_database.PytomoDatabase(database_file=None)[source]

Pytomo database class The columns of the file pytomo_table are as follows: TID - A timestamped ID generated by for each record entered Service - The website on which the analysis is performed

Example: Youtube, Dailymotion

Url - The url of the webpage CacheUrl- The Url of the cache server hosting the video CacheServerDelay- the delay to obtain the cache server url (from the

initial web page)
IP - The IP address of the cache server from which the video is
downloaded
Resolver- The DNS resolver used to get obtain the IP address of the
cache server prefixed with ISP given (if any) Example Google DNS, Local DNS

ResolveTime- The time to get an answer from DNS AS - The AS as resolved by RIPE PingMin - The minimum recorded ping time to the resolved IP address of

the cache server
PingAvg - The average recorded ping time to the resolved IP address of
the cache server
PingMax - The maximum recorded ping time to the resolved IP address of
the cache server
DownloadTime - The Time taken to download the video sample
(We do not download the entire video but only for a limited download time)

VideoDuration - The actual duration of the complete video VideoLength - The length (in bytes) of the complete video EncodingRate - The encoding rate of the video: VideoLength/VideoDuration DownloadBytes - The length of the video sample (in bytes) DownloadInterruptions - Nb of interruptions experienced during the

download
InitialData - Number of bytes downloaded in the initial buffering
period,
InitialRate - The mean data rate (in kbps) during the initial
buffering period,

BufferingDuration - Accumulate time spend in buffering state PlaybackDuration - Accumulate time spend in playing state BufferDurationAtEnd - The buffer length at the end of download TimeTogetFirstByte - Time to get first byte MaxInstantThp - The max instantaneous throughput of the download RedirectUrl - The Redirection Url in case of an HTTP redirect StatusCode - HTTP Return Code

close_handle()[source]

Closes the connection to the database

count_rows()[source]

Function to return the number of rows in a table. If there are problems related to database integrity, -1 is returned.

create_pytomo_table(table=None)[source]

Function to create a table

created = None
describe_tables()[source]

Function to show the create command of a table

fetch_all()[source]

Function to print all the records of the table

fetch_all_parameters(parameters)[source]

Function to save (parameter_1, ..., parameter_n, timestamp) in a sorted list of tuples dependent on timestamp

fetch_single_parameter(parameter)[source]

Function to save (timestamp,parameter) in a sorted list of tuples

fetch_single_parameter_with_stats(parameter)[source]

Function to save (timestamp, parameter) in a sorted list of tuples only for records with stats

fetch_start_time()[source]

Function to return the first timestamp in the database in linux format

insert_record(row)[source]

Function to insert a record

static logger_db()[source]

Initialze the logger

pytomo.lib_database.time_to_epoch(timestamp)[source]

Function to transform to seconds from epoch time represented by a string of the form ‘%Y-%m-%d %H:%M:%S.%f’ >>> time_to_epoch(‘2012-06-25 14:54:57.422007’) 1340628897 >>> time_to_epoch(None) Traceback (most recent call last):

...

TypeError: expected string or buffer >>> time_to_epoch(‘2012-06-25 14:54:57’) 1340628897 >>> time_to_epoch(‘2012-06-25 14:54:57’) #doctest: +NORMALIZE_WHITESPACE Traceback (most recent call last):

...

ValueError: time data ‘2012-06-25 14:54:57’ does not match format ‘%Y-%m-%d %H:%M:%S.%f’

lib_dns Module

Module to retrieve the IP address of a URL out of a set of nameservers

Usage: To use the functions provided in this module independently, first place yourself just above pytomo folder.Then:

import pytomo.start_pytomo TIMESTAMP = ‘test_timestamp’ start_pytomo.configure_log_file(TIMESTAMP)

import pytomo.lib_dns as lib_dns url = ‘www.example.com’ lib_dns.get_ip_addresses(url)

lib_dns.get_default_name_servers()

pytomo.lib_dns.get_default_name_servers()[source]

Return a list of IP addresses of default name servers >>> get_default_name_servers() ... # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS ‘...............’ >>> # Check for string of the format ‘x.x.x.x’

pytomo.lib_dns.get_ip_addresses(url)[source]

Return a list of tuples with the IP address and the resolver used

pytomo.lib_dns.reduce_addresses(data)[source]

Return a reduced list of IP addresses and resolvers >>> reduce_addresses([(‘1.1.1.1’, ‘default’), (‘2.2.2.2’, ‘open’),

(‘1.1.1.1’, ‘goo’)])

[(‘1.1.1.1’, ‘default_goo’), (‘2.2.2.2’, ‘open’)]

lib_io Module

Module for I/O operations on filesystem. Used to write the index.html to display the graphical interface. Version: 0.1 Author: Ana Oprea Date: 20.07.2012

Usage:
class pytomo.lib_io.PytomoFPDF(pdf_name)[source]

Bases: pytomo.fpdf.fpdf.FPDF, pytomo.fpdf.html.HTMLMixin

Class to create pdf from html

close_pdf()[source]
pytomo.lib_io.average(values, known_values)[source]

Computes the arithmetic mean of a list of numbers. >>> average([20, 30, 70], 3) 40.0 >>> average([], 0) nan

pytomo.lib_io.check_templates_exist(timestamp)[source]

Verify that all html templates and their plots have been created.

pytomo.lib_io.check_templates_exist_obsolete(timestamp)[source]

Verify that all html templates and their plots have been created.

pytomo.lib_io.compute_average_values(data)[source]

Function to return a tuple (start_crawl_time, end_crawl_time, nr_videos, average_ping, average_download_time, average_download_interruptions)

pytomo.lib_io.create_pdf(pdf_name, timestamp, average_values, *parameters)[source]
pytomo.lib_io.get_file_by_param_timestamp(path, parameter, timestamp)[source]

Function to return from the path directory the files for a specific parameter timestamped or None.

The filenames are relative to the parent directory.
>>> import os.path
>>> from tempfile import NamedTemporaryFile
>>> from time import time
>>> PARAM = 'DownloadTime'
>>> TIMESTAMP = str(int(time()))
>>> RRD_PLOT_DIR = 'images'
>>> f1 = NamedTemporaryFile(suffix=PARAM, dir=RRD_PLOT_DIR, delete=False)
>>> f2 = NamedTemporaryFile(suffix=TIMESTAMP, dir=RRD_PLOT_DIR,
... delete=False)
>>> f3 = NamedTemporaryFile(suffix=(PARAM + '_' + TIMESTAMP),
... dir=RRD_PLOT_DIR, delete=False)
>>> os.path.basename(f3.name) == os.path.basename(
... get_file_by_param_timestamp(RRD_PLOT_DIR, PARAM, TIMESTAMP))
True
>>> os.path.basename(f2.name) == os.path.basename(
... get_file_by_param_timestamp(RRD_PLOT_DIR, PARAM, TIMESTAMP))
False
>>> os.path.basename(f1.name) == os.path.basename(
... get_file_by_param_timestamp(RRD_PLOT_DIR, PARAM, TIMESTAMP))
False
>>> f1.close()
>>> f2.close()
>>> f3.close()
>>> os.unlink(f1.name)
>>> os.unlink(f2.name)
>>> os.unlink(f3.name)
pytomo.lib_io.get_latest_file(path)[source]

Function to return the newest file in a path >>> import os.path >>> from tempfile import NamedTemporaryFile >>> f = NamedTemporaryFile(delete=False) >>> f.name == get_latest_file(os.path.dirname(f.name)) True >>> f.close() >>> os.unlink(f.name)

pytomo.lib_io.get_latest_specific_file(path, include)[source]

Function to return the newest file in a path >>> import os.path >>> from tempfile import NamedTemporaryFile >>> INCLUDE = ‘test’ >>> f = NamedTemporaryFile(suffix=INCLUDE, delete=False) >>> f.name == get_latest_specific_file(os.path.dirname(f.name), INCLUDE) True >>> f.close() >>> os.unlink(f.name)

pytomo.lib_io.get_specific_files(path, include)[source]

Function to return all the files in path that contain include string in their name >>> import os.path >>> from tempfile import NamedTemporaryFile >>> INCLUDE = ‘test’ >>> f1 = NamedTemporaryFile(suffix=INCLUDE, delete=False) >>> f2 = NamedTemporaryFile(suffix=INCLUDE, delete=False) >>> f3 = NamedTemporaryFile(delete=False) >>> set([f1.name, f2.name]) == set( ... get_specific_files(os.path.dirname(f1.name), INCLUDE)) True >>> set([f1.name, f2.name, f3.name]) == set( ... get_specific_files(os.path.dirname(f1.name), INCLUDE)) False >>> f1.close() >>> f2.close() >>> f3.close() >>> os.unlink(f1.name) >>> os.unlink(f2.name) >>> os.unlink(f3.name)

pytomo.lib_io.html_graphs_for_pdf(timestamp, average_values, *parameters)[source]

Function to return the html containing graphs and their explanation for the *parameters

pytomo.lib_io.index_filename(param, timestamp)[source]

Return the file name of the index (try to create it if it does not exist). Will have the pattern: <TEMPLATES_DIR>/<hostname>.<timestamp>.<param_TEMPLATE_FILE>

pytomo.lib_io.logger_io()[source]

Initialze the logger

pytomo.lib_io.main(argv=None)[source]

Program wrapper

pytomo.lib_io.pdf_filename(param, timestamp)[source]

Return the file name of the pdf (try to create it if it does not exist). Will have the pattern: <PDF_DIR>/<hostname>.<timestamp>.<param_PDF_FILE>

pytomo.lib_io.plot_filename(param, timestamp)[source]

Return the file name of the plot (try to create it if it does not exist). Will have the pattern: <RRD_PLOT_DIR>/<hostname>.<timestamp>.<param_IMAGE_FILE>

pytomo.lib_io.plot_path_to_write_in_html(param, timestamp)[source]

Return the path to the plot relative to the TEMPLATES_DIR Will have the pattern: ../<RRD_PLOT_DIR>/<plot_name>

pytomo.lib_io.plot_path_to_write_in_pdf(param, timestamp)[source]

Return the path to the plot relative to the TEMPLATES_DIR Will have the pattern: <RRD_PLOT_DIR>/<plot_name>

pytomo.lib_io.rrd_filename(timestamp)[source]

Return the file name of the rrd (try to create it if it does not exist). Will have the pattern: <RRD_DIR>/<hostname>.<timestamp>.<RRD_FILE>

pytomo.lib_io.write_database_archive(f_index, db_dir)[source]

Write the list of databases from db_dir in the html template.

pytomo.lib_io.write_end_div_refresh(f_index, database)[source]
pytomo.lib_io.write_index(timestamp, database, db_dir='databases')[source]

Function to create the parameter_timestamp_index.html from template files and include the images that also contain a specific timestamp.

pytomo.lib_io.write_left_column(f_index, database)[source]

Function to write the header and contents of the left column - links

pytomo.lib_io.write_middle_column(f_index, timestamp, links, db_dir, *parameters)[source]

Function to write the header and contents of the middle column - plots for the *parameters or the table with the links to the videos downloaded

pytomo.lib_io.write_right_column(f_index, average_values)[source]

Function to write the header and contents of the right column - tables containing the average values determined by the crawl and the list of existent databases.

lib_ping Module

Module to generate the RTT times of a ping

This module provides two functions that enable us to get the ping statistics of an IP address on any system(Linux, Windows, Mac)

Usage:
import pytomo.lib_ping as lib_ping import pytomo.config_pytomo as config_pytomo # Get the system name to configure the Ping RE import platform config_pytomo.SYSTEM = platform.system() # Set the Regular Expression for current system nb_packets = 5 ip_address = ‘127.0.0.1’ lib_ping.configure_ping_options(nb_packets) lib_ping.ping_ip(ip_address, nb_packets)
pytomo.lib_ping.configure_ping_options(ping_packets=10)[source]

Store in config_pytomo module the config for RTT matching

pytomo.lib_ping.ping_ip(ip_address, ping_packets=10)[source]

Return a list of the min, avg, max and mdev ping values

lib_plot Module

Module to plot the data and generate the PNG/PDF image file

pytomo.lib_plot.create_fig(db_name)[source]

Return the figure

pytomo.lib_plot.create_options(parser)[source]

Add the different options to parser

pytomo.lib_plot.main(argv=None)[source]

Program wrapper

pytomo.lib_plot.plot_data(column_names, image_file, db_file=None, cdf=False)[source]

Function to plot the data in the database. Creates sub plots for the column names.

pytomo.lib_plot.plot_function(to_plot, db_file, image_file, cdf_data=None)[source]

Function to plot data

lib_rrdtools Module

Module for RRDtool interface to the pytomo data.

Necessary module: rrdtool (python-rrdtool,
http://oss.oetiker.ch/rrdtool/download.en.html)

Version: 0.1 Author: Ana Oprea Date: 11.07.2012

Usage:

# first create a database - follow steps in lib_database import pytomo.start_pytomo as start_pytomo start_pytomo.configure_log_file(‘doc_test’) import pytomo.lib_database as lib_database from pytomo.lib_plot import UNITS import pytomo.lib_rrdtools as lib_rrdtools

pytomo_rrd = lib_rrdtools.PytomoRRD(db_name) pytomo_rrd.update_pytomo_rrd() pytomo_rrd.plot_pytomo_rrd()

>>> import time
>>> timestamp = time.strftime("%Y-%m-%d.%H_%M_%S")
>>> # to make sure a new file is created for every run we use
>>> # timestamp.
>>> db_name = 'doc_test_lib_db' + str(timestamp) + '.db'
>>> # import pytomo.lib_database as lib_database
>>> doc_db = lib_database.PytomoDatabase(db_name)
>>> doc_db.create_pytomo_table('doc_test_table')
>>> import datetime
>>> record = (datetime.datetime(2011, 5, 6, 15, 30, 50, 103775),
... 'Youtube', 'http://www.youtube.com/watch?v=RcmKbTR--iA',
... 'http://v15.lscache3.c.youtube.com',
... '173.194.20.56','default_10.193.225.12', None, None, None,
... 8.9944229125976562, 'mp4', 225, 115012833.0, 511168.14666666667,
... 9575411, 0, 1024 ,100,  0.99954795837402344, 7.9875903129577637,
... 35, 11.722306421319782, 1192528.8804511931, None)
>>> doc_db.insert_record(record)
>>> record = (datetime.datetime(2011, 5, 6, 15, 31, 10, 103775),
... 'Youtube', 'http://www.youtube.com/watch?v=RcmKbTR--iA',
... 'http://v15.lscache3.c.youtube.com',
... '173.194.20.56','default_10.193.225.12', None, None, None,
... 8.9944229125976562, 'mp4', 225, 115012833.0, 511168.14666666667,
... 9575411, 0, 1024, 100, 0.99954795837402344, 7.9875903129577637,
... 40, 11.722306421319782, 1192528.8804511931,
... 'http://www.youtube.com/fake_redirect')
>>> doc_db.insert_record(record)
>>> pytomo_rrd = PytomoRRD(db_name)
>>> pytomo_rrd.update_pytomo_rrd()
>>> pytomo_rrd.plot_pytomo_rrd()
>>> from os import unlink
>>> unlink(db_name)
class pytomo.lib_rrdtools.PytomoRRD(db_file)[source]

Pytomo class to interact with rrdtools

fetch_pytomo_rrd()[source]

Fetch data from the rrd.

has_values = None
static logger_rrd()[source]

Initialze the logger

plot_pytomo_rrd()[source]

Plot the time series parameters (at least 3 points must exist in the database for the graphs to exist).

update_pytomo_rrd()[source]

Insert data from the list of tuples (timestamp, parameter1, ...) to the rrd.

pytomo.lib_rrdtools.create_DS_types(parameters, heartbeat)[source]

Function to return a list of elements ‘DS:ds-name:GAUGE:heartbeat:U:U’ >>> HEARTBEAT = 100 >>> create_DS_types([‘BufferDurationAtEnd’, ‘PingMin’, ... ‘InitialData’], HEARTBEAT) #doctest: +NORMALIZE_WHITESPACE [‘DS:BufferDurationAtEnd:GAUGE:100:U:U’, ‘DS:PingMin:GAUGE:100:U:U’, ‘DS:InitialData:GAUGE:100:U:U’] >>> create_DS_types([], HEARTBEAT) [] >>> create_DS_types(None, HEARTBEAT) Traceback (most recent call last):

...

TypeError: ‘NoneType’ object is not iterable

pytomo.lib_rrdtools.create_options(parser)[source]

Add the different options to the parser

pytomo.lib_rrdtools.format_null_values(*args)[source]

Function to return a list where None arguments are transformed to U >>> format_null_values((‘2012-06-25 14:54:57.422007’, 0.0, None, 130048.0, ... None, 4643.9046215020562)) #doctest: +NORMALIZE_WHITESPACE [(‘2012-06-25 14:54:57.422007’, 0.0, None, 130048.0, None,

4643.9046215020562)]
>>> format_null_values(*('2012-06-25 14:54:57.422007', 0.0, None, 130048.0,
... None, 4643.9046215020562)) 
['2012-06-25 14:54:57.422007', 0.0, 'U', 130048.0, 'U', 4643.9046215020562]
>>> format_null_values(None)
['U']
>>> format_null_values('2012-06-25 14:54:57.422007',*(0.0, None, 130048.0,
... None, 4643.9046215020562)) 
['2012-06-25 14:54:57.422007', 0.0, 'U', 130048.0, 'U', 4643.9046215020562]
>>> format_null_values('2012-06-25 14:54:57.422007',(0.0, None, 130048.0,
... None, 4643.9046215020562)) 
['2012-06-25 14:54:57.422007', (0.0, None, 130048.0, None,
    4643.9046215020562)]
pytomo.lib_rrdtools.generate_plot_names(parameters, timestamp)[source]

Function to create a list of filenames for plots like: RRD_PLOT_DIR/parameter_to_plot_timestamp.extension TODO: redo doctest >>> from time import time >>> TIMESTAMP = ‘2012-07-20.11_44_27’ >>> generate_plot_names([‘DownloadTime’, ... ‘PingMin’], TIMESTAMP) #doctest: +NORMALIZE_WHITESPACEi, +ELLIPSIS [‘/home/capture/co/pytomo/trunk/Pytomo/images/s-spo-hti.2012-07-20.11_44_27.DownloadTime_pytomo_image.png’,

‘/home/capture/co/pytomo/trunk/Pytomo/images/s-spo-hti.2012-07-20.11_44_27.PingMin_pytomo_image.png’]
>>> generate_plot_names([], TIMESTAMP)
[]
>>> generate_plot_names(None, TIMESTAMP)
Traceback (most recent call last):
    ...
TypeError: 'NoneType' object is not iterable
pytomo.lib_rrdtools.main(argv=None)[source]

Program wrapper

pytomo.lib_rrdtools.rrd_filename_escape_colon(rrd_file)[source]

Escape the : in the filename of a rrd because this is not accepted in the rrd_graph when defining a function (problem appears generally on

Windows) >>> rrd_filename_escape_colon(‘/home/capture/co/pytomo/trunk/Pytomo/rrds/s-spo-hti.1350291171.pytomo.rrd’) ‘/home/capture/co/pytomo/trunk/Pytomo/rrds/s-spo-hti.1350291171.pytomo.rrd’ >>> rrd_filename_escape_colon(‘C:Pytomo
rdss-spo-hti.1350291171.pytomo.rrd’)
‘C:Pytomo

rdss-spo-hti.1350291171.pytomo.rrd’

pytomo.lib_rrdtools.update_data_types(parameters)[source]

Function to return a string ‘%i:%s:...:%s’ dependent on the number of DS >>> update_data_types([‘BufferDurationAtEnd’, ‘PingMin’, ‘InitialData’]) ‘%i:%s:%s:%s’ >>> update_data_types([]) ‘%i’ >>> update_data_types(None) Traceback (most recent call last):

...

TypeError: object of type ‘NoneType’ has no len()

lib_youtube_api Module

Function to get the most popular Youtube videos according to the time frame.
Arguments:
time = ‘today’ or ‘month’ or ‘week’ or all_time’ max_results : In multiples of 25

Returns: A list containing the list of videos.

Usage: To use the functions provided in this module independently,

first place yourself just above pytomo folder.Then:

import pytomo.start_pytomo TIMESTAMP = ‘test_timestamp’ start_pytomo.configure_log_file(TIMESTAMP)

import pytomo.lib_youtube_api as lib_youtube_api time = ‘today’ # choose from ‘today’ or ‘month’ or ‘week’ or all_time’ max_results = 25 time_frame = lib_youtube_api.get_time_frame(time) lib_youtube_api.get_popular_links(time_frame, max_results) url = ‘http://www.youtube.com/watch?v=cv5bF2FJQBc‘ max_per_page = 25 max_per_url = 10 lib_youtube_api.get_youtube_links(url) lib_youtube_api.get_related_urls(url, max_per_page, max_per_url)

Returns the most popular youtube links (world-wide). The number of videos returned is given as Total_pages. (The results returned are in no particular order). A set of only Youtube links from url

Return a set of max_links randomly chosen related urls

pytomo.lib_youtube_api.get_time_frame(input_time='week')[source]

Returns the time frame in the form accepted by youtube_api >>> from . import start_pytomo >>> start_pytomo.configure_log_file(‘doc_test’) #doctest: +ELLIPSIS Configuring log file Logs are there: ... ... >>> get_time_frame(‘today’) ‘t’ >>> get_time_frame(‘week’) ‘w’ >>> get_time_frame(‘month’) ‘m’ >>> get_time_frame(‘all_time’) ‘a’ >>> get_time_frame(‘other’) ‘a’

Return a set of only Youtube links from url

pytomo.lib_youtube_api.trunk_url(url)[source]

Return the interesting part of a Youtube url >>> url= ‘http://www.youtube.com/watch?v=hE0207sxaPg&feature=hp_SLN&list=SL‘ >>> trunk_url(url) #doctest: +NORMALIZE_WHITESPACE ‘http://www.youtube.com/watch?v=hE0207sxaPg‘ >>> url = ‘http://www.youtube.com/watch?v=y2kEx5BLoC4& ... feature=list_related&playnext=1&list=MLGxdCwVVULXfxx-61LMYHbwpcwAvZd-rI’ >>> trunk_url(url) #doctest: +NORMALIZE_WHITESPACE

>>> url = 'http://www.youtube.com/watch?v=UC-RFFIMXlA'
>>> trunk_url(url)  
'http://www.youtube.com/watch?v=UC-RFFIMXlA'

lib_youtube_download Module

Module to download youtube video for a limited amount of time and calculate the data downloaded within that time

Usage:
This module provides two classes: FileDownloader class and the InfoExtractor class. This module is not meant to be called directly.
class pytomo.lib_youtube_download.YoutubeIE(downloader=None)[source]

Bases: pytomo.lib_general_download.InfoExtractor

Information extractor for youtube.com.

static get_swf(video_webpage, mobj)[source]

Attempt to extract SWF player URL

get_video_info(video_id)[source]

Get video info Return the video

get_video_url_list(video_id, video_token, video_info, req_format=None)[source]

Decide which formats to download with req_format (default is best quality) Return video url list

report_infopage_download(video_id)[source]

Report attempt to download video info webpage.

report_information_extraction(video_id)[source]

Report attempt to extract video information.

report_lang()[source]

Report attempt to set language.

report_video_webpage_download(video_id)[source]

Report attempt to download video webpage.

static suitable(url)[source]

Returns True if URL is suitable to this IE else False >>> yie = YoutubeIE(InfoExtractor) >>> yie.suitable(‘http://www.youtube.com/watch?v=rERIxeYOYhI‘) True >>> yie.suitable(‘http://www.youtube.com‘) False >>> yie.suitable(‘http://www.youtube.com/watch?v=-VB2dHVNyds&amp‘) True >>> yie.suitable(‘http://www.youtube.com/watch?’) False >>> yie.suitable(‘http://youtu.be/3VdOTTfSKyM‘) True

pytomo.lib_youtube_download.get_cache_url(url, redirect=False)[source]

Return the cache url of the video (Wrote mock test)

pytomo.lib_youtube_download.get_youtube_info_extractor(download_time=30.0)[source]

Return an info extractor for YouTube with correct mocks

start_pytomo Module

Module to launch a crawl. This module supplies the following functions that can be used independently:

  1. compute_stats: To calculate the download statistics of a URL.
Usage:
To use the functions provided in this module independently, first place yourself just above pytomo folder.Then:

import pytomo.start_pytomo as start_pytomo import pytomo.config_pytomo as config_pytomo config_pytomo.LOG_FILE = ‘-‘ import time timestamp = time.strftime(‘%Y-%m-%d.%H_%M_%S’) log_file = start_pytomo.configure_log_file(timestamp) import platform config_pytomo.SYSTEM = platform.system() url = ‘http://youtu.be/3VdOTTfSKyM‘ start_pytomo.compute_stats(url) # test Dailymotion url = ‘http://www.dailymotion.com/video/xscdm4_le-losc-au-pays-basque_sport?no_track=1

import pytomo.start_pytomo as start_pytomo import pytomo.config_pytomo as config_pytomo config_pytomo.LOG_FILE = ‘-‘ import time timestamp = time.strftime(‘%Y-%m-%d.%H_%M_%S’) log_file = start_pytomo.configure_log_file(timestamp) import platform config_pytomo.SYSTEM = platform.system()

# video delivered by akamai CDN url = ‘http://www.dailymotion.com/video/xp9fq9_test-video-akamai_tech‘ start_pytomo.compute_stats(url) # redirect url: do not work url = ‘http://vid.ak.dmcdn.net/video/986/034/42430689_mp4_h264_aac.mp4?primaryToken=1343398942_d77027d09aac0c5d5de74d5428fb9e5b‘ start_pytomo.compute_stats(url, redirect=True)

# video delivered by edgecast CDN url = ‘http://www.dailymotion.com/video/xmcyww_test-video-cell-edgecast_tech‘ start_pytomo.compute_stats(url) url = ‘http://vid.ec.dmcdn.net/cdn/H264-512x384/video/xmcyww.mp4?77838fedd64fa52abe6a11b3bdbb4e62f4387ebf7cbce2147ea4becc5eee5c418aaa6598bb98a61fc95a02997247e59bfb0dcd58cdf05c1601ded04f75ae357b225da725baad5e97ea6cce6d6a12e17d1c01‘ start_pytomo.compute_stats(url, redirect=True)

# video delivered by dailymotion servers url = ‘http://www.dailymotion.com/video/xmcyw2_test-video-cell-core_tech‘ start_pytomo.compute_stats(url) url = ‘http://proxy-60.dailymotion.com/video/246/655/37556642_mp4_h264_aac.mp4?auth=1343399602-4098-bdkyfgul-eb00ad223e1964e40b327d75367b273b‘ start_pytomo.compute_stats(url, redirect=True)

exception pytomo.start_pytomo.MaxUrlException[source]

Bases: exceptions.Exception

Class to stop crawling when the max nb of urls has been attained

exception pytomo.start_pytomo.MyTimeoutException[source]

Bases: exceptions.Exception

Class to generate timeout exceptions

pytomo.start_pytomo.add_stats(stats, cache_server_delay, url, result_stream=None, data_base=None)[source]

Insert the stats in the db and update the crawled urls

pytomo.start_pytomo.check_full_download(len_crawled_urls)[source]

Check if the urls should be fully downloaded

pytomo.start_pytomo.check_options(parser, options)[source]

Check incompatible options

pytomo.start_pytomo.check_out_files(file_pattern, directory, timestamp)[source]

Return a full path of the file used for the output Test if the path exists, create if possible or create it in default user directory

>>> file_pattern = None
>>> directory = 'logs'
>>> timestamp = 'doc_test'
>>> check_out_files(file_pattern, directory, timestamp) 
>>> file_pattern = 'pytomo.log'
>>> check_out_files(file_pattern, directory, timestamp) 
'...doc_test.pytomo.log'
pytomo.start_pytomo.compute_download_stats(resolver, ip_address, cache_uri, current_stats, do_full_crawl=False)[source]

Return a list of the download statistics related to the cache_uri

pytomo.start_pytomo.compute_stats(url, cache_uri, do_download_stats, redirect_url=None, do_full_crawl=None)[source]

Return a list of the statistics related to the url

pytomo.start_pytomo.configure_alarm(timeout)[source]

Set timeout if OS support it Return a bool indicating if signal is supported

pytomo.start_pytomo.configure_log_file(timestamp, kaa_metadata=True)[source]

Configure log file and indicate succes or failure

pytomo.start_pytomo.convert_debug_level(_, __, value, parser)[source]

Convert the string passed to a logging level

Crawl the link and return the next urls

Wrapper to crawl each input link

pytomo.start_pytomo.create_options(parser)[source]

Add the different options to the parser

pytomo.start_pytomo.do_crawl(result_stream=None, db_file=None, timestamp=None, image_file=None, loop=False, related=True)[source]

Crawls the urls given by the url_file up to max_rounds are performed or max_visited_urls

pytomo.start_pytomo.do_rounds(input_links, result_stream, data_base, db_file, image_file, related=True, loop=False)[source]

Perform the rounds of crawl

pytomo.start_pytomo.format_stats(stats, cache_server_delay, service='YouTube')[source]

Return the stats as a list of tuple to insert into database >>> stats = (‘http://www.youtube.com/watch?v=RcmKbTR–iA‘, ... ‘http://v15.lscache3.c.youtube.com‘, ... {‘173.194.20.56’: [datetime.datetime( ... 2011, 5, 6, 15, 30, 50, 103775), ... None, ... [8.9944229125976562, ‘mp4’, ... 225, ... 115012833.0, ... 511168.14666666667, ... 9575411, ... 0, ... 0.99954795837402344, ... 7.9875903129577637, ... 11.722306421319782, ... 1192528.8804511931, 15169], ... None, ‘default_10.193.225.12’]})

>>> format_stats(stats) 
    [(datetime.datetime(2011, 5, 6, 15, 30, 50, 103775),
      'Youtube', 'http://www.youtube.com/watch?v=RcmKbTR--iA',
  'http://v15.lscache3.c.youtube.com', '173.194.20.56',
   'default_10.193.225.12', 15169, None, None, None, 8.9944229125976562,
  'mp4', 225, 115012833.0, 511168.14666666667, 9575411, 0,
 0.99954795837402344, 7.9875903129577637, 11.722306421319782,
  1192528.8804511931, None)]
>>> stats = ('http://www.youtube.com/watch?v=OdF-oiaICZI',
...  'http://v7.lscache8.c.youtube.com',
...                 {'74.125.105.226': [datetime.datetime(
...                                       2011, 5, 6, 15, 30, 50, 103775),
...                                     [26.0, 196.0, 82.0],
...                                     [30.311000108718872, 'mp4',
...                                      287.487, 16840065.0,
...                                      58576.78781997099,
...                                      1967199, 0,
...                                      1.316999912261963,
...                                      28.986000061035156,
...                                      5.542251416248594,
...                                      1109.4598961624772, 15169],
...                                    'http://www.youtube.com/fake_redirect',
...                       'google_public_dns_8.8.8.8_open_dns_208.67.220.220'],
...                  '173.194.8.226': [datetime.datetime(2011, 5, 6, 15,
...                                                       30, 51, 103775),
...                                    [103.0, 108.0, 105.0],
...                                    [30.287999868392944, 'mp4',
...                                     287.487, 16840065.0,
...                                     58576.78781997099,
...                                     2307716,
...                                     0,
...                                     1.3849999904632568,
...                                     28.89300012588501,
...                                     11.47842453761781,
...                                     32770.37517215069, 15169],
...                                    None, 'default_212.234.161.118']})
>>> format_stats(stats) 
[(datetime.datetime(2011, 5, 6, 15, 30, 50, 103775),
   'Youtube', 'http://www.youtube.com/watch?v=OdF-oiaICZI',
  'http://v7.lscache8.c.youtube.com', '74.125.105.226',
        'google_public_dns_8.8.8.8_open_dns_208.67.220.220', 15169, 26.0, 196.0, 82.0,
  30.311000108718872, 'mp4', 287.48700000000002, 16840065.0,
  58576.787819970988, 1967199, 0, 1.3169999122619629,
  28.986000061035156, 5.5422514162485941, 1109.4598961624772,
  'http://www.youtube.com/fake_redirect'),
 (datetime.datetime(2011, 5, 6, 15, 30, 51, 103775),
  'Youtube', 'http://www.youtube.com/watch?v=OdF-oiaICZI',
  'http://v7.lscache8.c.youtube.com', '173.194.8.226',
  'default_212.234.161.118', 103.0, 108.0, 105.0, 30.287999868392944,
  'mp4', 287.48700000000002, 16840065.0, 58576.787819970988, 2307716,
  0, 1.3849999904632568, 28.89300012588501, 11.47842453761781,
  32770.375172150692, None)]
pytomo.start_pytomo.get_next_round_urls(lib_api, input_links, max_per_page=10, max_per_url=10, max_round_duration=600)[source]

Return a tuple of the set of input urls and a set of related url of videos. Arguments:

  • input_links: list of the urls
  • max_per_url and max_per_page options
  • out_file_name: if provided, list is dump in it
pytomo.start_pytomo.log_ip_address()[source]

Log the remote IP addresses

pytomo.start_pytomo.log_md5_results(result_file, db_file)[source]

Computes and stores the md5 hash of result and database files

pytomo.start_pytomo.log_provider(timeout=10)[source]

Get and logs the provider from the user or skip after timeout seconds

pytomo.start_pytomo.main(version=None, argv=None)[source]

Program wrapper Setup of log part

pytomo.start_pytomo.md5sum(input_file)[source]

Return the standard md5 of the file

pytomo.start_pytomo.prompt_max_crawls(support_signal, timeout)[source]

Function to prompt the user to enter max_urls

pytomo.start_pytomo.prompt_provider(support_signal, timeout)[source]

Function to prompt for provider

pytomo.start_pytomo.prompt_proxies(support_signal, timeout)[source]

Function to prompt the user to enter the proxies it uses to connect to the internet

pytomo.start_pytomo.prompt_start_crawl()[source]

Funtion to prompt user for to accept the crawling

pytomo.start_pytomo.retrieve_cache_urls(url, lib_download)[source]

Return the list of cache url servers for a given video. The last element is the server from which the actual video is downloaded.

pytomo.start_pytomo.select_libraries(url)[source]

Return the libraries to use for dowloading and retrieving specific links

pytomo.start_pytomo.set_max_crawls(timeout=10, prompt=True, nb_max_crawls=10000)[source]

Sets the max number of videos to be crawlled

pytomo.start_pytomo.set_proxies(_, __, value, parser)[source]

Convert the proxy passed to a dict to be handled by urllib2

pytomo.start_pytomo.set_proxies_cli(timeout=10)[source]

Sets the proxies needed to connect to the internet

pytomo.start_pytomo.write_options_to_config(options)[source]

Write read options to config_pytomo

webpage Module

Simple python server to display an index page and static objects.

Note

External module included: webpy (http://webpy.org/)

Usage:
>>> # call the class from top level
>>> start_server.py
class pytomo.webpage.Doc[source]

Class that serves the documentation pages.

GET(filename)[source]

Retrieves the documentation files.

class pytomo.webpage.Index[source]

Class that serves the main page. Will search for a .html file under the folder set in render below.

GET(parameter)[source]

Retrieves the main page from the parameter_timestamp_index.html template, based on the selected database as parameter.

class pytomo.webpage.Pdf[source]

Class that serves the PDF reports. Will search for elements under the directories mentioned in urls related to this class.

GET(media, filename)[source]

Retrieves the static objects located in the main page.

class pytomo.webpage.Static[source]

Class that serves the static objects of the main page. Will search for elements under the directories mentioned in urls related to this class.

GET(media, filename)[source]

Retrieves the static objects located in the main page.

pytomo.webpage.configure_logger_web()[source]

Configure log file and indicate succes or failure

pytomo.webpage.main(argv=None)[source]

Program wrapper

Subpackages