Metrics Service Implementation¶
Generic linux daemon base class for python 3.x.
-
class
d1_metrics.daemon.
Daemon
(pidfile)[source]¶ A generic daemon class.
Usage: subclass the daemon class and override the run() method.
-
class
d1_metrics.solrclient.
SolrClient
(base_url, core_name, select='/')[source]¶ -
-
getFieldValues
(name, q='*:*', fq=None, maxvalues=-1, sort=True, **query_args)[source]¶ Retrieve the unique values for a field, along with their usage counts. :param sort: Sort the result :param name: Name of field for which to retrieve values :type name: string :param q: Query identifying the records from which values will be retrieved :type q: string :param fq: Filter query restricting operation of query :type fq: string :param maxvalues: Maximum number of values to retrieve. Default is -1,
which causes retrieval of all values.Returns: dict of {fieldname: [[value, count], … ], }
-
-
class
d1_metrics.solrclient.
SolrSearchResponseIterator
(base_url, core_name, q, select='select', fq=None, fields='*', page_size=10000, max_records=None, sort=None, **query_args)[source]¶ Performs a search against a Solr index and acts as an iterator to retrieve all the values.
Solr Event Processor¶
Echo DataONE aggregated logs to disk
Requires python 3
This script reads records from the aggregated logs solr index and writes each record to a log file on disk, one record per line. Each line is formatted as:
JSON_DATA
where:
JSON_DATA = JSON representation of the record as retrieved from solr
Output log files are rotated based on size, with rotation scheduled at 1GB. A maximum of 150 log files are kept, so the log directory should not exceed about 150GB.
JSON loading benchmarks: http://artem.krylysov.com/blog/2015/09/29/benchmark-python-json-libraries/ Note performance difference under python3 are much reduced.
One particular challenge is that the dateLogged time in the log records has precision only to the second. This makes restarting the harvest challenging since there may be multiple records on the same second.
The strategy employed here is to retrieve the last set of n records (100 or so) and ignore any retrieved records that are present in the last set.
Each log record is on the order of 500-600 bytes, assume 1000bytes / record. The last 100 or so records would be the last 100k bytes of the log record.
-
class
d1_logagg.eventprocessor.
LogFormatter
(fmt=None, datefmt=None, style='%')[source]¶ -
converter
()¶ timestamp[, tz] -> tz’s local time from POSIX timestamp.
-
formatTime
(record, datefmt=None)[source]¶ Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which wants to make use of a formatted time. This method can be overridden in formatters to provide for any specific requirement, but the basic behaviour is as follows: if datefmt (a string) is specified, it is used with time.strftime() to format the creation time of the record. Otherwise, an ISO8601-like (or RFC 3339-like) format is used. The resulting string is returned. This function uses a user-configurable function to convert the creation time to a tuple. By default, time.localtime() is used; to change this for a particular formatter instance, set the ‘converter’ attribute to a function with the same signature as time.localtime() or time.gmtime(). To change it for all formatters, for example if you want all logging times to be shown in GMT, set the ‘converter’ attribute in the Formatter class.
-
-
class
d1_logagg.eventprocessor.
OutputLogFormatter
(fmt=None, datefmt=None, style='%')[source]¶ -
converter
()¶ timestamp[, tz] -> tz’s local time from POSIX timestamp.
-
formatTime
(record, datefmt=None)[source]¶ Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which wants to make use of a formatted time. This method can be overridden in formatters to provide for any specific requirement, but the basic behaviour is as follows: if datefmt (a string) is specified, it is used with time.strftime() to format the creation time of the record. Otherwise, an ISO8601-like (or RFC 3339-like) format is used. The resulting string is returned. This function uses a user-configurable function to convert the creation time to a tuple. By default, time.localtime() is used; to change this for a particular formatter instance, set the ‘converter’ attribute to a function with the same signature as time.localtime() or time.gmtime(). To change it for all formatters, for example if you want all logging times to be shown in GMT, set the ‘converter’ attribute in the Formatter class.
-
-
class
d1_logagg.eventprocessor.
SolrSearchResponseIterator
(select_url, q, fq=None, fields='*', page_size=10000, max_records=None, sort=None, **query_args)[source]¶ Performs a search against a Solr index and acts as an iterator to retrieve all the values.
-
d1_logagg.eventprocessor.
getLastLinesFromFile
(fname, seek_back=100000, pattern='^{', lines_to_return=100)[source]¶ Returns the last lines matching pattern from the file fname
- Args:
- fname: name of file to examine seek_back: number of bytes to look backwards in file pattern: Pattern lines must match to be returned lines_to_return: maximum number of lines to return
- Returns:
- last n log entries that match pattern
-
d1_logagg.eventprocessor.
getOutputLogger
(log_file, log_level=20)[source]¶ Logger used for emitting the solr records as JSON blobs, one record per line.
Only really using logger for this to take advantage of the file rotation capability.
Parameters: - log_file –
- log_level –
Returns:
-
d1_logagg.eventprocessor.
getQuery
(src_file='d1logagg.log', tstamp=None)[source]¶ Returns a query that would retrieve the last entry in the log file
- Args:
- src_file: name of the log file to examine tstamp: timestamp of last point for query. Defaults to the value of utcnow if not set
- Returns:
- A solr query string that returns at least the last record from the index and the record data that was retrieved from the log
-
d1_logagg.eventprocessor.
getRecords
(log_file_name, core_name, base_url='http://localhost:8983/solr', test_only=False)[source]¶ Main method. Retrieve records from solr and save them to disk.
- Args:
- log_file_name: Name of the destination log file core_name: Name of the solr core to query base_url: Base URL of the solr service
- Returns:
- Nothing
-
d1_logagg.eventprocessor.
logRecordInList
(record_list, record)[source]¶ Returns True if record is in record_list Args:
record_list: record:- Returns:
- Boolean