When Geraldo was 0.3.9 and I planned the next version, I was wondering what would be good enough to support a lot requirements we was needing, like charts, faster and scalable performance, look-and-feel, bar codes, events system, etc.
Before I start coding, Ari Caldeira contacted me with a patch with some improvements - some ones just fixing bugs, other ones fixing logic and some others enhancing things - and all of those matched to part of the features I was planning to have in 0.4.
Four months gone and here we are, releasing 0.4-final, with lots of improvements and new features. So, enjoy the trip :)
Expressions
ObjectValue
On ObjectValue, I ever disliked on triad attribute_name + action + get_value, but it was the best solution we had for that moment.
To fix that, we needed something to match all of them at once, what means, just an "expression".
We still need on attribute_name and action for some rare cases, and get_value will be ever useful for all kinds of stuff, but now we can do most of things just using expressions.
You were used to code like this:
ObjectValue(attribute_name="salary", action=geraldo.utils.FIELD_ACTION_SUM)
Now you just do this:
ObjectValue(expression="sum(salary)")
That is not a hard example, because you are just saving characters, but the following example is much more complicated and a better example for the advantages of expressions:
ObjectValue(expression="sum(quantity * price) ** 2 + sum(discount)")
That means "quantity multiplied by price, summing from every items, squared, plus sum of discount of those items". This was just impossible with that old paradigm.
In future we want to extend expressions to support much more (like slicing or basic functions), but for now this is very good comparing to what we had.
SystemField
SystemField was also improved.
Now it supports the new macros %(first_page_number)s and %(last_page_number)s - as you can realize, to show first and last page numbers, like the following:
SystemField(expression="%(first_page_number)s to %(last_page_number)s")
This is because now we support start numbering pages from any number we want, but this will be discussed later.
SystemField has also another newness: variables.
Yes, this means you can set a dictionary of variables and use them on SystemField, as you want, like example below:
SystemField(expression="%(var:my_store)s")
...
report.generate_by(PDFGenerator, variables={"my_store": "Apple Store"})
Charts
Geraldo now has support to some kinds of charts.
Because we are already dependent on ReportLab, we elected its charts engine to be our charts machine.
The types of charts Geraldo supports are below.
Line (and multiple-line) charts
The chart above is made from the following code:
...
elements=[
LineChart(top=7*cm, left=1*cm, height=3*cm, width=5*cm,
cols_attribute='state', rows_attribute='government',
cell_attribute='population', action='sum', legend_labels=True),
]
...
Bar charts (vertical and horizontal)
Bar charts can be vertical or horizontal, 3D or flat, with multiple levels or not. Here is an example of a 3D vertical chart.
The code of bar chart above is:
...
elements=[
BarChart(top=8.3*cm, left=12*cm, height=3*cm, width=5*cm,
data='holidays', rows_attribute='type', cols_attribute='month',
cell_attribute='day', action='count', is3d=True, axis_labels=True,
axis_labels_angle=80, summarize_by=CROSS_ROWS),
]
...
And now the following code for the one above:
...
elements=[
BarChart(top=6.3*cm, left=0.5*cm, height=3*cm, width=8*cm,
data='holidays', rows_attribute='type', cols_attribute='month',
cell_attribute='day', action='count', horizontal=True,
axis_labels=True, axis_labels_angle=-20, summarize_by=CROSS_ROWS),
]
...
Spider charts
An example of spider charts...
And the code:
...
elements=[
SpiderChart(top=7*cm, left=12*cm, height=3*cm, width=5*cm,
rows_attribute='capital', cols_attribute='state',
cell_attribute='area', action='sum'),
]
...
Pie charts
And the code...
...
elements=[
PieChart(top=1.3*cm, left=1*cm, height=3*cm, width=5*cm,
cols_attribute='state', rows_attribute='government',
cell_attribute='population', action='sum', legend_labels=True,
slice_popout=2),
]
...
Doughnut charts
... the code:
...
elements=[
DoughnutChart(top=1.3*cm, left=13*cm, height=3*cm, width=5*cm,
cols_attribute='state', rows_attribute='government',
cell_attribute='population', action='sum', summarize_by=CROSS_COLS,
slice_popout=True),
]
...
The examples above are just a little instance of what our charts supports, but this is a long topic, with dozens of possibilities. So, take a look at the documentation to learn more about that.
Cross Reference
Most of charts need aggregated values, like a summary groupped by some field or attribute. This is important also to make cross reference tables (those tables with X, Y, Z values, where X is a row, Y is a column and Z is the crossed aggregated value between X and Y).
Look at the following table:
The table above is made by the following code:
cities = [
{'city':'New York City','state':'NY','capital':False,'population':8363710,'area':468.9,'government':'Mayor'},
{'city':'Albany','state':'NY','capital':True,'population':95658,'area':21.8,'government':'Mayor'},
{'city':'Austin','state':'TX','capital':True,'population':757688,'area':296.2,'government':'Council-manager'},
{'city':'Dallas','state':'TX','capital':False,'population':1279910,'area':385.0,'government':'Council-manager'},
{'city':'Houston','state':'TX','capital':False,'population':2242193,'area':601.7,'government':'Mayor-council'},
{'city':'San Francisco','state':'CA','capital':False,'population':808976,'area':231.92,'government':'Mayor-council'},
{'city':'Los Angeles','state':'CA','capital':False,'population':3833995,'area':498.3,'government':'Mayor-council'},
{'city':'Sacramento','state':'CA','capital':True,'population':463794,'area':99.2,'government':'Mayor-council'},
{'city':'Seattle','state':'WA','capital':False,'population':602000,'area':142.5,'government':'Mayor-council'},
]
cross = CrossReferenceMatrix(
objects_list=cities,
rows_attribute='capital',
cols_attribute='state',
)
...
elements=[
Label(text='Totals', width=2.5*cm),
ManyElements(
element_class=ObjectValue,
count=CROSS_COLS,
start_left=4*cm,
width=2*cm,
attribute_name=CROSS_COLS,
get_value=lambda self, inst: inst.matrix.sum('population', col=self.attribute_name),
),
ObjectValue(
attribute_name='row',
left=17*cm,
width=3.5*cm,
get_value=lambda inst: inst.matrix.avg('population'),
),
]
...
Look at the examples and documentation to learn more.
Bar Codes
Geraldo uses also ReportLab bar codes engine to support bar codes. The following types are supported:
- Codabar
- Code11
- Code128
- EAN13
- EAN8
- Extended39
- Extended93
- FIM
- I2of5
- MSI
- POSTNET
- Standard39
- Standard93
- USPS_4State
Here are some of them:
And the code to show them:
...
elements = [
BarCode(type='Codabar', attribute_name='code', top=1.2*cm, height=1.5*cm),
BarCode(type='Code11', attribute_name='code', top=1.2*cm, left=6*cm, height=1.5*cm),
BarCode(type='Code128', attribute_name='code', top=1.2*cm, left=12*cm, height=1.5*cm),
BarCode(type='EAN13', attribute_name='code', top=3.8*cm, height=1.5*cm),
BarCode(type='EAN8', attribute_name='code', top=3.8*cm, left=6*cm, height=1.5*cm),
BarCode(type='Extended39', attribute_name='code', top=3.8*cm, left=12*cm, height=1.5*cm),
]
...
Caching system
One of the pillars of this version is about to faster, reliable and scalable - that means: optimization.
This has been motivated by the needs to render hundreds (or even thousands) of pages at once PDF. Many of these reports are generated again and on and on... so, a good way to improve it is caching reports or, at least, their data or instances to save processing time.
Currently Geraldo supports caching in file system, but could (and will, in future) to support cache on memory, database or any other way to store, and you can choose between just cache the queryset or cache the rendered structure, letting only the final generation to make when you want, so, you can cache it and generate it as Text or PDF formats with no waste on processor.
To start caching your reports, you just have to set this on some place of your code:
from geraldo.cache import CACHE_BY_RENDER, DEFAULT_CACHE_STATUS
DEFAULT_CACHE_STATUS = CACHE_BY_RENDER
You can take a look at the docs to learn in further.
Events system
The events system was in our plans since the 0.1, but we were just pushing it to the future. So, Ari Caldeira gave us the patch to make it work. So, now you can attach functions as event handler to be called when some events happen on report generation.
The available events are:
- Report
- before_print
- before_generate
- after_print
- on_new_page
- ReportBand
- before_print
- after_print
- Element (widgets, graphics, etc.)
- before_print
- after_print
You can set them by two different ways:
1. Attaching lambda or simple function to the event handler, like below:
def report_before_print(report, generator):
report.title = '%s - %s objects'%(report.title, len(objects_list))
my_report.before_print = report_before_print
2. Coding a method on the class
class MyReport(Report):
...
def do_before_print(self, generator):
super(MyReport, self). do_before_print(generator)
self.title = '%s - %s objects'%(self.title, len(objects_list))
Methods and callback attached functions (or lambdas) are different because we see everything on Geraldo thinking on reports as objects, so, we must be able to attach functions to report objects, instead of report classes.
Events are useful to show a progress bar, customizing, checking while printing, and many others needs. Use your needs and creativity.
Look and Feel
Additional Fonts
Sometimes we need to use fonts on our reports that aren't part of default PDF available fonts. This was impossible on previous versions of Geraldo, but now it is with us :P
Look at this report, that uses two additional fonts (one of them available as bold and normal ways):
The code behind it, is here:
class SimpleListReport(Report):
title = 'Report with additional fonts'
additional_fonts = {
'Footlight MT Light': os.path.join(cur_dir, 'ftltlt.ttf'), # full path to font file
'HandGotDLig': (
{'file': os.path.join(cur_dir, 'handgotl.ttf'), 'name': 'HandGotDLig'},
{'file': os.path.join(cur_dir, 'handgotb.ttf'), 'name': 'HandGotDBold', 'bold': True},
),
}
default_style = {'fontName': 'Footlight MT Light'}
...
Thanks to Allan Bomfim and Ari Caldeira for their help on this.
First page number
Sometimes we just need to start the number of pages from a different number than 1, sometimes because we just want, other times because we are merging that report into other. No matters what is the cause, we just need or want it.
It wasn't possible, but now we can do it like the code below:
report_numbers.generate_by(
PDFGenerator,
filename=os.path.join(cur_dir, 'output/page-numbers-case-2.pdf'),
first_page_number=8,
)
The code above will start page numbering from 8.
Improvements on performance and scalability
Improved performance
Many little things have been refactored/enhanced to improve performance... and we keep doing it ever we note we can improve it.
Memoize
One of improvements is the decorator @geraldo.utils.memoize, that saves in memory a function result for the given arguments and save processing time. You can use it out of Geraldo, to improve your code to do other things.
Multi-processed generation
Geraldo was improved on some parts of code to work with multiprocessing. One of these improvements is about to the decorator @geraldo.utils.run_under_process, that makes a function to work as a separated process.
To generate a report with multiprocessing, you can code this:
report.generate_under_process_by(PDFGenerator, filename='report.pdf')
You can just disable multiprocessing setting geraldo.utils.DISABLE_MULTIPROCESSING as True to switch between enabled/disable states without need modify other parts of your code (since you are using method "generate_under_process_by").
Circular references removed
The most important villain of some of our performance issues was a circular reference, present on report rendering. This was fixed and saved tons of megabytes from our memory.
Internal cache
ObjectValue elements now store their values in an internal cache, to save processing time.
Improved get_value and get_text
ObjectValue now has better ways to set get_value callback, and has a new argument get_text, to do it after the value rendering. Thanks again to Ari Caldeira.
Registering reports on a store
Now Geraldo has an internal metaclass that register all report classes with an internal ID, that makes possible to find report classes as a report store, in the future we will have good news related to this, came from other project related to Geraldo and Django... keep waiting on next months and you will have good news about this.
But for a while you can use it like this:
>>> from geraldo.base import Report, get_report_class_by_registered_id
>>> class MyReport(Report):
... pass
>>> MyReport._registered_id
'__builtin__.MyReport'
>>> get_report_class_by_registered_id('__builtin__.MyReport')
<class 'MyReport'>
Outside the code...
Better documentation
Documentation keep being improved day-by-day, and doctests also evolves more and more. Of course we'd like to improve it a lot, but for a while it is better and improving everyday.
Cheat-Sheet
Now we have a cheat-sheet available, you can get it clicking below:
New website
The website was entirely re-written, using Google App Engine hosting. This will make us to have more power and flexibility to help our users in the future.
Special Thanks
This version had the direct help from Allan Bomfim sponsoring some great enhancements.
Ari Caldeira, from Tauga Software was also a good contributor, helping with little fixes and big enhancements.
Miltinho Brandão helped all the time, as the main tester and a bugs hunter.
Thanks to those guys and others that helped a lot of everything.
from future import this
Finally, for the soon future, you can wait for generators to ODS, XML and HTML, templates system, SubReports refactoring, graphics refactoring, elastic expressions, a reports server and maybe a little long later, Drill-down reports.
Marinho Brandão
Geraldo's author and maintainer