XSL-FO og FOP
Vi gjør jobben som en to-stegs operasjon:
- Vi skriver en XSLT-transformasjon, xml-to-fo.xslt, som produserer en fil som er tagget som XSL-FO (formatering), olympic.fo.
- Vi bruker et standard program, FOP, som blandt mye annet kan lage PDF fra XSL-FO dokumenter.
XSLT-transformasjonen, xml-to-fo.xslt, ser slik ut:
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="xml" version="1.0" encoding="ISO-8859-1" indent="yes"/> <xsl:template match="/"> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <!-- layout for the first page --> <fo:simple-page-master master-name="coverpage" page-height="29.7cm" page-width="21cm" margin-top="7cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm"> <fo:region-body margin-top="3cm" margin-bottom="1.5cm"/> <fo:region-before extent="2cm"/> <fo:region-after extent="1.5cm"/> </fo:simple-page-master> <!-- layout for all other pages --> <fo:simple-page-master master-name="pages" page-height="29.7cm" page-width="21cm" margin-top="1cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm"> <fo:region-body margin-top="1cm" margin-bottom="1cm"/> <fo:region-before extent="2cm"/> <fo:region-after extent="1.5cm"/> </fo:simple-page-master> </fo:layout-master-set> <!-- filling the front page --> <fo:page-sequence master-reference="coverpage"> <fo:flow flow-name="xsl-region-body"> <fo:block font-weight="bold" font-size="28pt" line-height="38pt" font-family="Times"> Olympiske resultater </fo:block> <fo:block font-weight="normal" font-size="13pt" line-height="15pt" font-family="Times"> Sprintøvelsene i de siste olympiadene </fo:block> </fo:flow> </fo:page-sequence> <!-- doing all olympics in turn --> <xsl:apply-templates select="/IOC/OlympicGame"> <xsl:sort select="@year"/> </xsl:apply-templates> </fo:root> </xsl:template> <xsl:template match="//OlympicGame"> <fo:page-sequence master-reference="pages"> <fo:flow flow-name="xsl-region-body"> <fo:block font-weight="bold" font-size="18pt" line-height="28pt" font-family="Times" padding-top="0cm" border-bottom-color="black" border-bottom-style="solid"> <xsl:element name="fo:external-graphic"> <xsl:attribute name="src"> <xsl:value-of select="@place"/>.gif</xsl:attribute> </xsl:element> </fo:block> <xsl:apply-templates select="event"> <xsl:sort select="@dist"/> </xsl:apply-templates> </fo:flow> </fo:page-sequence> </xsl:template> <xsl:template match="//event"> <fo:block font-weight="bold" font-size="12pt" line-height="14pt" font-family="Times" padding-top="1cm" > <xsl:value-of select="@dist"/> </fo:block> <xsl:apply-templates select="athlet"> <xsl:sort data-type="number" select="result"/> </xsl:apply-templates> </xsl:template> <xsl:template match="//athlet"> <fo:block font-size="10pt" line-height="14pt" font-family="Times" > <xsl:value-of select="name"/>, <xsl:value-of select="nation"/> : <xsl:value-of select="result"/> </fo:block> </xsl:template> </xsl:stylesheet>
Vi forutsetter at FOP er installert i katalogen fop, og kan kalle FOP fra kommandolinja slik:
c:\fop\fop.bat olympic.fo olympic.pdf
PRINCE
Vi gjør følgende:
Transformasjonen som lage html er i all hovedsak den samme som den som er brukt i modulen XML2HTML :
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" encoding="ISO-8859-1" indent="yes" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/> <xsl:template match="/"> <html> <head> <title>Olympics</title> </head> <body> <h1>Resultater sprint</h1> <p>fra de siste olympiske leker</p> <hr/> <xsl:apply-templates select="IOC/OlympicGame"> <xsl:sort select="@year" order="ascending"/> </xsl:apply-templates> </body> </html> </xsl:template> <xsl:template match="OlympicGame"> <table cellpadding="10"> <tr> <td> <xsl:element name="img"> <xsl:attribute name="src"> <xsl:value-of select="@place"/>.gif</xsl:attribute> <xsl:attribute name="alt"> <xsl:value-of select="@place"/></xsl:attribute> </xsl:element> </td> <td> <h1><xsl:value-of select="@place"/> <br/> <xsl:value-of select="@year"/></h1> </td> </tr> </table> <table cellpadding="10" border="0" cellspacing="0"> <tr> <xsl:apply-templates select="event"/> </tr> </table> </xsl:template> <xsl:template match="//event"> <td valign="top"> <h2><xsl:value-of select="@dist"/></h2> <xsl:apply-templates select="athlet"> <xsl:sort data-type="number" select="result"/> </xsl:apply-templates> </td> </xsl:template> <xsl:template match="athlet"> <p><xsl:value-of select="name"/><br/> <xsl:value-of select="nation"/><br/> <xsl:value-of select="result"/></p> </xsl:template> </xsl:stylesheet>
Resultatet, prepared.html, er slik:
CSS-fila som beskriver layout til PDF-fila er svært enkel (printpages.css):
@page { size: A4; margin: 100pt 40pt 40pt 90pt; @top-left { content:"demo"; } @top-right { content:"Markup og Web"; font-size:24px; } @bottom-right { content: counter(page); font-style: italic; font-size:11px; border-top-style:solid; border-top-width:thin; } @bottom-left { content:"B. Stenseth"; font-style: italic; font-size:11px; border-top-style:solid; border-top-width:thin; } } h1{page-break-before:always}
Resultatet, prepared.pdf, er slik:
Python, lxml og Prince
Vi skal gjøre følgende sammensatte transformasjon
Vi begynner med å skrive en enkel transformasjon, tohtml.xsl, som lager en HTML-fil. det som i hovedsak skiller denne transformasjonen fra den vi så på i avsnittet over er at denne gangen lager vi en innholsdfortegnelse.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" version="1.0" omit-xml-declaration="yes" encoding="UTF-8" indent="no"/> <xsl:template match="/"> <xsl:text disable-output-escaping='yes'><!DOCTYPE html> </xsl:text> <html> <head> <title>Olympiade</title> <link href="olscreen.css" rel="stylesheet" /> <link href="olprint.css" rel="stylesheet" /> <link href="olprojection.css" rel="stylesheet" /> </head> <body> <div id="heading">Olympiske sprintresultater</div> <xsl:call-template name="toc"/> <xsl:apply-templates select="IOC/OlympicGame"> <xsl:sort select="@year" order="ascending"/> </xsl:apply-templates> </body> </html> </xsl:template> <xsl:template match="OlympicGame"> <xsl:element name="a"> <xsl:attribute name="name"><xsl:value-of select="@place"/></xsl:attribute> <h1> <xsl:value-of select="@place"/> - <xsl:value-of select="@year"/> </h1> </xsl:element> <div> <xsl:element name="img"> <xsl:attribute name="src"><xsl:value-of select="@place"/>.gif</xsl:attribute> <xsl:attribute name="alt"><xsl:value-of select="@place"/></xsl:attribute> </xsl:element> </div> <xsl:apply-templates select="event"/> </xsl:template> <xsl:template match="event"> <h2><xsl:value-of select="@dist"/></h2> <xsl:apply-templates select="athlet"> <xsl:sort data-type="number" select="result"/> </xsl:apply-templates> </xsl:template> <xsl:template match="athlet"> <div class="athlet"> <p><xsl:value-of select="name"/></p> <p><xsl:value-of select="nation"/></p> <p><xsl:value-of select="result"/></p> </div> </xsl:template> <xsl:template name="toc"> <xsl:element name="div"> <xsl:attribute name="id">maintoc</xsl:attribute> <div class="tocheader">Innhold</div> <xsl:for-each select="//OlympicGame"> <xsl:sort select="@year" order="ascending"/> <div class="toclevel1"> <xsl:element name="a"> <xsl:attribute name="href">#<xsl:value-of select="@place"/></xsl:attribute> <xsl:value-of select="@place"/> </xsl:element> </div> </xsl:for-each> </xsl:element> </xsl:template> </xsl:stylesheet>
Resultatet, bok.html, av transformasjonen er slik:
Så lager vi noen Pythonmoduler som skal forestå den kombinerte opersjonen:
- transformasjon XML - > HTML
- kall på Prince for å lage PDF
Selve transformasjonen, transform.py, gjøres slik:
""" Transforming XML to HTML using lxml """ from lxml import etree def produce(xmlfile,xsltfile): xmlTree=etree.parse(xmlfile) xsltTree=etree.parse(xsltfile) transform=etree.XSLT(xsltTree) resultTree=transform(xmlTree) return str(resultTree)
I modulen, makesingle.py, nedenfor er det metoden: doSinglePageJob som anvender Prince.
""" The purpose of this module is to make a PDF-files from a HTML file One to One The converterengine is PrinceXML Parameters to this module when run from the commandline is any number of HTML-files. The PDF files will have same name, but pdf as extension """ import subprocess import sys import utils import transform #-------------------- # fixed paths and logging """ catalog """ cat='c:\\web\\dw\\olymp\\' """ prince path """ princepath='c:\\fixed\\prince\\engine\\bin\\prince.exe' """ log file """ logfile=cat+'ol2pdfprince2\\princelog.txt' """ print log after job """ printlog=False """ full report """ verbose=False """ all stylesheets """ stylesheets=[cat+'ol2pdfprince2\\olsheet1.css'] """ erase log file """ def eraseLog(): utils.storeTextFile(logfile,'') """ Do one page to one page """ def doSinglePageJob(infile,outfile): print infile+' -> '+outfile params=[princepath,infile,'-o '+outfile,'--log='+logfile] if verbose: params.append('-v') for style in stylesheets: params.append("-s "+style) print 'making: '+outfile subprocess.call(params) #-------------------------------- if __name__=="__main__": T=transform.produce(cat+'all_results.xml', cat+'ol2pdfprince2\\tohtml.xsl') utils.storeTextFile(cat+'ol2pdfprince2\\bok.html',T) doSinglePageJob(cat+'ol2pdfprince2\\bok.html', cat+'ol2pdfprince2\\bok.pdf')
Modulen utils inneholder bare to metoder for filaksess:
""" load a text file """ def getTextFile(filename): try: file=open(filename,'r') intext=file.read() file.close() return intext except: print 'Error reading file ',filename return '' """ store a text file """ def storeTextFile(filename,txt): try: file=open(filename,'w') file.write(txt) file.close() except: print 'Trouble writing to: '+filename
Stilarket, olsheet1.css, som brukes til PDF-produksjonen ser slik ut:
@page { size: A4; margin: 100pt 40pt 40pt 90pt; @top-left { content:url(http://www.ia.hiof.no/~borres/common/gfx/printlogo.gif); } @top-right { content: string(doctitle); font-size:18px; } @bottom-right { content: counter(page); font-style: italic; font-size:11px; border-top-style:solid; border-top-width:thin; } @bottom-left { content:"B. Stenseth"; font-style: italic; font-size:11px; border-top-style:solid; border-top-width:thin; } } @page:first { @top-left { content:url(http://www.ia.hiof.no/~borres/common/gfx/printlogo_txt.gif); } @top-right { content:""; font-size:24px; } } #heading{margin-top:50px;margin-bottom:50px;font-weight:bold;font-size:36px} h1 { string-set: doctitle content() } h1,h2 {page-break-before:always} #maintoc a::after { content: leader(".") target-counter(attr(href), page); } .tocheader{margin-top:50px;margin-bottom:50px;font-weight:bold;font-size:20px} .toclevel1{margin-left:20px;line-height:150%} .athlet p{line-height:70%} .athlet :first-child{font-weight:bold} /* linking NB: sequence is important */ a:link {color:black;text-decoration:none} a:visited {color:black;text-decoration:none} a:hover {color:black;text-decoration:none} a:active {color:black;text-decoration:none}
Stilarkene nedenfor benyttes til skjerm, print og projection (F11 i Opera)
@media screen { h1{color:red} }
@media print { h1{color:blue;page-break-before:always} }
@media projection { #maintoc,img{display:none} h1,h2{page-break-before:always} .athlet {line-height:60%;margin-left:150px;margin-top:40px;} .athlet :first-child{font-weight:bold;font-size:20px} #heading{margin-left:150px;margin-top:150px;font-size:46px} }
Resultatet, bok.pdf, er slik: