Saturday, December 20, 2014

code_to_text

Well, I'm currently building the X.org.

I'm using qemu and though it's really gorgeous, it's sooooo slow. Or it seems to be.
I've been installing X.org for 4 hours (with all the required libs and additional packages). I got bored and made up my mind to practice python (as python it's is one of required skills for my server-side xcb task).

I needed an app to convert my numerous source code files from the OS course project to the single pdf to present and see the professor's comments. Yes, it's pretty useless app, nobody needs to print the source code as pdf nowadays, except me. Anyway, it took not so much time, I practiced python instead of just starring at the black installation screen and I won't have to copy and paste all the source files and then convert to pdf.

#!/bin/python

import os
import re

from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.enums import TA_JUSTIFY
from reportlab.lib.pagesizes import letter

def code_to_text(regpattern = 'py$',
 spath = os.getcwd(), 
 dpath = os.path.join(os.getcwd(),"code.pdf"),
 fontsize = 10, 
 fontface = "Ubuntu Mono",
 left = 5,
 right = 5,
 top = 5,
 bottom = 5, 
 cmnt = '#'):
 """Converts numerous source files into one pdf file to print and present
 """
 
 story = []
 styles = getSampleStyleSheet()
 
 for path, dirs, files in os.walk(spath):
     for f in files:
  print(f)
             if re.search(regpattern, f) != None:
   fd = open(f)
   head = cmnt + ' ' + f + ' ' + cmnt
   story.append(Paragraph(head, styles["Normal"]))
   story.append(Paragraph(fd.read(), styles["Normal"]))
   fd.close()
      story.append(Spacer(1, 12))
  
 doc = SimpleDocTemplate(dpath, 
    leftMargin=left, 
    rightMargin=right, 
    topMargin=top, 
    bottomMargin=bottom)
 
 doc.build(story)

I used the reportlab.pdfgen lib to generate pdf and os.walk to walk through the files. 
User can filter the files to process with regexp, e. g. per default it's 'py$' -- python source files, 'cpp$' -- c++ and so on, I used re to make this happen
I'm about to implement font face and size specification.

Specify the margins with left-, right-, top- and bottomMargins. (As for me, I specify 5 for each one, so I will pay less money when printing the pdf)

About the comment signs. I planned to calculate the target comment sign right from the specified file extensions, e. g. '#' for 'py$' files or ';' for masm32 files, but then I realized it's too dummy a decision -- there are so many programming languages, I won't be able to build a db with all the extensions corresponding to comment signs. So I let user just specify it as a regular symbol. Not a reliable decision. But better then building a db for every single programming language. It's just a 4-hours script, I will not spend an hour working on it after the implementation of font face and size specs. Thinking about spending some more time useless drove me crazy, so I just let it be a sign.

Tomorrow I'll try to build xserver again, hope this time I'll do it and go further with my task.

Learn python!

P. S. Yes, it manages new lines horribly, to be frank it doesn't manage them at all! But I'll fix this.
P. P. S. How much coding is easier then deploying, dear god

Part II

I improved the script to the normal state, features featuring:
  • It will walk ok using dfs and won't get into .git subfolders
  • User will be able to specify the font and the font size
  • It manages indents ok (kludge detected!)
You also can enjoy my solution's sweet N^3 complexity. Argh, I just don't know how to make it faster, so I'll just let it be so.
I also shared it on bitbucket. Feel free to clone: https://bitbucket.org/AsalleKim/ctt.git
And after spending an hour designing the regex to find all the \.c$ | \.h$ files in the breezy dir I came up with thought that using regexes was not such a good idea...
Don't write kludges!

No comments:

Post a Comment