August 22, 2011

Parsing XML with python

I'm transitioning my classes from the "Desire to Learn" (D2L) CMS to the ANGEL CMS.  Unfortunately, I didn't export my classes before I lost access to D2L.  Fortunately, I know a great person who was able to export the D2L files and send them to me. 

Unfortunately, ANGEL doesn't recognize D2L files.  Bummer.  

All of my "Reading Quiz" questions are stored in a question bank which is an xml file.  If I knew how to parse XML, I could probably cut and paste questions in to ANGEL.

I don't know how to parse XML, but I do know how to use Google. I found a bunch of tutorials, but ultimately settled on the Dive into Python tutorial.  

My question bank had 2345 entries in it.  The entries were made up of three things: a) the actual questions, b) multiple choice answers and distractors, and c) empty entries between questions.   All the entries were identified with the 'mattext' tag, although there didn't seem to be an easy way to separate the various types of entries.

Here is some code I wrote.  I'm putting it here so I don't lose it.

#!/usr/bin/env python

from xml.dom import minidom 

xmldoc = minidom.parse('./questiondb.xml')
mattextlist = xmldoc.getElementsByTagName('mattext')

for i in range(2345):
#print i
print mattextlist[i]
print "

except AttributeError:
print "

An AttributeError was triggered by every one of the empty entries.  Printing the blank lines every time an empty entry was reached made the output of the script easier to interpret visually.

This was a fun little bit of python.  I'm glad I was able to salvage stuff from the D2L class files.

1 comment:

Russ Lankenau said...

Sounds like you know _exactly_ how to parse XML.

Google is my first step in most programming projects.