miércoles, 24 de febrero de 2010

A DSL for a data importing tool

Continuing with our OpenTaps examples, we want to generate some seed data for our application, the good thing is that the OfBiz framework offers some tooling to import data nice and easy, the not so nice part of it, is that it must be done using xml files, like the example below:


<entity-engine-xml>

<Product productId="SUPPLY-001" productTypeId="SUPPLIES" productName="Pencil" internalName="Pencil" description="Office pencil (demo supply)."
taxable="N" chargeShipping="N" autoCreateKeywords="N" isVirtual="N" isVariant="N" createdDate="2007-01-01 00:00:00.000" createdByUserLogin="admin"/>

<ProductPrice productId="service1" productPriceTypeId="DEFAULT_PRICE" productPricePurposeId="PURCHASE" currencyUomId="USD" productStoreGroupId="_NA_" fromDate="2008-01-01 00:00:00" price="10.00"/>
</entity-engine-xml>
This will insert a new Product entity (or update an existing one, if the primary key already exists), modifying the fields described in the attributes of the product tag (productId, productTypeId, productName, etc.), and will also modify or create a ProductPrice entity. It sounds pretty simple (and it really is), but to be honest, what would happen if we want to have a huge amount of data, and also provided by people that isn't exactly technical, it would be really hard to write all that xml, so lets use the power of groovy and a few more tools, to create a DSL that will help us parse an spreadsheet file (in this case OpenOffice calc) and automatically generate the data xml we need.

First of all, lets organize how we are going to import the data:
  • Data will be located in an OpenOffice Calc file, each sheet will represent one or more entities
  • Each column will correspond to one field (or more) of our entity
  • Each row will be a complete object
  • Our DSL will help us determine how our sheet will be parsed to generate an OO representation
  • Then we will export that OO representation into xml files
So, for this task, we will need the following libs and frameworks:
  • Groovy, I really used Grails because I wanted to make an small app with a nice frontend for the conversion
  • JOpenDocument so we can parse OpenOffice Calc files in a nice way
  • OpenTaps, so we can test our data files
Now, we need to define a domain model that will be used by our DSL to store the configuration of how to parse the Calc file, so we will create two java classes, the first one we will name it ODSNode and it will store the generic information about the sheet we are parsing, it will be like this:


public class ODSNode {
//Constants we will use later
public static final String SHEET_NAME_ATTR="name";
public static final String OFBIZ_NAME_ATTR="entity";
public static final String START_ROW_ATTR="startRow";


private int startRow=1;
private int index=0;

private String sheetName;
private String ofbizEntity;

private List<ODSElement> elements = new ArrayList<ODSElement>();
//getters and setters here...
}


The fields will have the following meaning:
  • startRow -> This is the row at which we will start parsing to get the data from the sheet
  • index -> The order in which the results will be put, so maybe some data in other sheets will have precedence than this one
  • sheetName -> The name of the sheet we will be parsing
  • ofbizEntity -> The name of the entity we will be populating from the data parsed
The node will be composed of many elements that correspond to the fields of the entity, so we will create the file ODSElement:


public class ODSElement {
//Some constants we will use later
public static final String COLUMN_ATTR="name";
public static final String INDEX_ATTR="index";
public static final String OFBIZ_ATTR="attribute";
public static final String CONSTANT="constant";

private String columnName;
private int columnIndex;
private Object constant;
private String ofbizColumn;
}


The fields have the following meaning:
  • columnName -> I'm not using it right now, just as a placeholder to be aware of which column I'm modifying
  • columnIndex -> The corresponding location of the column inside the sheet
  • constant -> In some cases we won't parse information from the spreadsheet and we will insert a constant value for all the rows in our entity
  • ofbizColumn -> This is the real name of the column or entity attribute for the Ofbiz Framework
Now, taking advantage of the great class BuilderSupport provided by groovy, we will create our own Builder, called ODSOfbizBuilder (I'll only show the methods I actually use):


public class ODSOfbizBuilder extends BuilderSupport{

private List<ODSNode> nodes = new ArrayList<ODSNnode>();

private static final String SHEET_NODE ="sheet";
private static final String COLUMN_NODE ="column";

//Return the list of nodes generated by this builder
public List<ODSNode> getNodes() {
return nodes;
}

//Create the corresponding node or element
@SuppressWarnings("unchecked")
@Override
protected Object createNode(Object name, Map attributes) {
System.out.println(name+" "+attributes);
Object obj=null;
if(name.equals(SHEET_NODE)){
ODSNode node = new ODSNode();
node.setSheetName(attributes.get(ODSNode.SHEET_NAME_ATTR).toString());
node.setOfbizEntity(attributes.get(ODSNode.OFBIZ_NAME_ATTR).toString());
if(attributes.containsKey(ODSNode.START_ROW_ATTR)){
node.setStartRow(Integer.parseInt(attributes.get(ODSNode.START_ROW_ATTR).toString()));
}
obj = node;
nodes.add(node);
}
else if(name.equals(COLUMN_NODE)){
ODSElement element = new ODSElement();
element.setColumnIndex(Integer.parseInt(attributes.get(ODSElement.INDEX_ATTR).toString()));
element.setColumnName(attributes.get(ODSElement.COLUMN_ATTR).toString());
element.setOfbizColumn(attributes.get(ODSElement.OFBIZ_ATTR).toString());
if(attributes.containsKey(ODSElement.CONSTANT)){
element.setConstant(attributes.get(ODSElement.CONSTANT));
}
obj =element;
}
else{
throw new IllegalArgumentException("Nodes must be named "+SHEET_NODE+" or "+COLUMN_NODE);
}
return obj;
}

//Add the elements to the corresponding node
@Override
protected void setParent(Object parent, Object node) {
if(parent instanceof ODSNode && node instanceof ODSElement){
((ODSNode) parent).addElement((ODSElement) node);
}
}

}


This class will allow us to parse our DSL and actually create a list of nodes that will help us parse a Calc file and get all data about them, so a Groovy code using that DSL would look like this:


def builder = new ODSOfbizBuilder()
builder.base{
sheet(name:'CatalogoUnidades',entity:"Uom",index:0){
column(name:"Constante",index:-1,attribute:"uomTypeId",constant:"WEIGHT_MEASURE")
column(name:"Unidad",index:0,attribute:"description")
column(name:"Simbolo",index:1,attribute:"abbreviation")
column(name:"ID",index:2,attribute:"uomId")
}
}


So now that we have the template ready, lets create now the domain model to store in memory the data parsed, first we will create an OfbizEntity class that will store the generic data about our entity:


public class OfbizEntity implements Comparable<OfbizEntity>{

private String entity;
private int index;
private Set<OfbizData> data = new TreeSet<OfbizData>();
//Getters and setters here
}


The class will store the name of the entity, and the order in which it should be stored in the xml file, and of course a set of OfbizData objects which will have the data we will store, and will have the following code in it:


public class OfbizData implements Comparable<OfbizData>{

private String attribute;
private String value;
//getters and setters here
}


So now we will create a class named ODSReader that will read our spreadsheet and create our model in memory:


public class ODSReader {

public List<OfbizEntity> parseFile(File spreadsheetFile,ODSOfbizBuilder builder) throws IOException{
//Here we will store all our data
List<OfbizEntity> data = new ArrayList<OfbizEntity>();
//Let's open the Calc file
SpreadSheet spreadSheet = SpreadSheet.createFromFile(spreadsheetFile);
//Let's get the nodes we parsed from our builder earlier
List<ODSNode> nodes = builder.getNodes();

for(ODSNode node : nodes){
//We need to get the sheet referenced by its name
Sheet sheet = spreadSheet.getSheet(node.getSheetName());
if(sheet == null){
throw new IllegalArgumentException("Sheet with name "+node.getSheetName()+" not found");
}
//lets get the last row with data
int lastRow = sheet.getRowCount();
int startRow = node.getStartRow();
int valueCount = 0;
//for each of the parsed sheets we need the collection of entities
Set<OfbizEntity> entities = new TreeSet<OfbizEntity>();
for(int row = startRow;row<lastrow;row++){ valuecount="0;" ofbizentity="" entity="new" odselement="" element="" string="" if="" column="" is="" a="" we="" just="" put="" the="" constant="" value="" in="" there=""><0||element.getconstant()!=null){ value =" element.getConstant().toString();" cell =" sheet.getCellAt(element.getColumnIndex()," value="cell.getValue().toString();" tuple =" new">0&&valueCount>0){
entities.add(entity);

}
}
//We used a set to ensure there are no duplicates and now we add all its elements to our list
data.addAll(entities);
}
return data;
}
}


So now we can load the data in our calc file, to memory, now it is time to persist it in an Xml file, for that we will help ourselves with the excellent groovy StreamingMarkupBuilder, so lets create our OfbizEntityGenerator groovy class:


lass OfbizEntityGenerator {

String output="outputDemoData.xml"

//We receive the list of entities and parse them into an xml file
public StreamingMarkupBuilder generateOfBizDataModel(List<OfbizEntity> entities){
StreamingMarkupBuilder builder = new StreamingMarkupBuilder()
println entities.size()
def writable = builder.bind{
"entity-engine-xml"{
entities.each{
ent ->
//Here we write a comment for our entity
mkp.comment("Data for entity ${ent.entity}")
//We write the entity with its attributes
"${ent.entity}"(asMap(ent))
}
}
}

def outputString = indentXml(writable)

//And finally we write it to a physical file
def file = new FileWriter(output)

file.write(outputString)
file.flush()
file.close()
}
//We just convert our list of objects to a map, not very elegant to be honest
Map asMap(OfbizEntity entity){
def map = [:]
entity.data.each{
node ->
map[node.attribute] = node.value
}
return map
}
//We use this helper method to have a nice indentation of our xml file
String indentXml(xml) {
def factory = TransformerFactory.newInstance()
factory.setAttribute("indent-number", 2);

Transformer transformer = factory.newTransformer()
transformer.setOutputProperty(OutputKeys.INDENT, 'yes')
StreamResult result = new StreamResult(new StringWriter())
transformer.transform(new StreamSource(new ByteArrayInputStream(xml.toString().bytes)), result)
return result.writer.toString()
}
}


So the complete usage example of our DSL and utils would be like this:


def reader = new ODSReader()
File file = new File("/citsa/opentaps/myfile.ods");
def builder = new ODSOfbizBuilder()
builder.base{
sheet(name:'CatalogoUnidades',entity:"Uom",index:0){
column(name:"Constante",index:-1,attribute:"uomTypeId",constant:"WEIGHT_MEASURE")
column(name:"Unidad",index:0,attribute:"description")
column(name:"Simbolo",index:1,attribute:"abbreviation")
column(name:"ID",index:2,attribute:"uomId")
}
}
def result = reader.parseFile(file, builder)
OfbizEntityGenerator generator = new OfbizEntityGenerator()
generator.output = "products.xml"
generator.generateOfBizDataModel(result)


And having the following ods file:



It will generate the following xml file:


<?xml version="1.0" encoding="UTF-8"?>
<entity-engine-xml>
<Uom abbreviation="g" description="gramo" uomId="1.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="lb" description="libra" uomId="2.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="kg" description="kilogramo" uomId="3.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="M" description="miles de semilas" uomId="4.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="ml" description="mililitro" uomId="5.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="l" description="litro" uomId="6.0" uomTypeId="WEIGHT_MEASURE"/>
<Uom abbreviation="docena" description="docena" uomId="7.0" uomTypeId="WEIGHT_MEASURE"/>
</entity-engine-xml>


Cool isn't it? now try to do it with a spreadsheet with a couple thousand rows, and you will see the real benefit of this.

Now tell OpenTaps that you have a new file, copy the xml (for our example it will be named products.xml) and paste it into your hot-deploy/citsaProduct/data folder and edit your ofbiz-component.xml to add the following line


<entity-resource type="data" reader-name="ext" loader="main" location="data/products.xml"/>


Finally, to tell OpenTaps to load it, just run the following script in your opentaps home:

java -Xmx384M -jar ofbiz.jar -install -readers=ext

And voilá, you can load a ton of seed data without much effort

More references:

0 comentarios:

Publicar un comentario