Hi all,
As part of my current project requirement, i have to read an excel file and store that one into a Database.
Lets see how i did it and that may save you some time.
First thing that comes into picture is how to read Excel without using any Third party tools. Licensing is a big problem. So we have to achieve it using native JDK support. We can achieve this by considering the EXCEL document as a database file. To connect to this DB, we need to use the Microsoft provided Excel Database Drivers. (we can found them in Control panel --> Administrative Tools --> ODBC Data Source Administrator. Click Add and find the Microsoft Excel Driver (*xls))
Use the appropriate DNS name (say ExcelDNS) and select the workbook (say ExcelFile).
Remaining everything is similar to connecting to Other DBs.
Here is the Sample Code for it
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package japp;
import java.io.*;
import java.sql.*;
/**
*
* @author TBalakrishna
*/
public class ExcelToJava
{
public static void main(String[] args)
{
Connection connection = null;
try{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
Connection con = DriverManager.getConnection( "jdbc:odbc:ExcelDNS" );
Statement st = con.createStatement();
ResultSet rs = st.executeQuery( "Select * from [Sheet1$]" );
ResultSetMetaData rsmd = rs.getMetaData();
int numberOfColumns = rsmd.getColumnCount();
//Remember First column will be treated as a Heading. So it can't be considered for printing
while (rs.next())
{
for (int i = 1; i <= numberOfColumns; i++) { if (i > 1) System.out.print(", ");
String columnValue = rs.getString(i);
System.out.print(columnValue);
}
System.out.println("");
}
st.close();
con.close();
}
catch(Exception ex)
{
System.err.print("Exception: ");
System.err.println(ex.getMessage());
}
}
}
----------------------------------------------------------------------------------------------
Upto now everything seems fine. But it has its own downsides... Like it can be implemented only on Windows Servers (i need Microsoft Excel Drivers .. right ;-)
So i thought of some other api's that i can use with out MS-Win dependency. Here are some those APIs
The structure of this File System is
The following is the top-level structure of an Excel Workbook:
Example.xls {
OLE2CDF headers
"Workbook" stream {
Workbook {
Static String Table Record..
Sheet names... and pointers
}
Sheet {
ROW
ROW
...
NUMBER RECORD (cell)
LABELSST Record (cell)
...
}
Sheet
}
}
... images, macros, etc.
Document Summary
Summary
Java Excel API which is developed by Sourceforge.net (my favorite open source resource) looks to be an option. You can find about it @ http://jexcelapi.sourceforge.net/
download jxl.jar to you classpath. Here is the code to read an Excel file named ExcelSheet.xls
import java.io.File;
import java.io.IOException;
import jxl.*;
import jxl.read.biff.BiffException;
import jxl.write.*;
public class ReadExcel
{
private String inputFile;
public void setInputFile(String inputFile)
{
this.inputFile = inputFile;
}
public void read() throws IOException, WriteException
{
File inputWorkbook = new File(inputFile);
Workbook w;
try {
w = Workbook.getWorkbook(inputWorkbook);
// To get First sheet use 0 else what ever the number you want.
Sheet sheet = w.getSheet(0);
// Column wise reading for all rows
for (int j = 0; j <>
for (int i = 0; i <>
Cell cell = sheet.getCell(j, i);
CellType type = cell.getType();
//If its a LABEL
if (cell.getType() == CellType.LABEL)
{
System.out.println("I got a label "+ cell.getContents());
}
//If its a NUMBER
if (cell.getType() == CellType.NUMBER)
{
System.out.println("I got a number "+ cell.getContents());
}
//Like there there are a lot of Types. Better you use an switch here
}
}
} catch (BiffException e)
{
e.printStackTrace();
}
}
public static void main(String[] args) throws WriteException, IOException
{
ReadExcel test = new ReadExcel();
test.setInputFile("c:/temp/ExcelSheet.xls");
test.read();
}
}
------------------------------------------------------------------------------------------------
Next one is to use APACHE POI
Apache's POI (Poor Obfuscation Implementation, the name that seemed to describe the format best) is a high-quality application that can read and write Excel and other MS-format files right from inside of your Java application
POI consists of various parts that fit together to deliver the data in a MS file format to the Java application. At the lowest level is the POIFS (POI FileSystem) API that contains the basic logic to process any OLE2CDF file. Above that sit the various components to process the application data. HSSF (Horrible SpreadSheet Format) understands the Excel structures, while HDF (Horrible Document Format) understands the Microsoft Word structures. In this article, we will look at how to use POIFS to read or write a OLE2CDF file. In future articles of this series, we shall see how to use HSSF and HDF, as well as HPSF (Horrible Property Sheet Format, used to read -- and eventually write -- document property information available through File->Property) and using the HSSF Cocoon Serializer to serialize XML to an Excel file.
Microsoft's OLE 2 Compound Document format once prevented Java programmers from reading and writing Office and MFC-generated file formats from pure Java. Java programmers often had to resort to native bridges which limited them to Microsoft Operating Systems. The Jakarta POI Project opens up new worlds to Java developers by allowing them to write to OLE2CDF-based file formats with pure Java -- even on UNIX. This article explained how to work with the underlying OLE 2 Compound Document Format. In the next article, we'll explain how to read, write, and Modify Excel files with HSSF. The final article will cover the HSSFSerializer for Cocoon, as well as HPSF and HDF.
HSSF has two APIs for reading: usermodel and eventusermodel. The former is most familiar, and the latter is more cryptic but far more efficient. The usermodel consists primarily of the classes in the org.apache.poi.hssf.usermodel package, as well as org.apache.poi.hssf.eventusermodel. (In earlier versions of HSSF, this was in the eventmodel package.) The usermodel package maps the file into familiar structures like Workbook, Sheet, Row, and Cell. It stores the entire structure in memory as a set of objects. The eventusermodel package requires you to become more familiar with the actual low-level structures of the file format. It operates in a manner similar to XML's SAX APIs or the AWT event model (the origin of the name)--and can be trickier to use. It is also read-only, so you cannot modify files using the eventusermodel
Reading Excel sheet using Apache's POI
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFRow;
/**
* A simple POI example of opening an Excel spreadsheet
* and writing its contents to the command line.
* @author Tony Sintes
*/
public class POIExample {
public static void main( String [] args ) {
try {
InputStream input = POIExample.class.getResourceAsStream( "qa.xls" );
POIFSFileSystem fs = new POIFSFileSystem( input );
HSSFWorkbook wb = new HSSFWorkbook(fs);
HSSFSheet sheet = wb.getSheetAt(0);
// Iterate over each row in the sheet
Iterator rows = sheet.rowIterator();
while( rows.hasNext() ) {
HSSFRow row = (HSSFRow) rows.next();
System.out.println( "Row #" + row.getRowNum() );
// Iterate over each cell in the row and print out the cell's content
Iterator cells = row.cellIterator();
while( cells.hasNext() ) {
HSSFCell cell = (HSSFCell) cells.next();
System.out.println( "Cell #" + cell.getCellNum() );
switch ( cell.getCellType() ) {
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println( cell.getNumericCellValue() );
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println( cell.getStringCellValue() );
break;
default:
System.out.println( "unsuported sell type" );
break;
}
}
}
} catch ( IOException ex ) {
ex.printStackTrace();
}
}
}
------------------------------------------------------------------------------------------------
Remember
getPhysicalNumberOfRows() returns the physical number of rows which may be more than the actual (logical) number of rows. The same goes for getPhysicalNumberOfCells().
You should check for nulls when fetching the HSSFRow and HSSFCell objects as shown.
Remember that Excel tables are often sparsely populated. So choose your data structures accordingly. POI accesses the data by sheet. In JExcelAPI you can directly access the data in any row and column
-------------------------------------------------------------------------------------------------
Conclusion
Comparison of JExcelAPI with Jakarta-POI (HSSF)
1. JExcelAPI is clearly not suitable for important data. It fails to read several files. Even when it reads it fails on cells for unknown reasons. In short JExcelAPI isn't suitable for enterprise use.
2. HSSF is the POI Project's pure Java implementation of the Excel '97(-2002) file format. It is a mature product and was able to correctly and effortlessly read excel data generated from various sources, including non-MS Excel products like Open Office, and for various versions of Excel. It is very robust and well featured. Highly recommended
As part of my current project requirement, i have to read an excel file and store that one into a Database.
Lets see how i did it and that may save you some time.
First thing that comes into picture is how to read Excel without using any Third party tools. Licensing is a big problem. So we have to achieve it using native JDK support. We can achieve this by considering the EXCEL document as a database file. To connect to this DB, we need to use the Microsoft provided Excel Database Drivers. (we can found them in Control panel --> Administrative Tools --> ODBC Data Source Administrator. Click Add and find the Microsoft Excel Driver (*xls))
Use the appropriate DNS name (say ExcelDNS) and select the workbook (say ExcelFile).
Remaining everything is similar to connecting to Other DBs.
Here is the Sample Code for it
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
package japp;
import java.io.*;
import java.sql.*;
/**
*
* @author TBalakrishna
*/
public class ExcelToJava
{
public static void main(String[] args)
{
Connection connection = null;
try{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
Connection con = DriverManager.getConnection( "jdbc:odbc:ExcelDNS" );
Statement st = con.createStatement();
ResultSet rs = st.executeQuery( "Select * from [Sheet1$]" );
ResultSetMetaData rsmd = rs.getMetaData();
int numberOfColumns = rsmd.getColumnCount();
//Remember First column will be treated as a Heading. So it can't be considered for printing
while (rs.next())
{
for (int i = 1; i <= numberOfColumns; i++) { if (i > 1) System.out.print(", ");
String columnValue = rs.getString(i);
System.out.print(columnValue);
}
System.out.println("");
}
st.close();
con.close();
}
catch(Exception ex)
{
System.err.print("Exception: ");
System.err.println(ex.getMessage());
}
}
}
----------------------------------------------------------------------------------------------
Upto now everything seems fine. But it has its own downsides... Like it can be implemented only on Windows Servers (i need Microsoft Excel Drivers .. right ;-)
So i thought of some other api's that i can use with out MS-Win dependency. Here are some those APIs
Library / package | License | Description |
---|---|---|
Actuate Spreadsheet Engine | Commercial, 30-day trial version available | Write Excel (XLS) files. |
ExcelAccessor | ? | Read and write Excel (XLS) files. |
ExcelReader | ? | JavaWorld article on how to read Microsoft Excel (XLS) files, including code. Requires an installed Microsoft ODBC driver for Excel files, and Sun's ODBC-JDBC driver. |
ExtenXLS | Commercial, 30-day trial version available | Read and write Microsoft Excel (XLS) files. |
JACOB project | LGPL | Java COM bridge with examples to interface Excel. |
Java Excel API | LGPL | Read Excel (XLS) 97, 98 and 2000 files. |
Java to Excel conversion | ? | Write SYLK files, which Excel can import and export. |
JExcel | Commercial | Swing component to integrate Excel via JNI. |
jXLS | LGPL | Create Excel files using XLS templates. |
POI | Apache Software License 1.1 | Read and write Microsoft OLE 2 compound document format files. This includes MS Office files (DOC, XLS, PPT) written with Office versions that were released after 1997. |
Snowbound | Commercial | Read Excel files. |
SoftArtisans OfficeWriter | Commercial | Write Word and Excel files. |
Vista JDBC | Commercial, 15-day trial version available | JDBC driver to access MS Excel (XLS) files. |
xlSQL | GPL | JDBC driver to access MS Excel (XLS) and CSV files. |
Most of these APIs are built on the fact that Microsoft Documents are saved on to hard using the Microsoft OLE 2 compound document file format
Starting with Office 95, all MS Office applications store their documents in an archive called the OLE2 Compound Document Format (OLE2CDF). It's a bit like the old FAT filesystem: it promotes fragmentation, doesn't support compression, and isn't linear (which would make streaming easier). Microsoft Foundation Classes allow applications to serialize to this format, so if you need interoperability with legacy Windows proprietary file formats or Office documents, you have to deal with OLE2CDF
Each OLE2CDF file on disk contains an entire filesystem, laid out using nested Directory Entries, which contain Entries. We are interested in Entry elements of the Document Entries type. A Document Entry contains application-specific (e.g. Excel) data structures
The Microsoft Excel 97 file format is also known as "BIFF8." Recent versions of Excel have changed very little about this file format, and writing out the new intricacies would serve no purpose other than to make everyone upgrade. So when we say Excel 97 format, we mean Excel 97-to-XP format.
The structure of this File System is
The following is the top-level structure of an Excel Workbook:
Example.xls {
OLE2CDF headers
"Workbook" stream {
Workbook {
Static String Table Record..
Sheet names... and pointers
}
Sheet {
ROW
ROW
...
NUMBER RECORD (cell)
LABELSST Record (cell)
...
}
Sheet
}
}
... images, macros, etc.
Document Summary
Summary
Java Excel API which is developed by Sourceforge.net (my favorite open source resource) looks to be an option. You can find about it @ http://jexcelapi.sourceforge.net/
download jxl.jar to you classpath. Here is the code to read an Excel file named ExcelSheet.xls
import java.io.File;
import java.io.IOException;
import jxl.*;
import jxl.read.biff.BiffException;
import jxl.write.*;
public class ReadExcel
{
private String inputFile;
public void setInputFile(String inputFile)
{
this.inputFile = inputFile;
}
public void read() throws IOException, WriteException
{
File inputWorkbook = new File(inputFile);
Workbook w;
try {
w = Workbook.getWorkbook(inputWorkbook);
// To get First sheet use 0 else what ever the number you want.
Sheet sheet = w.getSheet(0);
// Column wise reading for all rows
for (int j = 0; j <>
for (int i = 0; i <>
Cell cell = sheet.getCell(j, i);
CellType type = cell.getType();
//If its a LABEL
if (cell.getType() == CellType.LABEL)
{
System.out.println("I got a label "+ cell.getContents());
}
//If its a NUMBER
if (cell.getType() == CellType.NUMBER)
{
System.out.println("I got a number "+ cell.getContents());
}
//Like there there are a lot of Types. Better you use an switch here
}
}
} catch (BiffException e)
{
e.printStackTrace();
}
}
public static void main(String[] args) throws WriteException, IOException
{
ReadExcel test = new ReadExcel();
test.setInputFile("c:/temp/ExcelSheet.xls");
test.read();
}
}
------------------------------------------------------------------------------------------------
Next one is to use APACHE POI
Apache's POI (Poor Obfuscation Implementation, the name that seemed to describe the format best) is a high-quality application that can read and write Excel and other MS-format files right from inside of your Java application
POI consists of various parts that fit together to deliver the data in a MS file format to the Java application. At the lowest level is the POIFS (POI FileSystem) API that contains the basic logic to process any OLE2CDF file. Above that sit the various components to process the application data. HSSF (Horrible SpreadSheet Format) understands the Excel structures, while HDF (Horrible Document Format) understands the Microsoft Word structures. In this article, we will look at how to use POIFS to read or write a OLE2CDF file. In future articles of this series, we shall see how to use HSSF and HDF, as well as HPSF (Horrible Property Sheet Format, used to read -- and eventually write -- document property information available through File->Property) and using the HSSF Cocoon Serializer to serialize XML to an Excel file.
Microsoft's OLE 2 Compound Document format once prevented Java programmers from reading and writing Office and MFC-generated file formats from pure Java. Java programmers often had to resort to native bridges which limited them to Microsoft Operating Systems. The Jakarta POI Project opens up new worlds to Java developers by allowing them to write to OLE2CDF-based file formats with pure Java -- even on UNIX. This article explained how to work with the underlying OLE 2 Compound Document Format. In the next article, we'll explain how to read, write, and Modify Excel files with HSSF. The final article will cover the HSSFSerializer for Cocoon, as well as HPSF and HDF.
HSSF has two APIs for reading: usermodel and eventusermodel. The former is most familiar, and the latter is more cryptic but far more efficient. The usermodel consists primarily of the classes in the org.apache.poi.hssf.usermodel package, as well as org.apache.poi.hssf.eventusermodel. (In earlier versions of HSSF, this was in the eventmodel package.) The usermodel package maps the file into familiar structures like Workbook, Sheet, Row, and Cell. It stores the entire structure in memory as a set of objects. The eventusermodel package requires you to become more familiar with the actual low-level structures of the file format. It operates in a manner similar to XML's SAX APIs or the AWT event model (the origin of the name)--and can be trickier to use. It is also read-only, so you cannot modify files using the eventusermodel
Reading Excel sheet using Apache's POI
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.hssf.usermodel.HSSFCell;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFRow;
/**
* A simple POI example of opening an Excel spreadsheet
* and writing its contents to the command line.
* @author Tony Sintes
*/
public class POIExample {
public static void main( String [] args ) {
try {
InputStream input = POIExample.class.getResourceAsStream( "qa.xls" );
POIFSFileSystem fs = new POIFSFileSystem( input );
HSSFWorkbook wb = new HSSFWorkbook(fs);
HSSFSheet sheet = wb.getSheetAt(0);
// Iterate over each row in the sheet
Iterator rows = sheet.rowIterator();
while( rows.hasNext() ) {
HSSFRow row = (HSSFRow) rows.next();
System.out.println( "Row #" + row.getRowNum() );
// Iterate over each cell in the row and print out the cell's content
Iterator cells = row.cellIterator();
while( cells.hasNext() ) {
HSSFCell cell = (HSSFCell) cells.next();
System.out.println( "Cell #" + cell.getCellNum() );
switch ( cell.getCellType() ) {
case HSSFCell.CELL_TYPE_NUMERIC:
System.out.println( cell.getNumericCellValue() );
break;
case HSSFCell.CELL_TYPE_STRING:
System.out.println( cell.getStringCellValue() );
break;
default:
System.out.println( "unsuported sell type" );
break;
}
}
}
} catch ( IOException ex ) {
ex.printStackTrace();
}
}
}
------------------------------------------------------------------------------------------------
Remember
getPhysicalNumberOfRows() returns the physical number of rows which may be more than the actual (logical) number of rows. The same goes for getPhysicalNumberOfCells().
You should check for nulls when fetching the HSSFRow and HSSFCell objects as shown.
Remember that Excel tables are often sparsely populated. So choose your data structures accordingly. POI accesses the data by sheet. In JExcelAPI you can directly access the data in any row and column
-------------------------------------------------------------------------------------------------
Conclusion
Comparison of JExcelAPI with Jakarta-POI (HSSF)
1. JExcelAPI is clearly not suitable for important data. It fails to read several files. Even when it reads it fails on cells for unknown reasons. In short JExcelAPI isn't suitable for enterprise use.
2. HSSF is the POI Project's pure Java implementation of the Excel '97(-2002) file format. It is a mature product and was able to correctly and effortlessly read excel data generated from various sources, including non-MS Excel products like Open Office, and for various versions of Excel. It is very robust and well featured. Highly recommended
Comments