Migrating Database Applications from JDBC to ADO.NET
Taking the pain out of the process

Digg This!

Today, more than ever, enterprises are faced with the necessity of migrating applications from one platform to another. Here, we try to take some of the pain out of migrating a database application written in Java to ADO.NET.

In this article we compare the various database API calls in JDBC (Java DataBase Connectivity) and ADO.NET (ActiveX Data Objects) side by side to enable an application programmer to easily migrate a database application written in Java to ADO.NET. We also discuss which API calls are better suited to various situations in order to help application programmers understand when to use which API.

We provide snippets of code throughout the article to enable a better understanding of the richness of the database API, what type of information is returned from the various API calls, and how this information can be used.

A transformation from the use of JDBC to the use of ADO.NET to access DB2 V8.1 is used as an example. In addition, we cover the pitfalls associated with some of the ways a database application can be coded.

JDBC and ADO.NET Architectures
Figures 1 and 2 depict the ADO.NET and JDBC architectures.

 
Figure 1

 
Figure 2

JDBC

  • The JDBC architecture uses four different driver types, based upon how the database vendor implements communication with the database.
  • JDBC runs on any platform where Java is implemented.
  • The database metadata is accessible through the JDBC API.
  • Thin and thick clients can be built using JDBC.
  • JDBC cursors are able to provide similar functionality compared with the DataSet object in ADO.NET, although it does not function in a disconnected mode. ADO.NET
  • The ADO.NET architecture integrates XML to provide non-Windows applications access to databases.
  • All database commands are executed via the Command object or the DataAdapter object associated with a given provider.
  • Disconnected and connected modes are employed by the ADO.NET architecture. The disconnected mode is used with the DataSet object, which provides an in-memory map of at least one table that has been read into it. It functions as a table that can be used later to apply changes back to the database.
  • Thin and thick clients can be built using ADO.NET.
  • The database metadata is accessible using ADO.NET.
Data Providers
ADO.NET data providers play a similar role in the .NET environment to that of JDBC drivers in Java. A data provider is a set of interfaces and classes that together provide access to an underlying data source, much like a JDBC driver.

 

Choosing a .NET Framework Data Provider
The .NET Framework Data Provider for SQL Server uses its own protocol to communicate with SQL Server. It is lightweight and performs well because it is optimized to access an SQL Server directly without adding an OLE DB or Open Database Connectivity (ODBC) layer. The .NET Framework Data Provider for OLE DB communicates with an OLE DB data source through both the OLE DB Service component, which provides connection pooling and transaction services, and the OLE DB Provider for the data source.

Options for Connecting to DB2 from .NET

  • OleDB .NET Managed Provider: Both Microsoft and IBM recommend utilizing an OleDB data provider for accessing most DB2 data when DB2 V8.1 or lower is used or when a COM object is the caller. Within the .NET Framework, OleDB providers are exposed via COM InterOp and essentially use the same drivers available for ADO development (see Table 1).

     

  • ODBC .NET Managed Provider: This provides access to native ODBC drivers the same way the OleDB .NET Data Provider provides access to native OleDB providers. The ODBC .NET Data Provider is an add-on component to the .NET Framework. The ODBC .NET Data Provider is intended to work with all compliant ODBC drivers. The DB2 .NET Data Provider is an add-in component to the Visual Studio .NET Framework. This provider accesses IBM DB2 database servers running on a variety of hardware and operating system platforms. At the time of this writing, the provider is still in beta but is expected to be in production soon. The DB2 .NET Data Provider provides connectivity to the DB2 family of servers from applications written using ADO.NET.
  • .NET Framework Data Provider for SQL Server: Recommended for applications using Microsoft SQL Server 7.0 or later.
  • .NET Framework Data Provider for OLE DB: Recommended for applications using DB2, Microsoft SQL Server 6.5 or earlier, or any OLE DB provider that supports the OLE DB interfaces listed in the OLE DB Interfaces Used by the .NET Framework Data Provider for OLE DB section in the Microsoft Visual Studio .NET API Reference. (OLE DB 2.5 interfaces are not required.) For Microsoft SQL Server 7.0 or later, the .NET Framework Data Provider for SQL Server is recommended.
  • .NET Framework Data Provider for ODBC: Recommended for applications using ODBC data sources.
  • .NET Framework Data Provider for Oracle: Recommended for applications using Oracle data sources.
The .NET Framework ships with five different data providers, each of which is contained within a specific namespace. Three of the data providers are database specific: the Oracle Data Provider (in the System.Data.OracleClient namespace); the SQL Server Data Provider, for SQL Server 7.0 and higher (the core of which is contained in the System.Data. SqlClient namespace, but additional classes are also contained in the System.Data.SqlTypes namespace); and the SQL Server CE Data Provider (in the System.Data.SqlServerCE namespace).

 

The other two data providers are not database specific; they use two of Microsoft's older data access tools to interact with a data source: the ODBC Data Provider (in the System.Data.Odbc namespace) and the OLE DB Data Provider (in the System.Data.OleDb namespace). Provided that you have an ODBC or OLE DB driver for your database, you can use the corresponding database-generic data provider to perform your ADO.NET operations. However, because these two providers are not database specific, there are more layers of software to go through than when using a native data provider, adding memory and performance overhead. Also, database-specific functionality is lost due to abstraction into generic method calls, so it is impossible to fully leverage your database management system.

Commands
The ADO.NET command objects provide similar functionality to the JDBC statement objects (see Table 2). Both are responsible for the submission of SQL and retrieval of results from the underlying DBMS. However, unlike JDBC, which has the Statement class for dynamic SQL statements; the PreparedStatement class for static, reusable SQL statements; and the CallableStatement class for calling stored procedures, all SQL statements and stored procedure calls are executed using instances of the command class.

 

Three properties in a command object are used to specify the interaction with the database:

  1. CommandText: When calling stored procedures, it contains the name of the stored procedure to call. When executing SQL text, it contains the SQL statements to be executed.
  2. CommandType: Specifies the type of statement being executed, such as a stored procedure or SQL text.
  3. Parameters: Specifies the parameters used when calling the stored procedure or SQL text. ADO.NET allows you to use parameters with SQL text commands instead of using string concatenation, to avoid the security threat posed by string concatenation.
Transforming a JDBC Query to an ADO.NET Query
Querying the Database Using JDBC

1.  Load the JDBC driver: An application must first load a driver that enables JDBC classes to communicate with a data source. In our included sample program we load the generic IBM DB2 database driver. The fully qualified driver name is required.

Class.forName("COM.ibm.db2.jdbc.app.DB2Driver");

2.  Connect to a data source: Once the data source driver has been loaded a connection must be established. The driver supplies methods to make a connection, although a specific format for the JDBC protocol connection URL is required. The format is jdbc:<subprotocol>:<subname>. Our sample program uses "jdbc:db2:sample" as the JDBC URL.

One obvious point that is often taken for granted is that the use of URLs means that JDBC applications are automatically network and Internet enabled.

Using the DriverManager class, an application requests a connection using the passed URL. The DriverManager selects the appropriate driver; here, only the DB2 driver, and returns the connection. The standard form for the connection request is as follows:

Connection con = DriverManager.getConnection(URL,username,Password );

3.  Create a statement: While the Connection class has a number of capabilities, a Statement object must be created in order to use DDL, Data Manipulation Language (DML), or SQL statements. We use the Connection object previously created to request a Statement object.

Statement stmt = con.createStatement();

4.  Retrieve data: To retrieve information from a database, you must send SQL SELECT statements to the database via the Statement.executeQuery method, which returns the requested information as rows of data in a ResultSet object. A default ResultSet is examined row by row using the ResultSet.next() method, which moves the position of the cursor to the next row. The ResultSet.getXXX() methods are useful in obtaining individual column data.

The following SQL statement retrieves all of the employee data from the EMPLOYEE table in the SAMPLE database.

SELECT * from EMPLOYEE;

Note that the JDBC URL determines which database (SAMPLE, in this case) we connect to and, subsequently, query from.

In order to execute our SQL statement, we must invoke the executeQuery() method of the Statement object as follows:

ResultSet result = stmt.executeQuery("SELECT * from EMPLOYEE");

5.  Data navigation: ResultSet.next() returns a Boolean: True if there is a next row, and false if not (meaning the end of the data/set has been reached). Conceptually, a pointer or cursor is positioned just before the first row when the ResultSet is obtained. Invoking next() on the ResultSet moves the cursor to the first row, followed by subsequent rows for each execution of the ResultSet next() method. In anticipation of only one row being returned, an application might choose to use an if statement. Otherwise, a while loop is the norm.

6.  Data extraction: Once the cursor is positioned at a row, the application may retrieve data on a column-by-column basis using the appropriate ResultSet.getXXX method. The following example retrieves the FIRSTNAME from the EMPLOYEE table in the SAMPLE database.

String fname = result.getString("FIRSTNAME");

As with any other socket program, the DBMS engine is free to decide which port it will use. While TCP/IP is generally the norm, other communication protocols may be used. DB2, for example, can also use APPC (Advanced Program to Program Communication) on several platforms.

A connection is automatically closed when it is garbage-collected, but cautious programmers always close connections explicitly to directly determine that and when this occurs and to conserve resources. Note that while the API specifically says that closing a connection "releases... database and JDBC resources immediately," the JDBC recommendation is to explicitly close connections and statements.

Database Query Sample Using JDBC
The default ResultSet object, returned by the executeQuery() method of the Statement object, has a cursor that moves forward only by use of the next() method, as shown in Listing 1 (all of the code referenced in this article can be downloaded from www.sys-con.com/dotnet/sourcec.cfm). It should be noted that executeQuery() always returns a non-null ResultSet. A common mistake is to compare the ResultSet to null in order to determine whether the ResultSet returned rows. Short of a driver error, the ResultSet is always non-null.

Transforming JDBC
Querying the Database Using ADO.NET
The simplest pattern used to query a database in ADO.NET is quite similar to the pattern used in JDBC. You create a connection object and a command object (the ADO.NET equivalent of a statement object). The connection to the database is maintained while the query is executed and while each row from the query is retrieved one by one. Once the method has completed processing all rows, the connection is closed.

A typical SQL select program flow for an ADO.NET application consists of the following:

1.  Define the connection string:

string connectionString = "Provider=IBMDADB2.1;User ID = spchang;
Password=sp2501c; Data Source=SAMPLE";

2.  Connect to a data source: Obtain an SqlConnection object, e.g., conn, for connecting to SQL Server or an OleDBConnection object, e.g., conn, for connecting to an OLE DB data source. There is also a connection object for ODBC. Use the open method of the conn object to open the connection. If the conn object is instantiated in a using structure, then when the application has completed, the conn object will be disposed of. A better way to code is to use the close or dispose method of the conn object to close the conn object when it is no longer needed. The rest of the comparison will use the close method of the conn object instead. In this case everything was in a try/catch block.

OleDbConnection con = new OleDbConnection();
con.ConnectionString = connectionString;

3.  Use a try/catch finally block to determine if Step 2 was successful.

4.  If Step 2 is successful, then create an SqlCommand object, e.g., cmd; or an OleDBCommand object, e.g., cmd, with a select statement that is to be executed and that takes the conn object as a parameter.

OleDbCommand cmd = new OleDbCommand();
cmd.Connection = con;
cmd.CommandText = "Select * FROM EMPLOYEE";
con.Open();

5.  To execute a select statement, use the ExecuteReader method of the cmd object and either an SqlDataReader object (e.g., rdr will be returned) or an OleDbDataReader object (e.g., rdr will be returned), depending on the data source.

OleDbDataReader reader = cmd.ExecuteReader();

6.  Retrieve data and data extraction. Iterate over the rdr object using a while loop and the Read method of the rdr object to obtain the next row. Using the Getxxx methods of the rdr object the values of different columns for a row can be displayed. The Getxxx methods use the number of the column as the input parameter. Before displaying any values determine if the value is null.

while(reader.Read())
{
string empno = reader.GetString(0);
string firstname = reader.GetString(1);
Console.Out.WriteLine(empno + "--->" + firstname);
}

When accessing column data, use the typed accessors like GetString, GetInt32, and so on. This saves you the processing required to cast the object returned from GetValue as a particular type.

7.  When there are no more rows in the result set the while loop is exited and then the finally block can be executed to close the rdr and conn objects.

finally
{
try{
if (con != null)
{
con.Close();
}
}catch(Exception ex)
{
Console.Out.WriteLine(ex.Message);
}
}

Take the following steps to display the results of a query:

  • Connection: Establishes a connection with the actual database. This interface provides functionality similar to the java.sql.Connection interface.
  • Command: Executes commands over the database connection. This interface provides similar functionality to the java.sql.Statement interface and its descendant interfaces, java.sql.PreparedStatement and java.sql.CallableStatement.
  • DataReader: Iterates over the results of an SQL query. Requires an open data connection. The functionally of this interface is similar to the java.sql.ResultSet interface.
  • DataAdapter: Used in disconnected scenarios, it acts as the middleman between the database and an application. It does not require an open data connection. There is no JDBC equivalent to this interface.
Release Connection Resources
Always explicitly close your Connection or DataReader objects when you are finished using them. While garbage collection eventually cleans up objects and therefore releases connections and other managed resources, garbage collection only occurs when it is needed. Therefore, it is still your responsibility to make sure any expensive resources are explicitly released. Also, connections that are not explicitly closed might not be returned to the pool. For example, a connection that has gone out of scope but that has not been explicitly closed will be returned to the connection pool only if the maximum pool size has been reached and the connection is still valid.

Coding Alternative for Better Resource Cleanup
For C# programmers, a convenient way to ensure that you always close your Connection and DataReader objects is to use the "using" statement. The using statement automatically calls Dispose on the object being "used" when leaving the scope of the using statement (see Listing 3).

Usage Patterns
Irrespective of the .NET data provider you use, you must always:

  • Open a database connection as late as possible.
  • Use the connection for as short a period as possible.
  • Close the connection as soon as possible. The connection is not returned to the pool until it is closed through either the Close or Dispose method. You should also close a connection even if you detect that it has entered the broken state. This ensures that it is returned to the pool and marked as invalid. The object pooler periodically scans the pool, looking for objects that have been marked as invalid.
The Elegance of ADO.NET
Querying the Database with the Disconnected Model
ADO.NET also allows you to interact with the database without maintaining an open connection throughout the entire duration of your interaction. The pattern for the disconnected data model is similar to the pattern for the connected model except that you use different objects and delegate some of the operations to these objects (see Listing 4).

At the end of the Fill() method, the DataSet object contains an in-memory snapshot of the result of the query in one of its tables. When executing a single SELECT statement, as in Listing 3, the results of the query are in the first table in the Tables collection, which is why we access the results using ds.Tables[0]. In this manner, the database connection is maintained for the least amount of time possible, drastically improving the reusability of the connection object. Note, however, that the disconnected data model may not be appropriate for memory-constrained situations.

All the database activity occurs within the da.Fill() method. The Fill() method has the following responsibilities:

  1. Open the database connection (passed in the constructor).
  2. Submit the SQL command (also passed in the constructor) to the database.
  3. Return all rows resulting from the query back to the application. In particular, persist all rows inside a table in the DataSet object.
  4. Close the database connection.
XML Support in ADO.NET
Database vendors have recognized the importance of XML in the future of their industry. Native support for XML has been built into a variety of proprietary database products and the use of XML will continue to grow with the emergence of protocols such as those used for Web services. With that in mind, Microsoft has integrated XML into ADO.NET. DataSet dynamically builds an XML schema to store the data. Relational data and XML data can be handled in a similar way. ADO.NET works offline once the data is fetched. For example, it is possible to exchange between XML and a DataSet with a single method call. This means that it is a simple and straightforward process to represent a database query as XML. Also, a DataSet object is able to model and enforce column constraints and table relationships in addition to storing raw data. These constraints can also be turned into XML by way of XML Schemas. Listing 5 uses the XML functionality of ADO.NET to perform a simple query on the database and save the contents of the query to an XML file.

Because of its inherent design, Recordsets and cursors that access and modify data can impact performance on the server side by tying up valuable resources. Providing a comprehensive disconnected data-access model, tight integration with XML, and seamless integration with the .NET Framework, ADO.NET leverages XML to provide optimized data access for the .NET Framework. ADO.NET uses XML extensively, while JDBC has a more relational flavor. JDBC is connection oriented; ADO.NET always works in a nonconnected, or offline, mode. .NET DataSets are a kind of in-memory-database cache.

Exception Handling
JDBC Exceptions
JDBC lets you see the warnings and exceptions generated by your DBMS and by the Java compiler. To see exceptions, you can have a catch block print them out. For example, the two catch blocks from the sample code print out a message explaining the exception (see Listing 6).

If you were to run the sample twice, you would get an error message similar to this:

SQLException: There is already an object named 'EMPLOYEES' in the database.
Severity 16, State 1, Line 1

This example illustrates printing out the message component of an SQLException object, which is sufficient for most situations.

There are actually three components, however, and for the sake of completeness, you can print them all out. Listing 7 shows a catch block that is complete in two ways. First, it prints out all three parts of an SQLException object: the message (a string that describes the error), the SQL state (a string identifying the error according to the X/Open SQLState conventions), and the vendor error code (a number that is the driver vendor's error code number). The SQLException object ex is caught, and its three components are accessed with the methods getMessage, getSQLState, and getErrorCode.

The second way the catch block shown in Listing 7 is complete is that it catches all of the exceptions that might have been thrown. If there is a second exception, it will be chained to ex, so ex.get NextException is called to see if there is another exception. If there is, the while loop continues and prints out the next exception's message, SQLState, and vendor error code. This continues until there are no more exceptions.

The vendor error code is specific to each driver, so you need to check your driver documentation for a list of error codes and what they mean.

Retrieving Warnings
SQLWarning objects are a subclass of SQLException that deals with database access warnings. Warnings do not stop the execution of an application, as exceptions do; they simply alert the user that something did not happen as planned. For example, a warning might let you know that a privilege you attempted to revoke was not revoked. Or a warning might tell you that an error occurred during a requested disconnection.

A warning can be reported on a Connection object, a Statement object (including PreparedStatement and CallableStatement objects), or a ResultSet object. Each of these classes has a getWarnings method, which you must invoke in order to see the first warning reported on the calling object. If getWarnings returns a warning, you can call the SQLWarning method getNextWarning on it to get any additional warnings. Executing a statement automatically clears the warnings from a previous statement, so they do not build up. This means, however, that if you want to retrieve warnings reported on a statement, you must do so before you execute another statement.

ADO.NET Exceptions
The .NET data providers translate database-specific error conditions into standard exception types, which you should handle in your data access code. The database-specific error details are made available to you through properties of the relevant exception object.

All .NET exception types ultimately are derived from the base Exception class in the System namespace. The .NET data providers throw provider-specific exception types. For example, the SQL Server .NET Data Provider throws SqlException objects whenever SQL Server returns an error condition. Similarly, the OLE DB .NET Data Provider throws exceptions of type OleDbException that contain details exposed by the underlying OLE DB provider.

The OleDbException class is derived from ExternalException, the base class for all COM Interop exceptions.

This SqlException class exposes properties that contain details of the exception condition. These include:

  • A Message property that contains text describing the error.
  • A Number property that contains the error number, which uniquely identifies the type of error.
  • A State property that contains additional information about the invocation state of the error. This is usually used to indicate a particular occurrence of a specific error condition.
For example, if a method can generate the same error from more than one line, the state should be used to identify the specific occurrence. The Errors collection contains detailed error information about the errors that SQL Server generates. The Errors collection will always contain at least one object of type SqlError.

Conclusion
ADO.NET is optimized for performance, throughput, and scalability. The choice between using a DataReader or a DataSet in ADO.NET depends on whether extensive processing of data is performed without requiring an open connection to the data source, which frees the connection to be used by other clients. If this is the case, the DataSet disconnect model is used; otherwise, the connection-oriented approach of the DataReader is used. C# has many data access features and APIs that have counterparts in Java, so with a small amount of effort the Java development community can seamlessly migrate their code to C#.

Resources

  • Gibbons, P. (2002) .NET Development for Java Programmers. Apress.
  • Microsoft Visual Studio .NET API Reference: http://msdn.microsoft.com/library/default.asp
  • About Andrew J. Bradfield
    Andrew J. Bradfield is an advisory software engineer at IBM's Software Division located at the IBM T.J. Watson Research Center in Hawthorne, NY. His projects include J2EE development of content management and data management solutions. Andrew graduated with a degree in computer science from Hamilton College in Clinton, NY.

    About Naga Ayachitula
    Naga (Arun) Ayachitula is a senior software engineer at IBM's Software Division located at the IBM TJ Watson Research Center in Hawathorne, NY. Arun holds a master's degree in computer science and is involved in the development of enterprise content management solutions for the media.

    About Michael S. Schwartz
    Michael S. Schwartz is a senior software engineer at the IBM TJ Watson Research Center in Hawthorne, NY. Michael holds a PhD in mathematics. His projects have included database systems, decision support systems, multimedia systems, and financial systems.