|
|
|
C# 3.0: The LINQ Revolution Begins by Herbert Schildt
Future generations of programmers will look back on version 3.0 as a pivotal event in the evolution of C# because it fundamentally and irrevocably reshapes the core of the language. The reason for this dramatic impact can be stated in a single acronym: LINQ. LINQ adds to C# an entirely new syntactic element, several new keywords, and a powerful new capability. The inclusion of LINQ significantly increases the scope of the language, expanding the range of tasks to which C# can be applied. Moreover, LINQ has charted the future direction of computer-language development because it offers a new way to think about and solve some of the most common, yet challenging problems that face today’s programmers. Simply put, the integration of LINQ into C# 3.0 sets a new standard that will affect the course of language design well into the future. Because of its fundamental role in C# 3.0, this article presents a brief introduction to this important feature. What Is LINQ? LINQ stands for Language-Integrated Query. It encompasses a set of features that let you retrieve information from a data source. As you may know, the retrieval of data constitutes an important part of many programs. For example, a program might obtain information from a customer list, look up product information in a catalog, or access an employee’s record. In many cases, such data is stored in a database that is separate from the application. For example, a product catalog might be stored in a relational database. In the past, interacting with such a database would involve generating queries using Structured Query Language (SQL). Other sources of data required a different approach. Therefore, prior to C# 3.0, support for such queries was not built into C#. LINQ changes this. LINQ adds to C# the ability to generate queries for any LINQ-compatible data source. Furthermore, the syntax used for the query is the same, no matter what data source is used. This means, for example, that the syntax used to query data in a relational database is the same as that used to query data stored in an array. It is no longer necessary to use SQL or any other non-C# mechanism. The query capability is fully integrated into the C# language. Having a uniform way to access data is a powerful, innovative concept. LINQ is not only changing the way that data is accessed, but it also offers a new way to think about and approach old problems. In the future, many programming solutions will be crafted in terms of LINQ. Its effects will not be limited to just database access. LINQ Fundamentals At LINQ’s core is the query. A query specifies what data will be obtained from a data source. For example, a query on a customer mailing list might request the addresses of all customers that reside in a specific city, such as Chicago or Tokyo. A query on an inventory database might request a list of out-of-stock items. A query on a log of Internet usage could ask for a list of the websites with the highest hit counts. Although these queries differ in their specifics, all can be expressed using the same LINQ syntactic elements. After a query has been created, it can be executed. One way this is done is by using the query in a foreach loop. Executing a query causes its results to be obtained. Thus, using a query involves two key steps. First, the form of the query is created. Second, the query is executed. Therefore, the query defines what to retrieve from a data source. Executing the query actually obtains the results. In order for a source of data to be used by LINQ, it must implement the IEnumerable interface. There are two forms of this interface: one generic, one not. In general, it is easier if the data source implements the generic version, IEnumerable<T>, where T specifies the type of data being enumerated. This interface is declared in System.Collections.Generic. A class that implements IEnumerable<T> supports enumeration, which means that its contents can be obtained one at a time, in sequence. All C# arrays support IEnumerable<T>. Thus, arrays are often used to demonstrate the central concepts of LINQ. Understand, however, that LINQ is not limited to arrays. A Simple Query To better understand the power of LINQ, it is helpful to work through a simple example. The following program uses a query to obtain the positive values contained in an array of integers. // Create a simple LINQ query. This program produces the following output: The positive values in nums: 1 3 5 As you can see, only the positive values in the nums array are displayed. Although quite simple, this program demonstrates the key features of LINQ. Let’s examine it closely. The first thing to notice in the program is the using directive: using System.Linq; To use the LINQ features, you must include the System.Linq namespace. Next, an array of int called nums is declared. All arrays in C# are implicitly convertible to IEnumerable<T>. This makes any C# array usable as a LINQ data source. Next, a query is declared that retrieves those elements in nums that are positive. It is shown here: var posNums = from n in nums The variable posNums is called the query variable. It refers to the set of rules defined by the query. Notice it uses var to implicitly declare posNums. This makes posNums an implicitly-typed variable. By using var, you are letting the compiler infer the type of variable, rather than explicitly specifying it. In queries, it is often convenient to use implicitly-typed variables, although you can also explicitly declare the type (which must be some form of IEnumerable<T>), if you choose. The variable posNums is then assigned the query expression. All queries begin with from. This clause specifies two items. The first is the range variable, which will receive elements obtained from the data source. In this case, the range variable is n. The second item is the data source, which in this case is the nums array. The type of the range variable is inferred from the data source. In this case, the type of n is int. Generalizing, here is the syntax of the from clause:
The next clause in the query is where. It specifies a condition that an element in the data source must meet in order to be obtained by the query. Its general form is shown here:
The boolean-expression must produce a bool result. (This expression is also called a predicate.) There can be more than one where clause in a query. In the program, this where clause is used: where n > 0 It will be true only for an element whose value is greater than zero. This expression will be evaluated for every n in nums when the query executes. Only those values that satisfy this condition will be obtained. In other words, a where clause acts as a filter on the data source, allowing only certain items through. The query ends with a select clause. It specifies precisely what is obtained by the query. For simple queries, such as the one in this example, the range value is selected. Therefore, it returns those integers from nums that satisfy the where clause. Notice that the select clause ends with a semicolon. Because select ends the query, it ends the statement and requires a semicolon. Notice, however, that the other clauses in the query do not end with a semicolon. At this point, a query variable called posNums has been created, but no results have been obtained. It is important to understand that a query simply defines a set of rules. It is not until the query is executed that results are obtained. Furthermore, the same query can be executed two or more times, with the possibility of differing results if the underlying data source changes between executions. Therefore, simply declaring the query posNums does not mean that it contains the results of the query. To execute the query, the program uses the foreach loop shown here: foreach(int i in posNums) Console.WriteLine(i + " "); Notice that posNums is specified as the collection being iterated over. When the foreach executes, the rules defined by the query specified by posNums are executed. With each pass through the loop, the next element returned by the query is obtained. The process ends when there are no more elements to retrieve. In this case, the type of the iteration variable i is explicitly specified as int because this is the type of the elements retrieved by the query. Explicitly specifying the type of the iteration variable is fine in this situation, since it is easy to know the type of the value selected by the query. However, in more complicated situations, it will be easier (or in some cases, necessary) to implicitly specify the type of the iteration variable by using var. Before leaving this example, one more point needs to be made. The select clause in the example is the simplest possible because it just returns the range variable, which contains an object from the sequence. Frequently, more sophisticated select clauses are used. For example, assuming the preceding example, consider this select clause: select Math.Sqrt(n); It returns the square root of n. Thus, the resulting sequence will contain the square root of the positive values of nums. Here is another example: select n * 10; It returns the value of n multiplied by 10. The key point is that the select clause lets you finely tune what is selected. For example, when querying a mailing list, you might return just the last name of each recipient, rather than the entire address. Or, you might return an object that combines elements from two different sequences, such as mailing list with an accounts payable list. The flexibility of the select clause is one of the reasons that LINQ is so powerful. LINQ's Key features LINQ is supported by a set of interrelated features, including the query syntax and keywords added to the C# language, lambda expressions, anonymous types, implicitly-typed variables, and extension methods. Each is briefly described here. The keywords added by LINQ are shown here:
As you can see, LINQ adds a considerable number of new keywords, which significantly expand C#. LINQ is, in effect, a language within a language. A lambda expression is a new syntactic feature that provides a streamlined, yet powerful way to define what is, essentially, a unit of executable code. As such, it offers a superior means of creating an anonymous method. Moreover, lambda expressions add an element of functional programming to C#. Thus, lambda expressions enhance the C# language in a way that goes beyond their use with LINQ. An anonymous type is a type, such as a class, that has no publicly accessible name. Its primary use is to create an object returned by a query. Often, the outcome of a query is a sequence of objects that are either a composite of two (or more) data sources or a subset of the members of one data source. Frequently such a type relates only to the query and is not used as a stand-alone type elsewhere in the program. In such a case, using an anonymous type eliminates the need to declare a class that will be used simply to hold the outcome of the query. When working with anonymous types, you will often need to use implicitly-typed variables. As described earlier, an implicitly-typed variable is created by use of the new var keyword. This tells the compiler to infer the type of the variable, rather than you specifying it explicitly. An extension method lets you extend the functionality of a class, but without the use of inheritance. This is accomplished through a new use of the this keyword. Here is how the process works. An extension method is a static method that is contained within a static, non-generic class. The type of its first parameter determines the type of objects on which the extension method can be called, and the first parameter must be modified by this. The object on which the method is invoked is passed automatically to the first parameter. It is not explicitly passed in the argument list. A key point is that even though an extension method is declared static, it can still be called on an object, just as if it were an instance method. Thus, by creating an extension method for a class, you are defining new functionality that can be used by any object of that class. For example, C# 3.0 uses extensions methods to greatly expand the functionality of IEnumerable<T>. These methods are frequently used within queries. Is LINQ the Future of Programming? The more you know about LINQ the more you come to understand that it can be used for many tasks other than database queries. For example, you can use a LINQ-based solution to compute the average of the values in a collection, to search an array, or permute a sequence. Thus, even if your application doesn't use a database, you may still find yourself using LINQ as part of your solution. Furthermore, lambda expressions are expected to play an increasing role in programming, and they are finding growing support in the programming community at large. For example, support for lambda expressions is currently being added to C++ by the C++ standardization committee. In the final analysis, LINQ is a powerful subsystem that gives you the ability to craft portable database queries. It is also reshaping the way we programmers think about and meet many other types of programming challenges. Simply put, LINQ will be a part of every C# programmer's future. It is something that no programmer can afford to ignore. |
|
|