Hooray For Functions -- The Value of Returning a Value

Hooray For Functions -- The Value of Returning a Value
by Geoff Canyon Version 1.0 February 12, 2005
See original post by the author on Geoff Canyon's Appeal to Authority blog. Reposted here by permission of the author. Introduction Note: this article was written based on Transcript, the language used in Revolution, but is equally applicable in other languages. Users of other xTalk environments should feel right at home with the examples. For an introduction to functions in Transcript, there is an introductory article by Jacqueline Landman Gay. There are two ways to segment code in Revolution, or (generally) in any programming language: handlers (subroutines) and functions. The practical difference between them is small: functions return a value, handlers don't. Even that distinction isn't firm -- a handler can use the return command, and the calling routine can then use the result to retrieve the value. The philosophical differences between handlers and functions, however, are much greater. Side Effects In general, a handler does something, while a function evaluates something and returns a result. In functional parlance, a handler has side effects -- it does something outside of itself. If it didn't, there would be no purpose in calling it. In contrast, a function's purpose is to perform a calculation and return the result. The result of a function is its purpose. A function can do something outside itself -- have side effects -- just as easily as a handler can. This is generally a bad idea. A function written to have side effects loses many of its advantages over handlers, and in fact might as well be a handler. Advantages Because of these differences between functions and handlers, functions offer several dramatic advantages over handlers: greater abstraction greater stability greater reusability greater extensibility greater testability greater flexibility greater clarity greater documentability Example As an example, let's consider the case where you need to set the location of an object relative to another object. We'll first look at how to perform the task with a handler, and then with a function. Handler A typical handler designed to perform this task might take references to the two objects and the offsets, and then perform the necessary calculations and set the location of the target object: on setRelativeLoc pObjSource,pObjTarget,xOffset,yOffset put the location of pObjSource into tLoc add xOffset to item 1 of tLoc add yOffset to item 2 of tLoc set the location of pObjTarget to tLoc end setRelativeLoc Or, more succinctly: on setRelativeLoc pObjSource,pObjTarget,xOffset,yOffset get the location of pObjSource set the location of pObjTarget to (xOffset + item 1 of it),(yOffset + item 2 of it) end setRelativeLoc A handler like this can be called in a single line, like so: setRelativeLoc (the long id of button "source"),(the long id of button "target"),20,30 This will set the location of button "target" to be 20 pixels to the right and 30 pixels below button "source." Bad Function Now consider how to perform this same task using a function. First, determine what is to be done. The handler is performing two actions: first, it is adding the offsets to the location of the first object; second, it is setting the location of the target object. Setting the location of the target object isn't suitable to a function -- it's an action, not an evaluation. So the task that can be converted to a function is the offset addition. The function might look like this: function addOffsets pLoc,x,y add x to item 1 of pLoc add y to item 2 of pLoc return pLoc end addOffsets This function is completely specific to the task at hand, which is bad. This function has few, if any, advantages over the handler version above. Good Function We want a more general function: code that can do the same job, but which can also do any job like it. So instead, we'll write the function like this: function addLists p1,p2 repeat with i = 1 to the number of items in p1 add item i of p2 to item i of p1 end repeat return p1 end addLists This function is better than the bad function because it doesn't just solve the problem at hand, but any problem like it. It can take two lists as arguments, and no matter how many items are in the lists it will return the correct value. Revolution has a faster way to get the same task done: function addLists p1,p2 put item 1 to (the number of items of p1) of p2 into p2 split p1 using comma split p2 using comma add p2 to p1 combine p1 using comma return p1 end addLists It's a few more lines of code, but its performance should scale better than the first solution. The call to the function is a little different than the call to the handler. It's still one line: set the location of button "target" to addLists(the location of button "source",(20,30)) Because the function simply performs a calculation and returns a result, the line calling the function actually does the work of setting the button's location. The line calling the handler, on the other hand, does nothing but call the handler, since the handler does the work. Relative Merits Now consider the merits of the handler vs. the function. We'll take each point in turn. Abstraction Which routine hides more complexity and is less specific? Loosely, abstraction means two things: hiding complexity lack of specificity. For simple tasks such as we are discussing, there is very little complexity to hide. However, it's important to note that the handler includes both what is being done and how it is done. The function does not include what is being done, only how. That means that what you need to know is maintained in the calling procedure, an important distinction. For small tasks, the advantage lies in keeping the logic grouped in the calling procedure. For large tasks with many steps, grouping functionality in the called procedure can hide complexity, suggesting the use of a handler. On the basis of lack of specificity, the function clearly offers more abstraction than the handler. The handler is designed for the exact task being performed. The function simply takes two lists, performs math on them, and returns the resulting list. Stability Which routine is less likely to need modification in the future? A well-defined function is like a screwdriver: it will likely never need to be changed, because its purpose never changes. Of course if the screwdriver is a flathead and you need a Phillips head you'll need another screwdriver. But both tools will be well-defined and unchanging. In the example given, the addLists function will only change if a bug is found in it, or if a more efficient way to perform the task is found. A handler, by comparison, might change whenever what it is supposed to do changes. Reusability Which routine will come in handy in the future? Reusability, stability, and abstraction (in the "lack of specificity" sense) go hand in hand; the function wins big here. Unless there is another object that needs its location set, the handler is useless. The function can be reused any time you need to perform the same list operation. For example, consider the case where, rather than the location, the rectangle of an object needs to be set relative to another object's rectangle. The existing handler would need to be completely rewritten (adding significant complexity) to accomplish this. The alternative is to write a new handler specifically for this task -- a handler that is again suited only to the exact task at hand. By comparison, the function doesn't need to be modified at all. This call will do the trick: set the rect of button "target" to addLists((the rect of button "source"),(30,30,60,40)) The task doesn't need to involve resizing objects at all. It might be any task that requires adding one list of values to another. Because the function has no side effects (setting the location of an object), it can be used anywhere. Extensibility Which routine will serve as a useful building block? Consider how to handle the similar task of setting the location of an object relative to another, but limiting the location of the target object to multiples of 10. With the handler, one way to handle that would be to pass in another argument like this: on setRelativeLoc pObjSource,pObjTarget,xOffset,yOffset,pRoundValue put the location of pObjSource into tLoc add xOffset to item 1 of tLoc add yOffset to item 2 of tLoc if pRoundValue is not empty then put pRoundValue * (item 1 of tLoc div pRoundValue) into item 1 of tLoc put pRoundValue * (item 2 of tLoc div pRoundValue) into item 2 of tLoc end if set the location of pObjTarget to tLoc end setRelativeLoc Note how the handler has gotten significantly more complex because of one simple change. This is a situation that will only get worse. What if the rounding values are different for x and y? What if there are limits on the appropriate values for the location? With a handler, the complexity of the code grows out of proportion with the complexity of the problem to be solved. Previous calls to the modified handler will still work since pRoundValue will be empty, but this requires planning on the part of the developer to ensure that previous calls to the handler continue to work. Setting the location of an object to multiples of 10 would look like this: setRelativeLoc (the long id of button "source"),(the long id of button "target"),20,30,10 Note how calling the handler has grown more complex. Without additional documentation, who knows what the "10" means, or for that matter the 20 or the 30. By contrast, implementing this change in the function is straightforward. In fact, the addLists function doesn't change at all. Because the function simply takes a set of values and returns a result, behavior can be modified either before or after the call to the function without modifying the function in any way. Rounding the result is a different task, so an additional function meets the need: function roundList pList,pRoundValue put empty into tReturn repeat for each item i in pList put pRoundValue * (i div pRoundValue) & comma after tReturn end repeat return char 1 to -2 of tReturn end roundList Note how this function, like the original addLists function, is simple, stable, testable, reusable, etc. Using the new function to perform the task would look like this: set the location of button "target" to roundList(addLists(the location of button "source",(20,30)),10) The most complex thing about the function calls is that the second argument to the roundList function is hidden at the end, after the bulky call to addLists. Testability Which routine can more easily support automated testing? Automated testing for the handler is hard. The only way to test the handler is to call it and observe the results in the environment. This can be done in Revolution, but not without the necessary objects to support the command's requirements. Practically, this means that a test routine would have to create the objects to support the test, and then remove them. This is risky in an automated setup. Automated testing for the function is easy. Simply write a routine to supply it with a range of values and check the results. There is no need to actually work with objects -- the function is more generic than that, and yet it does the same core task as the handler. This test can easily be incorporated into a set of unit tests to validate significant portions of a project's code. The stability of a function plays a role here as well. A stable routine can have an automated test. A changing routine does not support automated testing as easily because the automated test has to be updated each time the requirements change for the handler it tests. Flexibility Which routine can be used in more circumstances? This is an area where the handler comes up dramatically short. The handler actually does the work of setting the object's location. Therefore, it is difficult to use the handler in any other situation, or modify what it does. The function simply takes two lists and works on them. Therefore, the function can be used any time the list-processing task it performs is needed. Part of designing good functions is looking for the underlying problem. To do this, you subdivide the result you want as much as possible. In this case you start with the question, "I have two objects and I want to set the location of one of them to a position relative to the other." After consideration you subdivide that problem into two parts: Given the location of an object, find a position that is offset a certain amount from it. Set the location of another object to that position. If you find the underlying problem, you've probably found a task that you'll be performing again and again. As you gain experience in creating functions it will become easier to find the underlying problems. Of course, sometimes there is no underlying structure to the task at hand. In that case you'll end up creating functions you're never likely to need again. This happens less often than you might think. A Brief Intermission: A Handler in Function's Clothing In several of these sections, the superiority of functions results from the fact that they simply return a value, as opposed to actually doing the work of setting the object's location. This begs the question, why not do the same with the handler? In Transcript a handler can return a value, and the code that called the handler can use "the result" to access the returned value. But this is unnecessary, as functions already behave this way. There is (usually) no reason to make a handler pretend to be a function. Clarity How easy is the code to understand? There are two aspects of clarity to consider: How clear is the code itself? How clear is a call to the code? Code Clarity Looking at the code in the examples above, it appears to be a tie. The functions are crystal clear in their purpose. Well-named functions are virtually self-documenting. A well-named handler will also be clear, and because of its specificity to the task can take a very specific and descriptive name. The situation changes as more functionality is added beyond the simple examples given above. Well-designed functions tend to retain their one task/one function simplicity. A rule of thumb I use is that if I can't document the purpose of a function in a one-line comment, I look for further sub-tasks to break it down into. New functionality generally means additional functions, not modifications to existing functions. But as was seen in the section above on reusability, a handler tends to grow as its purpose expands. Note how easy it is for the purpose of the handler to become obscured by complexity. In the example above, when the rounding functionality is added, suddenly the handler is no longer the simple creature it once was. As a handler grows, code clarity suffers. It should be noted that function names and organization of functions into related groups are especially important. Functions require more careful naming and organization because they tend to result in more separate code chunks. Calling Clarity Because of the inherent simplicity of a function's purpose, calling them is not often complex. But most importantly, functions encourage the maintenance of the actual task to be performed in the calling code. In the example given, calling the handler looks like this: setRelativeLoc (the long id of button "source"),(the long id of button "target"),20,30 while using the function looks like this: set the location of button "target" to addLists(the location of button "source",(20,30)) This is especially helpful because of object references. Note that the handler requires the use of the long id. With the function, the object reference is in the calling code, avoiding the issue of incorrect object references. The clarity of the call to the handler depends on the clarity of the name given to the handler. Of course, this is true for both handlers and functions. But because the handler actually does the work, this issue is more severe. If the setRelativeLoc handler were called "x34kjsw" the above call would mean nothing to the reader. If the function were named "y09234" it would still be clear that the location of button "target" was being set, it just wouldn't be clear to what. In short, as has been demonstrated repeatedly, the handler comes up short because the it actually does the work, obscuring what will be done. The function simply takes values and returns a result, leaving the actual work to the calling code. Documentability How easy is the code to document? The function is more likely to perform a single task that is easily documented. The handler is more likely to perform many variations on a task, or even many tasks, making the work of documenting it that much harder. Obviously it's possible to write large complex functions with multiple purposes, just as it's possible to write small, easily-documented handlers. The natural tendency, however, is the opposite. You should take it as a warning sign if you find that your function has side effects, or needs to return more than a single value/list. The counter-argument is that there are likely to be more functions, complicating the task of documenting the relationships between them. Clear function names limit this issue. Organizing functions into libraries of related code can help as well. So When is a Handler Appropriate? Based on the above arguments, you might think that all handlers should be replaced by functions. There are languages where this is the case; Transcript, by definition, isn't one of them. For starters, the engine delivers messages, so you must write handlers to receive those messages. on mouseUp isn't going away! There are other cases as well where a handler is appropriate: When the same action needs to be performed many times, a handler may be appropriate. For example, if the text attributes of a number of fields all need to be changed, it would clearly be beneficial to have a handler that took a list of objects and a list of text settings (textFont, textSize, etc.) and set those properties for each of the objects. In cases such as this, it's good to follow the same principles as when creating a function: strive for as generic a routine as possible, as simple a routine as possible. In this particular example, consider whether grouping the objects and setting the attributes of the group is an appropriate solution. When there are large blocks of related functionality that can be logically segmented (and are obviously not functions!), a handler is suitable. For example, if you are creating a spy-tracking application for James Bond, there might be a handler called updateEnemyAgentLocations. But consider: if you find yourself creating a handler called checkForEnemyAgentsInEngland handler, you should consider creating a function called enemyAgentsInCountryList instead. Any time a built-in handler might need to be called from somewhere else. For example, suppose you have a button that resets several interface elements to their original values. You could simply put the code into the mouseUp handler, but you're likely to need to reset the interface from numerous locations. You'll have a corresponding menu item at least. So create a handler called resetInterface in the card or stack script, and call that from the button. Any time you have a switch statement. Switch statements have a tendency to grow out of control, until they are hundreds of lines long, with each case being a handler of its own. From the very beginning it's better to break out each case as its own handler, unless it's clear that the case will never grown to more than a line or two. Other times. Handlers have their place, and coding in Transcript without them would be a pain. Don't shun them entirely. Summary The case is clear. Functions offer greater abstraction and stability, which leads to greater reusability. Because functions simply take values and return results, they offer greater extensibility, testability, and flexibility. Finally, functions provide greater clarity and greater documentability. In the words of your mother, "Use functions, they're good for you!" Postscript -- Examples Retrieving Data When you are retrieving data to display -- querying a database for a user's search results, searching a text file for the appropriate passage from a book, or retrieving a web page -- write a function that takes the query as an argument and returns the data it gathers . Then write code that uses the function to get the data to display. Formatting Data When you need to format data a certain way -- bold the first words, change the case, or correct misspellings -- write a function that takes the source data and returns the formatted result. Then write code that uses the function to format the data. Quoting Text This classic example is used by many Transcript coders. I think I picked it up from Ken Ray. Instead of writing quote & "text" & quote, write a function "q" that takes a string as its argument and returns the string with quotes around it. List Processing Many functional languages have strong list-processing commands built-in. Adding similar functions to Transcript is a fairly easy way to add significant functional capabilities.