One issue that came up in Friday’s discussion was the problem of isolating the environments of user-defined methods. For example, suppose that the user defined a method called foo and called it within main. Any user-level symbol bindings inside main are visible inside foo and any bindings made inside foo leak out to main. Clearly, this is a bad thing.
We discussed two possible solutions. First involves requiring users to define each method using our keyword (something like define_strategy), where the code is passed as a block. In this way each user-defined method is, in fact, a Proc and we can enclose it within another Proc that does the “right thing” to push and pop environments. The problem with this approach is that users must use special syntax for defining methods, and more importantly, we force each user-level method to be a closure that might have performance issues.
A second “solution” is to use the Kernel#caller method and prefix each symbol with a string representing the call-chain. However, this “solution” has the original problem of eternally growing table of symbol bindings. Moreover, I realized later, it is not a solution at all! Consider what happens when foo is called successively by main. The two invocations of foo will have identical looking call-chain and their symbols will interfere. This is related to the eternally-growing-table problem, because we never remove any symbols from the table when a method ends.
Here’s a possible solution. Require that all methods be called with a special keyword, say apply. Users will define their methods as normal Ruby methods, but must invoke them with the apply keyword. Of course, now they must pass the name of the method as a symbol.
apply :foo['arg1', 'arg2']
Can we do better? Of course!
+:foo['arg1', 'arg2']
We can define the unary plus on the Node type (which does work, unlike unary plus on Symbol, which does not). We will now require that each user-defined method invocation use square-brackets even if there is no argument, just like C requires parentheses for function calls. A nice thing with this syntax is that all invocations of user-defined methods are uniform, except that immediate invocations use + and deferred invocations use -. (Yes, unfortunately, the users will need to remember where to use deferred execution.) It also serves to perpetuate the illusion that we have certain keywords within RubyWrite, e.g., match and build, which are distinct from methods.
There is one small wrinkle in implementing the Node unary plus. The method foo must be invoked in the context of the ReWriter instance, which is not known to Node. There was a similar situation with unary minus, but there the final invocation of the returned Proc object was within the RubyWrite internal methods, which can arrange to send the ReWrite instance as an additional argument. Here, we do not want to encumber the user with an additional argument each time their method is to be invoked.
The solution to this wrinkle involves using Ruby’s class instance variables. Recall that each class in Ruby is really an object. So, even though it sounds like an oxymoron, class instance variables do make sense. Here is the code for Node unary minus.
class Node
def +@
ReWriter::reWriterInstance.send value, *@children
end
end
reWriterInstance is defined within the ReWriter initializer using the attribute setter of the ReWriter class.
class ReWriter
class << self
attr_accessor :reWriterInstance
end
def initialize
...
ReWriter.reWriterInstance = self
end
end
As long as the users stick to using the run class method (and not directly instantiate their class, for example) the magic of class instance variables makes this work even when there are multiple user classes.
I think this is a good idea… though I’m not sure that the class instance variable is going to give you quite the behaviour you want.
Specifically I don’t think this will work in the cases where you have more then one instance of the ReWrite class running around in the same Ruby program, since there is only one instance variable defined for the ReWrite class.
For instance, we can test this out in IRB:
> class ReWriter
> class < attr_accessor :reWriterInstance
> end
> def initialize
> ReWriter.reWriterInstance = self
> end
> end
=> nil
> ReWriter::reWriterInstance
=> nil
> r1 = ReWriter.new
=> #
> ReWriter::reWriterInstance
=> #
> r2 = ReWriter.new
=> #
> ReWriter::reWriterInstance
=> #
> r1 == ReWriter::reWriterInstance
=> false
> r2 == ReWriter::reWriterInstance
=> true
It may still be a limitation worth living with though, at least until we can come up with another clever solution. We might want to add something into ReWriter though to ensure that it only has one instance at a time, and throw an error if it has more then one.
Reply to akeepOf course, if you have more than one instance of any derived class you run into problems. However, class instance variable does protect you against the situation when you might have two different derived classes active at the same time.
Reply to Arun ChauhanActually, after reading the posting on class instance variables, I take back what I said earlier. My example was actually a bad one, because what we really have is:
class Transformer1 < ReWriter
# . . .
end
class Transformer2 < ReWriter
# . . .
end
Unforunately, this still has the same problem in its simplest form because the instance variable referred to in the initializer is the instance variable from ReWriter, not from the individual transformers.
Reply to akeepAndy,
It doesn’t matter, really. There can be exactly one instance of the class running at any given time if you use the “run” class method (so, may be, using class instance variable is not even necessary). There is no way to get back to an earlier instance. Like I said in my original post, all bets are off if the users bypass the “run” method.
Reply to Arun Chauhan