I now write Java code professionally and have shipped code to even more than that number of users. (In fact, in the almost ancient past, I worked on Blogger itself but I've gotten way better at Java since then.)
However, Rust is proving kind of difficult for me to learn. See my previous post where I did write a pretty simple Rust program, to only print unique lines from stdin. This "unique" program is my "hello world" as it is a useful real world program.
Since I have lots of experience with writing compilers (back in my C code days and at "university"), I thought it would be interesting to create an interpreted language using Rust as the problem itself is very well known to me and thus writing an interpreter in Rust would give me a chance to write a real Rust program where I could concentrate on learning the Rust language and not so much about learning a new problem domain.
So, I've spent maybe 3-4 hours writing a "tokenizer" and now I'm trying to write a parser (a recursive decent parser) in Rust. These additional 2 hours have not gotten me very far - basically nothing to show for it except questions!
Whereas the tokenizer went ok (mostly because I found this: https://github.com/servo/rust-cssparser/blob/master/src/tokenizer.rs), the parser is proving much more difficult!
What I'm going to do now is some rubber duck debugging and let you guys see how this works out. (See https://en.wikipedia.org/wiki/Rubber_duck_debugging ).
Question 1
Why do I need to use Token::WhiteSpace instead of just WhiteSpace in a match statement (for an enum) like the code I was looking at?I've now answered this question thanks to the rubber duck: I was simply missing this critical construct:
use self::Token::*;
What's kind of funny in Java, when you switch on an enum, you are forced to use the raw enum name and can't even fully qualify it if you wanted to, and here I was expecting the language with better "inference-ing" to do this on my behalf (because an example or two looked like it did this). This isn't a major criticism, though certainly a minor one. (Rust seems to use scope in lots of ways to make stuff less ambiguous and this probably fits well into how the language designers thought of this - just more unusual coming from Java.).
By the way, that use statement at first was a disaster! I got a whole bunch of errors I didn't have before when using it.
Hmm, why:
Before:
#[derive(PartialEq, Debug, Clone)]
pub enum Token {
WhiteSpace(Box<String>),
Comment(Box<String>),
Delimiter(Box<String>),
Symbol(Box<String>),
String(Box<String>),
Word(Box<String>),
Error(Box<String>),
}
After:
use self::Token::*;
#[derive(PartialEq, Debug, Clone)]
pub enum Token {
WhiteSpace(Box<String>),
Comment(Box<String>),
Delimiter(Box<String>),
Symbol(Box<String>),
String(Box<String>), // Doh! Look at the name of the token!
Word(Box<String>),
Error(Box<String>),
}
Obviously I should rename the enum value String to say StringLiteral so that the standard String class doesn't get confused with my enum value.
It seems like "technically" I didn't need to make this change.
Fixed!
Question 2
I was actually hoping that fixing that would make a world of difference, but it really didn't.
My current compiler error is:
main.rs:19:15: 19:20 error: cannot move out of borrowed content
main.rs:19 match *self {
^~~~~
main.rs:26:25: 26:26 note: attempting to move value to here
main.rs:26 Token::Word(x) => *x == keyword
^
main.rs:26:25: 26:26 help: to prevent the move, use `ref x` or `ref mut x` to capture value by reference
error: aborting due to previous error
Could not compile `xyz`.
I'm sort of wondering if this isn't part of the issue:
https://doc.rust-lang.org/book/box-syntax-and-patterns.html
Specifically, maybe using #![feature(box_syntax, box_patterns)] would help?
Here's what I am trying to get working:
impl Token {
pub fn is_keyword(&self, keyword: String) -> bool {
match *self {
WhiteSpace(_) => false,
Comment(_) => false,
Delimiter(_) => false ,
Symbol(_) => false,
StringLiteral(_) => false,
Error(_) => false,
Word(x) => *x == keyword
}
}
}
Let me put this in "high level terms" -- I want a convenient way to see if a particular token is a word that happens to match the passed in String.
Question 2 Interlude
BTW, there is another more radical option: I could actually add lots of enum values to Token - one for each keyword I expect to see:
#[derive(PartialEq, Debug, Clone)]
pub enum Token {
WhiteSpace(Box<String>),
Comment(Box<String>),
Delimiter(Box<String>),
Symbol(Box<String>),
String(Box<String>),
KeywordIf(), // At tokenization time, I could recognize
KeywordElse(), // each keyword and add it's value here
KeywordSet(), //
Word(Box<String>),
Error(Box<String>),
}
Question 2 continued...
In Java, what I really want is something like this:
if (x instanceof Word) && ((Word) x).getString().equals("if"))) {
// blah blah blah
}
This is obviously "runtime" Java since I'm using "instanceof".
In Rust, I'm expecting the enum Token to act something like a "tagged union".
Oh, looks like this compilies:
impl Token {
pub fn is_keyword(self, keyword: String) -> bool {
match self {
WhiteSpace(_) => false,
Comment(_) => false,
Delimiter(_) => false ,
Symbol(_) => false,
StringLiteral(_) => false,
Error(_) => false,
Word(x) => x == Box::new(keyword)
}
}
}
In this case, maybe I'm forcing the keyword string onto the heap even though that is kind of the opposite of what I would expect to have to do...
BTW, is == even the right thing to use here? I don't know. Let's find out.
I basically did this:
for token in tokens {
println!("Token {:?}", token);
println!("Function is export {:?}", token.is_keyword(
"export".to_string()));
}
}
So this works great "functionally" - my program compiles and gives the correct result. I'm not really all the concerned about the performance impact just yet, I'm more worried about how badly I know Rust right now.
Where I should go next?
(Rubber ducky), where should I go next?
- keep with it! - le'ts create some parse trees and learn more lessons along the way -- the possible performance penalty of the extra boxing for the current solu-kludgetion I finally found won't probably matter for what I'm after anyways and is therefore second order - using my interpreted language!
- switch to another language - I can re-evaluate the reasons I was using Rust in the first place - like stand-alone executables - and choose another language.
- Swift is open source and available on Linux now... It's suddenly a new choice.
- C was rejected because I don't understand it's unicode story and other stories but I like C alot in many ways
- C++ was rejected because it was too complex even if I could get a better unicode story by using say an external library,
- Go, and other garbage collected languages were rejected because I don't understand their "fork" story though maybe with modern posix I won't need to actually fork
- Go doesn't have sexy enums like Rust (nor traits, nor generic collections) which maybe I don't really need anyways
- Most other languages were rejected because they don't have a good stories about stand-alone executables and utf8. There are some Scheme systems that will compile to C but they don't come with great utf8 stories (yet?)
- BTW, I would probably convert to Javascript if there was a stand-alone executable generator from the Node JS guys - and that supported all the node process stuff - even if the default executable were a bit on the heavy side.
So, I guess let's keep up the spirit of pushing me to learn Rust more. I'll just pretend I'm playing an advanced version of Sudoku. At the very least, this will help me better understand the good and bad parts of Rust and help others make similar decisions.
BTW, I think I need to learn how to write unit tests in Rust that don't compile into the main binary - I see "annotations" on functions that say they are tests, so this is already thought out - probably something "cargo" can do once I use the Google or Bing search engines some more.