More DigitalMars D - finding a string in a stream
Published 2006-03-15 10:14:53
The three good ways to learn a language:
Rather than attacking the core imap bit, I decided to start with the MimeDocument decoding part. something relatively self contained, and conceptually quite simple. Most of the Porting involved bringing together Classes that had methods defined in multiple files (as seems common with C++), and merging them into nice classes in D.
While most of it will probably end up untested until it's all ported, one single method stood out as a good simple test of working with D. - Searching for a string (or delimiter) in a stream.
Obviously, one of the things that happens with an imap server, is that it has to scan a email message, and find out how what makes up the email (eg. attachments, different mimetypes and how they are nested. A brute force approach would be to load the whole message into memory, and just scan through looking for the sections. However, since email messages can frequently be over 5Mb, It's obviously horribly inefficent. So the existing code used a simple C++ method to search for a delimiter.
Hit the more link for another simple tutorial...
- hack on some existing code
- write a simple program from scratch
- port some code from another language to the one you want to learn.
Rather than attacking the core imap bit, I decided to start with the MimeDocument decoding part. something relatively self contained, and conceptually quite simple. Most of the Porting involved bringing together Classes that had methods defined in multiple files (as seems common with C++), and merging them into nice classes in D.
While most of it will probably end up untested until it's all ported, one single method stood out as a good simple test of working with D. - Searching for a string (or delimiter) in a stream.
Obviously, one of the things that happens with an imap server, is that it has to scan a email message, and find out how what makes up the email (eg. attachments, different mimetypes and how they are nested. A brute force approach would be to load the whole message into memory, and just scan through looking for the sections. However, since email messages can frequently be over 5Mb, It's obviously horribly inefficent. So the existing code used a simple C++ method to search for a delimiter.
Hit the more link for another simple tutorial...
This is what the original function looked like in C++
In looking at re-writing this in D, I started to consider a few things.
At this location we will only arrive if we matched the first character, and the remaining data does not match. So we need to alter the test_string.
Finally when we return the boundaryFound variable (when the loops have broken out.)
While not as clever as the C or C++ version, the resulting code I think is slightly more readable, and is probably just as memory efficient (along with being just as fast).
Got any ideas to improve it?
static bool skipUntilBoundary(const string &delimiter,Quite nicely written, and using and doing some quite obtuse test to see if the string stored in the current buffer matched the one being looked for. (I didnt have a look at compareStringToQueue, but my guess is that it just went through the string in the buffer, starting at the delimiter position, and check to see if it matched what was being looked for.
unsigned int *nlines, bool *eof)
{
int endpos = delimiter.length();
char *delimiterqueue = 0;
int delimiterpos = 0;
const char *delimiterStr = delimiter.c_str();
if (delimiter != "") {
delimiterqueue = new char[endpos];
memset(delimiterqueue, 0, endpos);
}
// first, skip to the first delimiter string. Anything between the
// header and the first delimiter string is simply ignored (it's
// usually a text message intended for non-mime clients)
char c;
bool foundBoundary = false;
for (;;) {
if (!mimeSource->getChar(&c)) {
*eof = true;
break;
}
if (c == '\n')
++*nlines;
// if there is no delimiter, we just read until the end of the
// file.
if (!delimiterqueue)
continue;
delimiterqueue[delimiterpos++ % endpos] = c;
if (compareStringToQueue(delimiterStr, delimiterqueue,
delimiterpos, endpos)) {
foundBoundary = true;
break;
}
}
delete [] delimiterqueue;
delimiterqueue = 0;
return foundBoundary;
}
In looking at re-writing this in D, I started to consider a few things.
- I can pretty much ignore all the input until it matches the first character of the delimiter. (and so I dont need to copy the data into the delimiterqueue)
- If we did start matching the delimiter, then I can just test the incomming character against the expected one in the stream.
- When we got to a character that did not match the delimiter, we should tidy up this test string. (and avoid re-allocating the string) -> just copy the bit that was left, and matches to the beginning of the string.
bool skipUntilBoundary(MimeStream mimeSource, char[] delimiter,our function signature is slighly different here:
inout uint nlines, inout bool eof)
{
- inout is used rather than the *
- uint is used rather than unsigned int.
- char[] is used rather than string.
- The stream being read is an argument, rather than using a global.
char[] teststring = "";Next up is creating a test string, to store our buffer to test against, the second line of this ensures that it's size is fixed at twice that of the original delimiter. (which should be more than enough).
teststring.length = delimiter.length * 2;
char c;Next we set up our variables,
int endpos = delimiter.length;
bool foundBoundary = false;
int lookup_offset = 0;
int teststring_offset = 0;
- teststring_offset, is the postition we write to in the teststring
- lookup_offset, is where we are trying to match against in the delimiter.
- endpos is just a shortcut to the length
- c is our character being read
- foundBoundary is our result.
while (true) {Now we start reading the incomming stream, checking to see if we have reached the end of the stream.
if (!mimeSource.getChar(c)) {
eof = true;
break;
}
if (c == '\n') {We keep an eye on how many lines we have read.
nlines++;
}
if ((teststring_offset == 0) && (delimiter[0] != c)) {If we are looking for the first character, and it doesnt match, just keep reading!
writefln("first character does not match: %s != %s",
delimiter[0] , c );
continue;
}
teststring[teststring_offset] = c;We now add the character to our test string. (even if it doesnt match)
teststring_offset++;
if (delimiter[lookup_offset] == c) {Now we test to see if the character we got matches the expected one, and if we have reached the end of the delimiter, then stop processing. otherwise make sure the lookup offset is increased.
writefln("got a matching character match (%d/%d): %s == %s",
lookup_offset , endpos, delimiter[0] , c );
if ((lookup_offset + 1) == endpos) {
writefln("GOT FULL MATCHING STRING ");
foundBoundary = true;
break;
}
lookup_offset++;
continue; // go and find next character..
}
At this location we will only arrive if we matched the first character, and the remaining data does not match. So we need to alter the test_string.
int trim_offset = 1;We start going through the test_string, starting at the second character, first off, we check to see if we have check all of the test_string, and just tell it to clean up if we have. (eg. nothing in this bit matches.)
while(true) {
writefln("testing teststring_offset=%d teststring[%d]
(%s) against first character %s",
teststring_offset, trim_offset,
teststring[trim_offset] , delimiter[0]);
if (trim_offset >= teststring_offset) { // reached the end..
writefln("Gone to end of string");
teststring_offset = 0;
lookup_offset = 0;
break;
}
if (teststring[trim_offset] == delimiter[0]) {Now we compare the section of the string against the portion of the delimiter, if they match, we rearrange the test_string by copying the string to the beginning, and reseting our pointers.
// found the start...
//check if string matches now..
int test_len = teststring_offset - trim_offset;
writefln("MATCH testing available remaining string
[%d..%d]%s == [%d]%s",
trim_offset,
test_len,
teststring[trim_offset..test_len] ,
test_len, delimiter[0..test_len]
);
if (teststring[trim_offset..test_len] == delimiter[0..test_len]) {
teststring[0..test_len] = teststring[trim_offset..test_len];
teststring_offset = 0;
lookup_offset = test_len;
break;
}
}
trim_offset++;
} }
Finally when we return the boundaryFound variable (when the loops have broken out.)
return foundBoundary;To make this little test work, we need to create a simple stream reader.
}
class MimeStreamThen create a main() functions so we can test it.
{
char[] thestring = "";
int pos = 0;
this(char[] string) {
this.thestring = string;
}
bool getChar(inout char c)
{
if (pos >= thestring.length) {
return false;
}
c = this.thestring[pos];
pos++;
return true;
}
}
import std.stdio;and with a simple line, build a binary to test:
void main () {
MimeStream x = new MimeStream("This is a test - hello with XXX - hello world - in the middle".dup);
uint lines = 0;
bool eof = 0;
bool ret = skipUntilBoundary(x, "- hello world -".dup, lines, eof);
if (ret) {
writefln("GOT STRING!");
} else {
writefln("NO MATCH");
}
}
#/dmd/bin/dmd test_string.dAnd out comes our result: GOT STRING! (with a few more debugging messages preceeding it.)
#./test_string
While not as clever as the C or C++ version, the resulting code I think is slightly more readable, and is probably just as memory efficient (along with being just as fast).
Got any ideas to improve it?
Mentioned By:
google.com : april (92 referals)
google.com : php remove first character from string (66 referals)
www.planet-php.net : Planet PHP (64 referals)
google.com : march (50 referals)
www.digitalmars.com : Digital Mars - digitalmars.D.learn - Another short blog post/tutorial (Finding a string in a stream) (37 referals)
google.com : remove first character from string PHP (34 referals)
planet-php.org : Planet PHP (25 referals)
google.com : december (25 referals)
google.com : php string remove first character (23 referals)
google.com : php remove first character (22 referals)
google.com : php remove first char from string (20 referals)
www.digitalmars.com : Another short blog post/tutorial (Finding a string in a stream) (19 referals)
google.com : php remove first character from a string (19 referals)
google.com : php remove first character string (19 referals)
google.com : digitalmars (16 referals)
google.com : php remove first character of string (16 referals)
google.com : remove first character of string php (16 referals)
google.com : php delete first character of string (13 referals)
google.com : remove first character string php (13 referals)
google.com : getchar php (11 referals)
google.com : april (92 referals)
google.com : php remove first character from string (66 referals)
www.planet-php.net : Planet PHP (64 referals)
google.com : march (50 referals)
www.digitalmars.com : Digital Mars - digitalmars.D.learn - Another short blog post/tutorial (Finding a string in a stream) (37 referals)
google.com : remove first character from string PHP (34 referals)
planet-php.org : Planet PHP (25 referals)
google.com : december (25 referals)
google.com : php string remove first character (23 referals)
google.com : php remove first character (22 referals)
google.com : php remove first char from string (20 referals)
www.digitalmars.com : Another short blog post/tutorial (Finding a string in a stream) (19 referals)
google.com : php remove first character from a string (19 referals)
google.com : php remove first character string (19 referals)
google.com : digitalmars (16 referals)
google.com : php remove first character of string (16 referals)
google.com : remove first character of string php (16 referals)
google.com : php delete first character of string (13 referals)
google.com : remove first character string php (13 referals)
google.com : getchar php (11 referals)
Follow us
-
- Some thoughts on the language server and its usefulness in the roobuilder
- Roo Builder for Gtk4 moving forward
- Clustered Web Applications - Mysql and File replication
- GitLive - Branching - Merging
- PDO_DataObject Released
- PDO_DataObject is under way
- Mass email Marketing and anti-spam - some of the how-to..
- Hydra - Recruitment done right
Blog Latest
-
Twitter - @Roojs