The next step for the surface mapping data is being exported to a KML file. KML has actually been an easy markup to understand. It’s structured similar to HTML, but has unique tags identifying various data formats and visual styles. I’ve posted links to the resources I’ve used at the bottom of the page. I just went through them in the order they are, the first page to learn what KML was capable of, then the second as an overview of the tags available to work with. After those two I found Google’s interactive sampler, which allows you to write your own KML and update it in a live environment. I’ve been working with the sampler to find the best way to get format the data, as it seems fully capable of rendering KML and is very convenient to have live control of your code.
*Original Post for 7/1/2010*
The next modification to the SMS data is to remove any lines without a latitude and longitude, as the data of an unknown point isn’t useful. The columns can be identified by a -99 value or NA as either lat or lon. The simplest solution seems to be to ignore each line as it’s read in with a -99 in lat or lon, but has been a little difficult implementing, mainly due to the code density. The idea is to identify which fields belong to lat and lon in the header data-types for flexibility, as I’ve done with the coordinate suffixes. As the lines of data are read in they would be matched up to the header values. The lines with no data can then be removed simply by ignoring them, since it’s at an early point in the program.
I’ve found an easier solution to removing the bad Surface Mapping System (SMS) data instead of using a character mask as I had mentioned in my last post. The data will actually end up in several environments that will need to look at it as only numerical values. Each line I read in was separated into a text array for the date and time, and a numerical array for the values. I added a line to the list of fields in the header so the program can determine if the related field was text or numeric. When the numeric values are sent into an array they are standardized if they are successfully converted, and if they can’t be converted they are deleted. After the numeric array has been filled math operations can be done, so far standardizing the lat/lon data is the only one I am doing. After the data has successfully gone into the arrays it gets pulled back out and added to a tab delimited text file. After some troubleshooting, namely spending a couple hours search for a problem caused by a missing quotation in the header, almost all of the program works. So far the only problem I’ve found has been that the code is very condensed, since a single loop scans the input array, compares it to the header, and separates the data into arrays depending on the assignments made in the header.
Still coding the reader for the Surface Mapping System (SMS) data. Last week I got most functions working, the files are read in, converted to tab delimited, then stored in a main file. I’ve found that multiple file operations are actually much more efficient than manipulating a large file in memory. Originally all of the text (140,000 lines) was stored in memory until it was written in a single step, but was taking about 5 minutes to complete. The runtime for the program using multiple file IO commands is about 4 seconds.
The next problem to solve is allowing only alphanumeric characters and signs in the text. I’ve tried Regex, several variants of compares and replaces, but still haven’t gotten one that works 100%. 1 out of every 1-2000 lines contains some random symbols that don’t appear in any consistency. The function I use to remove the data should ideally remove any characters that aren’t on an accepted characters list, or at least remove the line. My current idea is to replace all the accepted characters with either a whitespace or null, then compare it to the original. Any duplicate characters between the files could be removed, leaving only the characters on the list, in theory.
I started coding a program to parse the .csv formatted data from the Sharp into a tab delimited format. I had the following observations as I was coding:
- Even though most C# programmers I’ve learned from have been declaring variables inline, like ‘string text = “textdata”‘ at line 20 in main, it’s better for declarations to stay in the first lines of the function
- I started moving my code into subroutines, and it seems there can be advantages and disadvantages to certain syntax in subroutines. File IO functions seem to be an extreme case, as a subroutine doing file io has the advantage of being unlikely to lose much data if problems occur, but adds a large overhead when it’s called so frequently. Any thoughts on weighing the pros/cons of this one?
After losing important parts of the Day 6 set of videos, I went on-line to learn how to get a SQL application running. I had check several places, including source code websites like planet-source-code.com, and not many of them had anything that even worked as intended. The best resource I found was a 10 minute video that shows every step. Not the best produced video ever, but it was free and did a good job.
I turned on my laptop when I got in yesterday morning to see “Operating System not found”, the computers language for saying my boot record just got added to the list of data corruption. Being the 3rd time this year it’s had a catastrophe I got tired of fighting the Intel-Sony hardware combinations I have. It’s a Sony Vaio AR-770, so being Sony drivers are only available for the OS shipped with it (Vista). Then being a Vaio most drivers direct from the manufacturers won’t install. In particular I seemed to be having problems with my disk controller. It’s Intel, ICH8 for RAID, and cannot be disabled for a non-raid setup. I decided to try something different since I’ve had good experiences with the laptop under Linux. I installed Ubuntu 10.04 with Virtualbox running Windows 7 in a seamless environment. Everything worked even at first boot. Kind of weird being able to program .Net 4 framework applications from seems to be directly within a Linux desktop.
In the time I was waiting for my account to be activated to blog I have gotten through the following:
Chapters 1-6 in Visual C# Step by Step
I’ve had experience in C and programming from some classes I had taken and only needed some refreshing on C syntax. Most functions I was familiar with were replaced with functions specific to .Net but the change was easy enough.
Classes were new to me and were difficult to understand from the book, but were better explained at this website: http://www.kirupa.com/net/oop_intro_classes_pg1.htm The site put the lesson into an example and overall just seemed much easier to follow.
I had already seen the basics of this information in my previous classes, so I only looked over it quickly and did some example programs before moving to the video lessons. Most of what I knew still applied, with the exception of some slight syntax changes in working with 2d arrays.
After chapter 11 I started on the video set “C# for Beginners”. Most of the first couple days was familiar, but it’s good to watch them anyway to get familiar with how the lessons are explained. The lessons are easy to follow. I made it through Day 5, Day 6 was corrupt.
I’m Mike Bender, a new intern at UD. I’m working in C# and ASP.Net. I have had some programming experience in C among other languages. Some of the tools I’ve found useful in learning C# have been Visual C# Step by Step, a set of videos; C# for Beginners, and various other website that I will reference as I post info about each step. I will keep this updated with my progress on projects and will add the steps I took in learning C# soon.
Welcome to the OCEANIC Interns blog site.
This site will serve as a repository of the progress that our interns make as they learn new technologies in support of OCEANIC research projects. Their resources, trials, projects and comments on the various technologies that they are exploring and learning will hopefully assist others as they learn what works and what doesn’t.