On the fly VMs: Viable security model for downloaded apps?

I’ve been thinking… always quite dangerous I know…

I woke up early this morning and couldn’t get back to sleep and for some unknown reason I started thinking about downloaded applications and how to prevent trojans getting a hold. Then it came to me, why let the application have real access to the system, especially the filesystem?

I started wondering how feasible it would be to modify the operating system to create on the fly a virtual machine which is a clone of itself within which an untrusted application is run. This VM would not have any real write access to the filesystem but instead would have a copy-on-write shadow copy of the real one. For performance reasons it would have to have pretty transparent access to the graphics sub-system but this shouldn’t be too high a security risk. Once the application had terminated the filesystem write operations could then be vetted and a risk assessment and “reputation” for the application could be determined before actually making the changes to the real data on the disk.

Later on the application could either be manually unrestricted or, if it’s “reputation” was above a certain threshold, unrestricted manually.

Anyway, it was just a thought.

[Edit] More thoughts added as a comment.

Google+: Cooking with the curate’s egg?

About a week ago I managed to get hold of an invitation to Google+, the new, not quite publicly available, in development, nascent social site Google are toying with. It’s got quite a “buzz” campaign running about it at the moment and all the Technorati are flocking to use it. But is it any good? Or, more importantly, could it become good enough to win main-stream users from Facebook?

Well, it does have a lot going for it. For a start the interface is clean and the management of the social groups is light years ahead of Facebook’s. There are issues with some of the privacy decisions made in the design, such as limited circulation posts becoming visible to those outside the initial distribution is one of the people within the circle posts a comment with public distribution. However, these are teething problems and the site is still very much under development.

There is currently no API for external applications to be built, such as games. For some people this is a major problem, for others it’s a blessing. It has been stated that a development system is being developed so I don’t see this as a road block in future.

The feel of the site has one major down side for a social site currently. The whole experience seems quite solitary. This isn’t because of the lack of people to “friends” with but more that you have no idea if any of your friends are currently on-line. You may not want to interact with them there and then but it’s nice to know that they’re about.

The other problems I see currently is that Google+ seems to be mostly gluing other Google services together. The imaging uploading and sharing is done using Picasa, which isn’t ideal for the posting of quick images on the go from a smart phone. The messaging service is a poorly integrated link to Google Chat.

One of the most interesting new facilities which could actually make people prefer Google+ over other systems could be the “Hangout” audio/video conferencing and chat sub-system. However, this is crippled by two problems currently. The first one is related to the fact that you don’t know who’s on-line at the moment. i.e. you can’t just invite those you know who are around for a chat, you have to invite blindly. The second one is that you have to download and install a plug-in for your browser for it to work.

So, do I think that it could rival Facebook in the end. Hmm… at the moment I’m not sure. There are currently too many things which make it less immediate and interactive with regards to interacting with your friends. Also, currently the reliance on glued on functionality from other Google services which don’t quite match with a social sharing system could well be a long-term problem.

So there you have it, at the moment it’s a curate’s egg, good in parts. I don’t want to damn it so early in its development but I am a little worried that the early reputation may stick. Let’s hope it does come to rival Facebook as that needs competition, especially as the developers seem to be getting into the Firefox and Gnome developer’s mind sets and changing things for change’s sake and seeing themselves as the only arbiters of good design.

Enthusing teen minds: Why today’s computers won’t create tomorrow’s programmers.

The recent 30th anniversary of the launch of the Sinclair ZX81 and the subsequent post on his blog by Jim Finnis brought back to me a recurring thought that today’s computer technology is the antithesis of that required to enthuse a teenager to want to discover and play.

The computers of the early 80s were a blank canvas. You plugged them in, switched them on and (hopefully) the input cursor blinked at you. There was no decoration, no clutter and it was something waiting for YOU to do something to it.

Not only this but with the manual which came with it a 13 year old could within 5 minutes print their name on the screen. Within 10 minutes, at least with the second generation, make a funny noise. And within half an hour he or she could have his or her name scrolling up the screen in different colours whilst making unmusical noises and annoying their parents… they were hooked!

Now, let’s look at today’s technology…

The desktop or laptop computer takes an age to start up (i.e. more than 5 seconds) and totally insulates the user from what it is.

Smartphones are usually on all the time so don’t have this problem. Similarly tablets.

They’re immediately brimming full of functionality all vying for your attention, but it’s also incredibly locked down. You can do absolutely anything… ANYTHING as long as it’s what the visionary who steered the programming teams thinks that you should want to do. Woe betide you if you want to do anything different. It’ll either ignore you or give you an unhelpful suggestion in a dialog box. You can be creative, but only in the ways you’re told you can be.

So, what about the art of programming?

Well, on tablets and smartphones forget any native fun. Apparently this is too subversive. On the desktop it’s only slightly better (and I’m not singling out any desktop OS here). What are your options?

Well, on MacOS and Linux you can open a shell window and all sorts of interpreters and compilers are available and all sorts of graphics libraries to use with them too. You would think that this would be the ideal playing ground. Sorry to burst that bubble. It’s a great playing ground if you’re already a programming expert. It’s like taking a 5 year old into an engineering workshop, sitting him down and then complaining when he doesn’t build a car as he had all the tools available to him to do it and hence it must be his fault.

No, these environments are hopeless to teach and enthuse. There’s so large an energy barrier that it’s too daunting to even try. Also, how many lines of code in one of these modern development environments would it take to do the equivalent of the following?: 

10 FOR x=1 TO 100
20 FOR y=0 TO 7
30 INK y : PAPER 7 - y
40 BEEP 1,y
50 PRINT "Noisey coloured text"
60 NEXT y
70 NEXT x

I bet you’ll find that it’s quite a large number of line of code using all sorts of weird and wonderful libraries, possibly some non-standard ones to do the sound and a whole lot of code to manage the framework to create a window with the correct attributes and define the font etc. Hopeless!

Oh, and when it comes to drawing lines and circles etc. Oh dear.

Of course a great many people think that a computer with similar functionality to the old BBC Micro or ZX Spectrum would never be able to compete in the mind of a teen when they have all that touch-screen goodness and Angry Birds to play with. I beg to differ. It was most delightfully illustrated that this is profoundly not the case in the second episode of the BBC’s “Electric Dreams” series (unfortunately not available to watch on-line) where the family was given a BBC Micro to play with. The teenage son brought his best friend home from school to play with it and they thought it was awesome. They liked that it was a blank sheet that they could make do what they wanted and not be told what they should want to do by the device. And, of course, what they wanted it to do was make silly noises and write their names on the screen in different colours. It sparked enthusiasm!

So, what can be done?

First of all we need to ignore the idealists who think everyone should start their programming life learning something worthy and object orientated. Once the kids are hooked they can learn that later. Also, that’s not how peoples’ minds work. You don’t see object orientated recipe books for a reason. Also, however annoying to the seasoned programmer, line numbers help understand the sequential way that programs work. In other words, the early 80s micro BASICs got it mostly right. BASIC does stand for “Beginner’s All-purpose Symbolic Instruction Code” after all.

Firstly, any system which is going to enthuse also HAS to have as its core functionality the “5, 10, 30 minute” teen grabbing fun element outlined near the beginning of this post. Without it the whole thing’s lost. Any system would also have to allow growth. Just as BBC BASIC allowed the nascent programmer to grow into using procedures so should any new project, and possibly more, such as variable typing, scoping etc. Line number could be made optional in an advanced mode.

Secondly, the freedom of the code itself is far less important than the freedom to discover, so any project should not use a viral license such as the GNU Public License (GPL) but instead use something such as the BSD license.

Thirdly, and helped by the above, the core should be written in a platform neutral way with the platform specific interface on top. In this case, probably the best platform to use would be the GNU compilers and specifically that implementation of Objective C with the QT libraries to interface with most operating systems (except, notably, Apple systems, especially the iPhone/iPod/iPad).

The biggest fly in the ointment with this whole pipe dream is that I just don’t have the skills to develop such a system. (Another would be getting people such as Apple to allow the system to be made available via their App Store type portals.)

So, anyone interested in starting a project? 😉

The horror! Scientific code and how not to read your arguments…

Over the years I have seen many, many examples of poor programming practise, usually kludges and quick fixes but today I saw the most horrible code for reading in command-line arguments in a C program ever. I just had to share the horror…

   if ( (argc-1) < 5 ) {
	[ Usage error response code removed]

   /* read in command-line arguments */
   numFiles = (argc-1) - 6;
   sscanf( argv[ numFiles+1 ], "%s", insFileName );
   sscanf( argv[ numFiles+2 ], "%s", outFileName );
   sscanf( argv[ numFiles+3 ], "%d", &outType );
   sscanf( argv[ numFiles+4 ], "%hd", &windowStartTimeCodeword0 );
   sscanf( argv[ numFiles+5 ], "%d", &newStartLine );
   sscanf( argv[ numFiles+6 ], "%d", &newEndLine);

Now, where can I start with this? Erm, I’m a bit dumbfounded actually.

Not only does the test for the incorrect number of arguments test for the wrong number but then it uses an index from the last value to reference the other values! Of course, this means that if the wrong numbers of arguments are given then the values are put into the wrong variables. Worse, that could be read from memory the process doesn’t own.

And there’s more.. it blindly sscanf()s them into variables.

Now, you may have seen that if one argument is left off the command line the input file now becomes the executable itself and the output file is actually the input data file. This is how this came to my attention. Trying to debug the program for a student it was found that it wasn’t reading the data correctly… and the data file was mysteriously emptied of its hundreds of megabytes of data each time the program was run. Oops!

So, dear readers, have any of you ever seen a worse command line parsing code segment?

IPv4 addresses almost gone, IPv6 not finished yet. Oops!

As has been noted very widely the last couple of large blocks of Internet Protocol version 4 addresses have been assigned to the local distributors and rightly there have been a large number of people stating that we need to get ready for the transition for IP version 6.

However, there are a few niggly little problems due partly to do with IPv6’s design and partly by tardy implementation, neither of which impact upon the general public and their edge networks but will impact upon the security and management of more corporate networks.

So, what are these two problems? Well, they’re both to do with network address assignment, one of which is a foolish design decision in the protocol itself which has a whole host of unintended consequences related to it.

The feature I’m talking about here is the stateless address assignment where a client machine will self-assign its address and self-discover the route out to the wider Internet. On the face of it it seems like a brilliant idea which will liberate the normal user from worrying about setting up IP addresses and all that tedious and confusing networking stuff, it all “just works”. Brilliant! And, in a perfect world, where everyone is smiley, helpful and trustworthy it would be. It’s a pity that the real world isn’t like that. Having said that, this doesn’t really affect personal networking within peoples’ homes but it does greatly affect the security and policing of corporate networks.

At this point it’s probably best to describe how security and policy are implemented, with regards to network addresses and packet routing in IPv4 networks so as to allow you to contrast the differences and the problems inherent in the self-assigned address world of IPv6. Currently a computer can either be manually assigned an address and network route which then has to be configured directly on the computer in question or it can be assigned automatically from a centrally managed Dynamic Host Configuration Protocol (DHCP) server. In the latter case it’s not only the network address and route information which can be given to the computer but other information such as its host name and various other items which it can use to interact correctly with the rest of the network. The centrally managed DHCP server can also tell any computer it doesn’t know (or the administrators don’t want to have network access) to bog off and hence not get network access. Using this very useful system administrators can assign different outgoing network routes for different sets of client machines which can help with load balancing and various other advantageous policies that only humans with an overview of the whole network can see.

As you can see, IPv6’s self-assignment of addresses and self-discovery of network routes by-passes all this control. If you add to this certain client operating systems being “helpful” and offering network tunnels out of the current network for IPv6 clients to the outside world and offering their services as routers it becomes a security nightmare as local outgoing firewall policies and protections are subverted.

Now, this problem has been foreseen, if belatedly, by a group who have, against the uproar of the IPv6 purists, defined an IPv6 version of DHCP. (Note: the purists hate it because it breaks their ideological tenet that all network peers should be equal and free to do as they wish.)

So, surely this means that IPv6 is ready? Erm, no. You see DHCPv6 is only currently a paper exercise. The technical details have been hammered out and the specification documents (RFCs) have been posted but there are no implementations out there. Ooops!

So, what does this mean for the whole IPv4 to IPv6 transition? Well, it means that internal corporate networks will not be able to change to the new protocol and will be forced to live behind an IPv4 to IPv6 network address translation (NAT) gateway. (Note 2: IPv6 purists cringe even more about this technology, they see NAT as the spawn of the devil as it stops all peers being equal and being able to talk directly with every other.)

I can foresee the transition from IPv4 to IPv6 being a long one with to start with only those machines which live in the no-mans-land where external services live and the core Internet changing over to IPv6 and everything else being behind huge NAT gateways. Internet Service Providers (ISPs), whose customers don’t generally have fixed network addresses anyway, will sit all their customers in IPv4 bubbles and this state of affairs will ossify. All web sites will be forced to use IPv4 compatible addresses.

Eventually, after many years, all the tools and security issues with IPv6 will be sorted out and slowly, very slowly, the corporate world will change their networks one by one, but there will always be “legacy” IPv4 networks in there, well at least for 20 years or so. For ISPs the transition will be quicker. They’ll probably have to begin with a separate product for IPv6 users or merely provide IPv6 gateway routers to new customers (quite probably to begin with using an IPv4 NAT bubble for the home network as quite a bit of embedded A/V equipment will not be IPv6 capable). I can foresee that even this transition will take a good decade. During this time all web servers will have to be on IPv4 mappable addresses.

It’s going to be a very long haul and expect things to break horribly.