
Writing about C++, Programming, FOST.3™, Mahlee™, the web, Thailand and anything else that catches my attention—with some photos thrown in
This is my first Django middleware and is written as much for me to work out how to do it as because I need it. I suspect that there's already several that do the same thing.
The middleware is intended for use where you want to force all page requests on the site to use HTTPS rather than HTTP. We use it for an intranet and extranet site that sits on the same web server as an ASP.NET application running the public web site. This means that we're also using PyISAPIe to serve the pages and there is a specific workaround because of that. First though the code.
class ForceHTTPS:
def process_request(self, request):
if request.META['SERVER_PORT'] == "443" or request.META['HTTP_HOST'] == "localhost:8000":
return None
return HttpResponsePermanentRedirect("https://%s%s" % (
request.META['HTTP_HOST'].split(':')[0],
request.get_full_path(),
))There are two implementation specific things that it's worth noting:
is_secure() always returns false no matter what. This is why it checks to see if the port number is 443 — standard for HTTPS.manage.py runserver. As this defaults to localhost on port 8000 I check for that instead¹ [1I think putting the port number on the host header is wrong, but still. It isn't something I've ever noticed before and I know that my HTTP client doesn't do it.]And another note:
We use this as the first middleware so that no effort is wasted trying anything else before the redirect happens.
These are some notes, primarily for myself, but that might be useful to others. I'm using VMWare server 1.0.5.
#!/bin/bash cd /usr/lib/vmware/lib/ mv libpng12.so.0/libpng12.so.0 libpng12.so.0/libpng12.so.0.disabled ln -sf /usr/lib/libpng12.so.0 libpng12.so.0/libpng12.so.0 mv libgcc_s.so.1/libgcc_s.so.1 libgcc_s.so.1/libgcc_s.so.1.disabled ln -sf /lib/libgcc_s.so.1 libgcc_s.so.1/libgcc_s.so.1 ln -s /usr/lib32 /usr/l32 sed -i -e 's/usr\/lib/usr\/l32/g' /usr/lib32/gtk-2.0/2.10.0/loader-files.d/libgtk2.0-0.loaders sed -i -e 's/usr\/lib/usr\/l32/g' /usr/lib32/libgdk_pixbuf-2.0.so.0.1200.9
Using this I can run virtual machines normally, but for some reason I don't get most of the icons showing up in the console. I guess this is due to a problem with the PNG patches made by the script above.
This month we made the biggest change to our internal systems that I've ever done. We finally switched from being primarily Windows based to being Linux based. That doesn't mean that we've switched from doing Windows development work to Linux work — what it means is that we've done most of the transition from running Windows with some Linux virtual machines to running Linux with some Windows virtual machines.
Part of the motivation is of course wanting to work on a Linux port of FOST.3™, an effort that will now get a bit more impetus behind it. It does however mean some massive changes to how we work and therein lies the good and bad.
We ended up choosing Ubuntu as our distribution and we're now using Hardy Heron (which came out last week) on everything but our main server (which will get it when we do a re-install to clean up from the experimental build it's currently running).
The things I really like about it are:
The things I hate:
The experimental server installation we started with has certainly taught us a few things. I guess I've been doing all sorts of things wrong, but neither Samba nor NFS seem especially reliable or useful for networking. I find this pretty surprising, but I guess it explains why everybody uses rsync instead.
Clearly some of these are problems of expectation and lack of knowledge of Linux compared to Windows, but there are some clear philosophical differences. Windows computers seem to come pre-packaged to work as a team player (I'm thinking especially of domain networking here), whereas on Linux each machine prefers to be its own master. Windows also provides a consistent if unexciting desktop experience. Linux provides a much more exciting desktop which is much better when it's better, but is unfortunately much more frustrating and difficult when it's bad.
As for us, we can live with things as they are for now. The new network configuration isn't ideal and many things work much worse than they did under Windows, but I expect that will improve over time as we learn more about the platform which in any case is a pre-requisite for building reliable software on it.
Via the Boost development list I came across Thomas Jensen's TinyJSON parser. As I've also been spending time on writing a JSON parser using the Boost tools and figure we might be able to learn something from each other's approaches.
Firstly though I think our goals are slightly different. I'm writing a JSON parser to fit in with the requirements of using JSON within FOST.3™ whereas his is a more general header only library. It would be hard to take my JSON parser without also taking a lot of the FOST.3™ foundation classes — there are good reasons for that which I'll get to in a moment.
In terms of what comes out of the parser the biggest difference is that I produce a JSON object based on Boost.Variant and he produces one based on Boost.Any. I think he's right that using Boost.Variant will introduce some extra complexity, but I think the simplification of accessing the final structure and the better type safety are both well worth it, but I'm not sure that it is compatible with his aims.
I split the JSON object itself into two parts. The first is a variant structure which is able to handle the simple values and is based on this Boost.Variant¹ [1t_null is simply a type representing the empty value called Null in FOST.3™.]:
boost::variant< t_null, bool, int64_t, double, wstring >The complete class looks like this (I've cut some members for brevity):
class F3UTIL_DECLSPEC Variant {
boost::variant< t_null, bool, int64_t, double, wstring > m_v;
public:
Variant() : m_v( Null ) {}
explicit Variant( bool b ) : m_v( b ) {}
explicit Variant( char c ) : m_v( int64_t( c ) ) {}
explicit Variant( int i ) : m_v( int64_t( i ) ) {}
explicit Variant( unsigned int i ) : m_v( int64_t( i ) ) {}
explicit Variant( long l ) : m_v( int64_t( l ) ) {}
explicit Variant( unsigned long l ) : m_v( int64_t( l ) ) {}
explicit Variant( int64_t i ) : m_v( i ) {}
explicit Variant( float f ) : m_v( double( f ) ) {}
explicit Variant( double d ) : m_v( d ) {}
explicit Variant( const char *s ) : m_v( widen( s ) ) {}
explicit Variant( const wchar_t *s ) : m_v( wstring( s ) ) {}
explicit Variant( const wstring &s ) : m_v( s ) {}
bool isnull() const;
template< typename T >
Nullable< T > get() const {
const T *p = boost::get< T >( &m_v );
if ( p )
return *p;
else
return Null;
}
bool operator ==( const Variant &v ) const;
bool operator !=( const Variant &v ) const { return !( *this == v ); }
template< typename T > Variant &operator =( T t ) { m_v = Variant( t ); return *this; }
template< typename T >
typename T::result_type apply_visitor( T &t ) const {
return boost::apply_visitor( t, m_v );
}
};This includes a number of type promoting constructors and forwarders for Boost's get (the use of Nullable is a standard FOST.3™ idiom) and the static visitor.
The actual JSON object is created from this base (again I've cut some members):
class F3UTIL_DECLSPEC Json {
public:
typedef FSLib::Variant atom_t;
typedef std::vector< boost::shared_ptr< Json > > array_t;
typedef FSLib::wstring key_t;
typedef std::map< key_t, boost::shared_ptr< Json > > object_t;
typedef boost::variant< atom_t, array_t, object_t > element_t;
BOOST_STATIC_ASSERT( sizeof( array_t::size_type ) == sizeof( object_t::size_type ) );
Json();
template< typename T > explicit
Json( const T &t ) : m_element( atom_t( t ) ) {
}
explicit Json( const atom_t &a ) : m_element( a ) {
}
Json( const array_t &a ) : m_element( a ) {
}
Json( const object_t &o ) : m_element( o ) {
}
explicit Json( const element_t &e ) : m_element( e ) {
}
template< typename T >
Nullable< T > get() const {
const atom_t *p = boost::get< atom_t >( &m_element );
if ( p )
return ( *p ).get< T >();
else
return Null;
}
template< typename T >
Json &operator =( const T &t ) { m_element = atom_t( t ); return *this; }
Json &operator =( const array_t &a ) { m_element = a; return *this; }
Json &operator =( const object_t &o ) { m_element = o; return *this; }
bool operator ==( const Json &r ) const;
bool operator !=( const Json &r ) const { return !( *this == r ); }
template< typename T >
typename T::result_type apply_visitor( T &t ) const {
return boost::apply_visitor( t, m_element );
}
private:
element_t m_element;
};I wouldn't be at all surprised if all of this machinery was far too much for Thomas. The problem here is that it moves TinyJSON away from just a JSON parser to being a full blown JSON API — not quite so tiny any more.
Neither can I see a way of making this more lightweight by avoiding the wrapper class because you can't do this:
typedef boost::variant< t_null, int, double, std::vector< Json* >, std::map< string, Json* > > Json;This sort of recursion is only possible with a full blown struct or class which means a load of constructors and forwarders and realistically a whole load of other machinery.
Even harder is correct Unicode support. The first thing to realise about Unicode in JSON is that it uses UTF-16. If you're on Windows this isn't such a big deal, but for various other platforms this is likely to cause some difficulties :/
Here is the string parser that I use:
struct string_closure : boost::spirit::closure< string_closure, FSLib::wstring, std::vector< utf16 >, utf16 > {
member1 text;
member2 buffer;
member3 character;
};
const struct json_string_parser : public grammar< json_string_parser, string_closure::context_t > {
template< typename scanner_t >
struct definition {
definition( json_string_parser const& self ) {
top = string[ self.text = arg1 ];
string =
chlit< wchar_t >( L'"' )
>> *(
( chlit< wchar_t >( L'\\' ) >> L'\"' )[ push_back( string.buffer, L'"' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'\\' )[ push_back( string.buffer, L'\\' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'/' )[ push_back( string.buffer, L'/' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'b' )[ push_back( string.buffer, utf16( 0x08 ) ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'f' )[ push_back( string.buffer, utf16( 0x0c ) ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'n' )[ push_back( string.buffer, L'\n' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'r' )[ push_back( string.buffer, L'\r' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L't' )[ push_back( string.buffer, L'\t' ) ]
| ( chlit< wchar_t >( L'\\' ) >> L'u' >> uint_parser< utf16, 16, 4, 4 >()[ push_back( string.buffer, arg1 ) ] )
| ( anychar_p[ string.character = arg1 ]
- ( chlit< wchar_t >( L'"' ) | chlit< wchar_t >( L'\\' ) )
)[ push_back( string.buffer, string.character ) ]
) >> chlit< wchar_t >( L'"' )[ string.text = string.buffer /* this is hard */ ];
}
rule< scanner_t, string_closure::context_t > string;
rule< scanner_t > top;
rule< scanner_t > const &start() const { return top; }
};
} json_string_p;This parser uses Boost.Phoenix and closures which I think makes it a little easier to follow — but of course I would say that :)
There are a couple of things to notice:
std::vector< utf16 >.Because JSON is UTF-16 the second piont becomes even harder to deal with if you try to mix the buffer character type with a different final string character type. I'm lucky because I have access to all of FOST.3™'s Unicode support and the FSLib::wstring is a std::wstring like class which has explicit Unicode support and can be constructed and assigned to directly from a UTF-16 buffer.
Consider the following JSON strings:
"\u2014" "\u5b6b\u5b50" "\xd834\xdd1e"
Here they are decoded:
"—" "孫子" "𝄞"
The first one is just an mdash, the second is Sun Tzu's name in Chinese, but the last is hard. If you're not using a good browser you probably won't even see it. It's a treble cleff and is a single Unicode code point which has to be represented as two UTF-16 code points. This needs to be converted to a four bytes in UTF-8 not six (F0 9D 84 9E) — the UTF-16 to UTF-8 converter has to go via UTF-32 to get this right.
Whether Thomas wants to deal with this, or how it should be dealt with in a lightweight library is really an open question.
What I think Thomas can do is to use my string parser above and parameterise it on a conversion function that can be used to convert from the UTF-16 buffer to the required string type. He can keep library light by providing a fairly simple implementation that throws an exception on anything non-ASCII, but also allow for better Unicode handling when users supply a more capable (and heavier weight) implementation.
It's been a very busy month here behind the scenes as I've been working on a fair bit of core technology and maybe of more interest, been doing a lot of preparatory work for starting to shift FOST.3™ to Linux. There's still a long way to go, but the work is paying dividends in forcing a spring clean on a lot of old code.
There are parts of the O/RM layer that haven't been touched for nearly 10 years. Clearly this is very stable code, but ten years is a long time in software development and the new requirements are making us look anew at the key abstractions that are used.
The ADO based database interfaces have always used the COM VARIANT data type to manage data interchange. We now have a stable feature branch that switches this to JSON. This may not sound like a big change, but one thought leads to another and we realised that having a very lightweight JSON database would be a useful addition to FOST.3™ — not least because it greatly simplifies testing of the individual components that make up the systems. This thought lead to yet another and we realise that we can do even better if we abstract out the in-database navigation to also use JSON and something akin to JSONPath to find rows and other result sets.
If this sounds familiar it's probably because it is. CouchDB is an interesting alternative to RDBMSes and has a number of nice properties. By changing our internal abstraction mechanisms we suddenly open up the possibility of using databases like this and Amazon's SimpleDB as an alternative to any flavour of SQL database and it looks like we may be able to do it with minimal impact on the application layer.
This is something that we're pretty excited about. We are however going to need a new name for the O/RM layer as the relational part will become a misnomer :)