kirit.com

Created 26th April, 2005 13:12 (UTC), last edited 30th December, 2013 06:38 (UTC)

Writing about C++, Programming, Fost 4, the web, Thailand and anything else that catches my attention—with some photos thrown in

Comma Separated JSON

Created 18th June, 2016 07:32 (UTC), last edited 20th June, 2016 02:19 (UTC)

Comma Separated JSON (CSJ) is a CSV like file format designed for stream processing where each cell is valid JSON. This makes it very similar to CSV, but without the problems that CSV has.

The problem with JSON is that to produce it you need to build a memory structure of everything you want to dump out, and to parse it you have to build everything in one go back into memory. This is fine for small JSON blobs, but isn't really ideal when the data consists of many mega bytes, or more.

XML solves this by having event based parsers that allow you to read sub-sections of the structure as they stream past. Kind of great, but who really wants to go back to XML?

CSV solves this in a different way. By having each line of data pretty much independent we can both generate and parse it one line at a time. This makes streaming it out and streaming it into things pretty painless. But the problem is that CSV isn't really a well defined file format with a well defined syntax.

  • What is the correct way to embed a new line? Do you expect \n or quoted text where the cell data actually goes into the next line?
  • What about embedded double quotes? Are they doubled up "" or slash escaped \"?
  • Are you going to parse the phone number +6686555123456 as 6.6865551e12?
  • What about dates and times? Actually, let's not even think about that.
  • If you have a single column of data, is the blank line at the end an empty cell and part of the data or not? What about blank lines in the middle?

All of these cases are solved in CSJ by using a JSON base to produce something that looks almost the same as CSV but without the parsing difficulties.

Below is a tiny CSJ file:

"name", "age", "job"
"Kirit Sælensminde", 45, "Minister Without  Portfolio"
"Freyja Sælensminde", 5, null

This looks almost exactly like the same CSV file would, and that's no accident. However we now know a few more things about the file. Following JSON's encoding rules the file is UTF-8. The strings are escaped using JSON rules so Unicode is simple to deal with. As are embedded new lines and double quotes.

Empty cells can now also be properly explicit making use of null. We would also have access to proper booleans. Never again get telephone numbers and actual numbers confused in processing.

Because it's a new format we can also be a bit stricter on specifying how we want a last tricky aspect to be dealt with:

  • All dates and times are ISO formatted
  • Times always have a time zone offset associated with them

Notes on consumption and production

Semantically a CSJ file is an array of JSON objects which share a common set of keys.

The media type for CSJ should be application/csj (following on from JSON's media type).

A single line from a CSJ file can be prepended with [ and suffixed by ] and run through a standard JSON parser.

Because it is JSON a cell doesn't have to be a JSON atom. You can embed JSON objects and arrays into a line and everything will work exactly as you'd expect with no ambiguity of how to produce or consume the data.

Like CSV, it can be produced with very low overhead. Our code that turns Postgres SQL statements into CSJ is able to stream the data over a HTTP connection at twice the speed that psql is able to stream the same SQL statement data into /dev/null. For my 32GB desktop producing JSON output for 10 million rows of half-a-dozen columns requires more RAM than I have. The RAM overhead to produce the CSJ data stream barely registers.

The example below includes embedded arrays:

"slug", "title", "released", "length_minutes", "created","tags", "watched__last", "watched__times"
"t1","Terminator","1984-10-26",null,"2016-06-11 08:54:09.006744+00",["adventure","action","dystopian","robots","time-travel","sci-fi"],"2016-06-11 08:55:31.54614+00",6
"t2","Terminator 2: Judgement Day","1991-07-01",94,"2016-06-11 08:54:12.895416+00",["adventure","action","dystopian","robots","time-travel","sci-fi"],null,null

It would be really cool if spreadsheets had a “CSJ export” option where we wouldn't suffer the sort of data corruption that all too often occurs

Now we'd never have to worry about a user loading a file into a spreadsheet, doing some work, and then corrupting half the data we need to process when it's saved.

Onward and upward

We've been introducing this in a few APIs that we've been building recently and are now in the process of building larger systems that use CSJ. Probably at some point in the near future we'll have a little JavaScript library that can be used to process server requests that return data in CSJ.

This is really a small part of a larger ecosystem that we're in the process of building out and you should be hearing more about in the coming months.


Categories:
Posted: 19th June, 2016 04:28 (UTC)
First on the front page.

Fost 5 release 5.16.03.44971 now out

Posted 29th March, 2016 11:11 (UTC), last edited 25th April, 2016 10:39 (UTC)

Most obvious change is the version bump has finally come through. There's also some new functionality in the crypto wrappers for better password hashes.

There's a couple of new libraries that we're going to be included sometime in the next few releases. We're also redesigning the Postgres handling which hopefully will lead to the removal of most, if not all, of the old O/RM code for a simpler and lighter wrapper around the core Postgres features.

There's also a big change in the pipeline for Android to support the switch from gcc to clang — watch this space.

Building on Linux

git clone --branch=5.16.03.44971 --recursive git@github.com:KayEss/fost-hello.git
cd fost-hello
Boost/build 58 0
Boost/install 58 0
hello/compile
dist/bin/hello-world-d

Download locations

Applications

  • beanbag — Stand alone transactional JSON database server — git@github.com:KayEss/beanbag.git
  • beanbag-seed — Seed project for giving you a starting point to develop web applications using Beanbag — git@github.com:KayEss/beanbag-seed.git
  • fost-hello — Sample seed project — git@github.com:KayEss/fost-hello.git
  • mengmon — Stand alone web server — git@github.com:KayEss/mengmom.git

Libraries

  • f5-threading — Preview of the first Fost 5 library which includes help for threading.
  • fost-aws — Amazon AWS and OpenStack — git@github.com:KayEss/fost-aws.git
  • fost-android — Eclipse project for Android that allows Fost 4 and Beanbags to be used on mobile devices — git@github.com:KayEss/fost-android.git
  • fost-android-ndk — The native code for Android. Includes required parts of Boost configured to use the standard Android build system.
  • fost-beanbag — Transactional JSON database — git@github.com:KayEss/fost-beanbag.git
  • fost-base — Build system and core libraries — git@github.com:KayEss/fost-base.git
  • fost-internet — Internet protocols, servers & clients — git@github.com:KayEss/fost-internet.git
  • fost-meta — All libraries in one wrapper — git@github.com:KayEss/fost-meta.git
  • fost-orm — Object/Relational mapping — git@github.com:KayEss/fost-orm.git
  • fost-postgres — PostgreSQL — git@github.com:KayEss/fost-postgres.git
  • fost-py — Python (2.x) bindings — git@github.com:KayEss/fost-py.git
  • fost-web — Web server libraries — git@github.com:KayEss/fost-web.git

Detailed change log

fost-aws

  • Removed use of deprecated Fost APIs.

fost-base

  • New fostlib::crypto_bytes function returns an array of bytes from a cryptographically secure location.
  • Add new fostlib::array_view that allows for looking at memory areas.
  • Implement PKBKDF2 with HMAC and SHA256 that produces a fixed length (64 bytes) derived key and one that produces a derived key whose length can be chosen at run time.
  • Added fostlib::crypto_compare which is intended to be as close as possible to constant time for comparison of memory blocks (byte arrays and strings).
  • Allow HMAC with std::array<unsigned char, n> secrets.
  • gcc doesn't really get [[deprecated]] so disable the warnings.
  • Add a new werror build option which sets the -Werror compile flag and turns warnings into errors. Fix remaining warnings in clang builds.
  • Deprecated the fostlib::exceptions::exception::info members. Added a coercion from fostlib::exceptions::exception to fostlib::json. Add a new forwarded exception type. Add #define FOST_NO_STD_EXCEPTION_PTR to control use of std::exception_ptr.
  • Fix a problem with the variadic insert where it expected a jcursor rather than converted to one.
  • Add digest and HMAC overloads for std::vector<unsigned char>.

fost-beanbag

  • Removed use of deprecated Fost APIs.

fost-internet

  • Removed use of deprecated Fost APIs.
  • Make sure the web server logs an error message when it catches an excepion.

fost-orm

  • Improved an error message for file backed JSON databases.
  • Removed use of deprecated Fost APIs.

fost-py

  • Removed use of deprecated Fost APIs.

fost-web

  • Allow multiple configuration files to be loaded by the web server and explicitly load up the logging options.
  • Added Date, Expires to static response headers.
  • Implemented 304 response.
  • Removed use of deprecated Fost APIs.

Categories:

Fost 4 release 4.15.12.44967 now out

Posted 20th January, 2016 11:17 (UTC), last edited 20th January, 2016 11:17 (UTC)

There has been a good deal of work done on the libraries, but nothing has made it into the master branches yet, so nothing to report here. That hasn't been why this release is a little late though. Boost 1.60 has just come out, but it appears to have had some sort of regression in the Python wrappers. This hasn't just affected fost-py and at the moment it seems everybody is still at a loss to understand why it has happened — even the Boost Python library maintainer. I was hoping that there was something fairly simple to get this working, but for the time being if you're using fost-py then you cannot use Boost 1.60.

Building on Linux

git clone --branch=4.15.12.44967 --recursive git@github.com:KayEss/fost-hello.git
cd fost-hello
Boost/build 58 0
Boost/install 58 0
hello/compile
dist/bin/hello-world-d

Download locations

Applications

  • beanbag — Stand alone transactional JSON database server — git@github.com:KayEss/beanbag.git
  • beanbag-seed — Seed project for giving you a starting point to develop web applications using Beanbag — git@github.com:KayEss/beanbag-seed.git
  • fost-hello — Sample seed project — git@github.com:KayEss/fost-hello.git
  • mengmon — Stand alone web server — git@github.com:KayEss/mengmom.git

Libraries

  • f5-threading — Preview of the first Fost 5 library which includes help for threading.
  • fost-aws — Amazon AWS and OpenStack — git@github.com:KayEss/fost-aws.git
  • fost-android — Eclipse project for Android that allows Fost 4 and Beanbags to be used on mobile devices — git@github.com:KayEss/fost-android.git
  • fost-android-ndk — The native code for Android. Includes required parts of Boost configured to use the standard Android build system.
  • fost-beanbag — Transactional JSON database — git@github.com:KayEss/fost-beanbag.git
  • fost-base — Build system and core libraries — git@github.com:KayEss/fost-base.git
  • fost-internet — Internet protocols, servers & clients — git@github.com:KayEss/fost-internet.git
  • fost-meta — All libraries in one wrapper — git@github.com:KayEss/fost-meta.git
  • fost-orm — Object/Relational mapping — git@github.com:KayEss/fost-orm.git
  • fost-postgres — PostgreSQL — git@github.com:KayEss/fost-postgres.git
  • fost-py — Python (2.x) bindings — git@github.com:KayEss/fost-py.git
  • fost-web — Web server libraries — git@github.com:KayEss/fost-web.git

Detailed change log

No changes


Categories:

Fost 4 release 4.15.09.44960 now out

Posted 29th September, 2015 02:50 (UTC), last edited 29th September, 2015 03:41 (UTC)

The detailed logs look pretty short, but that hides a few fairly big improvements and new features.

The builds are now completely C++14 for both Linux and Android. One consequence of this is that the platform versions of the Boost libraries will no longer work and Boost will need to be re-built.

We've started to sketch out the Fost 5 threading library, a set of very basic building blocks for thread safe storage of data. This is now a dependency for fost-base. There is also a new module system and new performance counters.

The beanbags have also gotten a bit smarter. The old implementation used a thread per beanbag to handle concurrency. Although easy to reason about this of course led to a lot of threads. The implementation has been changed to use a mutex instead. There is now a possibility of deadlock that wasn't there before if you try to do too much in a transaction.

Building on Linux

git clone --branch=4.15.09.44960 --recursive git@github.com:KayEss/fost-hello.git
cd fost-hello
Boost/build 58 0
Boost/install 58 0
hello/compile
dist/bin/hello-world-d

Download locations

Applications

  • beanbag — Stand alone transactional JSON database server — git@github.com:KayEss/beanbag.git
  • beanbag-seed — Seed project for giving you a starting point to develop web applications using Beanbag — git@github.com:KayEss/beanbag-seed.git
  • fost-hello — Sample seed project — git@github.com:KayEss/fost-hello.git
  • mengmon — Stand alone web server — git@github.com:KayEss/mengmom.git

Libraries

  • f5-threading — Preview of the first Fost 5 library which includes help for threading.
  • fost-aws — Amazon AWS and OpenStack — git@github.com:KayEss/fost-aws.git
  • fost-android — Eclipse project for Android that allows Fost 4 and Beanbags to be used on mobile devices — git@github.com:KayEss/fost-android.git
  • fost-android-ndk — The native code for Android. Includes required parts of Boost configured to use the standard Android build system.
  • fost-beanbag — Transactional JSON database — git@github.com:KayEss/fost-beanbag.git
  • fost-base — Build system and core libraries — git@github.com:KayEss/fost-base.git
  • fost-internet — Internet protocols, servers & clients — git@github.com:KayEss/fost-internet.git
  • fost-meta — All libraries in one wrapper — git@github.com:KayEss/fost-meta.git
  • fost-orm — Object/Relational mapping — git@github.com:KayEss/fost-orm.git
  • fost-postgres — PostgreSQL — git@github.com:KayEss/fost-postgres.git
  • fost-py — Python (2.x) bindings — git@github.com:KayEss/fost-py.git
  • fost-web — Web server libraries — git@github.com:KayEss/fost-web.git

Detailed change log

fost-base

  • Moved the tagged string header.
  • The performance counter now takes a variadic constructor to build the path that it will be recorded into.
  • Changed the jcursor constructors to be properly variadic.
  • Added a mechanism for setting modules that are part of a system. The log messages now make use of this so it's easier to track where log messages come from.
  • fostlib::push_back now accepts a fostlib::json::array_t.
  • Deprecate fostlib::counter.

beanbag/fost-beanbag

  • Improved error reporting when opening a beanbag.
  • Fixed up tests to remove use of std::auto_ptr.

fost-orm

  • Cleaned up some old conditional compilation that is no longer relevant.
  • Use a mutex to serialise access to the beanbag data rather than a separate thread.

mengmom/fost-web

  • Add MIME type for SVG files.
  • Remove uses of std::auto_ptr.
  • Add the ability for the static view to handle DELETE requests when it's configuration includes "verbs": {"DELETE": true}

Categories:

Fost 4 release 4.15.06.44953 now out

Posted 25th June, 2015 04:45 (UTC), last edited 4th July, 2015 04:18 (UTC)

This is the first release for C++14. It includes a preview of a first version of the Fost 5 threading library, but fost-windows has been removed due to continued in-accessibility to a compiler and test platform. Old versions of Boost should probably be rebuilt so that they will be built with C++14.

All of the changes for this switch are going to take a bit of time to settle down. So far we are using C++14 for Android and Linux projects based on this release, but the packaging of it is probably still a bit off in some of the -dev projects.

We've also restricted the Boost versions that we're supporting to 1.55 through 1.58. And Android library is currently using 1.56.

Building on Linux

git clone --branch=4.15.06.44953 --recursive git@github.com:KayEss/fost-hello.git
cd fost-hello
Boost/build 58 0
Boost/install 58 0
hello/compile
dist/bin/hello-world-d

Download locations

Applications

  • beanbag — Stand alone transactional JSON database server — git@github.com:KayEss/beanbag.git
  • beanbag-seed — Seed project for giving you a starting point to develop web applications using Beanbag — git@github.com:KayEss/beanbag-seed.git
  • fost-hello — Sample seed project — git@github.com:KayEss/fost-hello.git
  • mengmon — Stand alone web server — git@github.com:KayEss/mengmom.git

Libraries

  • f5-threading — Preview of the first Fost 5 library which includes help for threading.
  • fost-aws — Amazon AWS and OpenStack — git@github.com:KayEss/fost-aws.git
  • fost-android — Eclipse project for Android that allows Fost 4 and Beanbags to be used on mobile devices — git@github.com:KayEss/fost-android.git
  • fost-android-ndk — The native code for Android. Includes required parts of Boost configured to use the standard Android build system.
  • fost-beanbag — Transactional JSON database — git@github.com:KayEss/fost-beanbag.git
  • fost-base — Build system and core libraries — git@github.com:KayEss/fost-base.git
  • fost-internet — Internet protocols, servers & clients — git@github.com:KayEss/fost-internet.git
  • fost-meta — All libraries in one wrapper — git@github.com:KayEss/fost-meta.git
  • fost-orm — Object/Relational mapping — git@github.com:KayEss/fost-orm.git
  • fost-postgres — PostgreSQL — git@github.com:KayEss/fost-postgres.git
  • fost-py — Python (2.x) bindings — git@github.com:KayEss/fost-py.git
  • fost-web — Web server libraries — git@github.com:KayEss/fost-web.git

Detailed change log

fost-aws

  • Remove explicit types to get rid of auto_ptr uses.

fost-base

  • Use std::rethrow_exception to move an exception between threads.
  • Added variadic versions of fostlib::push_back and fostlib::insert.
  • Add coercions for ascii_string and utf8_string to json.
  • We need libatomic if we're using gcc++ (or clang) for many programs so just have it included.
  • Pretty up some common log patterns when using colour output.
  • Added colour options to the stdout logger.
  • Remove the FOST_HAS_MOVE define because with C++14 we don't need it.
  • Test file names can now end in .tests.cpp as well as -tests.cpp.
  • Removed internal uses of boost::scoped_ptr.
  • Add SHA256 to the crypto wrappers.
  • Replace all uses of boost::filesystem::wpath with path.
  • When building an executable make sure that we pull in at least everything for fost-core.
  • Fix a build error with Boost 1.58.0.
  • Switch to C++14.
  • Improve the exception information where a jcursor::insert fails due to existing data at the requested key.

fost-internet

  • Remove all uses of auto_ptr. Replace some with move semantics, others with unique_ptr.
  • Fixed up the tests to use the new service API on the network connections.
  • Changed the way that Boost ASIO IO services are used so that server accept sockets can be serviced totally independantly. This is an ugly workaround for the problem, but does at least cause all requests to be properly serviced.
  • Allow apostrophes in the fostlib::url::filepath_string strings.

fost-orm

  • Add a pre-commit mechanism to the jsondb::local.
  • Make the JSON DB local transaction movable.
  • Add a data member to the local JSON transaction so all of the data can be fetched.
  • Add JSON database post-commit hooks to augement the transactions ones we already had.
  • Fix a build error with Boost 1.58.0.
  • Switch to C++14 and remove auto_ptrs.

fost-py

  • Strip out auto_ptr.
  • Add a new test so we can check the version of Python we're using.
  • Alter the fpython tests so they don't rebuild the host for each one.
  • Updated to C++14.

Categories: