diff options
author | Ben Sima <ben@bsima.me> | 2018-01-24 18:31:41 -0800 |
---|---|---|
committer | Ben Sima <ben@bsima.me> | 2018-01-24 18:31:41 -0800 |
commit | 8eeda4aa133e6aa3e068b1a9f7d31cd9001475d4 (patch) | |
tree | f20c96ba90eeed88cee92649929f20bbc9ea1634 | |
parent | 57bcca93b1ed1e31d8837ba1a43a872c9e4f757f (diff) |
Fix the slow bootstrapping with `groupUpdates`
Fixes #1
-rwxr-xr-x | main.hs | 25 | ||||
-rw-r--r-- | readme.md | 21 |
2 files changed, 8 insertions, 38 deletions
@@ -28,6 +28,7 @@ {-# LANGUAGE RecordWildCards #-} import Data.Acid +import Data.Acid.Advanced (groupUpdates) import Data.Acid.Local (createCheckpointAndClose) import qualified Data.ByteString.Lazy as BSL import qualified Data.Csv as Csv @@ -240,21 +241,9 @@ postNumberR = do caller <- liftIO $ update db $ AddCaller _name _number _context sendStatusJSON status200 $ caller --- | This takes a while; on my machine it averages 181 records per second. It's --- IO bound, and in an un-optimized program GHC on Linux uses a single, blocking --- IO manager thread (on Windows it's non-blocking, apparently). This can be --- improved with the Control.Concurrent module, it which case we could launch as --- many IO threads as we want, and do probably 10k records per second. There's --- definitely an optimal amount of threads here, we'd have to test to find that. --- --- HOWEVER, you can watch it bootstrap. Hit this endpoint, then use "GET /count" --- to see it updating. New POSTs will also work and update the database, even --- while it is bootstrapping, which is kinda cool. --- --- Try this in bash: --- --- while sleep 1; do curl -s "localhost:3000/count" | jq '.count'; done --- + +callerFromCsv (number, context, name) = AddCaller name number context + postBootstrapR :: Handler RepJson postBootstrapR = do $logInfo "Initializing the database." @@ -264,11 +253,7 @@ postBootstrapR = do seedData <- liftIO $ BSL.readFile "interview-callerid-data.csv" callers <- case Csv.decode Csv.NoHeader seedData of Left err -> fail err - Right v -> - Vector.forM_ (Vector.indexed v) $ \(callerId, record) -> do - let (number, context, name) = record - c <- liftIO $ update db $ AddCaller name number context - return c + Right v -> liftIO $ groupUpdates db $ Vector.toList $ Vector.map callerFromCsv v sendStatusJSON status200 $ ("Bootstrap complete." :: Text) @@ -2,22 +2,7 @@ 2. Optional: have [nix](https://nixos.org/nix/) installed. If you *don't* use nix, delete line 3 of `main.hs`. 3. Run `./main.hs` and the server will startup -To bootstrap the db, do `curl -XPOST "localhost:3000/bootstrap"`. +To bootstrap the db, do `curl -XPOST "localhost:3000/bootstrap"`. This takes +like 3 seconds or so. -This takes a while; on my machine it averages 181 records per second. It's IO -bound, and in an un-optimized program GHC on Linux uses a single, blocking IO -manager thread (on Windows it's non-blocking, [apparently][1]). This can be -improved with the [Control.Concurrent][1] module, it which case we could launch -as many IO threads as we want, and do probably 10k records per second. There's -definitely an optimal amount of threads here, we'd have to test to find that. - -*However*, you can watch it bootstrap. In a separate terminal, do `curl -"localhost:3000/count"` to see it updating. New POSTs will also work and update -the database, even while it is bootstrapping, which is kinda cool. - -Try this in bash: - - while sleep 1; do curl -s "localhost:3000/count" | jq '.count'; done - - -[1]: https://www.stackage.org/haddock/lts-10.3/base-4.10.1.0/Control-Concurrent.html#g:10 +See `test.http` for some other examples. |