Liking cljdoc? Tell your friends :D

SqlForCql

SQL like capabilities for Apache Cassandra.

This library is born out of frustration of expecting SQL (kind of) queries to work against Cassandra, but they don't. The situation becomes worse when you have to update some data.

Recently, I was working on a data migration task, and due to some logical errors, I had to rollback the changes made to the data, but, sadly, this was not an RDBMS!

Hence, this library, which helps in executing those queries against a Cassandra database which are not possible using plain CQL. Seemingly simple queries, e.g. a query with a like clause or, a query where we have to update multiple rows (which possibly, could've been achieved using a nested SQL query), are made possible using this library.

This library is written in Clojure, so the syntax is far from that of SQL, but it is minimal enough to be usable.

Usage

Clojars Project

There are just a few namespaces in this library:

  1. sqlforcql.core - with functions to connect to and disconnect from Cassandra
  2. sqlforcql.cql - with functions to get / update data from Cassandra
  3. sqlforcql.db - which actually establishes a connection to Cassandra and disconnects from it
  4. sqlforcql.atoms - contains a single default-db-map atom which stores Cassandra connection parameters
  5. sqlforcql.querybuilder - which constructs valid CQL queries to be run against Cassandra
  6. sqlforcql.analyze - with a function to get counts of various tables, and a function to get the difference in the rows of a base table, and it's supporting query table.
  7. sqlforcql.generate - with a function to get insert-statements when there is a difference in the rows of a base table, and it's supporting query table.

Query capabilities

Before we can start executing any queries, we need to connect to a Cassandra instance. This can be done using:

  (core/connect-to-default-db "localhost" "username" "password" "keyspace")

Once we are done, we can disconnect from Cassandra using:

  (core/disconnect-from-default-db)

Sample schema

The tests in this library sqlforcql.schema create the following two tables (with the data as shown):

players table

with nickname as the partitioning key (hereafter, referred to as the PK column)

nickname(*)citycountryfirst_namelast_namezip
fedexBernSwitzerlandRogerFederer3001
naseerAjmerIndiaNaseeruddinShah305001
sonuDubaiUAESonuNigam00000
rafaMadridSpainRafaelNadal28001
kingAbu DhabiUAEShahrukhKhan00000
chintuJodhpurIndiaRishiKapoor305001

players_by_city table

with city and country as the partitioning keys, so country becomes the clustering column (hereafter, referred to as the CK column)

city(*)country(*)first_namelast_namenicknamezip
JodhpurIndiaRishiKapoorchintu305001
BernSwitzerlandRogerFedererfedex3001
Abu DhabiUAEShahrukhKhanking00000
AjmerIndiaNaseeruddinShahnaseer305001
DubaiUAESonuNigamsonu00000
MadridSpainRafaelNadalrafa28001

The following queries are obvious (though nothing which CQL can't do):

  • select * from players; - (cql/get-all "players")

  • select count(*) from players; - (cql/get-count "players")

  • select * from players where city = 'Ajmer' allow filtering; (cql/get-by-non-pk-col "players" {:city "Ajmer"})

  • select * from players where nickname = 'fedex'; (cql/get-by-pk-col "players" {:nickname "fedex"})

Or, from players_by_city table where city and country are PK / CK columns respectively:

  • select * from players_by_city where city = 'Jodhpur' and country = 'India'; (cql/get-by-pk-col "players_by_city" {:city "Jodhpur" :country "India"})

Now this is where (the CQL betrays us) and the fun starts:

Suppose we wanted to execute a SQLish query with a like clause:

  • select * from players where city like 'Dhabi'; (cql/get-by-non-pk-col-like "players" {:city "Dhabi"})

Or, suppose we wanted to update multiple rows based on a criteria involving some non-PK column:

  • update players set city = 'X' where city = 'Y'; (cql/update-by-non-pk-col "players" :nickname {:city "X"} {:city "Y"})

A query similar to the above update query against a table having both PK and CK columns, we have to use:

  • update players_by_city set zip = 411038 where zip = 305001; - (cql/update-by-non-pk-col-with-clustering-col "players_by_city" [:city :country] {:zip 305001} {:zip 411038})

Other capabilities

Typically, when we are using Cassandra, it is very common to have a base table, and a supporting query table similar to the players & players_by_city tables in our example above. In older Cassandra versions, there was support for materialized views where we did not have to worry about keeping the number of rows in the base table and its materialized view, in sync. However, current Cassandra versions have dropped support for materialized views, so it's now our responsibility to keep the rows in the base table, and the supporting query tables in sync (i.e. the data in these related tables should be the same). This is where the analyze and the generate namespaces help.

Getting counts of the number of rows of a few (base, and the supporting query) tables:

  • (analyze/get-counts ["players" "players_by_city"])

Getting difference in the rows of a base table, and it's supporting query table:

  • (analyze/get-diff "players" "players_by_city")

  • NOTE: If the rows are the same, then this above function returns: {:no-difference #{}}

Generating insert statements, if the rows in the base table, and it's supporting query table differ:

  • (generate/get-insert-statements table-name table-rows)

Typically, the output of analyze/get-diff can be fed to the above function if the rows aren't the same:

  • (generate/get-insert-statements (analyze/get-diff "players" "players_by_city"))

License

Copyright © 2020 FIXME

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

Can you improve this documentation? These fine people already did:
Manoj Waikar & mmwaikar
Edit on GitHub

cljdoc is a website building & hosting documentation for Clojure/Script libraries

× close