ClickHouse Expert: Refactor PostgreSQL Query Builders to ClickHouse
Upwork·US·Remote Friendly
Posted 1 day ago
Contractor
Apply Now About the Role
--- About Us ---
We run a survey analytics platform that processes ~50 million survey answers per day. Our data is stored in an EAV (Entity-Attribute-Value) model — 500M+ rows, 20,000+ distinct questions, surveys with up to 250,000 responses each.
We've migrated our data layer to ClickHouse Cloud (with real-time replication from PostgreSQL via ClickPipes), but two of our most critical query builders still target PostgreSQL. We need them refactored to ClickHouse.
--- What You'll Be Working With ---
These are NOT static SQL files. Both queries are TypeScript functions that dynamically construct SQL at runtime using template literals and Drizzle ORM. You'll be reading and writing .ts files, not .sql.
The codebase is TypeScript/Node.js running on AWS Lambda.
--- Query 1: Advanced Filter Engine ---
This is the foundation — nearly every analytical feature on the platform depends on it.
The filter engine takes a nested AND/OR condition tree (configured by the user in the UI) and recursively converts it into WHERE clauses. In PostgreSQL, this currently uses JSONB array operations (ANY, &&).
Example business logic:
"Show me users who answered 'Yes' to Question A AND match either of these two groups: (answered 'London' to Question B AND are aged 25-34) OR (answered 'Manchester' to Question B AND are aged 35-44)."
As a condition tree:
AND
├─ Q_A = 'Yes'
└─ OR
├─ AND
│ ├─ Q_B = 'London'
│ └─ Age = '25-34'
└─ AND
├─ Q_B = 'Manchester'
└─ Age = '35-44'
The query builder recursively walks this tree and generates the corresponding WHERE clause against the EAV structure. This must remain generic and reusable — it's called by the segmentation query below, but also by data exports, respondent lists, and other internal tools.
--- Query 2: Cross-Tabulated Segmentation ---
This is the main analytical query. It uses the filter engine (Query 1) to select a user set, then segments those users across multiple dimensions.
Users configure segments in the UI from various dimension types: question responses, profile fields, date ranges, and aggregated values. The query dynamically builds 8+ CTEs based on this configuration.
Example: A user creates segments "Number of cats owned" × "Favourite cat food" × "Income bracket". The query returns the count of users for every combination of values, plus sample sizes.
Technically, the PostgreSQL version uses ARRAY_AGG/UNNEST for per-user aggregation and word_similarity/UNACCENT for fuzzy text matching. Both need ClickHouse-appropriate alternatives.
--- What We Provide ---
• Both queries in their current PostgreSQL/TypeScript form, fully documented
• A ClickHouse Cloud dev environment with realistic test data
• Our table schemas and EAV model documentation
• Direct access to our lead developer for questions and code review
--- What Success Looks Like ---
• Both query builders produce correct, optimized ClickHouse SQL
• Query performance targets: 1-3 seconds for typical analytical workloads
• The filter engine remains generic and reusable
• Code follows our existing TypeScript patterns and passes review
--- Must-Have Skills ---
• Deep ClickHouse experience — you understand MergeTree engine internals, how ORDER BY affects query planning, granule-level index filtering, and how to read EXPLAIN output
• Strong PostgreSQL knowledge — you need to fully understand the source queries to refactor them correctly
• TypeScript proficiency — this is .ts refactoring, not SQL writing
• Experience working with EAV data models or similarly denormalized analytical schemas
--- In Your Proposal, Please Include ---
1. A specific example of a ClickHouse query optimization you've done — what was slow, what you changed, and what the result was
2. How you'd approach replacing PostgreSQL's word_similarity() in ClickHouse
3. Your availability and estimated timeline for this scope
What you'll do
- You'll be reading and writing .ts files, not .sql
- This is the foundation — nearly every analytical feature on the platform depends on it
- The filter engine takes a nested AND/OR condition tree (configured by the user in the UI) and recursively converts it into WHERE clauses
- │ ├─ Q_B = 'London'
- It uses the filter engine (Query 1) to select a user set, then segments those users across multiple dimensions
- Users configure segments in the UI from various dimension types: question responses, profile fields, date ranges, and aggregated values
- Direct access to our lead developer for questions and code review
- The filter engine remains generic and reusable
- Code follows our existing TypeScript patterns and passes review
Requirements
- In PostgreSQL, this currently uses JSONB array operations (ANY, &&)
- "Show me users who answered 'Yes' to Question A AND match either of these two groups: (answered 'London' to Question B AND are aged 25-34) OR (answered 'Manchester' to Question B AND are aged 35-44)."
- │ └─ Age = '25-34'
- └─ Age = '35-44'
- This must remain generic and reusable — it's called by the segmentation query below, but also by data exports, respondent lists, and other internal tools
- -- Query 2: Cross-Tabulated Segmentation ---
- Both need ClickHouse-appropriate alternatives
- Both queries in their current PostgreSQL/TypeScript form, fully documented
- A ClickHouse Cloud dev environment with realistic test data
- Our table schemas and EAV model documentation
- Both query builders produce correct, optimized ClickHouse SQL
- Query performance targets: 1-3 seconds for typical analytical workloads
- Deep ClickHouse experience — you understand MergeTree engine internals, how ORDER BY affects query planning, granule-level index filtering, and how to read EXPLAIN output
- Strong PostgreSQL knowledge — you need to fully understand the source queries to refactor them correctly
- TypeScript proficiency — this is .ts refactoring, not SQL writing
- Experience working with EAV data models or similarly denormalized analytical schemas
- How you'd approach replacing PostgreSQL's word_similarity() in ClickHouse
- Your availability and estimated timeline for this scope