
Aggregations in MongoDB
In MongoDB, aggregation is the process of transforming and processing data from the database to compute aggregated results such as counts, sums, averages, and more. Aggregation operations allow you to process and analyze data more effectively, especially when you need to perform complex queries and transformations on your data.
MongoDB provides an aggregation framework that includes a variety of operators and stages to help perform these operations efficiently.
1. Aggregation Pipeline
The aggregation pipeline is the primary way to perform aggregation in MongoDB. It is made up of a series of stages, where each stage processes the data and passes the result to the next stage. You can think of it as a pipeline of transformations, much like an assembly line, where data flows through different stages to produce the final result.
Each stage of the aggregation pipeline performs a specific operation, such as filtering, grouping, sorting, or projecting data. These stages allow you to build complex queries and calculations in an efficient manner.
The pipeline is defined as an array of stages, with each stage represented by a document (also called an operator). Some common stages include $match
, $group
, $sort
, $project
, and $unwind
.
2. Common Aggregation Stages
Here are some of the most commonly used aggregation stages:
1. $match
- The
$match
stage filters documents based on specified conditions, similar to afind
query but within the context of the aggregation pipeline. - It allows you to filter the data before performing further operations like grouping or sorting.
Example:
db.sales.aggregate([ { $match: { year: 2023 } }])
This stage filters for documents where the year is 2023
.
2. $group
- The
$group
stage is used to group documents by a specific field and perform aggregations on those groups. It is often used for operations like counting, summing, averaging, and more. - The
$_id
field is mandatory in the$group
stage, and it represents the grouping key.
Example:
db.sales.aggregate([ { $match: { year: 2023 } }, { $group: { _id: "$region", total_sales: { $sum: "$amount" } } }])
This groups the documents by the region
field and calculates the total sales ($sum
) for each region.
3. $project
- The
$project
stage reshapes the documents, meaning you can include, exclude, or rename fields. You can also create new fields or perform computations on existing fields.
Example:
db.sales.aggregate([ { $match: { year: 2023 } }, { $project: { region: 1, amount: 1, discount: { $multiply: ["$amount", 0.1] } } }])
This stage selects only the region
and amount
fields, and adds a new field discount
, which is 10% of the amount
.
4. $sort
- The
$sort
stage is used to sort the documents based on one or more fields.
Example:
db.sales.aggregate([ { $match: { year: 2023 } }, { $sort: { amount: -1 } }])
This sorts the documents by the amount
field in descending order (-1
means descending).
5. $unwind
- The
$unwind
stage is used to deconstruct an array field from the input documents, creating one document for each element in the array. It’s useful when you want to work with individual elements of an array.
Example:
db.orders.aggregate([ { $unwind: "$items" }, { $group: { _id: "$items.product", total_quantity: { $sum: "$items.quantity" } } }])
This unwinds the items
array, and then groups by product
to get the total quantity for each product.
6. $limit
and $skip
- The
$limit
stage restricts the number of documents passed to the next stage in the pipeline. - The
$skip
stage skips over a specified number of documents in the pipeline.
Example:
db.sales.aggregate([ { $sort: { amount: -1 } }, { $limit: 5 }])
This sorts the documents by amount
in descending order and then limits the result to the top 5 documents.
7. $addFields
and $set
- The
$addFields
and$set
stages allow you to add new fields to the documents.
Example:
db.orders.aggregate([ { $addFields: { total_price: { $multiply: ["$price", "$quantity"] } } }])
This adds a new field total_price
, which is calculated by multiplying the price
and quantity
fields.
8. $count
- The
$count
stage is used to count the number of documents that pass through the pipeline.
Example:
db.sales.aggregate([ { $match: { year: 2023 } }, { $count: "total_sales" }])
This will return the count of documents where the year is 2023 and label the count as total_sales
.
3. Aggregation Operators
There are various operators that can be used within aggregation stages. Here are some key ones:
1. Accumulator Operators (for $group
):
$sum
: Calculates the sum of a field.$avg
: Calculates the average of a field.$min
/$max
: Returns the minimum or maximum value of a field.$push
: Creates an array of values for each group.$first
/$last
: Returns the first or last value in the group.
Example:
db.sales.aggregate([ { $group: { _id: "$region", total_sales: { $sum: "$amount" } } }])
2. Expression Operators (for $project
or $addFields
):
$multiply
,$add
,$subtract
,$divide
: Perform mathematical operations.$concat
: Concatenates strings.$ifNull
: Returns a value if a field isnull
.
Example:
db.orders.aggregate([ { $project: { total: { $multiply: ["$quantity", "$price"] } } }])
3. Comparison Operators (for $match
):
$eq
: Equal to.$ne
: Not equal to.$gt
,$gte
,$lt
,$lte
: Greater than, greater than or equal to, less than, and less than or equal to.$in
,$nin
: Checks if a field's value is in or not in a list of values.
Example:
db.orders.aggregate([ { $match: { quantity: { $gt: 10 } } }])
4. Faceted Search
MongoDB allows you to perform multiple aggregations in a single query by using the $facet
stage. This enables you to create multiple pipelines and combine their results in one output.
Example:
db.orders.aggregate([ { $facet: { "total_sales": [ { $match: { year: 2023 } }, { $group: { _id: null, total_sales: { $sum: "$amount" } } } ], "product_sales": [ { $unwind: "$items" }, { $group: { _id: "$items.product", total_quantity: { $sum: "$items.quantity" } } } ] } }])
This query generates two aggregated results: total_sales
and product_sales
.
5. Example of Complex Aggregation
Here’s an example of a complex aggregation pipeline that filters, groups, sorts, and calculates an average:
db.sales.aggregate([ { $match: { year: 2023, region: "North" } }, { $group: { _id: "$product", total_sales: { $sum: "$amount" }, avg_sales: { $avg: "$amount" } } }, { $sort: { total_sales: -1 } }, { $project: { _id: 0, product: "$_id", total_sales: 1, avg_sales: 1 } }])
This pipeline:
- Filters sales from the year 2023 and region "North".
- Groups by product and calculates the total sales and average sales.
- Sorts by total sales in descending order.
- Projects the fields for the output.
Conclusion
Aggregation in MongoDB is a powerful tool for processing and analyzing data. The aggregation pipeline allows you to chain multiple stages to filter, group, sort, and transform data in a variety of ways. By using aggregation operators and stages, you can perform complex queries and computations on your MongoDB data efficiently, making it suitable for tasks ranging from data analysis to reporting and complex filtering.