Skip to content

Instantly share code, notes, and snippets.

@smitpatel
Last active October 15, 2021 17:39
Show Gist options
  • Save smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a to your computer and use it in GitHub Desktop.
Save smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a to your computer and use it in GitHub Desktop.
This app shows how to do split queries for multiple levels of collection include in EF Core 3.0
using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.EntityFrameworkCore;
namespace EFSampleApp
{
public class Program
{
public static void Main(string[] args)
{
using (var db = new MyContext())
{
// Recreate database
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
// Seed database
db.AddRange(new Customer
{
Address = new Address(),
Orders = new List<Order>
{
new Order
{
OrderDiscount = new OrderDiscount(),
OrderDetails = new List<OrderDetail>
{
new OrderDetail(),
new OrderDetail()
}
},
new Order
{
OrderDiscount = new OrderDiscount(),
OrderDetails = new List<OrderDetail>
{
new OrderDetail(),
new OrderDetail()
}
},
new Order
{
OrderDiscount = new OrderDiscount()
},
new Order()
},
},
new Customer
{
Address = new Address()
},
new Customer());
db.SaveChanges();
}
using (var db = new MyContext())
{
// Run queries
// Tracking and Buffered
Console.WriteLine("Tracking & Buffering");
var query = db.Customers.Include(c => c.Address);
var result = query.ToList();
query.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders).Load();
query.SelectMany(c => c.Orders).SelectMany(o => o.OrderDetails).Load();
// Following code is just to print out, above will run queries and stitch up graph
// Since Include is not used for collection navigations,
// the collection properties may be null if no related objects & not initialized
foreach (var customer in result)
{
Console.WriteLine($"CustomerId: {customer.Id}");
Console.WriteLine($"Customer Address: {customer.Address?.Id}");
if (customer.Orders != null)
{
Console.WriteLine($"Customer Orders.Count: {customer.Orders.Count}");
foreach (var order in customer.Orders)
{
Console.WriteLine($"OrderId: {order.Id}");
Console.WriteLine($"Order OrderDiscount: {order.OrderDiscount?.Id}");
if (order.OrderDetails != null)
{
Console.WriteLine($"Order OrderDetails.Count: {order.OrderDetails?.Count}");
foreach (var orderDetail in order.OrderDetails)
{
Console.WriteLine($"OrderDetailId: {orderDetail.Id}");
}
}
}
}
}
}
using (var db = new MyContext())
{
// Run queries
// Tracking and non-buffered
Console.WriteLine("Tracking & Non-buffering");
var customers = db.Customers.Include(c => c.Address);
var orders = customers.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders).GetEnumerator();
orders.MoveNext();
var orderDetails = customers.SelectMany(c => c.Orders).SelectMany(o => o.OrderDetails).GetEnumerator();
orderDetails.MoveNext();
// Above will run queries and get enumerators, following code will actually enumerate.
// The following code blocks will move each enumerators upto the point it is needed to generate the current result
// Since Include is not used for collection navigations,
// the collection properties may be null if no related objects & not initialized
foreach (var customer in customers)
{
Console.WriteLine($"CustomerId: {customer.Id}");
Console.WriteLine($"Customer Address: {customer.Address?.Id}");
while (orders.Current?.CustomerId == customer.Id)
{
// Enumerate orders as long as the order is related to customer
if (!orders.MoveNext())
{
break;
}
}
if (customer.Orders != null)
{
Console.WriteLine($"Customer Orders.Count: {customer.Orders.Count}");
foreach (var order in customer.Orders)
{
Console.WriteLine($"OrderId: {order.Id}");
Console.WriteLine($"Order OrderDiscount: {order.OrderDiscount?.Id}");
while (orderDetails.Current?.OrderId == order.Id)
{
// Enumerate orderDetails as long as the orderDetail is related to order
if (!orderDetails.MoveNext())
{
break;
}
}
if (order.OrderDetails != null)
{
Console.WriteLine($"Order OrderDetails.Count: {order.OrderDetails.Count}");
foreach (var orderDetail in order.OrderDetails)
{
Console.WriteLine($"OrderDetailId: {orderDetail.Id}");
}
}
}
}
}
orders.Dispose();
orderDetails.Dispose();
}
using (var db = new MyContext())
{
// Run queries
// Non-tracking
Console.WriteLine("Non-tracking");
var customers = db.Customers.Include(c => c.Address).AsNoTracking();
var orders = customers.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders)
.Select(o => new
{
// We connect order to related customer by comparing value of FK to PK.
// If FK property is not shadow then this custom projection is not necessary as you can access o.CustomerId
// If FK property is shadow then project out FK value and use it for comparison.
o.CustomerId, // For shadow property use EF.Property<int>(o, "CustomerId")
o
}).GetEnumerator();
orders.MoveNext();
var orderDetails = customers.SelectMany(c => c.Orders).SelectMany(o => o.OrderDetails)
.Select(od => new
{
od.OrderId,
od
})
.GetEnumerator();
orderDetails.MoveNext();
// Above will run queries and get enumerators, following code will actually enumerate.
// The following code blocks will move each enumerators upto the point it is needed to generate the current result
// And stitch up navigations.
// If you want to buffer the result, create collection to store top level objects.
foreach (var customer in customers)
{
Console.WriteLine($"CustomerId: {customer.Id}");
Console.WriteLine($"Customer Address: {customer.Address?.Id}");
customer.Orders = new List<Order>();
while (orders.Current?.CustomerId == customer.Id)
{
// Add order to collection
customer.Orders.Add(orders.Current.o);
// Set inverse navigation to customer
orders.Current.o.Customer = customer;
// Enumerate orders as long as the order is related to customer
if (!orders.MoveNext())
{
break;
}
}
Console.WriteLine($"Customer Orders.Count: {customer.Orders.Count}");
foreach (var order in customer.Orders)
{
Console.WriteLine($"OrderId: {order.Id}");
Console.WriteLine($"Order OrderDiscount: {order.OrderDiscount?.Id}");
order.OrderDetails = new List<OrderDetail>();
while (orderDetails.Current?.OrderId == order.Id)
{
// Add orderDetail to collection
order.OrderDetails.Add(orderDetails.Current.od);
// Set inverse navigation to order
orderDetails.Current.od.Order = order;
// Enumerate orderDetails as long as the orderDetail is related to order
if (!orderDetails.MoveNext())
{
break;
}
}
Console.WriteLine($"Order OrderDetails.Count: {order.OrderDetails.Count}");
foreach (var orderDetail in order.OrderDetails)
{
Console.WriteLine($"OrderDetailId: {orderDetail.Id}");
}
}
}
orders.Dispose();
orderDetails.Dispose();
}
Console.WriteLine("Program finished.");
}
}
public class MyContext : DbContext
{
public DbSet<Customer> Customers { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
// Select 1 provider
optionsBuilder
.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=_ModelApp;Trusted_Connection=True;Connect Timeout=5;ConnectRetryCount=0;MultipleActiveResultSets=true");
}
}
public class Customer
{
public int Id { get; set; }
public Address Address { get; set; }
public List<Order> Orders { get; set; }
}
public class Address
{
public int Id { get; set; }
public int CustomerId { get; set; }
public Customer Customer { get; set; }
}
public class Order
{
public int Id { get; set; }
public int CustomerId { get; set; }
public Customer Customer { get; set; }
public OrderDiscount OrderDiscount { get; set; }
public List<OrderDetail> OrderDetails { get; set; }
}
public class OrderDetail
{
public int Id { get; set; }
public int OrderId { get; set; }
public Order Order { get; set; }
}
public class OrderDiscount
{
public int Id { get; set; }
public int OrderId { get; set; }
public Order Order { get; set; }
}
}
@iuridosanjos
Copy link

As far as I've tested, your first example are just redoing the whole select with additional informations when it's supposed to just load the additional informations.

var query = db.Customers.Include(c => c.Address);
var result = query.ToList();
query.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders).Load();

Second line will load all Customers JOIN Addresses.
Third line will load all Customers JOIN Addresses JOIN Orders JOIN OrdersDiscount, but it should only load what's missing: Orders and OrdersDiscount.

@smitpatel
Copy link
Author

Third line only loads what is missing, it does not fetch any additional data about Customer/Address from server.

@iuridosanjos
Copy link

iuridosanjos commented Apr 14, 2020

Well, I've mislead myself on the problem. I got a problem using LazyLoadingProxies. If you don't use lazy loading, this solution works well, but when you activate lazy loading and add the "virtual" to the properties, they all get loaded again, like it wasn't already loaded on the context or the tracking is broken. Any advice on that?

For an example you can edit this model to use lazy loading proxies, add "virtual" to relation properties and test the first part "Tracking & Buffering". Whenever the properties not loaded on the first query are used, but you loaded then on the second and third query, the lazy loading will load them again.

I've posted a detailed explanation about the problem using this example: dotnet/efcore#20626

@smitpatel
Copy link
Author

You would need to manually mark navigations as loaded.

@iuridosanjos
Copy link

So, in other words, multiple level queries eager load don't work in a lazy load model?

How can I eager load data considering 1:n relations without having the performance problem that EF Core 3.x is having? I cannot load everything in only one query... and now I cannot load in subsequent queries, at least not on model that supports lazy loading. I can't really find a good solution to this problem.

@smitpatel
Copy link
Author

It works, you will need additional code in above code to explicitly mark the navigation as loaded.

@iuridosanjos
Copy link

How can I do that? Can you show me how?

@laurynasr
Copy link

The "Non-tracking" part also loses ORDER BY clauses, so needs additional call to .OrderBy for customers

@Dunge
Copy link

Dunge commented May 6, 2020

I need more explanation on how to use this query splitting strategy, because it doesn't seems to be functioning here.

I'm aware this isn't StackOverflow and not a place to ask for help with a custom issue, so I'm sorry for that, but I believe it's the only place where I might find an answer. I have a pretty complex model, but I simplified my query to a single include() call and it still loads everything twice.

using (var context = new MyContext())
{
    var query = context.Container.Where(d => d.EnterpriseId.Equals(1));
    var result = query.ToList();
    query.Include("CommunicationSetting").Load();
}

When looking at SQL Profiler, the first ToList() call give me this query:

SELECT [c].[Id], [c].[DeletedDate], [c].[Description], [c].[Enterprise_FK], [c].[Name], [c].[Template_FK], [c].[Type], [c].[CanSupportVip16HybridMode], [c].[HasCompass], [c].[HasGps], [c].[HasMaintenanceLog], [c].[HasReverseGeocoding], [c].[ImageUrl], [c].[PushOptions], [c].[SerialNumber], [c].[StopAutomate], [c].[TldPhasePlanData], [c].[DrivingSide], [c].[Intersection_FK], [c].[Pattern], [c].[PhasePlanVersion], [c].[Script], [c].[GpsData], [c].[Image], [c].[Height], [c].[Width], [c].[Message_FK], [c].[Number], [c].[TransparentColor], [c].[CustomData], [c].[IsActive], [c].[JamLogicVersion], [c].[Periodicity], [c].[WorkflowTemplate_FK], [c].[CameraControlInSystemBrowser], [c].[NormalSpeed], [c].[NormalVehicleCount], [c].[SlowIn], [c].[SlowOut], [c].[SlowSpeed], [c].[SlowVehicleCount], [c].[StoppedSpeed], [c].[StoppedVehicleCount], [c].[VirtualRadarMaxSpeed], [c].[VirtualRadarMinSpeed], [c].[AverageLength], [c].[Length], [c].[MajorTravelTime], [c].[ModerateTravelTime], [c].[CustomData], [c].[TravelTimeOffset], [c].[DefaultMessage_FK], [c].[Height], [c].[Width], [c].[Length], [c].[Capacity], [c].[Locked], [c].[ProductOrder], [c].[SubscriptionEnd], [c].[SubscriptionStart], [c].[NormalSpeed], [c].[CustomData], [c].[SlowSpeed], [c].[StoppedSpeed], [c].[VirtualRadarMaxSpeed], [c].[VirtualRadarMinSpeed], [c].[RtmsDownstreamLanes], [c].[LaneFilterCount], [c].[LaneFilterMinute], [c].[NormalOccupancy], [c].[SlowOccupancy], [c].[StoppedOccupancy], [c].[RtmsUpstream], [c].[RtmsUpstreamLanes]
FROM [Container] AS [c]
WHERE [c].[Type] IN (CAST(32768 AS bigint), CAST(128 AS bigint), CAST(2097152 AS bigint), CAST(8589934592 AS bigint), CAST(68719476736 AS bigint), CAST(262144 AS bigint), CAST(131072 AS bigint), CAST(34359738368 AS bigint), CAST(4294967296 AS bigint), CAST(4194304 AS bigint), CAST(8388608 AS bigint), CAST(1099511627776 AS bigint), CAST(8192 AS bigint), CAST(65536 AS bigint), CAST(512 AS bigint), CAST(536870912 AS bigint), CAST(64 AS bigint), CAST(256 AS bigint), CAST(274877906944 AS bigint), CAST(4398046511104 AS bigint), CAST(2199023255552 AS bigint), CAST(2 AS bigint), CAST(268435456 AS bigint), CAST(1024 AS bigint), CAST(16 AS bigint), CAST(8 AS bigint), CAST(1073741824 AS bigint), CAST(1 AS bigint), CAST(4096 AS bigint), CAST(8796093022208 AS bigint), CAST(17179869184 AS bigint), CAST(1048576 AS bigint), CAST(524288 AS bigint), CAST(33554432 AS bigint), CAST(67108864 AS bigint), CAST(549755813888 AS bigint), CAST(134217728 AS bigint), CAST(4 AS bigint), CAST(2048 AS bigint), CAST(16384 AS bigint), CAST(17592186044416 AS bigint), CAST(137438953472 AS bigint), CAST(2147483648 AS bigint), CAST(32 AS bigint)) AND ([c].[Enterprise_FK] = 1)

When running the next line (Include().Load()), it gives me this query. Problem is, the whole select of all base properties are selected again, as well as the additional properties for the JOIN clause I'm trying to get:

SELECT [c].[Id], [c].[DeletedDate], [c].[Description], [c].[Enterprise_FK], [c].[Name], [c].[Template_FK], [c].[Type], [c].[CanSupportVip16HybridMode], [c].[HasCompass], [c].[HasGps], [c].[HasMaintenanceLog], [c].[HasReverseGeocoding], [c].[ImageUrl], [c].[PushOptions], [c].[SerialNumber], [c].[StopAutomate], [c].[TldPhasePlanData], [c].[DrivingSide], [c].[Intersection_FK], [c].[Pattern], [c].[PhasePlanVersion], [c].[Script], [c].[GpsData], [c].[Image], [c].[Height], [c].[Width], [c].[Message_FK], [c].[Number], [c].[TransparentColor], [c].[CustomData], [c].[IsActive], [c].[JamLogicVersion], [c].[Periodicity], [c].[WorkflowTemplate_FK], [c].[CameraControlInSystemBrowser], [c].[NormalSpeed], [c].[NormalVehicleCount], [c].[SlowIn], [c].[SlowOut], [c].[SlowSpeed], [c].[SlowVehicleCount], [c].[StoppedSpeed], [c].[StoppedVehicleCount], [c].[VirtualRadarMaxSpeed], [c].[VirtualRadarMinSpeed], [c].[AverageLength], [c].[Length], [c].[MajorTravelTime], [c].[ModerateTravelTime], [c].[CustomData], [c].[TravelTimeOffset], [c].[DefaultMessage_FK], [c].[Height], [c].[Width], [c].[Length], [c].[Capacity], [c].[Locked], [c].[ProductOrder], [c].[SubscriptionEnd], [c].[SubscriptionStart], [c].[NormalSpeed], [c].[CustomData], [c].[SlowSpeed], [c].[StoppedSpeed], [c].[VirtualRadarMaxSpeed], [c].[VirtualRadarMinSpeed], [c].[RtmsDownstreamLanes], [c].[LaneFilterCount], [c].[LaneFilterMinute], [c].[NormalOccupancy], [c].[SlowOccupancy], [c].[StoppedOccupancy], [c].[RtmsUpstream], [c].[RtmsUpstreamLanes], [t].[Id], [t].[Connectivity], [t].[Device_FK], [t].[GpsPositionLogin], [t].[GpsPositionPassword], [t].[GpsSourceType], [t].[ModemType], [t].[PowerSourceType], [t].[Type], [t].[ProtocolType], [t].[CommunityName], [t].[NbRetry], [t].[SubAddress], [t].[Timeout], [t].[NetworkAddress], [t].[Port], [t].[ServerId], [t].[SocketType], [t].[BaudRate], [t].[ComPort], [t].[DropAddress], [t].[FeedbackUri], [t].[Login], [t].[Password], [t].[PhoneNumber], [t].[InputType]
FROM [Container] AS [c]
LEFT JOIN (
    SELECT [c0].[Id], [c0].[Connectivity], [c0].[Device_FK], [c0].[GpsPositionLogin], [c0].[GpsPositionPassword], [c0].[GpsSourceType], [c0].[ModemType], [c0].[PowerSourceType], [c0].[Type], [c0].[ProtocolType], [c0].[CommunityName], [c0].[NbRetry], [c0].[SubAddress], [c0].[Timeout], [c0].[NetworkAddress], [c0].[Port], [c0].[ServerId], [c0].[SocketType], [c0].[BaudRate], [c0].[ComPort], [c0].[DropAddress], [c0].[FeedbackUri], [c0].[Login], [c0].[Password], [c0].[PhoneNumber], [c0].[InputType]
    FROM [CommunicationSettings] AS [c0]
    WHERE [c0].[Type] IN (14, 10, 11, 0, 2, 9, 13, 1, 12, 3, 7, 5, 6)
) AS [t] ON [c].[Id] = [t].[Device_FK]
WHERE [c].[Type] IN (CAST(32768 AS bigint), CAST(128 AS bigint), CAST(2097152 AS bigint), CAST(8589934592 AS bigint), CAST(68719476736 AS bigint), CAST(262144 AS bigint), CAST(131072 AS bigint), CAST(34359738368 AS bigint), CAST(4294967296 AS bigint), CAST(4194304 AS bigint), CAST(8388608 AS bigint), CAST(1099511627776 AS bigint), CAST(8192 AS bigint), CAST(65536 AS bigint), CAST(512 AS bigint), CAST(536870912 AS bigint), CAST(64 AS bigint), CAST(256 AS bigint), CAST(274877906944 AS bigint), CAST(4398046511104 AS bigint), CAST(2199023255552 AS bigint), CAST(2 AS bigint), CAST(268435456 AS bigint), CAST(1024 AS bigint), CAST(16 AS bigint), CAST(8 AS bigint), CAST(1073741824 AS bigint), CAST(1 AS bigint), CAST(4096 AS bigint), CAST(8796093022208 AS bigint), CAST(17179869184 AS bigint), CAST(1048576 AS bigint), CAST(524288 AS bigint), CAST(33554432 AS bigint), CAST(67108864 AS bigint), CAST(549755813888 AS bigint), CAST(134217728 AS bigint), CAST(4 AS bigint), CAST(2048 AS bigint), CAST(16384 AS bigint), CAST(17592186044416 AS bigint), CAST(137438953472 AS bigint), CAST(2147483648 AS bigint), CAST(32 AS bigint)) AND ([c].[Enterprise_FK] = 1)

And of course, performances are just worse than a single query.

Can you explain why does it re-select every base properties a second time?
Thanks!

@smitpatel
Copy link
Author

Code above uses SelectMany to iterate over the elements in collection. You are not using SelectMany. Your 2nd query is same as a query without doing any query splitting.

@Dunge
Copy link

Dunge commented May 6, 2020

In my example, the "CommunicationSetting" property is a reference nav (1 to 1), not a collection, same as Address and OrderDiscount in your example where you do not use SelectMany. So I don't think that's it.

But since the question will come next, in the case of collections it's not quite clear how the usage of SelectMany should be used. Every time we need to include a collection we follow with a call to SelectMany? I'm also having issues with inheritance and entities where the collection only exists in the derived type. Seems like it won't let me cast to the derived type to use SelectMany.

@Dunge
Copy link

Dunge commented May 6, 2020

Oh sorry, gotcha. Load all base reference nav in the original query (a join make sense in this case), but collections navs with a separate call. With SelectMany, the properties aren't duplicated.

I'm still having issues with inheritance though, can't use SelectMany if the base type doesn't possess the collections.
Edit: A Simple OfType<>() seems to do.

@jkatsiotis
Copy link

jkatsiotis commented May 14, 2020

Any help on how to work with paging?

If we had a query like this
query = query.Skip(10).Take(50);
var result = query.ToList();

Will this have to change? I assume it wouldn't.
query.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).SelectMany(c => c.Orders).Load();

How about this ? It should perform the paging in the Orders Table. Is it possible?
query.SelectMany(c => c.Orders).SelectMany(o => o.OrderDetails).Load();

@smitpatel
Copy link
Author

query.Include(c => c.Orders).ThenInclude(o => o.OrderDiscount).Skip(10).Take(50).SelectMany(c => c.Orders).Load();

@jkatsiotis
Copy link

Thanks!

@Dunge
Copy link

Dunge commented Jun 8, 2020

When adding a Take(1) parameter to the original query to simulate a FirstOrDefault() call, all subsequent queries done with SelectMany/Load will also reselect all base properties of the first query in a join with related data. Without Take(1), it only select the related data making it substantially faster.

var query = context.Container.Include("CommunicationSetting").Where(d => d.Id == id).Take(1);
var result = query.ToList();
query.Include(t => t.Labels).SelectMany(t => t.Labels).Load();

Will be much slower than:

var query = context.Container.Include("CommunicationSetting").Where(d => d.Id == id);
var result = query.ToList();
query.Include(t => t.Labels).SelectMany(t => t.Labels).Load();

When using SQL Profiler, I clearly see in the first case that all Container properties are selected a second time when stepping over the line loading Labels. In the second example, they aren't.

Any idea why?

@bwn-z
Copy link

bwn-z commented Jun 23, 2020

I think you need to add a stopwatch to all blocks of running queries. The user must monitor the result of executed queries in different blocks. User can make decisions about the approach used, depending on the performance.

@NN89
Copy link

NN89 commented Oct 15, 2021

Oh sorry, gotcha. Load all base reference nav in the original query (a join make sense in this case), but collections navs with a separate call. With SelectMany, the properties aren't duplicated.

I'm still having issues with inheritance though, can't use SelectMany if the base type doesn't possess the collections. Edit: A Simple OfType<>() seems to do.

I am facing a similar issue with SelectMany if the base type doesn't possess the collections. I wasn't able to figure out your Edit portion of the comment. Can you give me an example of how you address that?

@Dunge
Copy link

Dunge commented Oct 15, 2021

Your best bet would be to upgrade to EFCore5/EFCore6 and use AsSplitQuery() instead of all that, but if you are stuck on EFCore3:

context.BaseEntitySet.OfType<DerivedEntity>().Include(e => e.YourList).SelectMany(t => t.YourList).Load();  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment