Blade He
357bb6d580
1. support dynamic show fund level data examples.
...
2. optimize for minimum_initial_investment data point
2025-02-25 10:35:53 -06:00
Blade He
e60e1fd546
move configuration files for all datapoints to "all_datapoints" folder
2025-02-24 15:23:16 -06:00
Blade He
590f7e2249
1. backup data points configurations
...
2. simplify data points configurations for important 11 data points.
2025-02-24 15:21:32 -06:00
Blade He
75ea383354
support identify aus prospectus document category: MIS or Super
2025-02-24 15:08:15 -06:00
Blade He
f7d53acdde
support get sqlpass api by configuration
2025-02-19 14:37:21 -06:00
Blade He
bb6862b179
update a little
2025-02-19 14:32:08 -06:00
Blade He
705933bbdd
optimized for phase 2 data
2025-02-18 18:52:26 -06:00
Blade He
353bc28599
update a little
2025-02-11 11:49:53 -06:00
Blade He
01e2a0e38d
add configuration for datapoints data types
...
update configuration for minimum initial investment
support apply value to all of funds for minimum initial investment
2025-02-05 12:08:12 -06:00
Blade He
a8810519f8
optimize instructions configuration
...
optimize drilldown part logic
2025-02-04 15:29:24 -06:00
Blade He
f9ef4cec96
update sql_query cache file store location
...
At most cache 5 days, then clean from local disk.
2025-01-31 10:59:54 -06:00
Blade He
7f37f3532f
switch example document
2025-01-27 14:59:26 -06:00
Blade He
6f831e241c
Merge branch 'aus_prospectus_ravi'
2025-01-27 12:32:42 -06:00
Blade He
41f8c307ff
a little change
2025-01-27 12:32:36 -06:00
Blade He
47c41e492f
1. only get name mapping data from document mapping
...
2. Compare name mapping metrics between Ravi's and mine.
2025-01-27 12:29:49 -06:00
Blade He
d9b0bed39a
a little change
2025-01-22 09:57:42 -06:00
Blade He
350550d1b0
fix issue for removing item from list
2025-01-21 17:24:05 -06:00
Blade He
e2b9bcbdbc
initial abbreviation configurations
2025-01-21 17:09:45 -06:00
Blade He
b15d260a58
migrate name mapping algorithm from Ravi
2025-01-21 16:55:08 -06:00
Blade He
d41fae3dba
prepare for 100 multi-funds document samples
2025-01-17 16:26:31 -06:00
Blade He
b93a8d55e8
update for output data as template
2025-01-17 11:41:58 -06:00
Blade He
f10ff8ee33
update for deployment
2025-01-16 20:34:43 -06:00
Blade He
fb4a6402f0
support output merged data format
2025-01-16 16:31:04 -06:00
Blade He
2eace81f51
support more configurable parts
2025-01-16 13:54:45 -06:00
Blade He
db0827435b
supplement EMEA AR configuration files
2025-01-16 11:30:44 -06:00
Blade He
9f0e77a11e
support load configurations by doc_source parameter
2025-01-16 11:17:48 -06:00
Blade He
acc30d4b72
if fail to get text by pdf to html API, then try to get text by pymupdf.
2025-01-15 18:36:02 -06:00
Blade He
ace0ac2674
a little change
2025-01-15 18:22:08 -06:00
Blade He
a89aa9c4de
support fetch data from Prospectus
2025-01-14 16:21:48 -06:00
Blade He
e230a5bf15
a little change
2025-01-09 12:19:24 -06:00
Blade He
91c86bb983
update AUS Prospectus relevant configuration
2025-01-08 17:40:57 -06:00
Blade He
0a867dcf07
complete configuration for AUS Prospectus
2025-01-07 16:25:13 -06:00
Blade He
201a809ffa
comment remove_abundant_data function
2025-01-06 15:27:43 -06:00
Blade He
c335992ced
update requirements.txt
2025-01-06 13:56:09 -06:00
Blade He
9348e32caa
support more performance fee keywords
2025-01-06 13:14:20 -06:00
Blade He
65e752e25a
realize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams
2024-12-18 09:19:55 -06:00
Blade He
309bb714f6
fix issue for parsing data via Vision Function.
2024-12-11 16:49:04 -06:00
Blade He
d673a99e21
switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
...
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He
f71e2968cc
simplify code
2024-12-09 22:24:40 -06:00
Blade He
75ea5e70de
1. support fetch data from messy-code page by ChatGPT4o Vision function.
...
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He
d96f77fe00
Split share class names which with multiple share classes in same line
2024-12-06 16:31:42 -06:00
Blade He
d79b05885d
optimize prompts for TOR
2024-12-06 14:50:34 -06:00
Blade He
a25991e2bb
1. Set TOR reported name priority
...
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He
95c386911c
Clean fund name after getting response from ChatGPT
2024-12-04 22:08:09 -06:00
Blade He
70362b554f
Fix issue for "The last fund name of previous PDF page" logic:
...
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He
36fbaa946e
Add the statement when transferring the last fund name of previous PDF page:
...
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He
a11a99fdc3
1. Optimize instructions: not to fetch the data with "up to" statement.
...
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He
bc32860f87
remove_abundant_data
2024-12-02 17:16:56 -06:00
Blade He
c146497052
optimize share feature judgment logic:
...
accumulation with capitalisation and institutional
income with distribution
Document: 337293427
2024-12-02 13:11:49 -06:00
Blade He
352886ade2
update instructions for TER, OGC, Performance Fees
2024-12-02 11:45:19 -06:00